Failed to process [...] because [...] is an exact copy and already exists

plittlefield · Post by **plittlefield** » 27 Aug 2018, 12:17

I already had 4 episodes of a TV show but when I want to copy and skip the remaining episodes, it stops and fails...

 filebot --action copy --conflict skip -rename --format "{n} {s00e00}" --output "/home/paully/Plex/TV/House" "/home/paully/Downloads/Videos/ToDo/House MD Season 1, 2, 3, 4, 5, 6, 7 & 8 + Extras DVDRip TSV/Season 1/" -non-strict
Rename episodes using [TheTVDB]
Auto-detected query: [House MD]
Fetching episode data for [House]
Processed 0 files
Failed to process [/home/paully/Downloads/Videos/ToDo/House MD Season 1, 2, 3, 4, 5, 6, 7 & 8 + Extras DVDRip TSV/Season 1/House MD Season 1 Episode 01 - Pilot.avi] because [/home/paully/Plex/TV/House/House S01E01.avi] is an exact copy and already exists
Failure (°_°)

I have tried all 3 conflict options of auto, override and skip and each one results the same error of "already exists."

In the end, I deleted the 4 episodes I already had, and then ran the same command line without the override option and it completed OK, but it still does not answer the problem, or maybe I did something wrong?

Thanks,

Paully

Post by **rednoah** » 27 Aug 2018, 17:19

This warning is completely unrelated to the --conflict option:

Code: Select all

Failed to process [...] because [...] is an exact copy and already exists

This error message means that you're somehow processing the exact same file twice, and this error message is basically a sanity check to make sure you're not doing something strange. It prevents you from accidentally processing the same files over and over again. If you repeatedly run FileBot on the same files, you need to use some sort of logic to call FileBot only of files that haven't been processed with FileBot before.

plittlefield · Post by **plittlefield** » 28 Aug 2018, 10:02

It seems strange how the

Code: Select all

--action test

option (which I ran first) works just fine and says that 4 files will be skipped!?!

It also seems strange how it DID work when I deleted the files that SHOULD have been skipped.

So, is there a 'clear memory' option, so that I can run the command and do this next time I need to... or will I have to delete any files that are already there again?

Sorry, I am slightly confused

Paul

Post by **rednoah** » 28 Aug 2018, 10:50

1.
--action test doesn't access any files, so IO errors, so no sanity checks, no --conflict actions, etc.

2.
If you delete the file that should been skipped, either source or destination, then FileBot can't tell that your way of doing things is not ideal, because it'll look as if you're processing the file for the first time.

3.
There is no way to clear memory, because the destination file is the memory. We shouldn't be talking about how we can trick FileBot into letting you do things you probably shouldn't. We should rather have a look at what you are doing, what you are trying to achieve, why you are doing things the way you are doing them, and then maybe find a better solution or approach.

I see many issue with the command you posted, most notably you seem to be organizing for Plex, but completely ignore the plex naming standard in multiple ways (series folder name does not match series name, no season folder, no title in episode, etc).

Maybe you just want to process recently added files? How come you process the same files again in the first place?

plittlefield · Post by **plittlefield** » 28 Aug 2018, 15:14

Hey, thanks for the detailed reply

I have acquired files in one folder and wanted to copy them to the Plex folder with the naming convention that my usual command line FileBot + Transmission script works with and which Plex is happy with (at least it works for me!)

This time, becuase the folder size was HUGE, I had started to copy the files from Downloaded folder A to Plex folder B by hand and did 4 of them:-

Code: Select all

cp -av /path/to/downloaded/file.mkv /path/to/Plex/House/House S01E01.mkv

Then, I suddenly thought I can save some time by using FileBot to rename them and copy them at the same time.

It worked, and Plex is happy but just stumbled with the 4 files I had already copied by hand.

Like I say, I normally use a bigger BASH command line with the script:amc and ut_title etc which works every time.

This one time I wanted to do it "by hand" and use FileBot (4.8) to copy the files for me...

...see what I mean?

Thanks,

Paully

Post by **rednoah** » 29 Aug 2018, 04:19

1.
I see. So normally this wouldn't be an issue then, since you can always just use FileBot the first time around. Without copying a few files manually first.

2.
The amc script works better for incremental tasks because it keeps track of what files have or have not been processed yet. If you already have a transmission-post process script, you can actually just use that, so that automated calls and manual calls do exactly the same.

e.g.

Code: Select all

export TR_TORRENT_DIR="/path/to/downloaded"
export TR_TORRENT_NAME="file.mkv"
./transmission-postprocess.sh

You can make your own helper script / function to do that in a single command.

plittlefield · Post by **plittlefield** » 29 Aug 2018, 06:47

Aaahh, nice... exported variables!

I shall try and report back later.

Thanks,

Paully

cgacord · Post by **cgacord** » 06 Dec 2018, 14:04

When you use --conflict skip, shouldn't filebot see there is a conflict at the destination, stop processing the file, and then move on to the next file for processing?

For some reason when I run into any conflicts, filebot reports the conflict and then quits, not processing anymore files.

Is there an option that will let filebot continue after failing due to a conflict?

Thanks

Post by **rednoah** » 06 Dec 2018, 14:30

Yes, this is typically the result if you process the same files over and over, for no good reason, abusing limited resources and breaking things for everyone.

Failed to process [...] because [...] is an exact copy and already exists indicates that there's something seriously broken with how you do things, and instead of working anyway, it forces you to fix things and do it right.

What exactly are you doing?

Why are you processing the exact same file into the exact same path more than once?

Why not only process new files?

CUSTOMER · Post by **CUSTOMER** » 14 May 2020, 21:02

Well, what a disapointing answer from the owner ... In case you didn't noticed, we are customers and we paid for this resources. We paid for a software that fails deliver a service. The job is pretty simple : add an option to move away those files. You are asked again and again to add this option but you are still not hearing this : your customers are unhappy about it. You claim it's easy to find a software that does this job .. it's actually easier to find people complaining about this error, with not practical answer from you other than "get ride of your dupes dude"

Tell me about wasting resources ... you think guys that are launching again and again the same script trying to figure out why the hell this software cannot simply move a file away are not wasting your precious resources ? Instead of doing the job once and never see those files again ?

Go ahead, delete this comment and think again.

Post by **rednoah** » 15 May 2020, 06:33

CUSTOMER wrote: 14 May 2020, 21:02 Tell me about wasting resources ... you think guys that are launching again and again the same script trying to figure out why the hell this software cannot simply move a file away are not wasting your precious resources ? Instead of doing the job once and never see those files again ?

The error message is quite clear. What exactly did you find confusing about it? What was your though process when you read the error message? Since we're simply moving and renaming files, how come you've got the same file twice?

CUSTOMER wrote: 14 May 2020, 21:02 Go ahead, delete this comment and think again.

Why would I do that? These kinds of posts are rare, and definitely something we wanna keep around for posterity.

Well...

Are you looking for a technical solution, or just need an outlet to complain a little bit? Because personally, I'd be somewhat more interesting in the former. The latter doesn't really do much for me if it doesn't include technical information or new arguments.

If you're looking for improved support for your particular use case, then please describe and explain your particular use case, so we can keep that in mind when adding these kinds of sanity checks, because whatever it is you're doing, it's really not as common as you might think, and depending on what exactly it is, there's usually a easy workaround:

rednoah wrote: 06 Dec 2018, 14:30 Failed to process [...] because [...] is an exact copy and already exists indicates that there's something seriously broken with how you do things, and instead of working anyway, it forces you to fix things and do it right.

What exactly are you doing?

Why are you processing the exact same file into the exact same path more than once?

Why not only process new files?

This is from 2 years ago. Answers would nevertheless be much appreciated.

Alternatively, you may want to consider using the GUI application, since that'll make everything straight-forward and leave little room for user error.

unpressurized · Post by **unpressurized** » 04 Jun 2020, 15:01

rednoah wrote: 15 May 2020, 06:33 Why are you processing the exact same file into the exact same path more than once?

I download individual episodes as they air, then later download a "pack" of the whole season. That pack will sometimes contain some of the files I've already downloaded. The reason to do this is consistent quality - maybe only 720p was available at the time for some episodes, but the pack will have 1080p for all episodes. Or only x264 was available for some episodes and x265 for others, but now the pack has x265 for all episodes.

This means quite often I've downloaded a file, processed it into place, then months later I'm downloading the same file and trying to process it into the same place. I try to avoid re-downloading the same file twice, but bandwidth is cheaper than my time and attention.

The default behavior of FileBot is good. People should examine their processes. However, I've examined my process and I still want the option of getting rid of the source file if it's a duplicate - either deleting it, or moving it to an alternate directory. Moving it is probably safer. Something like:

Code: Select all

--move-duplicates=/home/media/duplicates/

If duplicates were moved to another directory, then I could examine them later at my leisure, without stopping my automation. I want this for both "is an exact copy and already exists" and for "Skipped because already exists".

Alternatively, if FileBot simply exited with a different error code for "is an exact copy and already exists" and for "Skipped because already exists", then I could do my own deletion/moving via scripting. I'd have to invoke FileBot separately for each individual file, but again, CPU time is cheaper than my attention.

Post by **rednoah** » 04 Jun 2020, 16:27

unpressurized wrote: 04 Jun 2020, 15:01 Alternatively, if FileBot simply exited with a different error code for "is an exact copy and already exists" and for "Skipped because already exists", then I could do my own deletion/moving via scripting. I'd have to invoke FileBot separately for each individual file, but again, CPU time is cheaper than my attention.

You could eliminate binary duplicates as a pre-processing step:
viewtopic.php?p=23171#p23171

e.g.

Code: Select all

filebot -script fn:duplicates --mode binary /input /output -rename --output /duplicates --format {fn}

unpressurized · Post by **unpressurized** » 06 Jun 2020, 23:02

rednoah wrote: 04 Jun 2020, 16:27 You could eliminate binary duplicates as a pre-processing step:
viewtopic.php?p=23171#p23171

e.g.
Code: Select all
filebot -script fn:duplicates --mode binary /input /output -rename --output /duplicates --format {fn}

Long story short, I couldn't get the duplicates script to work even after enabling the .xattr directories and making sure both source and destination had filled .xattr directories (by moving them to a temporary directory, then renaming them). The only message I ever got was "Done ヾ(＠⌒ー⌒＠)ノ". I then tried to modify renall.groovy or some other script to try and add error-checking to the script (try to rename, and if that fails, then rename to another directory), which also didn't get far.

Then I saw that the error messages in the log file include the source filename. I could just parse the log file after FileBot runs!

So, here's my three lines of shell scripting that solves the issue for me. This moves matches and exact matches to their own directories after FileBot runs. You need "--log-file /tmp/filebot.log" in your FileBot invocation.

Code: Select all

grep '^Skipped \[' /tmp/filebot.log | sed 's/^[^\[]*\[\([^]]*\)\].*$/\1/' | while read line; do mv -v "${line}" /media/incoming/dupes/; done
grep '^Failed to process \[' /tmp/filebot.log |sed 's/^[^\[]*\[\([^]]*\)\].*$/\1/' | while read line; do mv -v "${line}" /media/incoming/exactdupes/; done
mv /tmp/filebot.log /tmp/filebot.log.bak

FileBot still fails at the first exact match, but you can just re-run it, or invoke FileBot for each file from a script. Renaming the log is necessary in order to not re-process the same files over and over.

Actually after writing this up, I realized I don't even need sed; cut will do:

Code: Select all

grep '^Skipped \[' /tmp/filebot.log |cut -d \] -f1 |cut -d \[ -f2 | while read line; do mv -v "${line}" /media/incoming/dupes/; done
grep '^Failed to process \[' /tmp/filebot.log |cut -d \] -f1 |cut -d \[ -f2 | while read line; do mv -v "${line}" /media/incoming/exactdupes/; done
mv /tmp/filebot.log /tmp/filebot.log.bak

This is not a good solution since it can break if the log format is ever changed, but it solves the problem for me so I'm sharing it. I'd still rather that FileBot had the ability to move duplicates as they are discovered.

Post by **rednoah** » 07 Jun 2020, 04:27

unpressurized wrote: 06 Jun 2020, 23:02 Long story short, I couldn't get the duplicates script to work even after enabling the .xattr directories and making sure both source and destination had filled .xattr directories (by moving them to a temporary directory, then renaming them).

If you're looking into xattr, then you're already on the wrong path. You'll want to look for binary duplicates (same bytes) and not logical duplicates (same movie id as per xattr metadata) and remove them from the input folder beforehand.

e.g.

Code: Select all

$ filebot -script fn:duplicates --mode binary OUTPUT INPUT -rename --output DUPLICATES --format '{fn}'
[*] [39513879, f5655e0cc6906599, DC33B050]
[+] 1. ...\OUTPUT\dts_animated_logo_lossless-DWEU.mkv
[-] 2. ...\INPUT\dts_animated_logo_lossless-DWEU.mkv
1 duplicates
Rename files using [Plain File]
[MOVE] from [...\INPUT\dts_animated_logo_lossless-DWEU.mkv] to [...\DUPLICATES\dts_animated_logo_lossless-DWEU.mkv]
Processed 1 files

--mode binary was introduced recently, so you'll probably need FileBot 4.9.1 or higher for this to work.

unpressurized · Post by **unpressurized** » 11 Jun 2020, 20:16

rednoah wrote: 07 Jun 2020, 04:27 If you're looking into xattr, then you're already on the wrong path. You'll want to look for binary duplicates (same bytes) and not logical duplicates (same movie id as per xattr metadata) and remove them from the input folder beforehand.

Well, I also want to deal with logical duplicates, but your post did make me check and it turns out that I only had 4.8.5, so I upgraded to 4.9.1 and tried again.

Code: Select all

/opt/filebot/filebot.sh -script fn:duplicates --mode binary /media/incoming /media/en/serials -rename --output /media/exactdupes --format "{n}{' ('+y+')'}/{episode.special ? 'Specials' : 'Season '+s.pad(2)}/{n} - {episode.special ? 'S00E'+special.pad(2) : s00e00} - {t.replaceAll(/[\`´‘’ʻ]/, /'/).replaceAll(/[!?.]+$/)}{' '+source}{' '+vf}{' '+vc}{' '+ac}{'.'+lang}"

/media/incoming had exactly one file, which I copied from my media library to make sure it was an exact duplicate.

I started FileBot and then walked away for a bit and came back, and it was still running. So I re-ran it with 'truss' (like 'strace') to see what it was doing. FileBot was going through each media file in my library, checking for an extended attribute named "CRC32" (won't work because I'm using ZFS), then opening and reading the file (to generate a checksum?), then trying and failing to set that same "CRC32" extended attribute.

Code: Select all

30658: extattr_get_file("/media/en/serials/TVSHOW (2016)/Season 2/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv",EXTATTR_NAMESPACE_USER,"CRC32",0x0,0) ERR#87 'Attribute not found'
30607: openat(AT_FDCWD,"/media/en/serials/TVSHOW (2016)/Season 2/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv",O_RDONLY,00) = 76 (0x4c)
30672: extattr_set_file("/media/en/serials/TVSHOW (2016)/Season 2/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv",EXTATTR_NAMESPACE_USER,"CRC32","165031EE",8) = 8 (0x8)

My library has 71 TB across 81,544 files. It would take days to checksum my library. Worse, without extended attributes, it would take days every time I ran this script!

I put "net.filebot.xattr.store=.xattr" back in my system.properties, and sure enough it started creating .xattr directories to store this CRC32 extended attribute:

Code: Select all

31046: openat(AT_FDCWD,"/media/en/serials/TVSHOW (2016)/Season 2/.xattr/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv/CRC32",O_RDONLY,00) ERR#2 'No such file or directory'
31046: stat("/media/en/serials/TVSHOW (2016)/Season 2/.xattr/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv",0x7fffdfff8e58) ERR#2 'No such file or directory'
31046: mkdir("/media/en/serials/TVSHOW (2016)/Season 2/.xattr/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv",0777) ERR#2 'No such file or directory'
31046: access("/media/en/serials/TVSHOW (2016)/Season 2/.xattr",F_OK) ERR#2 'No such file or directory'
31046: mkdir("/media/en/serials/TVSHOW (2016)/Season 2/.xattr",0777) = 0 (0x0)
31046: mkdir("/media/en/serials/TVSHOW (2016)/Season 2/.xattr/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv",0777) = 0 (0x0)
31046: openat(AT_FDCWD,"/media/en/serials/TVSHOW (2016)/Season 2/.xattr/TVSHOW (2016) - s02e01 - EPISODETITLE 1080p WEB-DL AVC.mkv/CRC32",O_WRONLY|O_CREAT|O_TRUNC,0666) = 76 (0x4c)

To me it looks like the only sane way to run this script would be to enable extended attributes, so it can store the "CRC32" attribute.

Anyways, I don't want to checksum my whole library, nor do I want to check for duplicates within my library. I purposefully hard-link files when the file has multiple languages, which would look like duplicates.

I just want to check the destination filepath. If FileBot knows that '/media/incoming/TVSHOW - S31E22 - EPISODETITLE 1080p x264 AAC.mkv' is going to end up as '/media/en/serials/TVSHOW (1989)/Season 31/TVSHOW - S31E22 - EPISODETITLE 1080p x264 AAC.mkv' based on the --format parameter, then there's no need to check any other file.

Do FileBot scripts have access to the destination file path without actually performing the rename?

Post by **rednoah** » 12 Jun 2020, 05:59

Well, --mode binary wouldn't typically be expected to read all files, because the file size, or differences in the first few bytes of the file, are typically enough to tell that two files are not the same, i.e. most files are not read at all, some files are read a little bit, and (almost) only actual duplicates would be expected to be read completely.

However, if all your files are hardlinks, and if all those hardlinks are present in both input and output folders, then --mode binary will indeed not work well for your particular use case.

But since you know all your files are hardlinks, checking for duplicates is even easier, just check the file id, and only process files where the file id doesn't already exist in the destination folder, and then instead of passing an entire folder to the filebot command, you just pass a set of files as arguments (only the files you want, excluding the files you don't want).

e.g. find all files from /input that have exactly one hardlink (neatly ignoring already hardlinked files) and then pass them on to a filebot call:

Code: Select all

find /input -type f -links 1 -exec filebot -script fn:sysenv {} +

TL;DR if you consistently use hardlinks, then finding hardlink duplicates with standard tools like find will be exceedingly simple and straight-forward, fast too, since the file system itself already keeps track of everything for normal operation.

EDIT:

The duplicates script has been enhanced with hardlink awareness though FileBot 4.9.2 (r7667) is required for that to work:
viewtopic.php?p=23171#p23171

cheaters · Post by **cheaters** » 03 Oct 2020, 21:42

rednoah wrote: 07 Jun 2020, 04:27
If you're looking into xattr, then you're already on the wrong path. You'll want to look for binary duplicates (same bytes) and not logical duplicates (same movie id as per xattr metadata) and remove them from the input folder beforehand.

e.g.
Code: Select all
$ filebot -script fn:duplicates --mode binary OUTPUT INPUT -rename --output DUPLICATES --format '{fn}'
[*] [39513879, f5655e0cc6906599, DC33B050]
[+] 1. ...\OUTPUT\dts_animated_logo_lossless-DWEU.mkv
[-] 2. ...\INPUT\dts_animated_logo_lossless-DWEU.mkv
1 duplicates
Rename files using [Plain File]
[MOVE] from [...\INPUT\dts_animated_logo_lossless-DWEU.mkv] to [...\DUPLICATES\dts_animated_logo_lossless-DWEU.mkv]
Processed 1 files
--mode binary was introduced recently, so you'll probably need FileBot 4.9.1 or higher for this to work.

I will ask this here since it seems appropriate spot but will start a new thread if someone think it better.

How would mode binary work for video tracks that are exact duplicates but their respective audio tracks are different - one being dubbed and one not. I think when filebot checks for binary duplicates it's looking at the video track and not the audio track?

I have seen that two movies will have the same crc32 while having different audio tracks.

Post by **rednoah** » 04 Oct 2020, 02:48

jprokos wrote: 03 Oct 2020, 21:42 How would mode binary work for video tracks that are exact duplicates but their respective audio tracks are different - one being dubbed and one not. I think when filebot checks for binary duplicates it's looking at the video track and not the audio track?

Modifying the file in any way whatsoever, even a single bit flip, will make the file different as far as --mode binary is concerned.

jprokos wrote: 03 Oct 2020, 21:42 I have seen that two movies will have the same crc32 while having different audio tracks.

That would be extremely unlikely. I'd start by ruling out human error by comparing the files byte by byte to find out if there are any bytes that are indeed different.

Post by **rednoah** » 27 Dec 2022, 12:17

Re-organize previously organized files using local xattr metadata

Failed to process [...] because [...] is an exact copy and already exists

Failed to process [...] because [...] is an exact copy and already exists

Re: --conflict not working?

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists

Re: Failed to process [...] because [...] is an exact copy and already exists