Filebot excluding files that have not been processed

edifus · Post by **edifus** » 11 Aug 2020, 14:59

❱ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"

❱ filebot -version
FileBot 4.9.1 (r7372) / OpenJDK Runtime Environment 11.0.8 / Linux 5.4.0-42-generic (amd64)

Trying to run Filebot and getting an error that an exact copy of the file already exists after I renamed the source file. I am running with the '--conflict override' option and it usually will overwrite an existing hardlink with no issues, even when its the same exact file and I've never seen this error before. Regardless, I can remove the existing hardlink and it will process the file again. The real issue is that this error causes Filebot to exit and all the files it was going to process are now added to the exclude file, but they haven't been processed.

Code: Select all

Run script [fn:amc] at [Tue Aug 11 10:40:55 EDT 2020]
Parameter: excludeList = /home/edifus/logs/tv.exclude.log
Parameter: ignore = sample[.]
Parameter: artwork = n
Parameter: extras = n
Parameter: skipExtract = y
Parameter: unsorted = n
Parameter: minFileSize = 0
Parameter: minLengthMS = 0
Parameter: seriesFormat = {n}/{episode.special ? 'Specials' : 'Season '+s.pad(2)}/{n.space('.').replaceAll(/[!?,\']+/)}.{episode.special ? 'S00E'+special.pad(2) : s00e00}.{t.space('.').replaceAll(/[!?,\']+/).replacePart('Part.$1')}{'.'+vf}{'.'+source.replace('WEBDL', 'WEB-DL').replace('WEBRIP', 'WEBRip').replace('BLURAY', 'BluRay')}{'.'+ac.replace('MPEG Audio', 'MP3').replace('AC3', 'DD')}{audio.FormatProfile =~ /MA Core/ ? '-HD.MA' : ''}{audio.FormatProfile =~ /ES/ ? '-ES' : ''}{audio.FormatProfile =~ /Pro/ ? '-Pro' : ''}{channels}{'.'+vc.replace('AVC', 'H.264')}{'-'+group}
Argument[0]: /data/downloads/tv
Use excludes: /home/edifus/logs/tv.exclude.log (848)
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E01.Youre.the.Indian.Now.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E02.Freight.Trains.and.Monsters.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E03.An.Acceptable.Surrender.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E04.Going.Back.to.Cali.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E05.Cowboys.and.Dreamers.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E06.All.for.Nothing.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E07.The.Beating.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
Input: /data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E08.I.Killed.a.Man.Today.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
xattr: [Yellowstone.2018.S03E01.Youre.the.Indian.Now.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv] => [Yellowstone (2018) - 3x01 - You're the Indian Now]
Group: {Series=yellowstone 2018} => [Yellowstone.2018.S03E01.Youre.the.Indian.Now.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv, Yellowstone.2018.S03E02.Freight.Trains.and.Monsters.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv, Yellowstone.2018.S03E03.An.Acceptable.Surrender.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv, Yellowstone.2018.S03E04.Going.Back.to.Cali.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv, Yellowstone.2018.S03E05.Cowboys.and.Dreamers.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv, Yellowstone.2018.S03E06.All.for.Nothing.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv, Yellowstone.2018.S03E07.The.Beating.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv, Yellowstone.2018.S03E08.I.Killed.a.Man.Today.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv]
Rename episodes using [TheTVDB] with [Airdate Order]
Auto-detected query: [Yellowstone (2018)]
Fetching episode data for [Yellowstone (2018)]
Fetching episode data for [Yellowstone]
Fetching episode data for [Yellowstone Live]
Fetching episode data for [Wild Yellowstone]
Processed 0 files
CmdlineException: Failed to process [/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E01.Youre.the.Indian.Now.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv] because [/data/plex/tv/Yellowstone (2018)/Season 03/Yellowstone.(2018).S03E01.Youre.the.Indian.Now.1080p.AMZN.WEB-DL.EDD2.0.H.264-NTb.mkv] is an exact copy and already exists [Last-Modified: Tue Aug 11 10:13:40 EDT 2020]
Finished without processing any files
Abort (×_×)

Now my 'tv.exclude.log' file has all the aborted files and they won't be processed unless I remove them from the exclude file.

Code: Select all

cat ~/logs/tv.exclude.log
...
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E01.Youre.the.Indian.Now.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E02.Freight.Trains.and.Monsters.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E03.An.Acceptable.Surrender.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E04.Going.Back.to.Cali.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E05.Cowboys.and.Dreamers.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E06.All.for.Nothing.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E07.The.Beating.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv
/data/downloads/tv/Yellowstone.2018.S03.AMZN.WEB-DL.DDP2.0.H.264-NTb/Yellowstone.2018.S03E08.I.Killed.a.Man.Today.1080p.AMZN.WEB-DL.DDP2.0.H.264-NTb.mkv

If Filebot aborts and the files are not processed, the files shouldn't be added to the exclude list.

Post by **rednoah** » 11 Aug 2020, 15:30

1.

edifus wrote: ↑11 Aug 2020, 14:59 If Filebot aborts and the files are not processed, the files shouldn't be added to the exclude list.

That would lead to an infinite loop on files that cannot be processed for some reason. The exclude list is designed to prevent that from happening.

edifus wrote: ↑11 Aug 2020, 14:59 Trying to run Filebot and getting an error that an exact copy of the file already exists after I renamed the source file. I am running with the '--conflict override' option and it usually will overwrite an existing hardlink with no issues, even when its the same exact file and I've never seen this error before.

If you rename files in the input folder, then you will end up with the exact same file being processed twice, which is typically the result of running filebot in an infinite loop over and over. That's different from processing different files at different times that just so happen to evaluate to the same destination file path.

The exclude list approach is simple and general, but may not be ideal for everyone at all times. If you strictly use hardlinks, then you can check if a file has been processed or not simply by checking the link count, regardless of file name. If link count is 1, then it's a new file you want to process. If link count is 2 or more, then you have already procesed and linked this file.

2.
You can use the duplicates script to preemptively eliminate binary duplicates from the input folder:
viewtopic.php?p=23171#p23171

This is an expensive operation, so I wouldn't choose this path unless you expect non-excluded binary duplicates in the input on a regular basis.

edifus · Post by **edifus** » 11 Aug 2020, 17:19

Makes sense, but no chance for an option to continue processing the remaining files in this scenario? I see how adding the bad file to the exclude list keeps Filebot out of an infinite loop, but the other files were never even attempted to be processed and put on the exclude list.

Are you saying Filebot can check if a file has more than one link and exclude it from processing accordingly? If so, how would this be accomplished? Right now it's using a directory input so Filebot picks and chooses which files to process.

Post by **rednoah** » 11 Aug 2020, 17:47

edifus wrote: ↑11 Aug 2020, 17:19 Makes sense, but no chance for an option to continue processing the remaining files in this scenario? I see how adding the bad file to the exclude list keeps Filebot out of an infinite loop, but the other files were never even attempted to be processed and put on the exclude list.

Well, opening the exclude list file in your favorite text editor and deleting the last few lines would take less time than searching the manual for additional option, so there's no additional options for that. Plus this particular issue won't come up during typical usage. Remember to not rename files and you're good.

The exclude list is meant to be simple, not perfect. There are many conceivable ways of keeping track of processed / not yet processed files that are arguably better at keeping track of things, at the cost of added complexity or universal applicability.

edifus wrote: ↑11 Aug 2020, 17:19 Are you saying Filebot can check if a file has more than one link and exclude it from processing accordingly? If so, how would this be accomplished? Right now it's using a directory input so Filebot picks and chooses which files to process.

The file system naturally keeps track of that, unrelated to FileBot.

e.g. find files with exactly 1 link (i.e. files that have not yet been processed)

Code: Select all

find . -type f -links 1

e.g. find files with 2 or more links (i.e. that have been processed already)

Code: Select all

find . -type f -links +1

You can use find to select the files you wanna process, then add the -exec option to call filebot with the selected files passed in as arguments.

e.g. this command will select and process the mp4 file the first time around, but then the second time around filebot isn't even called because the file already has 2 links now:

Code: Select all

$ find . -type f -iname '*.mp4' -links 1 -exec filebot -rename --action HARDLINK --log INFO -non-strict {} +
[HARDLINK] from [Alias.1x01.mp4] to [Alias - 1x01 - Truth Be Told.mp4]
$ find . -type f -iname '*.mp4' -links 1 -exec filebot -rename --action HARDLINK --log INFO -non-strict {} +
$

edifus · Post by **edifus** » 11 Aug 2020, 18:55

Right, I was thinking it would go that direction using the shell to identify files with hardlinks. Still preferred to run Filebot hundreds of times (in rare circumstances) over one run to process a lot of new files? Not trying to be cheeky just trying to get a better understanding if I should change my workflow. Thinking it through, using find to locate only files without any hard links would let me streamline my workflow a bit.. definitely gives me something to consider.

Post by **rednoah** » 12 Aug 2020, 01:49

find -exec + will find all files, and then call filebot once with many arguments. This is generally recommended.

You can do find -exec \; to spawn a new filebot process for each file. This is generally not recommended because your CPU will be spending most of the time initializing the runtime, but it can work. This way you get an exit code for each file, so you can check what worked and what failed file by file. It's going to be much slower though. In some corner cases, it might work slightly differently, for better or worse.

EDIT:

FileBot r7898 adds File.linkCount so --file-filter "f.linkCount == 1" is now an option. find -links will always be much faster though.

pandiloko · Post by **pandiloko** » 22 Sep 2022, 10:33

Sorry for necrobumping but I thought it would be better to just keep this info in the same discussion. It is not the very same problem but IMHO very closely related.

I'm currently dealing with this problem. The way I went about this was to make a copy of the exclude file just before calling filebot and depending on the exit code I restore the exclude file or not. This is still has the problem of becoming an endless loop but I think we can make a distinction between unrecoverable errors or errors which need user intervention (folder not found, permissions, search failed, etc) and auto-recoverable errors like in my case where the internet was temporarily down and filebot showed:

Code: Select all

Fetch failed: Try again in 2 seconds (2 more) => connect timed out: Unable to connect to api.themoviedb.org at this time. Please try again later.
Fetch failed: Try again in 8 seconds (1 more) => connect timed out: Unable to connect to api.themoviedb.org at this time. Please try again later.
Failed to fetch resource: Network Error: connect timed out: Unable to connect to api.themoviedb.org at this time. Please try again later.
Network Error: connect timed out: Unable to connect to api.themoviedb.org at this time. Please try again later.
Finished without processing any files
Failure (×_×)⌒☆

Unfortunately, I didn't catch the exit code but now I'm logging the exit codes and my plan is to only restore the copy of exclude if the exit code corresponds to a recoverable error.

PS: I use filebot as part of a big script which also does other things in a big endless while loop until Ctrl-C

Post by **rednoah** » 22 Sep 2022, 12:48

Failure (×_×)⌒☆ means Exit Code 3 though that could be for all kinds of reasons. You'll probably need to grep the console output for specific keywords if you want to be smart about doing things / not doing things depending on what FileBot did or did not do.

pandiloko · Post by **pandiloko** » 22 Sep 2022, 15:19

You're right. As described by you in an older thread:

viewtopic.php?p=43137#p43137

3 includes network problems. I guess I'm better off parsing the output for "Network Error" to be sure. Thanks!!

Post by **rednoah** » 23 Sep 2022, 17:25

Note that you can configure the internal retry behaviour if you think that network errors resolve themselves over time in your environment:
viewtopic.php?t=12941

e.g. you can write a custom configuration file:

Code: Select all

filebot -script fn:properties --def net.filebot.CachedResource.retryLimit=2 net.filebot.CachedResource.retryDelay=PT2S net.filebot.CachedResource.retryMultiplier=4

The default behaviour is try 1st request, wait 2s, try 2nd request, wait 8s, try 3rd request based on retryLimit=2 retryDelay=PT2S retryMultiplier=4. Just setting retryLimit=8 will give you the last retry at 36 hours.

Post by **rednoah** » 27 Dec 2022, 12:17

Re-organize previously organized files using local xattr metadata

Filebot excluding files that have not been processed

Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed

Re: Filebot excluding files that have not been processed