Condition for applying duplicate index pattern

Any questions? Need some help?
Post Reply
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Condition for applying duplicate index pattern

Post by nls »

Filebot's duplicate index feature is very useful for some files where a single episode spans multiple files. I can use eg.

Code: Select all

--format "/rootpath/{plex.tail} - part{di}"
to comply with Plex, but I only want to add the " - part{di}" segment when there is actually a duplicate in the renamed file set. How could I achieve that? There is no such thing currently as {dn} for "number of duplicates found" or something to build a condition with.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

You use {model} to access all bindings values for all matches.

e.g. count the number of occurrences of the current episode in the entire match model:

Code: Select all

model.episode.count(episode)
So you can check that and add {di} at the end if the number of occurrences is greater than one:

Code: Select all

model.episode.count(episode) > 1 ? " - part$di" : null
So here's the final copy & paste format expression:

Code: Select all

{plex.tail.derive{model.episode.count(episode) > 1 ? " - part$di" : null}}

EDIT:

Nevermind. New versions do have the {dc} duplicate count binding built-in, so this will work:

Code: Select all

{plex.tail.derive{dc > 1 ? " - part$di" : null}}
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

Thanks, the last expression is working perfectly!
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

Well, I rushed that statement a bit. There are 2 problems with duplicate detection:
  • It groups different file types for duplicate detection, so filename.mkv and filename.srt are considered a duplicates, while they're obviously not.
  • It detects duplicates on a global level, ignoring lower level data: series episodes in the same season are obvious candidates, while ones across different seasons/series are not. The issue is probably that duplicate detection data is not reset between such groups, so I end up with file names for all files in a large set such as these (note the " - part" at the end):

Code: Select all

xxx/Season 02/xxx - S02E07 - Deutschland 93 - part.mkv
xxx/Season 02/xxx - S02E07 - Deutschland 93 - part.eng.srt
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

Sorry, don't really have a fix for that. You might be able to implement all the grouping and filtering you need via the {model} binding and some code though.
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

One possible solution would be to introduce the concept of object hierarchies, based on scraped file metadata. Like how tv/series_name/season_no/file_type/episode forms a 5-level hierarchy. For handling duplicates, it would be sufficient to handle tv+series_name+season_no+file_type as a grouping parameter for matching episode objects. If we call this eg. group_id, only group_id+episode could ever form a duplicate. It would require a bigger code overhaul than it's probably worth, though. It's just an idea.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

1.
You can do a per-file extension duplicate index like so:

Code: Select all

{
	plex.tail.derive{
		def x = model.count{ [it.ext, it.episode] == [ext, episode] }
		return x > 1 ? " - part$x" : null
	}
}
:idea: You can modify this code to count by any grouping.

:idea: Note the FileBot doesn't just "name subtitle file exact as corresponding movie file" for flexibility in naming subtitles, so it'll always rely on the format just so happening to yield the same base name for both video file and corresponding subtitle files.



2.
nls wrote: 23 Dec 2018, 18:05
  • It detects duplicates on a global level, ignoring lower level data: series episodes in the same season are obvious candidates, while ones across different seasons/series are not. The issue is probably that duplicate detection data is not reset between such groups, so I end up with file names for all files in a large set such as these (note the " - part" at the end):
I don't really understand this part.
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

I might not have been accurate. I wanted to say that once a duplicate is detected in a file set, possibly spanning hundreds of series or seasons within them, dc never gets reset, regardless of series or season. That's why I had file names attached the " - part" segment for ALL files, not just the few dups.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

rednoah wrote: 23 Dec 2018, 20:28 1.
You can do a per-file extension duplicate index like so:

Code: Select all

{
	plex.tail.derive{
		def x = model.count{ [it.ext, it.episode] == [ext, episode] }
		return x > 1 ? " - part$x" : null
	}
}
Thanks, I'll try this tomorrow.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

It's not possible for this code to to just yield " - part" without the Integer number at the end:

Code: Select all

dc > 1 ? " - part$di" : null
If your rename model is full of Episode objects then {dc} would find duplicates based on TheTVDB Episode ID which is globally unique across all TV shows / seasons / etc. Video and Subtitle file pairs are matched to the same Episode object, so they're duplicates, which explains the subtitle interference issue. But different episodes of the same TV show wouldn't be duplicates.

:idea: Maybe a set of examples and screenshots so I can reproduce the behaviour you describe would be helpful.
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

I had that " - part" attached because some weird shell mishap involving sudo. Sorry, that issue does not exist, it was just my own mistake. The duplicate detection where filename.mkv and filename.srt is flagged as a dup still stands, I'll try to handle that using your suggestions above.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

Oh! The shell interpreted $di as shell variable, which was empty, so that part ended up missing from the format value that was actually passed in. No worries. Happens to the best of us. Everyone stumbles over this one eventually. :lol:
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

No, I escaped it properly... If I use "sudo -iu long_command", it breaks, if I use "sudo -u long_command", it works, the only difference being that -i option for sudo. I don't want to get to the bottom of it, since it works with one way and it isn't a filebot issue. (It must have to do something with how -i passes the command line to the user's shell via the -c parameter of the shell.)

The complex scheme with model.count doesn't seem to work properly, but I'll run a series of tests before being able to give more details.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

While testing the model.count solution, I run into an issue when trying to extend the group condition. I need to extend it with the language, because .srt files of different languages for the same episode are detected as duplicates, eg:

Code: Select all

filename.eng.srt -> seriesname - SXEXX - title - part1.eng.srt
filename.ger.srt -> seriesname - SXEXX - title - part2.ger.srt
when it should be:

Code: Select all

filename.eng.srt -> seriesname - SXEXX - title.eng.srt
filename.ger.srt -> seriesname - SXEXX - title.ger.srt
So I tried this:

Code: Select all

		def x = model.count{ [it.ext, it.episode, it.lang] == [ext, episode, lang] }
But it disables duplicate detection. I'm possibly using the wrong expression, or language is not detected from the .srt filenames as I think. Language detection is not documented very well.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

{lang} has no value for non-subtitle files and errors out with Binding "lang": undefined which means model.lang will never work because it's gonna be undefined for at least some non-subtitle files.

You could try this:

Code: Select all

{
	plex.tail.derive{
		def key = { m -> [m.ext, m.episode, any{m.lang.ISO2}{null}] }
		def keyEquals = { m -> key(m) == key(self) }
		def dc = model.count(keyEquals)
		def di = model.findIndexOf(keyEquals)
		return dc > 1 ? " - part$di" : null
	}
}
We use any{...}{...} to catch any errors that {lang} might throw and default to null instead. Since Language.equals() doesn't work for us, we'll have to use the language code value for comparison.

There's probably lots of corner cases where this code doesn't do what we want depending on the layout of your files, how many subtitles per language, missing subtitles for some files, etc.

Maybe it'll work, but I suspect it'll require many more iterations. Your particular use case just isn't really supported, and format code seems to be unexpectedly tricky for this one.
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

I ended up using this formula:

Code: Select all

{
        plex.tail.derive{
                def x = model.count{ [it.ext, it.episode, it.lang] == [ext, episode, any{m.lang.ISO2}{null}] }
                return x > 1 ? \" - part\$di\" : null
        }
}
It's working nicely for individual directories with a season worth of files, but recursively scanning about a thousand files ran so long that I stopped it. It was using 100% CPU for about 4-5 minutes without producing output. I think it calculates the count for every single file and it's not working out very well. If I pass directories as parameters, will it create a large file set first from all the directories, or will it handle them one-by-one internally? I could also pass one directory or a few at a time using xargs. Filebot startup time is not exactly fast, but that should work.

But anyway, having multi-file episodes and then having external subtitle files for them is not so much an edge case. For instance Family Guy has a double episode in each season. This scenario should be supported, I think.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

1.
{model} doesn't scale well at all, because you'll be accessing accessing bindings for all items for every single item, so format time increases quadratically and not linear. Optimizations are possible but ultimately algorithmic complexity is against you.

However, if you're using the command-line, just processing files folder by folder, will largely resolve the issue. If you're currently using -rename -r then using -script fn:renall will probably resolve the issue, since this particular script processes media files folder by folder.


2.
Please call tree on your Family Guy folder and post the output so I can see the folder structure and filenames so I can maybe figure out a solution.

:idea: Having multiple files per one episode seems to be a very rare corner case though that's indeed not (well) supported because I have no automated tests for this use case. It's come up before, but only like once or twice every few years. Either way, I'll need some example files to look into it.
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

Sorry, I was mistaken, FG has specials almost each season, not double eps. Season 8 has the problematic double ep. I use a multi-pass approach:

1. scrape into xattrs
2. symlink to temp dir (using --db xattr)
3. convert relative symlinks to absolute (there really should be a switch to make them absolute to begin with)
4. rsync symlinks to target dir for Plex, etc.

This is all to be able to keep the original folder/file structure intact and to update the target structure with new objects only. It's working pretty well, barring these few cases which would pose some problem anyway. Obviously I only run 2-4 for the entire set of files. This usage pattern might be peculiar but I have my reasons to do this.

fn:renall sounds good, I'll try it, thanks.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

Well, -script fn:renall doesn't seem to process any files for me. My command line is:

Code: Select all

filebot --encoding UTF-8 -rename --db xattr --action symlink -non-strict --file-filter "f.xattr['net.filebot.metadata'] != null" --format "/targetdir/
{
        plex.tail.derive{
                def x = model.count{ [it.ext, it.episode, it.lang] == [ext, episode, any{m.lang.ISO2}{null}] }
                return x > 1 ? \" - part\$di\" : null
        }
}
" -r -script fn:renall "path1" "path2" ...
Without -script fn:renall it works on single dirs, but not on the whole set, as mentioned before.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

1.
I see. The renall script expects one or more root folders as input argument, and then traverse the folder hierarchy folder by folder. However, by using -r, FileBot just ends up passing a punch of files onto the script, which the script subsequently ignores, since it doesn't have any folders to work with in it's input.

The --file-filter was introduced recently, and may or may not be incompatible with the renall script. Since you can't use -r I reckon that --file-filter won't work either (and if you do use it like you do now, you might just end up excluding the input folder right away). The good news here is that your --file-filter doesn't do anything, and can be removed from the command, since --db xattr ignores non-xattr tagged files by default (unless -non-strict is set).

:idea: Note that the renall script process "media folders, folder by folder" (see source) meaning that it'll ignore folders that don't contain at least one media file.


2.
As a simple workaround for creating symlinks (or doing anything else really) you can use your own program or shell script as "rename action" like so:

Code: Select all

--action /path/to/plex-link.sh

Code: Select all

#!/bin/sh -xu
ln -l "$1" "$2"
:arrow: viewtopic.php?f=4&t=4915
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

The good news here is that your --file-filter doesn't do anything, and can be removed from the command, since --db xattr ignores non-xattr tagged files by default (unless -non-strict is set).
-non-strict is a leftover from copy-pasting my 1st pass command line. It might not do anything for the renaming pass generally, but as a coincidence it makes my filter work as intended. But I can change the command line by removing both options and according to your post it should just rename files having proper xattr attributes.

So, I'll run the 2nd pass according to your hints and report back later. I was already thinking of coding a small script to do the 2nd pass renaming but I guess this is still manageable using filebot. This problem might be a corner case but I absolutely want to avoid manual renaming even for a few files and want to make an idempotent, repeatable process that needs no human intervention.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Condition for applying duplicate index pattern

Post by rednoah »

nls wrote: 26 Dec 2018, 11:15 So, I'll run the 2nd pass according to your hints and report back later. I was already thinking of coding a small script to do the 2nd pass renaming but I guess this is still manageable using filebot. This problem might be a corner case but I absolutely want to avoid manual renaming even for a few files and want to make an idempotent, repeatable process that needs no human intervention.
Very understandable.

At this point, the FileBot / Groovy version really just boils down to this, so there's plenty of room for additional lines of code and customization:

Code: Select all

#!/usr/bin/env filebot -script

args.eachMediaFolder{
	rename(folder: it)
}
:idea: Please read the FAQ and How to Request Help.
nls
Posts: 45
Joined: 19 Aug 2018, 21:07

Re: Condition for applying duplicate index pattern

Post by nls »

OK, finally it looks like it works perfectly. The final command line is:

Code: Select all

sudo -u media filebot --encoding UTF-8 -rename --db xattr --action symlink --format "/target_root/
{
        plex.tail.derive{
                def x = model.count{ [it.ext, it.episode, it.lang] == [ext, episode, any{m.lang.ISO2}{null}] }
                return x > 1 ? \" - part\$di\" : null
        }
}
" -script fn:renall "source_root"
It renames everything using the Plex ruleset and also properly tags filenames with " - partX" where appropriate. Thanks for all the help!
Post Reply