Increase Matching Accuracy and Skip Poor Matches?

ember1205 · Post by **ember1205** » 28 May 2017, 16:26

Using filebot under Linux and it works very well when it's able to correctly match. I'm finding about half of the movies are matching incorrectly, even when the folder name from the rip contains a "perfect match" for the name. For example, "Disturbia (2007)" ripped to folder DISTURBIA_WS_20170528_113217 (the date and time info in the folder name comes from the date and time that the rip occurred). When filebot swept through to attempt to rename it, it came back with "The Way (2017)" which is a movie in Korean. Huh?

I'm using TMDb for matching and outputting in a Plex compatible format.

Why did filebot choose a movie with a completely different name? I'm doing these renames in bulk, so adding additional criteria to pare down the matches to a better list is not possible. I don't even HAVE the additional information (like the year of release), so I can't add it.

Is there something I can do to increase match accuracy and skip any that don't match at least a certain level?

Post by **rednoah** » 28 May 2017, 16:59

1.
You can use Match Mode: Strict but that'll probably just result in most of your files being ignored because the can't be matched reliably:
viewtopic.php?f=3&t=4695

2.
If you strip away all the misleading information, then it might work better. Depending on how your files are currently named and organized, preprocessing files may be more or less easy.

e.g. not so great:

Code: Select all

DISTURBIA_WS_20170528_113217

e.g. maybe a bit better:

Code: Select all

DISTURBIA

@see viewtopic.php?f=3&t=3228

ember1205 · Post by **ember1205** » 28 May 2017, 17:05

Honestly, being ignored is better since incorrect naming means that I not only have to figure out what the name USED to be, but then I have to rename everything and then "Fix" the matches in Plex. It's more time-consuming to correct mistakes than to do the work manually in the first place.

Maybe I'll look at stripping the date/time information from the folder names and then running filebot against it to see what happens. And then I'll add the strict mode to see how much it helps.

Post by **rednoah** » 28 May 2017, 17:16

1.
Are you using the GUI? You should always double check the matches, especially if you already know that files are badly named and thus prone to mismatches.

2.
I'd just create a Plain File Preset for mass-preprocessing all the files. Presumably they're all in the same format, so it's quite easy, and once you have a Preset it'll just take a few seconds in the future.

e.g. DISTURBIA_WS_20170528_113217 ➔ DISTURBIA

Code: Select all

n.before(/_WS|_[0-9]{8}/)

ember1205 · Post by **ember1205** » 29 May 2017, 02:24

I'm trying to automate the process as much as I can, so everything is headless. No GUI, all command-line driven and called from cron.

Post by **rednoah** » 29 May 2017, 04:17

I wouldn't recommend unattended automation if your files don't contain name + year, because it'll never work reliably without the year to narrow down the options.

Depending on how your files are organized, you could run this command beforehand to clean up the names a little bit:

Code: Select all

filebot -rename --db xattr -non-strict -r /input --output /staging --format {fn.before(/_WS|_[0-9]{8}/)}

ember1205 · Post by **ember1205** » 29 May 2017, 14:37

What, exactly, does that command do? What is the xattr database and why does one use it?

Why do I need the year if the title is an exact match?

Here are some more mistakes:

Code: Select all

I_AM_LEGEND_20170528_091420/title00.mkv -> King Arthur - Legend of the Sword (2017).mkv
THE_MUMMY_TOMB_OF_THE_DRAGON_20170528_092945/title00.mkv -> The Mummy (1999)/The Mummy (1999).mkv
BOBBY_20170528_104951/title00.mkv -> Tina and Bobby (2017).mkv
THE_QUEEN________________________20170528_110755/title00.mkv -> The Concubine (2012).mkv
DISTURBIA_WS_20170528_113217/title00.mkv -> The Way (2017).mkv

Post by **rednoah** » 29 May 2017, 15:03

1.
The movie name is often not as unique as you might think, and the name you have may not always exactly match the name in the database as well you might think.

2.
xattr uses xattr metadata for processing and the -non-strict option allows you to process plain files that haven't been xattr tagged, so that you can fix the filename.

tl;dr xattr is for processing files offline

3.
So "DISTURBIA_WS_20170528_113217" is the folder name? And "title00" is the filename? This is the kind of information that really should have been in the very first post.

What does this command do?

Code: Select all

$ filebot -rename --db xattr -non-strict -r DISTURBIA_WS_20170528_113217 --output .  --format '{folder.name.before(/_WS|_[0-9]{8}/)}'
Rename files using [Extended Attributes]
[MOVE] From [DISTURBIA_WS_20170528_113217/title00.mp4] to [DISTURBIA.mp4]

Why does one use it?

Code: Select all

$ filebot -rename DISTURBIA.mp4 --db TheMovieDB
Rename movies using [TheMovieDB]
Auto-detect movie from context: [DISTURBIA.mp4]
[MOVE] From [DISTURBIA.mp4] to [Disturbia (2007).mp4]

ember1205 · Post by **ember1205** » 29 May 2017, 15:23

rednoah wrote:1.
The movie name is often not as unique as you might think, and the name you have may not always exactly match the name in the database as well you might think.

I understand... I've found a number of movies that have the same titles, so this isn't terribly surprising. My question wasn't about name-uniqueness, though, it was about name accuracy. How does "I Am Legend" become "Legend of the Sword"? Not. Even. Close.

rednoah wrote: 2.
xattr uses xattr metadata for processing and the -non-strict option allows you to process plain files that haven't been xattr tagged, so that you can fix the filename.

tl;dr xattr is for processing files offline

Is this a static database provided by filebot? Curious as to how this helps with renaming.

rednoah wrote:3.
So "DISTURBIA_WS_20170528_113217" is the folder name? And "title00" is the filename? This is the kind of information that really should have been in the very first post.

Half of it was there.

I did specify that the movies rip to folders with those names, but I didn't specify that the movie filenames were "title00.mkv" or similar. If it helps, I'm using makemkvcon (linux) to rip the content from the discs.

rednoah wrote:What does this command do?

Code: Select all

$ filebot -rename --db xattr -non-strict -r DISTURBIA_WS_20170528_113217 --output .  --format '{folder.name.before(/_WS|_[0-9]{8}/)}'
Rename files using [Extended Attributes]
[MOVE] From [DISTURBIA_WS_20170528_113217/title00.mp4] to [DISTURBIA.mp4]

Why does one use it?

Code: Select all

$ filebot -rename DISTURBIA.mp4 --db TheMovieDB
Rename movies using [TheMovieDB]
Auto-detect movie from context: [DISTURBIA.mp4]
[MOVE] From [DISTURBIA.mp4] to [Disturbia (2007).mp4]

Maybe I'll give this a shot to see if it at least helps. I don't expect perfection, but I'm missing at a rate of somewhere between 2:1 and 3:1. That's pretty bad.

Post by **rednoah** » 29 May 2017, 15:58

1.
アバター has nothing in common with Avatar (2009) and yet it'd be a correct match. In this case, I am Legend somehow isn't an option in the first place, so FileBot will end up with the next best option. Why is I am Legend not an option in the first place? Nobody knows. You'd have to debug the code for a few hours to find out exactly why in doesn't work in this particular case.

2.
Metadata and Extended Attributes: viewtopic.php?t=324

ember1205 · Post by **ember1205** » 29 May 2017, 16:06

So, the next 10 movies to go through matched at a rate of about 3:1 instead of missing at a rate of 3:1 by using your suggested xattr rename FIRST and then using my existing rename after it. Much better. I'll keep tinkering to see if I can raise the percentages even more.

Thanks for the tips.

DHoarder · Post by **DHoarder** » 30 May 2017, 10:42

I've also found filebot to be less accurate recently specially for documentaries. Here are logs for two documentaries which were identified as movies.

Code: Select all

Run script [fn:amc] at [Sun May 28 21:54:07 IST 2017]
Parameter: pushbullet = *****
Parameter: subtitles = en
Parameter: clean = y
Parameter: movieFormat = //mnt/MyMedia/{info.genres.contains('Documentary') ? 'Documentaries': info.SpokenLanguages =~ /hi/  ? 'Hindi' : 'Movies'}/{Collection.replaceAll(/Saga Collection/, Saga).replaceAll(/[`\ï¿½\ï¿½\ï¿½\?""\ï¿½\ï¿½]/, "'").replaceAll(/[:|]/, " - ").replaceAll(/[?]/, "!").replaceAll(/[*\s]+/, " ")}\{norm = {it.upperInitial().lowerTrail().replaceTrailingBrackets().replaceAll(/[`\ï¿½\ï¿½\ï¿½\?""\ï¿½\ï¿½]/, "'").replaceAll(/[:|]/, " - ").replaceAll(/[?]/, "!").replaceAll(/[*\s]+/, " ").replaceAll(/\b[IiVvXx]+\b/, { it.upper() }).replaceAll(/\b[0-9](?i:th|nd|rd)\b/, { it.lower() }).replaceFirst(/^(?i)(The)\s(.+)/, /$2, $1/)}; norm(n)}{if (!norm(n).equals(norm(primaryTitle))) ' ('+norm(primaryTitle)+')'}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""} ({y})/{norm(n)}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""}{'.' + fn.matchAll(/extended|uncensored|remastered|unrated|uncut|directors.cut|special.edition|mind.bending.edition/)*.upperInitial()*.lowerTrail().sort().join(', ').replaceAll(/[._]/, " ")}{" Part $pi"}{".$y"}{".$vf"}{".$vc"}{".$source"}{".$group"}
Parameter: seriesFormat = //mnt/MyMedia/{info.genres.contains('Documentary') ? 'Documentaries': 'TV Shows'}/{norm = {it.upperInitial().lowerTrail().replaceTrailingBrackets().replaceAll(/[`\ï¿½\ï¿½\ï¿½\?""\ï¿½\ï¿½]/, "'").replaceAll(/[:|]/, " - ").replaceAll(/[?]/, "!").replaceAll(/[*\s]+/, " ").replaceAll(/\b[IiVvXx]+\b/, { it.upper() }).replaceAll(/\b[0-9](?i:th|nd|rd)\b/, { it.lower() }).replaceFirst(/^(?i)(The)\s(.+)/, /$2, $1/)}; norm(n)}{(!norm(n).equals(norm(primaryTitle))) ' ('+norm(primaryTitle)+')'}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""} ({y})/{episode.special ? 'Special' : 'Season '+s.pad(2)}/{norm(n)} {episode.special ? 'S00E'+special.pad(2) : s00e00} {norm(t)}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""}{'.' + fn.matchAll(/extended|uncensored|remastered|unrated|special[ ._-]edition/)*.upperInitial()*.lowerTrail().sort().join(', ').replaceAll(/[.]/, " ")}{".$y"}{".$vf"}{".$vc"}{".$source"}{".$group"}
Argument[0]: /home/shashank/Abandoned.Engineering.Series.1.1of3.Silent.Cities.720p.HDTV.x264.AAC.MVGroup.org.mkv
Input: /home/shashank/Abandoned.Engineering.Series.1.1of3.Silent.Cities.720p.HDTV.x264.AAC.MVGroup.org.mkv
Abandoned.Engineering.Series.1.1of3.Silent.Cities.720p.HDTV.x264.AAC.MVGroup.org.mkv [series: Abandoned, movie: Angels Crest (2011)]
Exclude Series: Abandoned
Group: [tvs:null, mov:angels crest 2011] => [Abandoned.Engineering.Series.1.1of3.Silent.Cities.720p.HDTV.x264.AAC.MVGroup.org.mkv]
Get [English] subtitles for 1 files
CmdlineException: OpenSubtitles: Please enter your login details by calling `filebot -script fn:configure`
Rename movies using [TheMovieDB]
Auto-detect movie from context: [/home/shashank/Abandoned.Engineering.Series.1.1of3.Silent.Cities.720p.HDTV.x264.AAC.MVGroup.org.mkv]
[TEST] From [/home/shashank/Abandoned.Engineering.Series.1.1of3.Silent.Cities.720p.HDTV.x264.AAC.MVGroup.org.mkv] to [/mnt/MyMedia/Movies/Angels Crest (2011)/Angels Crest.2011.HDTV.mkv]
Processed 1 files
Done ãƒ¾(ï¼ âŒ’ãƒ¼âŒ’ï¼ )ãƒŽ

Code: Select all

Run script [fn:amc] at [Sun May 28 22:10:37 IST 2017]
Parameter: pushbullet = *****
Parameter: subtitles = en
Parameter: clean = y
Parameter: movieFormat = //mnt/MyMedia/{info.genres.contains('Documentary') ? 'Documentaries': info.SpokenLanguages =~ /hi/  ? 'Hindi' : 'Movies'}/{Collection.replaceAll(/Saga Collection/, Saga).replaceAll(/[`\ï¿½\ï¿½\ï¿½\?""\ï¿½\ï¿½]/, "'").replaceAll(/[:|]/, " - ").replaceAll(/[?]/, "!").replaceAll(/[*\s]+/, " ")}\{norm = {it.upperInitial().lowerTrail().replaceTrailingBrackets().replaceAll(/[`\ï¿½\ï¿½\ï¿½\?""\ï¿½\ï¿½]/, "'").replaceAll(/[:|]/, " - ").replaceAll(/[?]/, "!").replaceAll(/[*\s]+/, " ").replaceAll(/\b[IiVvXx]+\b/, { it.upper() }).replaceAll(/\b[0-9](?i:th|nd|rd)\b/, { it.lower() }).replaceFirst(/^(?i)(The)\s(.+)/, /$2, $1/)}; norm(n)}{if (!norm(n).equals(norm(primaryTitle))) ' ('+norm(primaryTitle)+')'}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""} ({y})/{norm(n)}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""}{'.' + fn.matchAll(/extended|uncensored|remastered|unrated|uncut|directors.cut|special.edition|mind.bending.edition/)*.upperInitial()*.lowerTrail().sort().join(', ').replaceAll(/[._]/, " ")}{" Part $pi"}{".$y"}{".$vf"}{".$vc"}{".$source"}{".$group"}
Parameter: seriesFormat = //mnt/MyMedia/{info.genres.contains('Documentary') ? 'Documentaries': 'TV Shows'}/{norm = {it.upperInitial().lowerTrail().replaceTrailingBrackets().replaceAll(/[`\ï¿½\ï¿½\ï¿½\?""\ï¿½\ï¿½]/, "'").replaceAll(/[:|]/, " - ").replaceAll(/[?]/, "!").replaceAll(/[*\s]+/, " ").replaceAll(/\b[IiVvXx]+\b/, { it.upper() }).replaceAll(/\b[0-9](?i:th|nd|rd)\b/, { it.lower() }).replaceFirst(/^(?i)(The)\s(.+)/, /$2, $1/)}; norm(n)}{(!norm(n).equals(norm(primaryTitle))) ' ('+norm(primaryTitle)+')'}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""} ({y})/{episode.special ? 'Special' : 'Season '+s.pad(2)}/{norm(n)} {episode.special ? 'S00E'+special.pad(2) : s00e00} {norm(t)}{fn.contains('3D') || fn.contains('3-D') ? ' '+'3D':""}{'.' + fn.matchAll(/extended|uncensored|remastered|unrated|special[ ._-]edition/)*.upperInitial()*.lowerTrail().sort().join(', ').replaceAll(/[.]/, " ")}{".$y"}{".$vf"}{".$vc"}{".$source"}{".$group"}
Argument[0]: /media/shashank/22E60E2037E23638/PBS.Nova.Why.Trains.Crash.720p.HDTV.x264.AAC.MVGroup.org.mp4
Input: /media/shashank/22E60E2037E23638/PBS.Nova.Why.Trains.Crash.720p.HDTV.x264.AAC.MVGroup.org.mp4
PBS.Nova.Why.Trains.Crash.720p.HDTV.x264.AAC.MVGroup.org.mp4 [series: NOVA, movie: Why (1971)]
Exclude Series: NOVA
Group: [tvs:null, mov:why 1971] => [PBS.Nova.Why.Trains.Crash.720p.HDTV.x264.AAC.MVGroup.org.mp4]
Get [English] subtitles for 1 files
CmdlineException: OpenSubtitles: Please enter your login details by calling `filebot -script fn:configure`
Rename movies using [TheMovieDB]
Auto-detect movie from context: [/media/shashank/22E60E2037E23638/PBS.Nova.Why.Trains.Crash.720p.HDTV.x264.AAC.MVGroup.org.mp4]
[TEST] From [/media/shashank/22E60E2037E23638/PBS.Nova.Why.Trains.Crash.720p.HDTV.x264.AAC.MVGroup.org.mp4] to [/mnt/MyMedia/Movies/Why (Detenuto In Attesa Di Giudizio) (1971)/Why.1971.HDTV.mp4]
Processed 1 files
Done ãƒ¾(ï¼ âŒ’ãƒ¼âŒ’ï¼ )ãƒŽ

Post by **rednoah** » 30 May 2017, 10:48

What would be the correct match for each of these files? Please include TheTVDB / TheMovieDB links.

DHoarder · Post by **DHoarder** » 30 May 2017, 10:54

rednoah wrote:What would be the correct match for each of these files? Please include TheTVDB / TheMovieDB links.

Here we go:

doc1: https://thetvdb.com/?tab=season&seriesi ... 2287&lid=7

doc2: http://thetvdb.com/?tab=episode&seriesi ... 1762&lid=7

Post by **rednoah** » 30 May 2017, 12:47

These files don't quite look like episode files. You'll need to force TV mode for these kinds of files.

Please read Advanced Fine-Tuning for details: viewtopic.php?f=4&t=215

Increase Matching Accuracy and Skip Poor Matches?

Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?

Re: Increase Matching Accuracy and Skip Poor Matches?