Filebot, TheMovieDB, etc. don't like colons

All your suggestions, requests and ideas for future development
Post Reply
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

I, like many, have set FileBot to replace colon characters : with one of the replacements ꞉
But if i drop a few movies that contain these colon replacements into FileBot and get it to 'Fetch & Match data' from TheMovieDB, to check for any misnamed, updated names, or year changes, then it mostly pops up with the 'Failed to identify some of the following files:'. And it's almost always files with the colon in the filename, and it often lists wildly wrong possible movies, and often not the actual movie.
If i replace these colon replacement characters with hyphens, the 'normal' replacement character, then all these movies are ID'd perfectly.

Could FileBot just strip out this 'problem' character(s) out of the search string it sends to TheMovieDB, etc?
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

:?: Since you presumably have sample file paths at hand, can I ask you to paste sample file paths as text so that I can run tests?

I notably cannot reproduce the problem with the first test case I came up with:

Code: Select all

Avatar꞉ The Way of Water.mkv


:?: If you have already matched the files, then xattr metadata should take care of identifying the file, not the file name. Does xattr metadata not work on your target file system? See Re-process previously organised files using local xattr metadata for details and examples.



:idea: If you're using the CLI, then the --q option allows you to extract the query from the file path with your own code. A ꞉ ratio to : colon replacement could be done there.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

Here's a few:

Code: Select all

Halloween꞉ Resurrection (2002) (720p)
Highlander꞉ Endgame (2000) (720p)
Hellraiser꞉ Revelations (2011) (720p)
Hellraiser꞉ Hellworld (2005) (720p)
Highlander꞉ The Source (2007) (720p)
How to Train Your Dragon꞉ Homecoming (2019) (720p)
For these, i tried renaming them, replacing the colon with a hyphen, and they were all ID'd straight away (and then FileBot renamed them back, lol!)

Cheers.
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

rednoah wrote: 05 Feb 2025, 03:16 :?: If you have already matched the files, then xattr metadata should take care of identifying the file, not the file name. Does xattr metadata not work on your target file system? See Re-process previously organised files using local xattr metadata for details and examples.
:?: So the second question is about xattr metadata. I ask this question because its not working in your case and would be an alternate solution / the reason why other users may not run into the same issue.





EDIT:

:idea: An additional solution would be adding an *.nfo file with movie ID to each movie folder (assuming that you are using movie folder) or adding the movie ID into the movie folder name / movie file name as seen in the How do I organize files for Plex? guides.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

Yeah, i see that FileBot adds the xattr metadata.
I can't remember if i had cleared them all off the hdd while ago, as it used to pop up with the annoying popup saying about losing attributes when copying them to exfat/fat32 usb sticks... And i do remember setting '-no-xattr' for FileBot at the time! lol.

RE: .nfos... Nah, i just have a clean folder full of JUST the movies, nothing else. And yes, i know i lose out on stuff... I use EMBY, and if i clean install and forget to backup my emby folders, i have to start from scratch.

But...

I was thinking that if FileBot could simply strip out 'troublesome' characters that confuse TheMovieDB, et al. Then the issue will be solved for everyone, not just me.
For me, i'm just doing some random occasional housework on my movie collection, which caused me to notice it.

Does FileBot 'prepare' the filenames already, before submitting them to the dbs?
I was guessing that stripping out characters, or even replacing them, wouldn't be an uncommon/abnormal thing when dealing with dbs like these?

Similar with Emby's dificulties in IDing movies with only the filenames... It seems smarter to just leverage software to fix it once, and then it's fixed for everyone, forever.
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

topbanana wrote: 05 Feb 2025, 07:40 I was thinking that if FileBot could simply strip out 'troublesome' characters that confuse TheMovieDB, et al. Then the issue will be solved for everyone, not just me.
While this is doable, the issue at hand seems to be quite isolated to your fairly unique combination of circumstances, using : ratio characters itself is rare and very much not recommended, and it's only a problem if you don't have xattr, and no nfo, and no ID markers, any of which would have compensated for bad naming. "Avatar꞉ The Way of Water" works despite everything for some reason.

topbanana wrote: 05 Feb 2025, 07:40 I was guessing that stripping out characters, or even replacing them, wouldn't be an uncommon/abnormal thing when dealing with dbs like these?
FileBot does strip certain patterns. Assuming that there's no movie that actually uses : ratio as part of the movie name, we could strip : ratio as well. But your : ratio : colon replacement does not exist in the wild, so I'm somewhat reluctant to add custom code for everyone what only benefits users that generate badly named files on purpose. That said, if the issue comes up more often from a variety of users over time, that would change the calculus.

topbanana wrote: 05 Feb 2025, 07:40 Similar with Emby's dificulties in IDing movies with only the filenames...
If you are using Emby, then you really really must name files correctly, with correct file / folder names, movie folders, NFO files, ID markers, etc, even if that's not your first preference. The {emby.id} binding makes it easy. You can then always generate a secondary structure using hardlinks for your own viewing from the primary well-named structure. That requires little time and no additional disk space, so you can have files the way Emby wants and have files the way you want.




:arrow: I'd use Plain File Mode to strip all the : ratio characters from all the file names:

Format: Select all

{ fn.replace('꞉', '') }
and then use Movie Mode in a second step to name / organise everything correctly with the {emby.id} binding once and for all:

Format: Select all

{ emby.id }


:idea: I have also investigated the sample movies above. "Avatar꞉ The Way of Water" works because it's a highly rated movie. The samples movies above were not in the search index because they were too badly rated or too short. That's something we can improve upon. If your clear the cache it'll work better right away, at the very least for all the sample movies listed above.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

I learnt to replace the colon with ratio right here on your forums. From other users. So i guess i'm not the only one.
viewtopic.php?p=55655 - Replace : colon with ∶ ratio
viewtopic.php?t=13680 - [SNIPPET] Replace Characters, Words or Patterns
Perhaps i'm the only one that's made the connection between the ratio character and emby/TheMovieDB not iding the movies. Most users don't even visit these forums, let alone post.

I have my naming convention that leaves all my media human readable, and they have simple filenames. When replacing the illegal characters that Windows doesn't like, of all of them, the ratio character is the one that is the least noticeable, the most perfect swap! (? and * look spaced out maaaan!). Emby, does now recognise most of the media straight off, so it's working fine. And once it's in, it's all good. So, nah, emby has to put up with what i've given it, it's not changing.

Emby, FileBot, etc. are pieces of software, that can, and should try to just magically accept any filename that's thrown at them. Again, if you do some smart coding once, it's fixed for everyone, forever. It just works. Which we'd guess is your goal?
But your : ratio : colon replacement does not exist in the wild, so I'm somewhat reluctant to add custom code for everyone what only benefits users that generate badly named files on purpose.
These filenames aren't so much 'badly' named, more that they're working around the limitations of Microsoft Windows' illegal characters. Again, learnt here on your forums. See above.

Code: Select all

Halloween꞉ Resurrection (2002) (720p)
Highlander꞉ Endgame (2000) (720p)
Hellraiser꞉ Revelations (2011) (720p)
Hellraiser꞉ Hellworld (2005) (720p)
Highlander꞉ The Source (2007) (720p)
How to Train Your Dragon꞉ Homecoming (2019) (720p)
I'd say that they look to be the best, almost the simplest form... The movie name, the movie year, the movie's resolution. Everyone would look at these and know exactly what they're looking at. They don't look badly generated to us. They look Indistinguishable from the original movie name.
FileBot does strip certain patterns. Assuming that there's no movie that actually uses : ratio as part of the movie name, we could strip : ratio as well. But your : ratio : colon replacement does not exist in the wild, so I'm somewhat reluctant to add custom code for everyone what only benefits users that generate badly named files on purpose. That said, if the issue comes up more often from a variety of users over time, that would change the calculus.
Correct, no movie actually uses : ratio as part of the movie name... This isn't the issue... It's that they do use the colon : character, which is illegal on Windows. And us FileBot users commonly use your code to replace it. FileBot/TheMovieDB/et al. are mostly searching using filename versions of movie names, not the strict, verbatim original movie name.
I've been using the ratio character for a few years, and only a few days ago did i twig that it's the thing that causes FileBot/TheMovieDB to not id a few movies. So again, you're probably not going to get a flood of users +1ing this forum thread, as it is just a niche, edge-case, but one that does exist, is easy to reproduce, and i was guessing is easy to fix. I thought stripping punctuation from a search string would be common.
Software exists to do clever, complex, repetitive things for us. And we write it to be as flexible as possible, to make the experience as smooth and as perfect as possible. Whatever we throw at it. It just works.

search.php?keywords=illegal+character&t ... mit=Search - FileBot forums Search: 'illegal character'
search.php?keywords=replace+character&t ... mit=Search - FileBot forums Search: 'replace character'

So, if it does strip certain patterns, have it strip one more character? And FileBot get even more polished!
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

rednoah wrote: 05 Feb 2025, 10:18 :idea: I have also investigated the sample movies above. "Avatar꞉ The Way of Water" works because it's a highly rated movie. The samples movies above were not in the search index because they were too badly rated or too short. That's something we can improve upon. If your clear the cache it'll work better right away, at the very least for all the sample movies listed above.
The solution we have implemented works for all of the test cases listed above:
Screenshot


:arrow: Please run filebot -clear-cache once (or wait 1 month) and try again. If you find additional file paths that still do not work, please paste them here and we'll have a look.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

Running FileBot 5.1.6 (r10435)
Cleared the cache with Ctrl + Shift + Del
Restarted Filebot.
Throwing in some documentary films, and it has issues with most of, but not all the files with the ratio colon replacement character.
Is there a newer version with the change?

:arrow: I'd use Plain File Mode to strip all the : ratio characters from all the file names:

Code: Select all

{ fn.replace('꞉', '') }
and then use Movie Mode in a second step to name / organise everything correctly with the {emby.id} binding once and for all:

Code: Select all

{ emby.id }
So, according to 'Everything', i have 7,800 folders and media files with the ratio character! :-D
They look perfect, they conform to Windows' illegal character limitations. So they're staying.

The only issue i get with them is FileBot. Where i got the idea to use them from (from the forums).

If FileBot already strips out stuff when submitting to the databases, just strip our this character, or perhaps all the common replacement characters that FileBot users commonly use. The ratio character has been shown to cause problems, is reproducible, and isn't a surprise. So adding these changes to FileBot will make FileBot just work more often, for its users.
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

topbanana wrote: 14 Feb 2025, 04:57 Throwing in some documentary films, and it has issues with most of, but not all the files with the ratio colon replacement character.
Please paste a few file paths so that we can run tests and confirm. You can press F7 to copy & paste file paths as text that are currently loaded into FileBot.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

Code: Select all

Documentary Films\'85꞉ The Greatest Team in Football History (2016) (720p).mkv	
Documentary Films\13 Lost꞉ The Untold Story of the Thai Cave Rescue (2020) (720p).mkv	
Documentary Films\24 7꞉ Kelly Slater (2019) (720p).mkv	
Documentary Films\27꞉ Gone Too Soon (2018) (720p).mkv	
Documentary Films\30 Years of Garbage꞉ The Garbage Pail Kids Story (2017) (720p).mkv	
Documentary Films\1972꞉ Munich's Black September (1972 - Münchens schwarzer September) (2022) (720p).mkv	
Documentary Films\2012꞉ Time for Change (2010).mkv	
Documentary Films\2022꞉ The Year from Space (2023) (720p).mkv	
Documentary Films\Accidental Courtesy꞉ Daryl Davis, Race & America (2016) (720p).mkv	
Documentary Films\A Disturbance in the Force꞉ How the Star Wars Holiday Special Happened (2023) (720p).mkv	
Documentary Films\Adrenaline Rush꞉ The Science of Risk (2002) (720p).mkv	
Documentary Films\Africa꞉ The Serengeti (1994) (720p).mkv	
Documentary Films\Alaska꞉ Spirit of the Wild (1998) (720p).mkv	
Documentary Films\A Life of Endless Summers꞉ The Bruce Brown Story (2020) (720p).mkv	
Documentary Films\A Life of Speed꞉ The Juan Manuel Fangio Story (Fangio, el hombre que domaba las máquinas) (2020) (720p).mkv	
Documentary Films\American Boy꞉ A Profile of Steven Prince (1978) (720p).mkv	
Documentary Films\Antarctica꞉ An Adventure of a Different Nature (1991) (720p).mkv	
Documentary Films\Antarctica꞉ A Year on Ice (2013) (720p).mkv	
Documentary Films\APEX꞉ The Story of the Hypercar (2016) (720p).mkv	
Documentary Films\Apollo꞉ The Forgotten Films (2019) (720p).mkv	
Documentary Films\Atomic Hope꞉ Inside the Pro-Nuclear Movement (2023) (720p).mkv	
Documentary Films\Atomic꞉ Living in Dread and Promise (2015) (720p).mkv	
Documentary Films\A Year in the Ice꞉ The Arctic Drift (2021) (720p).mkv	
Oh, and yes, since these are all in the "Documentary Films" folder, all the 'results' that it suggested are almost all the same selection of Oscar awards related stuff!!! lol.

F7???!!!!
Please document these useful features!!!
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

topbanana wrote: 14 Feb 2025, 05:49 F7???!!!!
Please document these useful features!!!
How to Request Help and Q: Does FileBot have keyboard shortcuts? would ideally get you started with learning about useful keyboard shortcuts and debug features. We could add additional links to the docs if you were to point out a page where an additional link would be helpful.
:idea: Please read the FAQ and How to Request Help.
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

The story deepens! The sample paths you posted use “꞉” U+A789 Modifier Letter Colon and not “∶” U+2236 Ratio. I was only aware of the latter : ratio replacement. Looks like there's not one but two “:” U+003A Colon lookalike characters!



I ran a few tests and here are my findings:

✅ TMDB website search with ∶ Ratio works: Atomic∶ Living in Dread and Promise

✅ TMDB website search with ∶ Modifier Letter Colon works too: Atomic꞉ Living in Dread and Promise

✅ TMDB API search with ∶ Ratio works:

Console Output: Select all

$ filebot -list --db TheMovieDB --q "Atomic∶ Living in Dread and Promise"
Atomic: Living in Dread and Promise (2015)

❌ TMDB API search with ꞉ Modifier Letter Colon does not work:

Console Output: Select all

$ filebot -list --db TheMovieDB --q "Atomic꞉ Living in Dread and Promise"
No search results


:arrow: I will run more tests and strip the ꞉ Modifier Letter Colon character, possibly the entire “Modifier Symbol” character class.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

rednoah wrote: 14 Feb 2025, 06:32 The story deepens! The sample paths you posted use “꞉” U+A789 Modifier Letter Colon and not “∶” U+2236 Ratio. I was only aware of the latter : ratio replacement. Looks like there's not one but two “:” U+003A Colon lookalike characters!
Ah...
I do seem to remember trying a different one before...
I think i got it from the forums ages ago...
Or possibly from just searching Charmap...
I ran a few tests and here are my findings:

✅ TMDB website search with ∶ Ratio works: Atomic∶ Living in Dread and Promise

✅ TMDB website search with ∶ Modifier Letter Colon works too: Atomic꞉ Living in Dread and Promise

✅ TMDB API search with ∶ Ratio works:

Console Output: Select all

$ filebot -list --db TheMovieDB --q "Atomic∶ Living in Dread and Promise"
Atomic: Living in Dread and Promise (2015)

❌ TMDB API search with ꞉ Modifier Letter Colon does not work:

Console Output: Select all

$ filebot -list --db TheMovieDB --q "Atomic꞉ Living in Dread and Promise"
No search results
Sometimes they do id, yes...
:arrow: I will run more tests and strip the ꞉ Modifier Letter Colon character, possibly the entire “Modifier Symbol” character class.
If you just have a function that strips out characters... then you can just add to it as we discover another. But stripping out all the popular ones can only help 99%. And we don't mind if there's the odd movie that it can't id for mysterious reasons... But we do if it doesn't id dozens, hundreds, thousands, all for the same reason!
It seems obvious that if we just send normal words to the dbs, they'll more likely id it... There aren't many/any movies that absolutely rely on any of these modifier letter characters!

So hopefully, this should just fix it, for good!
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

Thinking about it, i think i changed from “∶” U+2236 Ratio, to “꞉” U+A789 Modifier Letter Colon, as it looks more natural.
So yeah, i don't think i use 'Ratio'

Here's a few more! lol.
https://www.amp-what.com/unicode/search/colon
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

FileBot r10503 (see Latest Beta Revisions and Release Candidates) now takes care of stripping the entire \p{Sk} character class in addition to the \p{Punct} character class. This change makes all the Documentary Films sample files above work as expected.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

rednoah wrote: 14 Feb 2025, 13:12 FileBot r10503 (see Latest Beta Revisions and Release Candidates) now takes care of stripping the entire \p{Sk} character class in addition to the \p{Punct} character class. This change makes all the Documentary Films sample files above work as expected.
I'd love to test it... But...
Guess who bought it from Microsoft Store! (I so wish i hadn't... It's the ONLY app i need the Microsoft Store for!)
May i ask for a trial licence please :D
User avatar
rednoah
The Source
Posts: 23833
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Filebot, TheMovieDB, etc. don't like colons

Post by rednoah »

Sent. Happy testing.
:idea: Please read the FAQ and How to Request Help.
topbanana
Posts: 47
Joined: 22 Jan 2015, 04:51

Re: Filebot, TheMovieDB, etc. don't like colons

Post by topbanana »

It just works... Perfectly as expected! 8-)

Thanks!!!

And thanks for the testing licence, too!
Post Reply