Filebot, TheMovieDB, etc. don't like colons
Filebot, TheMovieDB, etc. don't like colons
I, like many, have set FileBot to replace colon characters : with one of the replacements ꞉
But if i drop a few movies that contain these colon replacements into FileBot and get it to 'Fetch & Match data' from TheMovieDB, to check for any misnamed, updated names, or year changes, then it mostly pops up with the 'Failed to identify some of the following files:'. And it's almost always files with the colon in the filename, and it often lists wildly wrong possible movies, and often not the actual movie.
If i replace these colon replacement characters with hyphens, the 'normal' replacement character, then all these movies are ID'd perfectly.
Could FileBot just strip out this 'problem' character(s) out of the search string it sends to TheMovieDB, etc?
But if i drop a few movies that contain these colon replacements into FileBot and get it to 'Fetch & Match data' from TheMovieDB, to check for any misnamed, updated names, or year changes, then it mostly pops up with the 'Failed to identify some of the following files:'. And it's almost always files with the colon in the filename, and it often lists wildly wrong possible movies, and often not the actual movie.
If i replace these colon replacement characters with hyphens, the 'normal' replacement character, then all these movies are ID'd perfectly.
Could FileBot just strip out this 'problem' character(s) out of the search string it sends to TheMovieDB, etc?
Re: Filebot, TheMovieDB, etc. don't like colons

I notably cannot reproduce the problem with the first test case I came up with:
Code: Select all
Avatar꞉ The Way of Water.mkv


Re: Filebot, TheMovieDB, etc. don't like colons
Here's a few:
For these, i tried renaming them, replacing the colon with a hyphen, and they were all ID'd straight away (and then FileBot renamed them back, lol!)
Cheers.
Code: Select all
Halloween꞉ Resurrection (2002) (720p)
Highlander꞉ Endgame (2000) (720p)
Hellraiser꞉ Revelations (2011) (720p)
Hellraiser꞉ Hellworld (2005) (720p)
Highlander꞉ The Source (2007) (720p)
How to Train Your Dragon꞉ Homecoming (2019) (720p)
Cheers.
Re: Filebot, TheMovieDB, etc. don't like colons
rednoah wrote: ↑05 Feb 2025, 03:16If you have already matched the files, then xattr metadata should take care of identifying the file, not the file name. Does xattr metadata not work on your target file system? See Re-process previously organised files using local xattr metadata for details and examples.

EDIT:

Re: Filebot, TheMovieDB, etc. don't like colons
Yeah, i see that FileBot adds the xattr metadata.
I can't remember if i had cleared them all off the hdd while ago, as it used to pop up with the annoying popup saying about losing attributes when copying them to exfat/fat32 usb sticks... And i do remember setting '-no-xattr' for FileBot at the time! lol.
RE: .nfos... Nah, i just have a clean folder full of JUST the movies, nothing else. And yes, i know i lose out on stuff... I use EMBY, and if i clean install and forget to backup my emby folders, i have to start from scratch.
But...
I was thinking that if FileBot could simply strip out 'troublesome' characters that confuse TheMovieDB, et al. Then the issue will be solved for everyone, not just me.
For me, i'm just doing some random occasional housework on my movie collection, which caused me to notice it.
Does FileBot 'prepare' the filenames already, before submitting them to the dbs?
I was guessing that stripping out characters, or even replacing them, wouldn't be an uncommon/abnormal thing when dealing with dbs like these?
Similar with Emby's dificulties in IDing movies with only the filenames... It seems smarter to just leverage software to fix it once, and then it's fixed for everyone, forever.
I can't remember if i had cleared them all off the hdd while ago, as it used to pop up with the annoying popup saying about losing attributes when copying them to exfat/fat32 usb sticks... And i do remember setting '-no-xattr' for FileBot at the time! lol.
RE: .nfos... Nah, i just have a clean folder full of JUST the movies, nothing else. And yes, i know i lose out on stuff... I use EMBY, and if i clean install and forget to backup my emby folders, i have to start from scratch.
But...
I was thinking that if FileBot could simply strip out 'troublesome' characters that confuse TheMovieDB, et al. Then the issue will be solved for everyone, not just me.
For me, i'm just doing some random occasional housework on my movie collection, which caused me to notice it.
Does FileBot 'prepare' the filenames already, before submitting them to the dbs?
I was guessing that stripping out characters, or even replacing them, wouldn't be an uncommon/abnormal thing when dealing with dbs like these?
Similar with Emby's dificulties in IDing movies with only the filenames... It seems smarter to just leverage software to fix it once, and then it's fixed for everyone, forever.
Re: Filebot, TheMovieDB, etc. don't like colons
While this is doable, the issue at hand seems to be quite isolated to your fairly unique combination of circumstances, using : ratio characters itself is rare and very much not recommended, and it's only a problem if you don't have xattr, and no nfo, and no ID markers, any of which would have compensated for bad naming. "Avatar꞉ The Way of Water" works despite everything for some reason.
FileBot does strip certain patterns. Assuming that there's no movie that actually uses : ratio as part of the movie name, we could strip : ratio as well. But your : ratio : colon replacement does not exist in the wild, so I'm somewhat reluctant to add custom code for everyone what only benefits users that generate badly named files on purpose. That said, if the issue comes up more often from a variety of users over time, that would change the calculus.
If you are using Emby, then you really really must name files correctly, with correct file / folder names, movie folders, NFO files, ID markers, etc, even if that's not your first preference. The {emby.id} binding makes it easy. You can then always generate a secondary structure using hardlinks for your own viewing from the primary well-named structure. That requires little time and no additional disk space, so you can have files the way Emby wants and have files the way you want.

Format: Select all
{ fn.replace('꞉', '') }
Format: Select all
{ emby.id }

Re: Filebot, TheMovieDB, etc. don't like colons
I learnt to replace the colon with ratio right here on your forums. From other users. So i guess i'm not the only one.
viewtopic.php?p=55655 - Replace : colon with ∶ ratio
viewtopic.php?t=13680 - [SNIPPET] Replace Characters, Words or Patterns
Perhaps i'm the only one that's made the connection between the ratio character and emby/TheMovieDB not iding the movies. Most users don't even visit these forums, let alone post.
I have my naming convention that leaves all my media human readable, and they have simple filenames. When replacing the illegal characters that Windows doesn't like, of all of them, the ratio character is the one that is the least noticeable, the most perfect swap! (? and * look spaced out maaaan!). Emby, does now recognise most of the media straight off, so it's working fine. And once it's in, it's all good. So, nah, emby has to put up with what i've given it, it's not changing.
Emby, FileBot, etc. are pieces of software, that can, and should try to just magically accept any filename that's thrown at them. Again, if you do some smart coding once, it's fixed for everyone, forever. It just works. Which we'd guess is your goal?
I'd say that they look to be the best, almost the simplest form... The movie name, the movie year, the movie's resolution. Everyone would look at these and know exactly what they're looking at. They don't look badly generated to us. They look Indistinguishable from the original movie name.
I've been using the ratio character for a few years, and only a few days ago did i twig that it's the thing that causes FileBot/TheMovieDB to not id a few movies. So again, you're probably not going to get a flood of users +1ing this forum thread, as it is just a niche, edge-case, but one that does exist, is easy to reproduce, and i was guessing is easy to fix. I thought stripping punctuation from a search string would be common.
Software exists to do clever, complex, repetitive things for us. And we write it to be as flexible as possible, to make the experience as smooth and as perfect as possible. Whatever we throw at it. It just works.
search.php?keywords=illegal+character&t ... mit=Search - FileBot forums Search: 'illegal character'
search.php?keywords=replace+character&t ... mit=Search - FileBot forums Search: 'replace character'
So, if it does strip certain patterns, have it strip one more character? And FileBot get even more polished!
viewtopic.php?p=55655 - Replace : colon with ∶ ratio
viewtopic.php?t=13680 - [SNIPPET] Replace Characters, Words or Patterns
Perhaps i'm the only one that's made the connection between the ratio character and emby/TheMovieDB not iding the movies. Most users don't even visit these forums, let alone post.
I have my naming convention that leaves all my media human readable, and they have simple filenames. When replacing the illegal characters that Windows doesn't like, of all of them, the ratio character is the one that is the least noticeable, the most perfect swap! (? and * look spaced out maaaan!). Emby, does now recognise most of the media straight off, so it's working fine. And once it's in, it's all good. So, nah, emby has to put up with what i've given it, it's not changing.
Emby, FileBot, etc. are pieces of software, that can, and should try to just magically accept any filename that's thrown at them. Again, if you do some smart coding once, it's fixed for everyone, forever. It just works. Which we'd guess is your goal?
These filenames aren't so much 'badly' named, more that they're working around the limitations of Microsoft Windows' illegal characters. Again, learnt here on your forums. See above.But your : ratio : colon replacement does not exist in the wild, so I'm somewhat reluctant to add custom code for everyone what only benefits users that generate badly named files on purpose.
Code: Select all
Halloween꞉ Resurrection (2002) (720p)
Highlander꞉ Endgame (2000) (720p)
Hellraiser꞉ Revelations (2011) (720p)
Hellraiser꞉ Hellworld (2005) (720p)
Highlander꞉ The Source (2007) (720p)
How to Train Your Dragon꞉ Homecoming (2019) (720p)
Correct, no movie actually uses : ratio as part of the movie name... This isn't the issue... It's that they do use the colon : character, which is illegal on Windows. And us FileBot users commonly use your code to replace it. FileBot/TheMovieDB/et al. are mostly searching using filename versions of movie names, not the strict, verbatim original movie name.FileBot does strip certain patterns. Assuming that there's no movie that actually uses : ratio as part of the movie name, we could strip : ratio as well. But your : ratio : colon replacement does not exist in the wild, so I'm somewhat reluctant to add custom code for everyone what only benefits users that generate badly named files on purpose. That said, if the issue comes up more often from a variety of users over time, that would change the calculus.
I've been using the ratio character for a few years, and only a few days ago did i twig that it's the thing that causes FileBot/TheMovieDB to not id a few movies. So again, you're probably not going to get a flood of users +1ing this forum thread, as it is just a niche, edge-case, but one that does exist, is easy to reproduce, and i was guessing is easy to fix. I thought stripping punctuation from a search string would be common.
Software exists to do clever, complex, repetitive things for us. And we write it to be as flexible as possible, to make the experience as smooth and as perfect as possible. Whatever we throw at it. It just works.
search.php?keywords=illegal+character&t ... mit=Search - FileBot forums Search: 'illegal character'
search.php?keywords=replace+character&t ... mit=Search - FileBot forums Search: 'replace character'
So, if it does strip certain patterns, have it strip one more character? And FileBot get even more polished!
Re: Filebot, TheMovieDB, etc. don't like colons
The solution we have implemented works for all of the test cases listed above:rednoah wrote: ↑05 Feb 2025, 10:18I have also investigated the sample movies above. "Avatar꞉ The Way of Water" works because it's a highly rated movie. The samples movies above were not in the search index because they were too badly rated or too short. That's something we can improve upon. If your clear the cache it'll work better right away, at the very least for all the sample movies listed above.


Re: Filebot, TheMovieDB, etc. don't like colons
Running FileBot 5.1.6 (r10435)
Cleared the cache with Ctrl + Shift + Del
Restarted Filebot.
Throwing in some documentary films, and it has issues with most of, but not all the files with the ratio colon replacement character.
Is there a newer version with the change?

They look perfect, they conform to Windows' illegal character limitations. So they're staying.
The only issue i get with them is FileBot. Where i got the idea to use them from (from the forums).
If FileBot already strips out stuff when submitting to the databases, just strip our this character, or perhaps all the common replacement characters that FileBot users commonly use. The ratio character has been shown to cause problems, is reproducible, and isn't a surprise. So adding these changes to FileBot will make FileBot just work more often, for its users.
Cleared the cache with Ctrl + Shift + Del
Restarted Filebot.
Throwing in some documentary films, and it has issues with most of, but not all the files with the ratio colon replacement character.
Is there a newer version with the change?
So, according to 'Everything', i have 7,800 folders and media files with the ratio character!I'd use Plain File Mode to strip all the : ratio characters from all the file names:
and then use Movie Mode in a second step to name / organise everything correctly with the {emby.id} binding once and for all:Code: Select all
{ fn.replace('꞉', '') }
Code: Select all
{ emby.id }

They look perfect, they conform to Windows' illegal character limitations. So they're staying.
The only issue i get with them is FileBot. Where i got the idea to use them from (from the forums).
If FileBot already strips out stuff when submitting to the databases, just strip our this character, or perhaps all the common replacement characters that FileBot users commonly use. The ratio character has been shown to cause problems, is reproducible, and isn't a surprise. So adding these changes to FileBot will make FileBot just work more often, for its users.
Re: Filebot, TheMovieDB, etc. don't like colons
Please paste a few file paths so that we can run tests and confirm. You can press F7 to copy & paste file paths as text that are currently loaded into FileBot.
Re: Filebot, TheMovieDB, etc. don't like colons
Code: Select all
Documentary Films\'85꞉ The Greatest Team in Football History (2016) (720p).mkv
Documentary Films\13 Lost꞉ The Untold Story of the Thai Cave Rescue (2020) (720p).mkv
Documentary Films\24 7꞉ Kelly Slater (2019) (720p).mkv
Documentary Films\27꞉ Gone Too Soon (2018) (720p).mkv
Documentary Films\30 Years of Garbage꞉ The Garbage Pail Kids Story (2017) (720p).mkv
Documentary Films\1972꞉ Munich's Black September (1972 - Münchens schwarzer September) (2022) (720p).mkv
Documentary Films\2012꞉ Time for Change (2010).mkv
Documentary Films\2022꞉ The Year from Space (2023) (720p).mkv
Documentary Films\Accidental Courtesy꞉ Daryl Davis, Race & America (2016) (720p).mkv
Documentary Films\A Disturbance in the Force꞉ How the Star Wars Holiday Special Happened (2023) (720p).mkv
Documentary Films\Adrenaline Rush꞉ The Science of Risk (2002) (720p).mkv
Documentary Films\Africa꞉ The Serengeti (1994) (720p).mkv
Documentary Films\Alaska꞉ Spirit of the Wild (1998) (720p).mkv
Documentary Films\A Life of Endless Summers꞉ The Bruce Brown Story (2020) (720p).mkv
Documentary Films\A Life of Speed꞉ The Juan Manuel Fangio Story (Fangio, el hombre que domaba las máquinas) (2020) (720p).mkv
Documentary Films\American Boy꞉ A Profile of Steven Prince (1978) (720p).mkv
Documentary Films\Antarctica꞉ An Adventure of a Different Nature (1991) (720p).mkv
Documentary Films\Antarctica꞉ A Year on Ice (2013) (720p).mkv
Documentary Films\APEX꞉ The Story of the Hypercar (2016) (720p).mkv
Documentary Films\Apollo꞉ The Forgotten Films (2019) (720p).mkv
Documentary Films\Atomic Hope꞉ Inside the Pro-Nuclear Movement (2023) (720p).mkv
Documentary Films\Atomic꞉ Living in Dread and Promise (2015) (720p).mkv
Documentary Films\A Year in the Ice꞉ The Arctic Drift (2021) (720p).mkv
F7???!!!!
Please document these useful features!!!
Re: Filebot, TheMovieDB, etc. don't like colons
How to Request Help and Q: Does FileBot have keyboard shortcuts? would ideally get you started with learning about useful keyboard shortcuts and debug features. We could add additional links to the docs if you were to point out a page where an additional link would be helpful.
Re: Filebot, TheMovieDB, etc. don't like colons
The story deepens! The sample paths you posted use “꞉” U+A789 Modifier Letter Colon and not “∶” U+2236 Ratio. I was only aware of the latter : ratio replacement. Looks like there's not one but two “:” U+003A Colon lookalike characters!
I ran a few tests and here are my findings:
TMDB website search with ∶ Ratio works: Atomic∶ Living in Dread and Promise
TMDB website search with ∶ Modifier Letter Colon works too: Atomic꞉ Living in Dread and Promise
TMDB API search with ∶ Ratio works:
TMDB API search with ꞉ Modifier Letter Colon does not work:
I will run more tests and strip the ꞉ Modifier Letter Colon character, possibly the entire “Modifier Symbol” character class.
I ran a few tests and here are my findings:
Console Output: Select all
$ filebot -list --db TheMovieDB --q "Atomic∶ Living in Dread and Promise"
Atomic: Living in Dread and Promise (2015)
Console Output: Select all
$ filebot -list --db TheMovieDB --q "Atomic꞉ Living in Dread and Promise"
No search results

Re: Filebot, TheMovieDB, etc. don't like colons
Ah...rednoah wrote: ↑14 Feb 2025, 06:32 The story deepens! The sample paths you posted use “꞉” U+A789 Modifier Letter Colon and not “∶” U+2236 Ratio. I was only aware of the latter : ratio replacement. Looks like there's not one but two “:” U+003A Colon lookalike characters!
I do seem to remember trying a different one before...
I think i got it from the forums ages ago...
Or possibly from just searching Charmap...
Sometimes they do id, yes...I ran a few tests and here are my findings:
TMDB website search with ∶ Ratio works: Atomic∶ Living in Dread and Promise
TMDB website search with ∶ Modifier Letter Colon works too: Atomic꞉ Living in Dread and Promise
TMDB API search with ∶ Ratio works:
Console Output: Select all
$ filebot -list --db TheMovieDB --q "Atomic∶ Living in Dread and Promise" Atomic: Living in Dread and Promise (2015)
TMDB API search with ꞉ Modifier Letter Colon does not work:
Console Output: Select all
$ filebot -list --db TheMovieDB --q "Atomic꞉ Living in Dread and Promise" No search results
If you just have a function that strips out characters... then you can just add to it as we discover another. But stripping out all the popular ones can only help 99%. And we don't mind if there's the odd movie that it can't id for mysterious reasons... But we do if it doesn't id dozens, hundreds, thousands, all for the same reason!I will run more tests and strip the ꞉ Modifier Letter Colon character, possibly the entire “Modifier Symbol” character class.
It seems obvious that if we just send normal words to the dbs, they'll more likely id it... There aren't many/any movies that absolutely rely on any of these modifier letter characters!
So hopefully, this should just fix it, for good!
Re: Filebot, TheMovieDB, etc. don't like colons
Thinking about it, i think i changed from “∶” U+2236 Ratio, to “꞉” U+A789 Modifier Letter Colon, as it looks more natural.
So yeah, i don't think i use 'Ratio'
Here's a few more! lol.
https://www.amp-what.com/unicode/search/colon
So yeah, i don't think i use 'Ratio'
Here's a few more! lol.
https://www.amp-what.com/unicode/search/colon
Re: Filebot, TheMovieDB, etc. don't like colons
FileBot r10503 (see Latest Beta Revisions and Release Candidates) now takes care of stripping the entire \p{Sk} character class in addition to the \p{Punct} character class. This change makes all the Documentary Films sample files above work as expected.
Re: Filebot, TheMovieDB, etc. don't like colons
I'd love to test it... But...rednoah wrote: ↑14 Feb 2025, 13:12 FileBot r10503 (see Latest Beta Revisions and Release Candidates) now takes care of stripping the entire \p{Sk} character class in addition to the \p{Punct} character class. This change makes all the Documentary Films sample files above work as expected.
Guess who bought it from Microsoft Store! (I so wish i hadn't... It's the ONLY app i need the Microsoft Store for!)
May i ask for a trial licence please

Re: Filebot, TheMovieDB, etc. don't like colons
Sent. Happy testing.
Re: Filebot, TheMovieDB, etc. don't like colons
It just works... Perfectly as expected!
Thanks!!!
And thanks for the testing licence, too!

Thanks!!!
And thanks for the testing licence, too!