Page 1 of 1
Detecting language in filename
Posted: 03 Apr 2016, 00:46
by jeff1326
I am trying to keep spoken language from filename, so i did this as a proof of concept :
Code: Select all
{fn.space('..').matchAll(/(?i)(?:\.)(VFI?|FR|ENGLISH|EN|SPANISH|SPA)(?:\.|$)/)}
If i do my test with this fake filename :
Virgin..Suicides..vfi..vf..1999..1080p..FR..EN..X264..AC3-mHDgz
On
https://regex101.com/r/aK9fB1/2, i get the expected behavior
On
http://regexr.com/3d4vj, the result seem to match filebot behavior
On filebot i get this result :
[.FR., .EN.]
I used a non-capturing group for the dot, why i get dot in result ?
What i missed ?
Re: Detecting language in filename
Posted: 03 Apr 2016, 00:58
by jeff1326
I saw than rednoah made matchAll case insensitive by default !
And i found that the following code work as expected :
Code: Select all
{fn.space('..').matchAll(/(?:\.)VFI?|FR|ENGLISH|EN|SPANISH|SPA(?:\.|$)/)}
Re: Detecting language in filename
Posted: 03 Apr 2016, 03:27
by rednoah
Fixed, so String.matchAll() should work more like String.match() now.
Re: Detecting language in filename
Posted: 03 Apr 2016, 14:48
by jeff1326
This fix will be included in which version of filebot ?
Re: Detecting language in filename
Posted: 03 Apr 2016, 18:24
by rednoah
Re: Detecting language in filename
Posted: 04 Apr 2016, 00:54
by jeff1326
So i come with :
Code: Select all
{(any{audios.language}{[]} + any{fn.replaceAll('-','.').toUpperCase().space('..').matchAll(/(?:\.)(VFI?|FR|TRUEFRENCH|FRENCH|ENGLISH|EN|SPA|SPANISH)(?:\.|$)/)}{[]}).join('-').toUpperCase().tokenize('-').sort().unique()}
Who give sometime this result when there's no language found in filename and there's no metadata found too :
I tried few things like from what i saw in
this post without success.
Re: Detecting language in filename
Posted: 04 Apr 2016, 07:05
by rednoah
1.
Yes, your code can be expected to work this way:
If
something throws and exception, then the result will be [] empty list.
2.
It may be obvious to you, but I can't know what you're trying to achieve. No result instead of empty list? You probably don't wanna use any.
Your format seems overly complicated. Try thinking like this instead:
Code: Select all
{[audios.language*.toLocale().displayName, fn.matchAll(/\b(FRENCH|ENGLISH)\b/)].flatten()*.upper().sort().unique() ?: null}
Re: Detecting language in filename
Posted: 05 Apr 2016, 01:38
by jeff1326
1.
"?: null" is the answer
2.
I use any to be able to add audio language list to language list from filename. Then i can do a unique on it. The goal is the find all spoken language, since some file can't be detected in audio param, i want to detect from filename.
I want a result like this [FR, EN] or [FR] or [EN, SPA] or nothing.
I was planing to replace TRUEFRENCH, TRUFRENCH, VF, VFI, VFF, FRENCH by FR. ENGLISH for EN. SPANISH for SPA.
3.
If i use your code, i get something like :
[net.filebot.format.BindingException: Binding "audios": Cannot open media file: Y:\Downloads\Sleeping.with.Other.People.2015.LIMITED.MULTi.1080p.BluRay.x264-LOST.mkv] Sleeping with Other People (2015)
or like this :
[java.lang.Exception: Pattern not found] Robot Jox (1989)
My regex or your with boundary doesn't match in ".VF_" so i added ".space('.')"
With your boundary regex, i can remove ".replaceAll('-','.')"
I get closer with this, i just have some troubles with this regex :
{(any{audios.language}{[]} + any{fn.space('.').upper().matchAll(/\b(?:(?:(?:TRUE?)*?(FR)(?:ENCH)*?)|(?:(EN)(?:GLISH)*?)|(?:(SPA)(?:NISH)*?)|(?:(VF)i?))\b/)}{[]}).flatten()*.upper().unique() ?: null}
I'll try later after job tomorrow, time to sleep now

Re: Detecting language in filename
Posted: 06 Apr 2016, 00:33
by jeff1326
Well, this look like as the regex in filebot doesn't return like in pcre, js or python.
See this :
https://regex101.com/r/oB9pN7/1
Was testing with filebot-r3843.jar
Re: Detecting language in filename
Posted: 06 Apr 2016, 05:08
by rednoah
The regex works the same. But matchAll will give you the the first capturing group if you use capturing groups. You probably want capturing group 0 for each match so you have to call
String.matchAll(/pattern/, 0) (or use non-capturing groups (?:asdf) every time).
e.g.
Match Group 1:
Code: Select all
{'A B C D E F'.matchAll(/\w \w (\w)/)}
Code: Select all
{'A B C D E F'.matchAll(/\w \w (\w)/, 1)}
Match Group 0:
Code: Select all
{'A B C D E F'.matchAll(/\w \w \w/)}
Code: Select all
{'A B C D E F'.matchAll(/\w \w \w/, 0)}
Re: Detecting language in filename
Posted: 08 Apr 2016, 23:57
by jeff1326
1. Finally, i ends with this for the language detection. Detect languages present as audio information, fallback on filename detection. Otherwise we simply ignore it.
Code: Select all
{(any{audios.language}{fn.space('.').upper().matchAll(/(?<=\bTRUE?|\b)FR(?=ENCH\b|\b)|\bEN(?=GLISH\b|\b)|\bSPA(?=NISH\b|\b)|\bVF(?=I\b|F\b|\b)/, 0)}{[]}).flatten()*.upper().flatten()*.replaceAll('VF', 'FR').unique() ?: null}
2. You should add your match group example there :
http://www.filebot.net/naming.html
3. What this do ?
EDIT: Nevermind
viewtopic.php?f=6&t=1221&p=18527&hilit=toLocale#p18527
Re: Detecting language in filename
Posted: 09 Apr 2016, 00:07
by rednoah
Creates a
Locale object from a language code String.