Detecting language in filename

All about user-defined episode / movie / file name format expressions
Post Reply
jeff1326
Posts: 22
Joined: 25 Mar 2016, 16:00

Detecting language in filename

Post by jeff1326 »

I am trying to keep spoken language from filename, so i did this as a proof of concept :

Code: Select all

{fn.space('..').matchAll(/(?i)(?:\.)(VFI?|FR|ENGLISH|EN|SPANISH|SPA)(?:\.|$)/)}
If i do my test with this fake filename :
Virgin..Suicides..vfi..vf..1999..1080p..FR..EN..X264..AC3-mHDgz
On https://regex101.com/r/aK9fB1/2, i get the expected behavior
On http://regexr.com/3d4vj, the result seem to match filebot behavior

On filebot i get this result :
[.FR., .EN.]
I used a non-capturing group for the dot, why i get dot in result ?
What i missed ?
jeff1326
Posts: 22
Joined: 25 Mar 2016, 16:00

Re: Detecting language in filename

Post by jeff1326 »

I saw than rednoah made matchAll case insensitive by default ! :D

And i found that the following code work as expected :

Code: Select all

{fn.space('..').matchAll(/(?:\.)VFI?|FR|ENGLISH|EN|SPANISH|SPA(?:\.|$)/)}
User avatar
rednoah
The Source
Posts: 23932
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Detecting language in filename

Post by rednoah »

Fixed, so String.matchAll() should work more like String.match() now.
:idea: Please read the FAQ and How to Request Help.
jeff1326
Posts: 22
Joined: 25 Mar 2016, 16:00

Re: Detecting language in filename

Post by jeff1326 »

This fix will be included in which version of filebot ?
User avatar
rednoah
The Source
Posts: 23932
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Detecting language in filename

Post by rednoah »

:idea: Please read the FAQ and How to Request Help.
jeff1326
Posts: 22
Joined: 25 Mar 2016, 16:00

Re: Detecting language in filename

Post by jeff1326 »

So i come with :

Code: Select all

{(any{audios.language}{[]} + any{fn.replaceAll('-','.').toUpperCase().space('..').matchAll(/(?:\.)(VFI?|FR|TRUEFRENCH|FRENCH|ENGLISH|EN|SPA|SPANISH)(?:\.|$)/)}{[]}).join('-').toUpperCase().tokenize('-').sort().unique()}
Who give sometime this result when there's no language found in filename and there's no metadata found too :

Code: Select all

[]
I tried few things like from what i saw in this post without success.
User avatar
rednoah
The Source
Posts: 23932
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Detecting language in filename

Post by rednoah »

1.
Yes, your code can be expected to work this way:

Code: Select all

any{something}{[]}
If something throws and exception, then the result will be [] empty list.


2.
It may be obvious to you, but I can't know what you're trying to achieve. No result instead of empty list? You probably don't wanna use any. ;)

Your format seems overly complicated. Try thinking like this instead:

Code: Select all

{[audios.language*.toLocale().displayName, fn.matchAll(/\b(FRENCH|ENGLISH)\b/)].flatten()*.upper().sort().unique() ?: null}
:idea: Please read the FAQ and How to Request Help.
jeff1326
Posts: 22
Joined: 25 Mar 2016, 16:00

Re: Detecting language in filename

Post by jeff1326 »

1.
"?: null" is the answer :D

2.
I use any to be able to add audio language list to language list from filename. Then i can do a unique on it. The goal is the find all spoken language, since some file can't be detected in audio param, i want to detect from filename.

I want a result like this [FR, EN] or [FR] or [EN, SPA] or nothing.
I was planing to replace TRUEFRENCH, TRUFRENCH, VF, VFI, VFF, FRENCH by FR. ENGLISH for EN. SPANISH for SPA.

3.
If i use your code, i get something like :
[net.filebot.format.BindingException: Binding "audios": Cannot open media file: Y:\Downloads\Sleeping.with.Other.People.2015.LIMITED.MULTi.1080p.BluRay.x264-LOST.mkv] Sleeping with Other People (2015)
or like this :
[java.lang.Exception: Pattern not found] Robot Jox (1989)
My regex or your with boundary doesn't match in ".VF_" so i added ".space('.')"
With your boundary regex, i can remove ".replaceAll('-','.')"

I get closer with this, i just have some troubles with this regex :
{(any{audios.language}{[]} + any{fn.space('.').upper().matchAll(/\b(?:(?:(?:TRUE?)*?(FR)(?:ENCH)*?)|(?:(EN)(?:GLISH)*?)|(?:(SPA)(?:NISH)*?)|(?:(VF)i?))\b/)}{[]}).flatten()*.upper().unique() ?: null}
I'll try later after job tomorrow, time to sleep now :cry:
jeff1326
Posts: 22
Joined: 25 Mar 2016, 16:00

Re: Detecting language in filename

Post by jeff1326 »

Well, this look like as the regex in filebot doesn't return like in pcre, js or python.

See this : https://regex101.com/r/oB9pN7/1

Was testing with filebot-r3843.jar
User avatar
rednoah
The Source
Posts: 23932
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Detecting language in filename

Post by rednoah »

The regex works the same. But matchAll will give you the the first capturing group if you use capturing groups. You probably want capturing group 0 for each match so you have to call String.matchAll(/pattern/, 0) (or use non-capturing groups (?:asdf) every time).

e.g.

Match Group 1:

Code: Select all

{'A B C D E F'.matchAll(/\w \w (\w)/)}

Code: Select all

{'A B C D E F'.matchAll(/\w \w (\w)/, 1)}
Match Group 0:

Code: Select all

{'A B C D E F'.matchAll(/\w \w \w/)}

Code: Select all

{'A B C D E F'.matchAll(/\w \w \w/, 0)}
:idea: Please read the FAQ and How to Request Help.
jeff1326
Posts: 22
Joined: 25 Mar 2016, 16:00

Re: Detecting language in filename

Post by jeff1326 »

1. Finally, i ends with this for the language detection. Detect languages present as audio information, fallback on filename detection. Otherwise we simply ignore it.

Code: Select all

{(any{audios.language}{fn.space('.').upper().matchAll(/(?<=\bTRUE?|\b)FR(?=ENCH\b|\b)|\bEN(?=GLISH\b|\b)|\bSPA(?=NISH\b|\b)|\bVF(?=I\b|F\b|\b)/, 0)}{[]}).flatten()*.upper().flatten()*.replaceAll('VF', 'FR').unique() ?: null}
2. You should add your match group example there : http://www.filebot.net/naming.html

3. What this do ?

Code: Select all

.toLocale()
EDIT: Nevermind :oops: viewtopic.php?f=6&t=1221&p=18527&hilit=toLocale#p18527
User avatar
rednoah
The Source
Posts: 23932
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Detecting language in filename

Post by rednoah »

Creates a Locale object from a language code String.
:idea: Please read the FAQ and How to Request Help.
Post Reply