Full roman numerals support

ZGab · Post by **ZGab** » 09 May 2016, 23:59

Hello,

You can find here a full roman numeral regular expression for FileBot script fn:amc.

{n.replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))\b/, { it.upper() })}

I've found the regular expression from here : http://stackoverflow.com/questions/2673 ... expression (comment from Corin - 5 votes)
I've just adapted for fn:amc scripting after testing it on http://www.regexplanet.com/advanced/java/index.html

Note : FileBot fn:amc replaceAll works correctly only if all capture group are ignore by (?:) or including a modifier as ingorecase (?i:)

ZGab · Post by **ZGab** » 19 May 2016, 20:30

For whom interested by this regex, here is a new version :

Code: Select all

{n.replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))(?:[^\w']|\Z)/, { it.upper() })}

Compared to previous version it put compatible roman numerals string into upper if last roman numeral character is not followed by a simple quote.
Useful for french language and such cases to avoid unexpected uppercase :

Code: Select all

"À l'aveugle" instead of "À L'aveugle"

Code: Select all

"La vie d'Adèle" instead of "La vie D'Adèle"

etc...

Post by **kim** » 19 May 2016, 20:51

just use {n} (99.99 % of the time it's better then if you use any format to replace it)

use for filename: {"La vie d'Adèle".upperInitial()} (only because it's ez to read)

btw:

(original title) La vie d'Adèle - Chapitres 1 et 2
France La vie d'Adèle
France (French title) La vie d'Adèle

http://www.imdb.com/title/tt2278871/rel ... tt_ql_dt_2
https://www.themoviedb.org/movie/152584 ... uage=fr-FR

ZGab · Post by **ZGab** » 19 May 2016, 21:35

You right, but to be able to use main title with special chars I'm using following format :

Code: Select all

    ...{norm =
        {
            it
                .transliterate(defines.mylang.lower() + ';')
                .replaceAll(/[`´‘’?""“”]/, "'")
                .replaceAll(/[:|]/, " - ")
                .replaceAll(/[?]/, "!")
                .replaceAll(/[*\s]+/, " ")
                .replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))(?:[^\w']|\Z)/, { it.upper() })
                .replaceAll(/\b[0-9](?i:th|nd|rd)\b/, { it.lower() })
        };
    norm(n)}...

I think it's useful when we have such case :

https://www.themoviedb.org/movie/813-ai ... uage=fr-FR

Code: Select all

Y a-t-il un pilote dans l'avion ?

https://www.themoviedb.org/movie/61346- ... uage=fr-FR

Code: Select all

Pathfinders : Vers la victoire

Or non roman alphabet transliteration, in that case the case may be changed and I want to force good one for roman numerals.
I may be unnecessary, you're probably right

ZGab · Post by **ZGab** » 19 May 2016, 21:58

I'll stop using this roman numerals regex cause to unicode characters :

Code: Select all

.replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))(?:[^\w']|\Z)/, { it.upper() })

on :

Le désastre

give :

Le DÉsastre

cause to accentuated character following matching

D

Unicode character

é

begins with bit 00 matching with final

\Z

Note: the regex could work with endings

(?:[^\p{L}']|\Z)/

but I found this finally too complicated for nothing.
ref: http://www.regular-expressions.info/unicode.html#prop

Post by **kim** » 19 May 2016, 22:01

{n.ascii()}

Post by **rednoah** » 26 May 2016, 09:40

This looks like fun. Please post your test data filenames so we can play with that as well.

I'd start with this:

Code: Select all

{n.replaceAll(/(?i)\b[XVI]+\b/){it.upper()}}

Maybe not super-complete, but probably good enough for 99.99% of use-cases and a bit more easy to read.

ZGab · Post by **ZGab** » 03 Jun 2016, 23:27

Sure your right, I'd same before. But I'd like to avoid some 0.01% exception

Finally I consider that {n} is enough and TheMovieDB always returns the good name.

I'm fan with regex, if needed

Full roman numerals support

Full roman numerals support

Re: Full roman numerals support

Re: Full roman numerals support

Re: Full roman numerals support

Re: Full roman numerals support

Re: Full roman numerals support

Re: Full roman numerals support

Re: Full roman numerals support