Full roman numerals support

All about user-defined episode / movie / file name format expressions
Post Reply
ZGab
Posts: 17
Joined: 09 May 2016, 20:35

Full roman numerals support

Post by ZGab »

Hello,

You can find here a full roman numeral regular expression for FileBot script fn:amc.

Code: Select all

{n.replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))\b/, { it.upper() })}
I've found the regular expression from here : http://stackoverflow.com/questions/2673 ... expression (comment from Corin - 5 votes)
I've just adapted for fn:amc scripting after testing it on http://www.regexplanet.com/advanced/java/index.html

Note : FileBot fn:amc replaceAll works correctly only if all capture group are ignore by (?:) or including a modifier as ingorecase (?i:)
ZGab
Posts: 17
Joined: 09 May 2016, 20:35

Re: Full roman numerals support

Post by ZGab »

For whom interested by this regex, here is a new version :

Code: Select all

{n.replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))(?:[^\w']|\Z)/, { it.upper() })}
Compared to previous version it put compatible roman numerals string into upper if last roman numeral character is not followed by a simple quote.
Useful for french language and such cases to avoid unexpected uppercase :

Code: Select all

"À l'aveugle" instead of "À L'aveugle"

Code: Select all

"La vie d'Adèle" instead of "La vie D'Adèle"
etc...
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: Full roman numerals support

Post by kim »

just use {n} (99.99 % of the time it's better then if you use any format to replace it) ;)
use for filename: {"La vie d'Adèle".upperInitial()} (only because it's ez to read)

btw:
(original title) La vie d'Adèle - Chapitres 1 et 2
France La vie d'Adèle
France (French title) La vie d'Adèle
http://www.imdb.com/title/tt2278871/rel ... tt_ql_dt_2
https://www.themoviedb.org/movie/152584 ... uage=fr-FR
ZGab
Posts: 17
Joined: 09 May 2016, 20:35

Re: Full roman numerals support

Post by ZGab »

You right, but to be able to use main title with special chars I'm using following format :

Code: Select all

    ...{norm =
        {
            it
                .transliterate(defines.mylang.lower() + ';')
                .replaceAll(/[`´‘’?""“”]/, "'")
                .replaceAll(/[:|]/, " - ")
                .replaceAll(/[?]/, "!")
                .replaceAll(/[*\s]+/, " ")
                .replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))(?:[^\w']|\Z)/, { it.upper() })
                .replaceAll(/\b[0-9](?i:th|nd|rd)\b/, { it.lower() })
        };
    norm(n)}...
I think it's useful when we have such case :

https://www.themoviedb.org/movie/813-ai ... uage=fr-FR

Code: Select all

Y a-t-il un pilote dans l'avion ?
https://www.themoviedb.org/movie/61346- ... uage=fr-FR

Code: Select all

Pathfinders : Vers la victoire
Or non roman alphabet transliteration, in that case the case may be changed and I want to force good one for roman numerals.
I may be unnecessary, you're probably right
ZGab
Posts: 17
Joined: 09 May 2016, 20:35

Re: Full roman numerals support

Post by ZGab »

I'll stop using this roman numerals regex cause to unicode characters :

Code: Select all

.replaceAll(/\b(?i:M{1,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|C?D|D?C{1,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|X?L|L?X{1,3})(?:IX|IV|V?I{0,3})|M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|I?V|V?I{1,3}))(?:[^\w']|\Z)/, { it.upper() })
on :
Le désastre
give :
Le DÉsastre
cause to accentuated character following matching
D
Unicode character
é
begins with bit 00 matching with final
\Z
Note: the regex could work with endings
(?:[^\p{L}']|\Z)/
but I found this finally too complicated for nothing.
ref: http://www.regular-expressions.info/unicode.html#prop
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: Full roman numerals support

Post by kim »

{n.ascii()}
User avatar
rednoah
The Source
Posts: 22991
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Full roman numerals support

Post by rednoah »

This looks like fun. Please post your test data filenames so we can play with that as well. ;)


I'd start with this:

Code: Select all

{n.replaceAll(/(?i)\b[XVI]+\b/){it.upper()}}

Maybe not super-complete, but probably good enough for 99.99% of use-cases and a bit more easy to read. :lol:
:idea: Please read the FAQ and How to Request Help.
ZGab
Posts: 17
Joined: 09 May 2016, 20:35

Re: Full roman numerals support

Post by ZGab »

Sure your right, I'd same before. But I'd like to avoid some 0.01% exception :)
Finally I consider that {n} is enough and TheMovieDB always returns the good name.

I'm fan with regex, if needed :)
Post Reply