DIN5007-2 Transliterator (aka converting umlauts)
Posted: 08 Nov 2017, 16:00
The old issue, converting umlauts like äüö to ae, oe and ue. This could be very elegantly handled in script with a custom Transliterator. The following line would take any input and transliterate it to Latin script, then handle umlauts before further reducing everything to ASCII characters.
To register a Transliterator in Java, something like the following is necessary only once:
The rules are as follows, with additional checks to convert e.g. Äffin to Aeffin:
Code: Select all
// DIN5007 applies to German words only. Don't use this for foreign words, e.g. Motörhead → Motorhead
{n.transliterate("Any-Latin; DIN5007_2; Latin-ASCII")}
Code: Select all
Transliterator.registerInstance(
Transliterator.createFromRules("DIN5007_2", rules, Transliterator.FORWARD));
Code: Select all
$beforeLower = [[:Mn:][:Me:]]* [:Lowercase:] ;
ä → ae;
ö → oe;
ü → ue;
ß → ss;
Ä } $beforeLower → Ae;
Ö } $beforeLower → Oe;
Ü } $beforeLower → Ue;
Ä → AE;
Ö → OE;
Ü → UE;
ẞ → SS;