Page 1 of 1

Non-english characters does not display right (Linux)

Posted: 14 Dec 2012, 09:22
by magnulu

Wonderful program, thank you!

I am renaming some movies according to the format '{director} - {n} ({y})/{n}{" CD$pi"}' . The movies are located on a ntfs disk mounted in a headless linux box using ntfs-3g. The files are shared via samba to different windows computers in my home network. The problems I have are related to displaying "special" characters. Example:

The movie Moolaadé ( renames to "Ousmane Sembene - Mooladé (2004)/Mooladé.avi." On my linux box this shows up as "Ousmane Sembene - Moolad? (2004)/Moolad?.avi.", but my windows computers sees this as ""OTBXOS~Y/M1ZNAB~O.AVI"

Can someone please help me out and explain why this is so, and preferably how I can fix it? Could it be my locale settings? I am located in Norway, and have configured my headless linux box to use nb_no-latin1 so that the special norwegian characters show correctly in the shell (still not showing correctly in all programs like pico and vi, but that is just laziness). I am using turnkey torrent server (11.3-lucid-x86).

Thanks in advance for any help, it would be very much appreciated!

Re: Non-english characters does not display right (Linux)

Posted: 14 Dec 2012, 15:46
by rednoah
Good thing you asked, I spent like 8 hours just two weeks ago figuring that out... :x

Those fuckers didn't configure the locale correctly, actually they didn't configure it at all. :x

You'll need to tell Java what charset the filenames are encoded with. On way is setting the environment variable LANG:

Code: Select all

export LANG=en_US.utf8
@see ...

Re: Non-english characters does not display right (Linux)

Posted: 16 Dec 2012, 11:05
by magnulu
Thank you for clearing this up!! I am very thankful.

I still have some problems setting up locale right, with utf-8 and nb_NO, but I will not bother you with that. Will look into it when time allows, and report back if I achieve something fathomable (usually that does not happen, instead the solution materializes out of nowhere after two sleepless nights trying absolutely everything - and I have no idea whatsoever how and why it suddenly works).

By the way, I changed the dir_bin in your script to /usr/share/filebot to get it working. This results in permission denied warnings regarding cache. I just close my eyes.

Re: Non-english characters does not display right (Linux)

Posted: 16 Dec 2012, 12:27
by rednoah
What I learned was that filenames are really just char[] and the only thing you can rely on is that it's a 0-terminated char[]. Each application decides how to encode a unicode string into a char[] that'll be the filename. Even GTK and KDE have different standards on how the filename should be encoded (one is using the locale of the user, the other forces utf8 i believe).

Re: Non-english characters does not display right (Linux)

Posted: 27 Dec 2012, 20:49
by magnulu
Hello again,

All working! A few words about what I did, maybe it can help somebody else.

1. Got the locale configured right by following the steps from this post: ... ult-locale. In my case I used nb_NO.UTF-8.

I SSH into my headless turnkey box, so I also had to change the encoding preference in my ssh client to UTF-8 to get it all working right. Took some time to figure that one out! Among other things I had some problems with the norwegian characters being "sticky" (as in that they would not show up until I pressed another key AFTER pressing the wanted character - like using the ~-character) - but that disappeared after changing this. I tried other things (different loadkey kombinations) at the same time, but I think changing the ssh client preferences did the trick.

2. Used your script with great success! I had an annoying typo in the locale-variable in your script that no doubt caused problems.

Like I said, I also changed the dir_bin path in your script, resulting in it working, but with said permission denied warnings regarding cache when not running as root. Chowned the /usr/share/filebot directory to my main (and only) user on the system to get rid of that. If you have any opinions on other/better ways to fix this, feel free to share them!

Thanks again for helping out. Highly appreciated!