issue with special caracters

Any questions? Need some help?
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

issue with special caracters

Post by gdelc »

Hello, i have detected a problem with Mac OS edition (dunno if occurs on other platforms)

problem reproducible if you have a mac, a NAS, and mount that nas as NFS share.

every file named with accentuated letters as é à ç ô aren't processed through mediainfo commands so parameters like vf and sdhd doesn't match
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

That seems like a tricky one. Possibly a filename encoding mismatch between Java Unicode strings and whatever native NS classes are used internally by mediainfo when talking to OSX. Possibly something that has to be fixed in the mediainfo code.

In anycase, I don't have a Mac so I can't do much about native issues right now.
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

Perhaps, or not.. 'cause by it's GUI mediainfo processes the files without a glitch. Using UTF-8 or another caracter encoding ?
if you use UTF-8 it would be useful to enforce NFC form in the parameters, can read http://twiki.org/cgi-bin/view/Codev/UnicodeMac if you need to rest a little and can't fall asleep ;)

can do tests for you if you wanna :D gimme the builds, and i try to process a bunch o'files
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

Yep, I'd guess it's an NFC issue cause the UTF-8 strings is different bytewise depending how accents are encoded.

I'm trying this:

Code: Select all

public synchronized boolean open(File file) {
		String path = file.getAbsolutePath();
		if (Platform.isMac()) {
			path = Normalizer.normalize(path, Form.NFC);
			System.out.println("Normalizer.normalize(path, Form.NFC) => " + path);
		}
		return file.isFile() && MediaInfoLibrary.INSTANCE.Open(handle, new WString(path)) > 0;
}
Please grab the latest jar from HEAD and give it a try.
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

i got that output from cli:

Code: Select all

filebot -mediainfo "/Volumes/Qmultimedia/video/films/#/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012) .tt1195478(6.2).mkv"
Normalizer.normalize(path, Form.NFC) => /Volumes/Qmultimedia/video/films/#/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012) .tt1195478(6.2).mkv
Normalizer.normalize(path, Form.NFC) => /Volumes/Qmultimedia/video/films/#/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012) .tt1195478(6.2).mkv
Normalizer.normalize(path, Form.NFC) => /Volumes/Qmultimedia/video/films/#/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012) .tt1195478(6.2).mkv
Normalizer.normalize(path, Form.NFC) => /Volumes/Qmultimedia/video/films/#/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012) .tt1195478(6.2).mkv
5 Ans de R?flexion (2012) .tt1195478(6.2) [   ]
so no output cause there's no file named with that question mark instead of accents :(
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

The ? means the console can't display that unicode character. It's not actually the ? character.

Added two more test jars for NFKC and NFD. Maybe one of those works.
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

java default caracterset is Macroman, is it any help ?
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

Nevermind console charsets. Try the other test jars with different Unicode NFs. Hopefully one of the four will just work.
:idea: Please read the FAQ and How to Request Help.
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

Updates? Is any of the test jars working?
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

- by GUI not even launch if embedded in the app package (for any test version).

- by direct launch of the JAR package: displaying upper folder (/Volumes/Qmultimedia/video/films/#), but not [e with eacute] folder (5 Ans de Réflexion (2012)) seen under 'load' box. When trying to scan # folder, obtaining [java.lang.NullPointerException] displayed whenever you try to load (for any test version).

- by CLI
version test1:

Code: Select all

Normalizer.normalize(path, Form.NFC) => /Volumes/Qmultimedia/video/films/#/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012)/5 Ans de R?flexion (2012) .tt1195478(6.2).mkv
four times and

Code: Select all

5 Ans de R?flexion (2012) .tt1195478(6.2) [   ]
so test 1 fail.

version test2:

Code: Select all

Normalizer.normalize(path, Form.NFKC) => /Volumes/Qmultimedia/video/films/#/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012) .tt1195478(6.2).mkv
four times and

Code: Select all

5 Ans de R?flexion (2012) .tt1195478(6.2) [   ]
so test 2 fails too but displays ok in normalizer even if mediainfo process fails the same way as usual.
version test3:

Code: Select all

Normalizer.normalize(path, Form.NFD) => /Volumes/Qmultimedia/video/films/#/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012) .tt1195478(6.2).mkv
four times and

Code: Select all

5 Ans de R?flexion (2012) .tt1195478(6.2) [   ]
so test 3 fails again same way...
and version test4 also:

Code: Select all

Normalizer.normalize(path, Form.NFKD) => /Volumes/Qmultimedia/video/films/#/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012) .tt1195478(6.2).mkv
four times and

Code: Select all

5 Ans de R?flexion (2012) .tt1195478(6.2) [   ]
seems some had found solutions, but i'm far from understanding the whole bunch :) speaking of javac settings, and also normalizer...

http://shlrm.org/blog/2012/10/04/osx-java-utf-8-oh-my/
http://stackoverflow.com/questions/3610 ... ion-issues
http://hints.macworld.com/article.php?s ... 8053951714
http://lists.apple.com/archives/java-de ... 00058.html

Hope it'll helps ;)
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

btw are you using Apples JDK 6 or Oracles JDK 7? If you only tried Apple JDK 6 please try again with Oracle JDK 7.

This is a hard one...
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

Apple Jdk is unable to handle filebot (of-course i tried ;) ) so i have freshly uninstalled Oracle JRE 7 to install JDK7 before testing :D
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

perhaps the problem comes not from filebot (in the nfd/nfkc/nfkd mode) but from the mediainfo library ?
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

treid with dylib found on sourceforge revidion 0.7.63

Code: Select all

Normalizer.normalize(path, Form.NFKD) => /Volumes/Qmultimedia/video/films/#/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012)/5 Ans de Réflexion (2012) .tt1195478(6.2).mkv
5 Ans de Réflexion (2012) .tt1195478(6.2) [   ]
taking good way ?
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

Still not working? I the latest release I do normalize with NFD, that should be the right one but it still didn't work I guess. Other than that I have no clue. I guess updating to the latest libmediainfo also doesn't help? Not much I can do on my said at this point.
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

i guess that it is mediainfo cli that is in cause for the last part of the bug. i'll try to fill a message to them :)
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

made tests, got full good output from mediainfo when submitting the file directly in command line...

just submitted the path between double quotes

tested with 3.61 CLI same as 3.60, and same with GUI.
Is there a method to see verbose mode ? it's obviously a parsing or kinda issue, nope ?
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

Nope, it's a native interface issue. FileBot is not calling the mediainfo cli tool. It's directly hooking into the libmediainfo C interface. I guess somewhere the conversion between Java String and C char_w** gets messed up, or just unicode normalization form. Suffice to say it'd be very hard for me to debug libmediainfo even if I had a Mac to play with.
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

did you triend this ?

Code: Select all

options = "#{options} -Dfile.encoding=UTF-8" if java.lang.System.getProperty('file.encoding') == 'MacRoman'
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

tried directly by command line, lost shot.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

A wise man said "when you don't know where you are going, look behind from where you come from"...

tried again the -script fn:sysinfo and here the results...

Code: Select all

Macgregor:~ greg$ java -Dfile.encoding=UTF8 -jar /Applications/FileBot.app/Contents/Resources/Java/FileBot.jar -script fn:sysinfo
FileBot 3.61 (r1646)
JNA Native: 3.5.0
MediaInfo: java.lang.UnsatisfiedLinkError: Unable to load library 'mediainfo': dlopen(libmediainfo.dylib, 9): image not found
7-Zip-JBinding: net.sf.sevenzipjbinding.SevenZipNativeInitializationException: Failed to load 7z-JBinding: no 7-Zip-JBinding in java.library.path
Extended Attributes: DISABLED
Java(TM) SE Runtime Environment 1.7.0_25
64-bit Java HotSpot(TM) 64-Bit Server VM
Mac OS X (x86_64)
Done ヾ(@⌒ー⌒@)ノ

Code: Select all

Macgregor:~ greg$ FileBot -script fn:sysinfo
FileBot 3.61 (r1646)
JNA Native: 3.5.0
MediaInfo: MediaInfoLib - v0.7.60
7-Zip-JBinding: OK
Extended Attributes: java.lang.NullPointerException
Java(TM) SE Runtime Environment 1.7.0_25 (headless)
64-bit Java HotSpot(TM) 64-Bit Server VM
Mac OS X (x86_64)
Done ヾ(@⌒ー⌒@)ノ
Macgregor:~ greg$ 
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

opened new terminal instance, had results... will reboot whole hardware !

Code: Select all

Macgregor:~ greg$ java -jar /Applications/FileBot.app/Contents/Resources/Java/FileBot.jar -script fn:sysenv
# Java System Properties #
sun.boot.library.path: /Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib
ehcache.disk.store.dir: /Users/greg/.filebot/cache/0
gopherProxySet: false
java.version: 1.7.0_25
java.vm.name: Java HotSpot(TM) 64-Bit Server VM
java.awt.graphicsenv: sun.awt.CGraphicsEnvironment
java.specification.vendor: Oracle Corporation
os.version: 10.8.4
ftp.nonProxyHosts: local|*.local|169.254/16|*.169.254/16
sun.os.patch.level: unknown
os.name: Mac OS X
java.specification.name: Java Platform API Specification
user.name: greg
sun.java.launcher: SUN_STANDARD
socksNonProxyHosts: local|*.local|169.254/16|*.169.254/16
user.dir: /Users/greg
java.ext.dirs: /Users/greg/Library/Java/Extensions:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/ext:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
sun.cpu.endian: little
user.home: /Users/greg
java.vm.specification.version: 1.7
grape.root: /Users/greg/.filebot/grape
java.endorsed.dirs: /Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/endorsed
file.separator: /
sun.arch.data.model: 64
sun.cpu.isalist: 
file.encoding: UTF-8
java.home: /Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre
java.vendor.url: http://java.oracle.com/
sun.management.compiler: HotSpot 64-Bit Tiered Compilers
java.class.path: /Applications/FileBot.app/Contents/Resources/Java/FileBot.jar
user.language: fr
java.runtime.name: Java(TM) SE Runtime Environment
java.vm.specification.vendor: Oracle Corporation
java.class.version: 51.0
http.agent: FileBot 3.61
file.encoding.pkg: sun.io
java.vm.info: mixed mode
swing.crossplatformlaf: javax.swing.plaf.nimbus.NimbusLookAndFeel
java.vendor: Oracle Corporation
sun.jnu.encoding: UTF-8
awt.toolkit: sun.lwawt.macosx.LWCToolkit
sun.font.fontmanager: sun.font.CFontManager
http.nonProxyHosts: local|*.local|169.254/16|*.169.254/16
user.country: FR
os.arch: x86_64
sun.boot.class.path: /Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/sunrsasign.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/lib/JObjC.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/classes
sun.io.unicode.encoding: UnicodeBig
line.separator: 

java.vm.version: 23.25-b01
java.io.tmpdir: /var/folders/7q/qx89427162sdgjb88dz944q40000gn/T/
sun.java.command: /Applications/FileBot.app/Contents/Resources/Java/FileBot.jar -script fn:sysenv
java.awt.printerjob: sun.lwawt.macosx.CPrinterJob
java.vendor.url.bug: http://bugreport.sun.com/bugreport/
java.vm.specification.name: Java Virtual Machine Specification
java.library.path: /Users/greg/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
java.runtime.version: 1.7.0_25-b15
java.specification.version: 1.7
path.separator: :
user.timezone: 
java.vm.vendor: Oracle Corporation
# Environment Variables #
_: /usr/bin/java
HOME: /Users/greg
SHELL: /bin/bash
__CF_USER_TEXT_ENCODING: 0x1F5:0:91

JAVA_ARCH: x86_64
Apple_PubSub_Socket_Render: /tmp/launch-lQLTHL/Render
SHLVL: 1
SECURITYSESSIONID: 186a4
LANG: fr_FR.UTF-8
LOGNAME: greg
SSH_AUTH_SOCK: /tmp/launch-S14RE4/Listeners
com.apple.java.jvmTask: CommandLine
PWD: /Users/greg
TERM: xterm-256color
TERM_SESSION_ID: D3D92F31-B5AB-4E09-9109-FACCC652E772
COMMAND_MODE: unix2003
TERM_PROGRAM: Apple_Terminal
PATH: /Library/Frameworks/Python.framework/Versions/2.7/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/git/bin:/Developer/usr/bin
TERM_PROGRAM_VERSION: 309
TMPDIR: /var/folders/7q/qx89427162sdgjb88dz944q40000gn/T/
USER: greg
Apple_Ubiquity_Message: /tmp/launch-U4Cblw/Apple_Ubiquity_Message
Done ヾ(@⌒ー⌒@)ノ
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

Can't think of anything here. Looks good to me. I'm not sure if much can be done with Java system properties.

Internal things seem to be all set to UTF-8...

Code: Select all

sun.jnu.encoding: UTF-8
:idea: Please read the FAQ and How to Request Help.
gdelc
Donor
Posts: 41
Joined: 16 Jun 2013, 19:46

Re: issue with special caracters

Post by gdelc »

i'll be glad to digg more by myself, if you could gimme some clues to follow.
starting with how to follow operations and results of instructions submitted to filebot step by step ?

IMO unicode normalization doesn't get thru UTF-16LE or UTF-8 accented characters, so a 'ê' is translated 'e+^' and transmitted just 'e' and mediainfo didn't find the misnamed file...
User avatar
rednoah
The Source
Posts: 22999
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: issue with special caracters

Post by rednoah »

Unicode NF has nothing to do with UTF-16LE/BE or UTF-8. Can be encoded however you want. The problem is that when you type ê you'd type ^+e (decomposed) and that'd result in the character ê (composed). It looks the same but it's two different bytesequences.

The first thing I'd debug is add loging to libmediainfo and dump filenames how they're native and how filebot is passing them in via JNA. Somehow there must be a difference.

1. The issue might either be filebot passing in the wrong value. But I'm passing in NFD now so I think that's ok.
2. JNA messes up the filename when converting Java String to native wide-char String
3. libmediainfo somehow messes up the correct filename it gets from JNA

I think without the help from zenitram the author of mediainfo I can't figure this out or fix it.
:idea: Please read the FAQ and How to Request Help.
Post Reply