Language inconsistently detected

All your suggestions, requests and ideas for future development
Post Reply
Pator
Power User
Posts: 15
Joined: 04 Oct 2016, 07:29

Language inconsistently detected

Post by Pator »

Hi,

1)
I have a movie that returns an empty array for
languages
Info.SpokenLanguages
(BTW, are they the same thing?)
Do they come from the TMDB as I think?

However in the TMDB page, the OriginalLanguage is defined (filebot info.OriginalLanguage does not exist either)

The movie in question is
https://www.themoviedb.org/movie/347760-aaaaaaaah

I guess this is a bug, since filebot cannot get the language from the TMDB

2)
Interestingly I did fInd the language inside media.Video_Language_List and media.Audio_Language_List. I understand that from the second filebot builds the audio.language array. But it fails.

Is this a bug?

And BTW what is the difference between audio.* and audios.*?
==
My goal is to put all movies in a language subdirectory. First based on the first audio stream, if detected in the file metadata, otherwise using OriginalLanguage (and not Spoken languages, though in this case none existing, since it is a list not necessarily sorted in relevance order)

Thanks.
User avatar
rednoah
The Source
Posts: 22998
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Language inconsistently detected

Post by rednoah »

1.
spoken_languages and original_language are separte pieces of data. {language} will give you whatever TheMovieDB sends as spoken_languages data which may or may not be an empty list.

@see http://pastebin.com/7VkLgFgG


2.
media.Video_Language_List is the Video_Language_List property of the container, but audio.language is the language property for each audio stream. Depending on how MediaInfo works, it may or may not be the same.

@see viewtopic.php?t=4285


3.
audio.* exists
audios.* does not exist (anymore)
:idea: Please read the FAQ and How to Request Help.
Pator
Power User
Posts: 15
Joined: 04 Oct 2016, 07:29

Re: Language inconsistently detected

Post by Pator »

1.
spoken_languages and original_language are separte pieces of data. {language} will give you whatever TheMovieDB sends as spoken_languages data which may or may not be an empty list.

So on one hand I was correct:
languages=info.SpokenLanguages (=TMDB spoken_languages)

But I cannot find any filebot bindings that give me TMDB's original_language

And a small question about conversion
Audio.languages returns ie [en, it, es], any built in solution to convert it to [eng, Ita, spa] [English, Italian, Spanish] and [Inglés, Italiano, Español]?

Would the same solution work for Audio_Language_List or lang (I think land has built in properties for iso2 and iso3?

And while I am at it, what is the difference between old lang (for subtitles) and the newer subt?
User avatar
rednoah
The Source
Posts: 22998
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Language inconsistently detected

Post by rednoah »

1.
Looks like the original_language data is not available because nobody ever asked for that.


2.
Depends on which MediaInfo audio property you choose to use. Looks like you want {audio.LanguageString3} instead of {audio.Language}.

e.g.

Code: Select all

Language	ja
Language/String	Japanese
Language/String1	Japanese
Language/String2	ja
Language/String3	jpn
Language/String4	ja

3.
{lang} is the language and will give you a Language object. {subt} is the subtitle tag which will give you a String that includes the ISO 639-2/B language code, any tags like (e.g .eng-forced.srt) and dots.

So if you just want to add .eng to the filename you'll want to use {subt} but if you want to do anything fancy with just the language code then you wanna use lang instead.

e.g.

Code: Select all

{lang.locale.getDisplayName(Locale.GERMAN)}
:idea: Please read the FAQ and How to Request Help.
Pator
Power User
Posts: 15
Joined: 04 Oct 2016, 07:29

Re: Language inconsistently detected

Post by Pator »

2. Everything is so clear now. Or almost.
How do I get iso3 from string English in
media.Audio_Language_List?

"English".tolocale() returns and object with empty properties except for several where the string english (lower case) goes.

1. Please include original_language, it's more meaningful than spoken_languages.

For what I see, ppl are using often the later plus filtering, when what they really need is the former.

Thanks!
Pator
Power User
Posts: 15
Joined: 04 Oct 2016, 07:29

Re: Language inconsistently detected

Post by Pator »

Before you reply, I can tell you I have seen your additions:

Now, for my test case
Info.originalLanguage
returns "en"
Info.originalLanguage.toLocale().displayName
Returns "English"
audioLanguages
returns in this case an empty array (and not null, i will need to use a different test case, I guess with this one I will be able to get ISO3 as opposed to media.audio_language_list)

But something else changed too (probably due to other commits)

I have my formats in files, and in several
lines, for easier reading and maintenance.
New lines (outside {}) where in the past ignored. This allowed to do a neat code formatting.
Now new lines insert a space, which in turn have corrupted my file naming a little bit. I hope this was not intended, since that would make code less readable.

In any case, thanks for being so supportive in general, hope you are getting enough of your well deserved donations.
User avatar
rednoah
The Source
Posts: 22998
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Language inconsistently detected

Post by rednoah »

1.
Generic MediaInfo bindings will give you direct access to any of the MediaInfo String values. All the FileBot bindings give you validated Language objects (which use the ISO3 code as default String representation).

e.g.

Code: Select all

{audio.language*.class} => {audio.language*.toLocale()*.getDisplayName()}

Code: Select all

{audioLanguages*.class} => {audioLanguages*.name}

2.
Pator wrote:New lines (outside {}) where in the past ignored. This allowed to do a neat code formatting.
Now new lines insert a space, which in turn have corrupted my file naming a little bit. I hope this was not intended, since that would make code less readable.
There is no intended behaviour either way. Please post some examples.
:idea: Please read the FAQ and How to Request Help.
Pator
Power User
Posts: 15
Joined: 04 Oct 2016, 07:29

Re: Language inconsistently detected

Post by Pator »

rednoah wrote: There is no intended behaviour either way. Please post some examples.
The following code goes into a file, wich I load with
--def movieFormat=@filebot/movie.groovy

Code: Select all

{ // Movies directory }
Movies/

{// Language Directory Name}
{any {audio.first().LanguageString}                 // Audio Stream 1 Language 
     {audioLanguages.first().name}                  // Same as above (stream)? Same as below (container)?
     {info.OriginalLanguage.toLocale().displayName} // DB Original Language
     // Container Audio info
     {media.Audio_Language_List.toLocale().displayName.replaceAll(/ .*/,'').upperInitial()}
     {info.SpokenLanguages.first()}                 // DB Languages, Unreliable
     {'_Unknown_'} 
}/

{ // Movie Directory }
{collection +'/'}{primaryTitle} ({y})/

{ // File Name }
{primaryTitle} ({y}){' CD'+pi}

{ // Tags }
{" " + tags} [{imdbid} {info.RunTime}m]

{ // Video Streams }
 [{vc} {resolution}]

{ // Audio Streams, channels sometimes wrong, need to investigate }
 [{ac}-{channels} {any {audio.languageString3} // Audio Streams ISO3 Languages
                       {audioLanguages}        // Same as above (stream) or container info?
                                               // media.Audio_Language_List removed, container audio list unreliable
                       {'[nil]'}
                  }] 

{ // Subtitle Streams, code style does not match the above, needs rewrite }
{ " [$text.format[0] " + any {subt}{if (text.language[0]) text.language} \
                             {'[nil]'} + ']'
}

{ // suffix for separated subtitle files }
{ '.'+lang}
Between "Movies/" and the next "{any" there are 3 newlines (no spaces), Unless I remove the 3 newlines, a space is inserted. Before, with 4.7.2 this did not happen, as it does with r4568

Current test case:

Code: Select all

[TEST] Rename [/share/CACHEDEV1_DATA/Public/Unsorted/TODO/American.Sniper.2014.BluRay.1080p.mkv] to [/share/CACHEDEV1_DATA/Public/Unsorted/DONE/Movies/ English/ American Sniper (2014)/ American Sniper (2014) [tt2179136 134m] [x264 1920x800] [AAC-5.1 [eng]].mkv]
You can see the space after Movies/, English/, (2014)/
It adds a space only if there is none. Therefore you see only one space (as intended) before [tt, [x264 or [AAC

While you are at it, maybe you can comment on my code, if it is the best and efficient way of doing it. Specially at the comments I inserted with my doubts.

It would also bee nice if everybody would format their formats in a readable way. most of what is lying around (one liners with hundreds of characters) is undecipherable for the human eye. Maybe filebot UI could provide more than one line view for the format editing to encourage this, or even open a window editor. Maybe people think only one long line is possible...
User avatar
rednoah
The Source
Posts: 22998
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Language inconsistently detected

Post by rednoah »

1.
Makes sense. NEWLINE in the String literal parts of the format shall be ignored.

2.
Most people blindly copy & paste code so the results ain't pretty. Even the simple built-in 1-line formats are already beyond most new users, so I don't plan on making it more intimidating as it already is. I'm not planning on building a Groovy IDE here. :D

3.
Your format is probably the best one I've ever seen on here. Nobody ever bothers to organize their code.
:idea: Please read the FAQ and How to Request Help.
Pator
Power User
Posts: 15
Joined: 04 Oct 2016, 07:29

Re: Language inconsistently detected

Post by Pator »

1. Yes. NEWLINES in the literal area were in the past ignored. But with the latest version they are not ignored anymore. There must be some regression in one of the latest commits. Do you know which might have caused it?

2. I hardly use the UI, so it would not be for me. But your format editor is already multiline. But the text field is too small to show more than one line (arrow keys work). You would only need to make that field larger to give an expanded view. Just cosmetics.

3. I like to maintain and improve my code, and help others to read it :)

So. audio.language is exactly the same thing as audioLanguages both internal filebot bindings that return objects with data taken from the Media Audio section? And not from the Media General section? (one defaults to iso3 and the other to iso2 though)
User avatar
rednoah
The Source
Posts: 22998
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Language inconsistently detected

Post by rednoah »

1.
Fixed with r4576.

2.
It's more like a combobox that allows you to select a value from the list of values below, and then maybe make minor edits. People that can write complex formats are usually people that don't need to Format Editor much anyway, other than prototyping bits and pieces.

3.
Yes, audioLanguages is based on the language field of each audio stream:
https://github.com/filebot/filebot/blob ... .java#L764

Feel free to subscribe to the Episode / Movie Naming Scheme subforum and see what comes up. Your eyes will bleed, but your format style would definitely up the average quality quite a bit! :lol:


EDIT:


This thread might be useful: viewtopic.php?f=4&t=3125 (guess the NEWLINE wasn't an issue here because we just structured everything as a single Groovy expression)

Other than you and dem61s (myself excluded of course) nobody really posted any readable format AFAIK.
:idea: Please read the FAQ and How to Request Help.
Pator
Power User
Posts: 15
Joined: 04 Oct 2016, 07:29

Re: Language inconsistently detected

Post by Pator »

The newline was never an issue for me. It suddenly appeared with the latest version when I tested audioLanguages and my existing format got broken.

That being said, i have tested your new patch and it works!

The audio channel detection, I think it is a mediainfo bug with HE-AAC (will investigate in their forums)

Now I have troubles with ut_label=tv still querying TMDB, and some trivial wrong movie matching, sometimes for no aparent reason, sometimes for using the containing folder as hint (my TODO/ directory got many movies there confused with Spanish movies with "Todo" in the title. Now my dir is called XXXXX :/

I will present my findings in a different thread, when ready.
mailus
Posts: 6
Joined: 28 Jan 2017, 15:24

Re: Language inconsistently detected

Post by mailus »

Sorry to reply in old thread.
I felt this will be the correct thread to ask this question.

How to get complete name instead of language code in FileBot_4.7.8_Beta.jar?.
In FileBot_4.7.2 version I used "language.first()" to get complete language name.
Recently I updated to FileBot_4.7.8_Beta version I didn't get complete name when i used "language.first()". Supposed to be the upgrade in FileBot standard caused this.
User avatar
rednoah
The Source
Posts: 22998
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Language inconsistently detected

Post by rednoah »

Not sure what you're trying to do... Maybe something like this?

Code: Select all

{languages[0].name}

Code: Select all

{audioLanguages[0].name}
:idea: Please read the FAQ and How to Request Help.
Post Reply