External subtitle languages?

Running FileBot from the console, Groovy scripting, shell scripts, etc
Post Reply
thielj
Posts: 55
Joined: 05 Nov 2017, 22:15

External subtitle languages?

Post by thielj »

Hi

It seems that textLanguages only includes subtitle streams embedded with the video (probably provided by mediainfo). I haven't checked yet, but the same might be true for audioLabguages and external audio tracks.

Has anyone written a script to collect languages from external subtitle (*.idx, *.ass) and audio files that they would be willing to share?

Is it possible to detect which additional subtitles FileBot was able to download?

Cheers!
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: External subtitle languages?

Post by kim »

If you have a srt file with same name as video file and in same folder, then you can use e.g.

Code: Select all

{fn}{subt}
to rename srt file with the lang e.g. "movie.eng.mkv"
thielj
Posts: 55
Joined: 05 Nov 2017, 22:15

Re: External subtitle languages?

Post by thielj »

What I have is a collection of files, eg:

Code: Select all

movie (1234).mp4             <-- includes embedded English audio and German subtitle streams
movie (1234).idx             <-- includes Italian and Spanish subs
movie (1234).sub
movie (1234).eng.srt         <-- automatically downloaded by FileBot
movie (1234).fr.mp3          <-- French audio stream
I have all the groovy code working to pick up the embedded audio ('EN') and subtitle streams ('de') and move all files to a folder "movie (1234) [EN de]".

However, I also want to add tags for the external audio track ('FR'), the titles in the idx/sub ('it' and 'sp') and the newly downloaded srt ('en') to come up with a folder named "movie (1234) [EN FR de en it sp]".

As these files have already been picked up and analyzed by FileBot, I wonder if it's possible to access this information - without re-inventing the wheel - and use it to format the folder name.
User avatar
rednoah
The Source
Posts: 24227
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: External subtitle languages?

Post by rednoah »

The {lang} binding should work for companion files:

Code: Select all

$ ls
Avatar.2009.de.mp3
Avatar.2009.fr.mp3
Avatar.2009.mp4

Code: Select all

$ filebot -rename * --db TheMovieDB -non-strict --format "{ny}{'.'+lang.name}" --action TEST
[TEST] From [Avatar.2009.mp4] to [Avatar (2009).mp4]
[TEST] From [Avatar.2009.de.mp3] to [Avatar (2009).German.mp3]
[TEST] From [Avatar.2009.fr.mp3] to [Avatar (2009).French.mp3]

Since {lang} works for individual items, it can also be accessed via {model} for all items from each individual item:

Code: Select all

$ filebot -rename * --db TheMovieDB -non-strict --format "{ny}{'.'+model.lang.name}" --action TEST
[TEST] from [Avatar.2009.mp4] to [Avatar (2009).[German, French].mp4]
[TEST] from [Avatar.2009.de.mp3] to [Avatar (2009).[German, French].mp3]
[TEST] from [Avatar.2009.fr.mp3] to [Avatar (2009).[German, French].mp3]

{model} is probably one of the most advanced bindings, so you won't find many examples. It's about taking into account all matches when formatting an individual match. It might be very useful for what you're trying to do though.


:!: For SUB/IDX you'll have to read the IDX file (text file) and check the language there. The {lang} binding doesn't do that.
:idea: Please read the FAQ and How to Request Help.
thielj
Posts: 55
Joined: 05 Nov 2017, 22:15

Re: External subtitle languages?

Post by thielj »

Thanks, {model} seems to be what I was looking for!
thielj
Posts: 55
Joined: 05 Nov 2017, 22:15

Re: External subtitle languages?

Post by thielj »

I still seem to have issues with FileBot picking up audio tracks (and pure audio files ending in .ac3 or .dts seem to be recognized as movies). Apart from that, here is the language recognition code so far, if anyone has similar needs:

Code: Select all

  // LANGUAGE PROCESSING ------------------------------------------------------------------------------------------------------

  // prduction language from the movie DB, not necessarily related to any spoken language in the movie!
  def dbOriginalLanguage = call{info.OriginalLanguage};
  // spoken languages from the movie DB, sorted by language name
  def dbSpokenLanguages  = call{languages}?:[];

  // try to guess the primary language of the movie
  def primaryLanguage    = (1 == dbSpokenLanguages.size()) ? dbSpokenLanguages[0] : Language.findLanguage(dbOriginalLanguage);

  // helper classes to enumerate embedded and external streams
  enum StreamType{ AUDIO, TEXT, HARD }
  class Stream {
      StreamType  type
      Language    lang
      Stream(StreamType t, Language l) { type=t; lang=l; }
      Stream(StreamType t, Locale l)   { type=t; lang=Language.getLanguage(l); }
      Stream(StreamType t, String l)   { type=t; lang=Language.findLanguage(l); }
      boolean equals(Object obj)       { return lang.equals( (obj as Stream).lang); }
      String toString() {
         switch (type) {
           case StreamType.AUDIO: return lang.ISO2.upper();
           case StreamType.TEXT:  return lang.ISO2;
           case StreamType.HARD:  return lang.ISO2+"-hard";
           default: break;
         }
         return null;
      }
  }

  // Embedded audio streams, provided by mediainfo
  // TODO: access mediainfo data and find the first 'default' audio track, if available?
  def embAudioStreams    = (call{audioLanguages}?:[]) .collect{ new Stream( StreamType.AUDIO, it ) };
  // External audio streams
  def extAudioStreams    = model.findAll{it.ext=~/(?i)^mp3|m4a|dts|ac3|wav|ogg$/} .collect{
                             new Stream( StreamType.AUDIO, it.lang?:"und" )
                           };
  // Embedded text/subtitle streams, provided by mediainfo
  def embTextStreams     = (call{textLanguages }?:[]) .collect{ new Stream( StreamType.TEXT, it ) };
  // External text/subtitle streams
  // TODO: which ones are recognnized by FileBot ???
  def extTextStreams     = model.findAll{it.ext=~/(?i)^ass|psb|srt|ssa|ssf|sub$/} .collect{
                             new Stream( StreamType.TEXT, it.lang?:"und" )
                           };
  // External VobSub titles (.idx and .sub) supporting multiple languages
  def extVobSubStreams   = model.findAll{it.ext=='idx'} .collectMany{
                             it.file.findAll(/^id: ([a-zA-Z]+)/)
                           } .collect{
                             new Stream( StreamType.TEXT, it )
                           };
  // try to match patterns indicating hard subs
  def embHardSubs        = myGrepContext.findAll(RE_HARDSUBS) .collect{ new Stream( StreamType.HARD, it ) };

  // if no other information available, assume the default audio stream is in the  primary language
  if( ! embAudioStreams && primaryLanguage ) embAudioStreams += [ new Stream( StreamType.AUDIO, primaryLanguage) ];

  def myStreams          = embAudioStreams+extAudioStreams+embHardSubs+embTextStreams+extVobSubStreams+extTextStreams;
Just do myStreams.unique() before tagging your file.
User avatar
rednoah
The Source
Posts: 24227
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: External subtitle languages?

Post by rednoah »

Yes, secondary files (i.e. any non-video files that match a video file by name) will automatically get matched to the the same match as the primary file. Unfortunately for your use case, that also means that all MediaInfo bindings will get redirected and retrieve values from the primary video file.

i.e.

Code: Select all

Movie (2000).mkv
Movie (2000).ac3
Movie (2000).dts
:idea: Please read the FAQ and How to Request Help.
thielj
Posts: 55
Joined: 05 Nov 2017, 22:15

Re: External subtitle languages?

Post by thielj »

In general, that's what I want, as long as the mkv/mp4/avi/divx etc is the 'primary file', and external ac3/dts/mp3 tracks are always secondaries.

Are there any hooks to modify the matches before moving on to actions? Is it possible to cache some data between invocations of my script, for example in the 'model' object?

I also noticed that matching seems to happen between sibling folders, i.e. I have two movies in different versions (different codecs/resolutions/etc). This is often due to one being e.g. the German release and the other the original, differing in source material, run-time, etc:

Code: Select all

Movies/(R)/Redacted (2000) [720p DTS 5.1 EN de]/Redacted (2000).nfo
Movies/(R)/Redacted (2000) [720p DTS 5.1 EN de]/Redacted (2000).mkv
Movies/(R)/Redacted (2000) [720p DTS 5.1 EN de]/Redacted (2000).de.srt

Movies/(R)/Redacted (2000) [704x384 DE]/Redacted (2000).nfo
Movies/(R)/Redacted (2000) [704x384 DE]/Redacted (2000).CD1.avi
Movies/(R)/Redacted (2000) [704x384 DE]/Redacted (2000).CD2.avi
As a workaround, I can process these separately I guess. Haven't fully investigated this yet due to the xattr issues.
User avatar
rednoah
The Source
Posts: 24227
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: External subtitle languages?

Post by rednoah »

1.
model can be particularly inefficient, because you'll access all files for each individual file each time, so it doesn't scale well. FileBot should be caching some stuff internally. There's a few crazy people that use internal APIs in custom formats. That's for another thread though.


2.
The --filter option should give you some control as to what gets matched. However, you don't get any hooks for checking the file/movie matches before processing. If you have the same movie file multiple times, then it will get matched to the same movie object multiple times. Your format will need to account for that if that's something that can happen in your case.
:idea: Please read the FAQ and How to Request Help.
Post Reply