Subtitles

All your suggestions, requests and ideas for future development
Post Reply
cheesemaker
Posts: 49
Joined: 03 Sep 2012, 10:52

Subtitles

Post by cheesemaker »

Hi there,

I've some movies with VOB-Subtitles (.idx & .sub). My naming scheme looks like this:

Code: Select all

"movieFormat=Filme/{n.upperInitial().space('.')}.({y})/{n.upperInitial().space('.')}.({y}){'.'+fn.file.path.match(/EXTENDED|UNCUT|DIRECTORS.CUT/)}{fn.match(/HOU|H-OU/) ? '.3D.H-OU' : ''}{fn.match(/HSBS|H-SBS/) ? '.3D.H-SBS' : ''}{\".\$ac\"}{\".\$vf\"}{\".\$source\"}{\".\$vc\"}{'.'+fn.file.path.match(/REPACK/)}{\"-\$group\"}{\".\$lang\"}" "ut_label=Movie"
Unfortunately, it looks like Filebot can't recognize the language of the idx files and it also mixes up the according .sub file.

For example an input folder is structured like this:
My.Movie.2014.German.DTS.DL.1080p.BluRay.x264-ReleaseGroup
-- releasegroup-mymovie-1080p.mkv
-- releasegroup-mymovie-1080p.nfo
-- / Sample
-- -- releasegroup-mymovie-1080p-sample.mkv
-- / Subs
-- -- releasegroup-mymovie-1080p-forced.idx
-- -- releasegroup-mymovie-1080p-forced.sub
-- -- releasegroup-mymovie-1080p.idx
-- -- releasegroup-mymovie-1080p.sub

So I've got 2 Problems:
Filebot doesn't recognize (with the {lang} operator) the language of the .idx and the according .sub file.
Filebot doesn't recognize forced subtitles (I tried: fn.match(/forced/) but output is an error).

The output would be:

Code: Select all

[TEST] Rename [/.../releasegroup-mymovie-1080p.mkv] to [/.../My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.mkv]
[TEST] Rename [/.../Subs/releasegroup-mymovie-1080p-forced.idx] to [/.../My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.idx]
[TEST] Rename [/.../Subs/releasegroup-mymovie-1080p.idx] to [/.../My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.idx]
[TEST] Rename [/.../Subs/releasegroup-mymovie-1080p.sub] to [/.../My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.sub]
[TEST] Rename [/.../Subs/releasegroup-mymovie-1080p-forced.sub] to [/.../My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.sub]
Processed 5 files
Done ヾ(@⌒ー⌒@)ノ
So you see, at the end I have only one subtitle left (the forced one), with no language ending (It should be "My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.eng.idx, because sometimes I have more than one language).

So the output should be like this:

Code: Select all

My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.mkv

My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.eng.idx  (| Important,                        |)
My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.eng.sub (| these must be the according files  |)

My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.forced.eng.idx  (| Important,                        |)
My.Movie.(2014).DTS.1080p.BluRay.AVC-ReleaseGroup.forced.eng.sub (| these must be the according files  |)
I hope you understood my problem and you could help me out a bit.

Thanks,
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

1.

Code: Select all

{fn.match(/forced/)}
If the fn contains 'forced' it will work. I tried, and it works just fine. It can't not. It's literally what the code does.

You probably get a warning since you haven't set your sample file for testing to a path that actually contains that 'forced' tag. You're testing 'What if it doesn't contain forced?'.


2.
IDX+SUB are binary image files, not text. Language can't be auto-detected.
:idea: Please read the FAQ and How to Request Help.
cheesemaker
Posts: 49
Joined: 03 Sep 2012, 10:52

Re: Subtitles

Post by cheesemaker »

Thanks for your reply.
1. I'll try it again.

2. .idx aren't binary files. There it would be possible to match the language, I think the difficulty is to handle the according .sub, but which has the same filename (just the other extension).
Here is a sample .idx:

Code: Select all

# VobSub index file, v7 (do not modify this line!)
# Created by BDSup2Sub 4.0.1

# Frame size
size: 1920x1080

# Origin - upper-left corner
org: 0, 0

# Scaling
scale: 100%, 100%

# Alpha blending
alpha: 100%

# Smoothing
smooth: OFF

# Fade in/out in milliseconds
fadein/out: 0, 0

# Force subtitle placement relative to (org.x, org.y)
align: OFF at LEFT TOP

# For correcting non-progressive desync. (in millisecs or hh:mm:ss:ms)
time offset: 0

# ON: displays only forced subtitles, OFF: shows everything
forced subs: OFF

# The palette of the generated file
palette: 000000, f0f0f0, cccccc, 999999, 3333fa, 1111bb, fa3333, bb1111, 33fa33, 11bb11, fafa33, bbbb11, fa33fa, bb11bb, 33fafa, 11bbbb

# Custom colors (transp idxs and the four colors)
custom colors: OFF, tridx: 1000, colors: 000000, 444444, 888888, cccccc

# Language index in use
langidx: 0

# English
id: en, index: 0
# Decomment next line to activate alternative name in DirectVobSub / Windows Media Player 6.x
# alt: English
# Vob/Cell ID: 1, 1 (PTS: 0)
timestamp: 00:02:13:008, filepos: 000000000
timestamp: 00:03:25:205, filepos: 000000800
timestamp: 00:03:27:541, filepos: 000002800
... and so on ...
These files have an "id", which is in this one "en" for English. (There is for example also "de" for German).

Best Regards,
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

Added support for guessing language from .idx contents with r1967.
:idea: Please read the FAQ and How to Request Help.
cheesemaker
Posts: 49
Joined: 03 Sep 2012, 10:52

Re: Subtitles

Post by cheesemaker »

I've tested it with some files and it looks like it's working. Thank you!
mueller56
Posts: 9
Joined: 10 Dec 2015, 18:33

Re: Subtitles

Post by mueller56 »

I was really happy when I found this thread, because I didn't think Filebot supported guessing the language from .idx/.sub pairs. Unfortunately, I had to find out that apparently this feature was dropped in the newer versions of Filebot. I know that these VobSubs are not state of the art anymore, but as the Scene still stipulates in their rules to use VobSubs as external subtitle format, they are far from dead.

I try to avoid converting subtitles to a more modern format like *.srt, because there are always a couple of spelling errors finding their way in there and additionally, I try to keep the releases in their original state.

It seems as there aren't many other users missing this feature, but nevertheless, I hope that this can find its way back into the next version of Filebot.

Of course, I could continue to use an old version of Filebot, but much rather would I get all the advantages and bug fixes of the newer versions AND the language guessing for VobSubs.

Apart from this little feature request, I am more than impressed with what Filebot can do and I use it almost daily for my renaming! Thanks for making my life a little easier :)
Last edited by mueller56 on 11 Dec 2015, 04:57, edited 1 time in total.
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

1.
For *.idx files you don't need to guess anything, it says so in the file:

Code: Select all

id: en, index: 0
2.
Most people that need "language guessing" actually only need "separate English from <native language>" kind of functionality.

@see viewtopic.php?f=8&t=2726#p15388
:idea: Please read the FAQ and How to Request Help.
mueller56
Posts: 9
Joined: 10 Dec 2015, 18:33

Re: Subtitles

Post by mueller56 »

Okay, so for .idx it seems pretty simple then, I could just do this for every language that usually comes with what I have:

Code: Select all

{if (ext == 'idx' && file.text.matchAll('id: en')) '.eng'}
But what about the corresponding .sub file? Basically, it has the same filename as the .idx, just the different extension. Is there a way to rename it just like the .idx?
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

That'll be a bit trickier... you'd need something like:

Code: Select all

if (ext == 'sub' && file.parentFile.listFiles().findResult{ it == fn+'.idx' }).text.matchAll('id: en')) '.eng'
(pseudo code, not tested)
:idea: Please read the FAQ and How to Request Help.
mueller56
Posts: 9
Joined: 10 Dec 2015, 18:33

Re: Subtitles

Post by mueller56 »

I tested the schemes and found that only renaming the .idx works. For the .sub I haven't been able to get it working.

The structure of my files before processing is:

Code: Select all

somefilename.mkv
somefilename.idx
somefilename.sub
somefilename-eng.idx
somefilename-eng.sub
somefilename-eng-forced.idx
somefilename-eng-forced.sub
somefilename-forced.idx
somefilename-forced.sub
So I guess

Code: Select all

ext == 'sub' && file.parentFile.listFiles()
returns all the files in the folder, then

Code: Select all

findResult{ it == fn+'.idx' }
should return only the file, that has the basename of the .sub file plus the .idx extension. That, however, doesn't seem to happen. I noticed that you had a closing parenthsis without an opening one at the end of the findResult function, so I just removed it, but still no success. I can't see why this shouldn't work, seems pretty straight-forward to me, but maybe there's a syntax problem?
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

The result of findResult{ it == fn+'.idx' } is true not the actual object where the condition is true. :lol:

Anyway, this is shorter, and tested:

Code: Select all

{file.resolveSibling("${fn}.idx").text.matchAll('id: en')}
:idea: Please read the FAQ and How to Request Help.
mueller56
Posts: 9
Joined: 10 Dec 2015, 18:33

Re: Subtitles

Post by mueller56 »

Great, this is near perfection now! One thing I'd like to do though is to recognize if the IDX file contains multiple subtitle tracks or not. Now there would be two ways to do that, but unfortunately I couldn't get either one to work on the command line.
  1. Match against "index: 1". Since "index: 0" is always the first subtitle track, the file must have multiple tracks if the expression finds a match. I wasn't able to figure out how to negate an expression, I thought it might work like this but it didn't:

    Code: Select all

    {if (ext == "idx" && !file.text.match("index: 1")) file.text.match("(?<=id: )..")}
    (I added positive lookbehind to the latter match function to match every possible two-letter language code)
    So how can I negate that first match function?
  2. Count the occurences of "index: " in the IDX file. I tried something like this, but that won't work:

    Code: Select all

    {if (ext == "idx" && file.text.matchAll("index: ").count==1) file.text.match("(?<=id: )..")}
I'm sure there's a fairly easy solution to the problem, but I can't seem to be able to find it :(

Btw. is there a way to convert the matched two-letter language codes to three-letter ones?
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

This regex will match all 2 or 3 letter language codes:

Code: Select all

index: [a-z]{2,3}
You can do your own replace logic for language codes, either in the format, or via external csv file:
viewtopic.php?f=5&t=182


EDIT:


Actually, you can do it quite easily via Locale code:

Code: Select all

{'en'.toLocale().ISO3Language}

Code: Select all

{'eng'.toLocale().displayLanguage}
:idea: Please read the FAQ and How to Request Help.
mueller56
Posts: 9
Joined: 10 Dec 2015, 18:33

Re: Subtitles

Post by mueller56 »

Great :) That answers one of my questions. Now I only need to know how to distinguish between IDX files that only have one subtitle track and the ones that have multiple tracks. How can I count the number of results, that the matchAll()-function returns, so I can use that in the if-clause :?
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

Here you go:

Code: Select all

{[1,2,3].size}
:idea: Please read the FAQ and How to Request Help.
mueller56
Posts: 9
Joined: 10 Dec 2015, 18:33

Re: Subtitles

Post by mueller56 »

Sorry for digging this old thread out, but after a long time of perfect FileBotness, I have to slightly change my automation process and am now trying to switch to the AMC script instead of all my custom stuff. I am, however, once again stuck on the topic of subtitles :/ I have always used the movie format solution that came out of this thread:

Code: Select all

{n} ({y})/{n} ({y})
{if (ext == 'idx') '.'+file.text.match('(?<=id: )[a-z]{2}').toLocale().ISO3Language}
{if (ext == 'sub') '.'+file.resolveSibling('${fn}.idx').text.match('(?<=id: )[a-z]{2}').toLocale().ISO3Language}
{'.'+fn.match(/forced/)}
This has always worked like a charm via the command line for examples like this one:

Code: Select all

movie.mkv
movie.eng.idx
movie.eng.sub
movie.forced.idx
movie.forced.sub
movie.idx
movie.sub
When using the AMC script with --def movieFormat, the .idx file gets renamed properly with the extracted language flag from the file, but unfortunately the .sub file is missing the language tag, which leads me to believe that the resolveSibling function is the culprit here, since that is really the only difference between .idx and .sub renaming.

Any idea what is causing this problem and how to fix it? :?

EDIT: Of course, just now after posting this, I find the solution :lol: Parameter expansion doesn't work with single quotes, so I guess ${fn} couldn't be expanded. With double quotes around it, it works 8-)
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Subtitles

Post by rednoah »

Best to use @files to load complex arguments from text files so you don't have to worry about command-line escaping. ;)
:idea: Please read the FAQ and How to Request Help.
Post Reply