hearing impaired subtitle files (.HI) problems

All your suggestions, requests and ideas for future development
Post Reply
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

hearing impaired subtitle files (.HI) problems

Post by beamer145 »

Filebot renaming (V3.3) seems to not like subtitle files which have a .HI tag (hearing impaired) in the name : both the group and the lang bindings give often wrong results.

I can understand it gets confused wrt to the lang binding. Though it should be possible to deal with a lot of cases by checking if another language tag is present right in front of the .hi , and if so then that is likely the real language (still gives problems for "Johnny English" etc, but maybe this is solvable as well, because it is known that that English is part of the title)

I don't get why the group gets mixed up though.


Eg with the format string {n}.{s00e00}.{t}.{vc}.{ac}{".$group"}{".$lang"} I get:
xxxxxxx.- Pilot.xvid.asap.English.HI => xxxxx.XviD.MP3.ASAP.hin (lang not ok)
xxxxxxx.FQM.English.HI.C.orig.Addic7ed.com => xxxxx.XviD.MP3.ASAP.eng (lang ok but group not ok)
xxxxxxx.HDTV.XviD-FQM.HI => xxxxx.XviD.MP3.ASAP.hin (lang not ok but he cannot know, group nok)
(the resulting wrong group always seems to be ASAP but that can be a coincidence)


Note that i started from a really messy folder, where (originally manually downloaded) subtitles and video files did not match.
After the rename step, sometimes they still did not match for the .HI files with problems (eg a different group for the subtitle file and for the video file)
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

Mmm sometimes the vc tag of the subtitle file also differs from the video file. How is this tag determined for the subtitle file (I assumed it just took the corresponding tags from the video file, but apparently that is not (always) the case.....) ?
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

I suppose .hi gets interpreted as Hindi. I can add a special case for that.

With the other things I need all your filename to test. Except for {lang} all the other bindings will be based on the corresponding video file. That assumes that video and subtitle file are at least named well enough to make that connection.

I guess the naming engine can't currently guess things like Alias.101.english.hi.srt matching Alias.S01E01.Title.avi

EDIT: Fixed with r1424. Hindi is now exempt from 2-letter language codes. The second one was extremely tricky but it should work as you're expecting now.
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

I just tried r1424 and it seems to solve both problems, so I assume there is no longer a need for a list of example filenames which show the problem.
Thanks !!
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

I just ran into a related minor issue with the r1424
When I have 2 versions of the subtitles file, eg a normal and a .HI version, they will both be transformed to the same filename.

So I get a conflict message when I select rename.

I can fix this by deleting one of the subtitle files before running filebot, but I was hoping for a different solution.
Probably I can do this automatically by checking for ".HI" myself with the groovy syntax (still need to look into that) and preserve the .HI indication if it is present.
But anyway, for now I tried to manually remove one version from the filelist from within filebot, and I wanted to do that by selecting the name in the output column with a mouseclick and then pressing delete on the keyboard.
Bad idea :) :
- For some reason this resulted in the entry below the selected one to be removed (in this case the movie name)
- The group name of the file that should have been removed becomes wrong during the reparsing of all the files after/on the deletion point (BAJSKORV io LOL). This is btw the same incorrect group I got when using the original v3.3, so it is probably related somehow.
- The filenames behind the removed one the movie files are marked in red which probably indicates some sort of trouble.
- There is something strange going on with the matching of input to output elements, eg no empty slot in the output column for the removed element, but there is one for the last element. The output column gets shifted somehow.

Maybe it is also interesting to add tooltips to the buttons below the input and output lists, because when looking for a way to solve the conflict it is not really obvious what these are for.
Eg if I press the up button below the output list, he seems to reparse the filename, but all the movie names get marked in red (as if the reparsing fails the second time ??)

Edit: forgot to add the screenshots I made illustrating the delete problem (before and after pressing delete)
http://imgur.com/BmBUWCS
http://imgur.com/5q8Pc1q
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

1. Yep, if you have hi and non-hi you'll have to take care of that in the naming scheme via {fn.match(/[.]hi$/)}

2. I don't understand what you mean. I tried a few things and all works. You can check via {media.filename} in the format what filebot things is the corresponding video file.

3. The Up-Down buttons? They just move entries up and down, and since drag-n-reorder is much more efficient for doing that kinda just the for decoration.

4. File<->Movie matches get highlighted in red? Are you sure it's not File<->Episode matches where SxE doesn't match?

5. Did you read the docs on {group}? It doesn't magically know all groups. It only knows the ones I added. Match preference goes like this: original filename > current filename > current foldername.
http://filebot.sourceforge.net/forums/v ... hp?f=5&t=4
:idea: Please read the FAQ and How to Request Help.
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

Yes, it's two stacks. If you delete something on one side then the other side is one short doesn't align anymore => One click on "Match" will align everything again.
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

>Yes, it's two stacks.
Ah crap, with that in mind everything I am seeing makes perfect sense.

I thought pressing delete was suppose to affect both lists, as the fact that both lists have matching solid blue selections kind of insinuates that ( if only one is affected, it would be more intuitive if there was a visible difference in the way entries are selected. Eg the 'active' list could have a blue selection background, the inactive one a gray background for selected entries, like eg 2xExplorer ).

Anyway, sorry to have bothered you.
And thanks for the app !
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

Fixed the other issue as well. Because of all that "C.hi.orig" it couldn't find the corresponding video file and then just defaulted to the first one.
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

Ah, that explains the strange occurrence of always the same group for some files.

BTW maybe interesting for other ppl ending up in this thread: i added an extra match for the xxx.hi.xxx case (yours was only for xxx.hi)
{fn.match(/[.]HI(?=\.)/)}{fn.match(/[.]HI$/)}

( or my full string {n}.{s00e00}.{t}.{vc}.{ac}{".$group"}{".$lang"}{fn.match(/[.]HI(?=\.)/)}{fn.match(/[.]HI$/)} )

Note: maybe there are more optimal ways eg to merge them into one match operation, or to avoid the lookahead needed to not have a double '.' at the end, but I don't really know what is possible in the match operator's pattern argument, eg I am already surprised/happy he does not get confused about the double use of the round brackets '(' and ')' (inside the regexp and around the argument)
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

Maybe like this:

Code: Select all

{fn.match(/\.HI\b/)}
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

.HI\b will get confused for http://www.imdb.com/title/tt0132213/?ref_=fn_al_tt_5 :) (I think, now I am not sure if a - counts :) )
And there is no way anyone can deal with http://www.imdb.com/title/tt1588694/?ref_=fn_al_tt_1 :)

I just ran into a new unrelated problem
Eg Terra.Nova.S01E01.720p.HDTV.x264 is matched on the show NOVA instead of terra nova.
This happens both with tvrage as with thetvdb.
When searching for subtitles they were retrieved correctly though.
If I manually search for Terra Nova then I get the correct show, so apparently he is looking for Nova, or is giving that search result a higher probabilty then the match for Terra Nova.
I am still on r1424.
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

1. Match for ".HI."

Code: Select all

{file.name.match(/[.]HI[.]/}
Actually I'd be more fun to auto-detect that from the subtitle content:

Code: Select all

{file.length() < 10e6 && {file.text.match(/(?m)^\(.+\)$/)} ? 'HI' : ''}
e.g. for files less than 10 MB, match subtitle text for (something) lines. I'm assuming that HI subtitles contain lines like (whistling) (singing a song) etc

2. Please do some more testing and tell me what might be throwing things off, cause it works for me:

Code: Select all

Parameter: ut_kind = single
Parameter: ut_dir = E:\testdata\utorrent-postprocess-test
Parameter: ut_file = Terra.Nova.S01E01.720p.HDTV.x264.avi
Parameter: ut_title = Deluge Test Folder
Input: E:\testdata\utorrent-postprocess-test\Terra.Nova.S01E01.720p.HDTV.x264.avi
Group: [tvs:Terra Nova] => [Terra.Nova.S01E01.720p.HDTV.x264.avi]
Rename episodes using [TheTVDB]
Auto-detected query: [Terra Nova, Nova]
Fetching episode data for [NOVA]
Fetching episode data for [Terra Nova]
Fetching episode data for [NOVA scienceNOW]
[TEST] Rename [E:\testdata\utorrent-postprocess-test\Terra.Nova.S01E01.720p.HDTV.x264.avi] to [E:\testdata\Deluge\output\TV Shows\Terra Nova\Season 1\Terra Nova - S01E01 - Genesis.avi]
Processed 1 files
Done ヾ(@⌒ー⌒@)ノ
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

[1] {file.name.match(/[.]HI[.]/} : that fails for subtitles ending on .HI? That is why I had to split it up for the 2 cases. Also it returns .HI. giving you an extra . at the end (i think) ?
The second expression is a cool idea :)
If I open a few .HI subtitles is seems [] are the delimiters....but I guess there is not really a standard.
So if we go into full paranoia mode we get something like this ? :) :
{file.length() < 10e6 && {file.text.match(/(?m)^[\(\[].+[\)\]]$/)} && ( {file.name.match(/[.]HI[.]/} || {file.name.match(/[.]HI$/} )? '.HI' : ''}

[2] How do I enable the debug output you added ?
I tried a few things from the CLI page but that did not seem to work....
filebot-3.3-r1424-revref.jar --log all => just starts the gui with nothing on the command line stdout (i immediately get a prompt again), and I cannot immediately find a log file
filebot-3.3-r1424-revref.jar --log all -rename D:\Series\TerraNova\test\ => no reaction as far as I can tell , I just go to the prompt again, file has not been renamed
filebot-3.3-r1424-revref.jar -help => also no reaction, just the prompt again

I also tried a few filename variations in the gui, both 'Terra.Nova' and 'Terra Nova' lead to the incorrect result, but 'TerraNova' returns the correct series.

Gui screenshot of the Terra.Nova result to prove that i am not entirely dreaming : http://imgur.com/UUBDRBU
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

1. {fn} is name without extension, {file.name} however is with extension so there'll always be a .srt or something. The dot we need. ;)

2. For cmdline I recommend replacing the jar in your install folder and then using my filebot startup script. If you installed the 'filebot' command will do that.

3. I'll have a look at terra nova. Please send me the full path of the file. And other files in the same folder.

EDIT:
E:\testdata\p3252\Terra.Nova.S01E01.720p.HDTV.x264.avi => WORKS

Can't reproduce the issue. What's the full path?
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

[2] Mmmm, I did not use the installer, I just downloaded the jar file
filebot-3.3-r1424-revref.jar (at the moment, originally I was using the one linked from the home page)
and the libs
http://sourceforge.net/p/filebot/code/H ... win32-x64/
and put them together in a folder.

I don't have access to my home pc at the moment, but will give it a try later today to see if installing makes any difference, and check if I can get logging enabled.
Other possibility : is there maybe a different behavior in the libs between x64 and x86 ?

[3] For testing, I made a folder with just the single file
D:\Series\TerraNova\test\Terra.Nova.S01E01.720p.HDTV.x264.mkv

After starting the jar file, I think i drag/dropped either the file in the gui or else the 'test' folder, I am not sure anymore.
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

The folder "TerraNova" parent folder messed things up due to missing space. Doesn't match "Terra Nova" while "NOVA" is a substring.

Fixed with r1435.

PS: In anycase I recommend using the platform package (e.g. the .msi) cause that includes optimized settings.
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

>The folder "TerraNova" parent folder messed things up due to missing space. Doesn't match "Terra Nova" while "NOVA" is a substring.
Ah i was not aware the folder name was taken into account when doing the match. I am not a big fan of spaces in my names, especially not folder names so I camelcase everything.
That gave me loads of fun modifying the XMBC scrapers as well (especially since they do a ToLower() on all the input file/folder names before you arrive to the part where you can modify something in the add-on system.
Is there anything I can modify for filebot in the way he generates search querys from file/folder names, or is that in code (which is still modifyable of course). From the looks of the r1435 changelist, it seems to be in code.

Anyway, all files are good now except for the double episode:
Terra.Nova.S01E12E13.HDTV.XviD-LOL.avi

That is for some reason still mapped to a nova episode :
NOVA.S01E12-E13.Fusion: The Energy Of Promise & Te mystery Of The Anasazi

Full path:
D:\Series\TerraNova\Season01\Terra.Nova.S01E12E13.HDTV.XviD-LOL.avi
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

Code: Select all

C:\>filebot -list --q "terra nova" --db thetvdb
Terra Nova - 1x01 - Genesis
Terra Nova - 1x02 - Instinct
Terra Nova - 1x03 - What Remains
Terra Nova - 1x04 - The Runaway
Terra Nova - 1x05 - Bylaw
Terra Nova - 1x06 - Nightfall
Terra Nova - 1x07 - Proof
Terra Nova - 1x08 - Vs.
Terra Nova - 1x09 - Now You See Me
Terra Nova - 1x10 - Within
Terra Nova - 1x11 - Occupation / Resistance
No 1x12 & 1x13. Can't work if your episode is invalid.

PS: Found some other Multi-Episode related issues that are fixed now.
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

IMDB does list 13 episodes (http://www.imdb.com/title/tt1641349/episodes).

Ah, the reason is mentioned on the wikipedia page (http://en.wikipedia.org/wiki/Terra_Nova_%28TV_series%29) :
"Note: As first and last episodes are two hours, some senders cut them up to air as 13 episodes."

Now I am wondering what he will have done for the subtitles :)

I guess this one will require manual intervention.

Thanks for the support !
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

TVRage lists it as 13 episodes. So if you match with TVRage data it should work.
:idea: Please read the FAQ and How to Request Help.
beamer145
Posts: 17
Joined: 27 Jan 2013, 01:30

Re: hearing impaired subtitle files (.HI) problems

Post by beamer145 »

Sorry to bother you again, but I am bumping into something new.

I have files names:
Haven.S01E01.DVDRip.XviD-SAiNTS.avi
Haven.S01E01.DVDRip.XviD-SAiNTS.idx
Haven.S01E01.DVDRip.XviD-SAiNTS.sub
...

I don't know if .idx/.sub is supported by Filebot, since he tries to pick a language for the idx files I am assuming it is. If not, ignore the rest.

For some reason, he is really slow at processing the sub files when he does the final output name generation (but ok, that is not yet really an issue) + he thinks the .idx files are slovenian (or whatever the the .slo language code stands for) + the sub files are not tagged with the language (maybe because he does not associates them with the .idx and therefore does not know the language connecton ?) .
The idx files contain:
....
# Language index in use
langidx: 0

# English
id: en, index: 0
# Decomment next line to activate alternative name in DirectVobSub / Windows Media Player 6.x
# alt: English
....

So i am not really sure why he picks .slo (for regular srt files he always picked .eng when the language is not part of the filename)

Folder:
D:\Series\Haven\S01\*

Screenshot :
http://imgur.com/l46qyNK
(also shows the slow .sub renaming, it takes 10-20 seconds for each .sub file so plenty of time to take a screenshot)
User avatar
rednoah
The Source
Posts: 22974
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: hearing impaired subtitle files (.HI) problems

Post by rednoah »

{lang} is based on the filename if possible but fall back to statistical language detection based on the text. I guess .idx mostly contains frame numbers, while .sub is binary.

Please send me a idx/sub pair and I'll have a look. I'll have to exclude VobSub subtitles from any normal subtitle processing.

EDIT: VobSub idx/sub pairs will be excluded from text-based processing with r1439
:idea: Please read the FAQ and How to Request Help.
Post Reply