appending subtitles with .sdh postfix based on text inside the subtitle file

Any questions? Need some help?
Post Reply
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

Hi

I have subtitles that are badly named (e.g 2_English.srt; 3_English.srt). Some are SDH, some are not. I want to rename them so the '.sdh' postfix is appended after the '.eng' so that Plex doesn't throw an "Unknown" flag on them. I use the GUI, no AMC.

After scouring the forums and cannibalising code here and there, I've done this:

Code: Select all

{drive}/series/{~emby.derive{' '+allOf{vf}{vcf}{ac}}} {if (ext == 'srt' && file.text.match(/(?m)^\[.+\]$/).size() > 2) '.sdh' else '.eng'}


This is supposed to look for text in brackets and if it gets more than 2 hits, it puts the .sdh postfix. I ripped the regex from one of rednoah's posts in another similar thread. I tried to simplify it so it just looks for a '[' (so 'match(/\[/)') but that didn't work. Also, I'm not sure why to use 'size', I tried 'count' but that also didn't work.

Anyway, the code aboves works but since 'match' does not return a true/false boolean, 'else' is never triggered. Now I can technically live with that since the .eng postfix will be put there anyway by the emby/plex/whatever expression but I want to do it right. I think I need to use the groovy match or find operator ('=~' and '==~') but I can't figure out how to do it.

I tried the below, and while it does put .sdh on some subtitles, it's not correct or reliable.

Code: Select all

{drive}/series/{~emby.derive{' '+allOf{vf}{vcf}{ac}}} {if (ext == 'srt' && file.text=~(/(?m)^\[.+\]$/).size() > 2) '.sdh' else '.eng'}

I know 0 groovy and regex, and I've been bashing my head against this task all day, my brain is completely fried at this point. I know I'm probably missing something extremely basic and silly, like I'm not escaping characters correctly or something like that, but I'm really stuck.

Also, I found out this expression String.matchBrackets() in the help page but I couldn't get it to work either.

Any help would be really appreciated :)
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by rednoah »

e.g. Test Case: match all text snippets that match the [ letters or spaces ] regex pattern:

Code: Select all

{
	'''[NOISE] [STEPS] Hello World'''.matchAll(/\[[\w\s]+\]/)
}

e.g. Custom .SDH detection (10+ [...] occurrences or more) based on subtitle text content:

Code: Select all

{
	f.subtitle && f.text.matchAll(/\[[\w\s]+\]/).size() >= 10 ? '.SDH' : null
}
** since you did not share sample subtitle files we cannot test if this code will actually work for your subtitle files


:!: String.matchBrackets() has been added recently, depending on your FileBot revision. Notably, it'll match () [] and {} so it's not quite what you're looking for.
:idea: Please read the FAQ and How to Request Help.
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

Thanks Noah!

So the new code you wrote works, but it has the same problem as the initial code I was using, even with the '?' operator. It does not append the postfix on the non-sdh subs. I'll put some screenshots so it's a bit easier to explain. I'm also sharing screenshots of the code, just in case I'm doing something stupid when entering the code.

I changed 'null' to 'test' so I can check it (since the 'emby' expression already autodects lang and puts the .eng postfix).
Image

Once I run it, .sdh is correctly appended to all the sdh subs (the '3_English' files in this example), but the non-sdh ('2_English' files) don't get the .test postfix.
Image

Then I changed the size() from 10 to 1000 hits as a test.
Image


This one puts the 'test' postfix only on the sdh subts ( '3_English' files). It seems that it triggers on files that have non-zero matches, but less than 1000. zero matches (which we can speculate the '2_English' files to be) does not append the .test postfix. I think we would expect all the files to be appended with .test here, since none of them have more than 1000 hits. It seems like a boolean error? the logic operation is conditional on the 'matchAll' returning something, and if it doesn't then it unravels and stops the "else" from being triggered. That's just a gut feeling, I don't know how any of this works under the hood.
Image

I didn't find how to attach files to a post, if you think it's an issue with the text inside the files I can upload the .srt files to a google drive or something.

Also, ignore the '[]' in the filename. It's just because they are not in the same folder as the mkv files, so it can't fetch the derivative info.
Last edited by Philmag on 10 Mar 2023, 07:58, edited 1 time in total.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by rednoah »

You can upload text files to pastebin:
https://pastebin.com/



:idea: Are you prototyping your format in the Preset Editor? You'll want to use the Format Editor to prototype your format, so that you can see status and error messages. Select an item that doesn't work right from the list, then do Double-Click -> Edit Format to prototype your format on that specific match.

Image



:idea: String.matchAll() likely errors out if there are 0 matches, which is useful in many uses cases, except it's not useful in this particular use case. See docs below for a workaround:
viewtopic.php?t=1895


e.g. always return 0 if something goes wrong, either because there is no match, or the file is not readable, or something else:

Code: Select all

{ any{ f.text.matchAll(/\[[\w\s]+\]/) }{ 0 } }
:idea: Please read the FAQ and How to Request Help.
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

Oh, completely forgot about pastebin. I don't think we need it right now though?

Yeah, I found that page yesterday but I was too burned out to actually understand it. I'll give it a shot now. I still need to call out 'f.text' for it it too look inside the srt file, right?
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

Ok just saw your edit, thanks!

I think I'm using the format editor, it does give me syntax errors ( I found out the hard way after doing 100 times via the preset editor and I knew I had an error when it would hang for a little bit and then return me to the preset editor, fun times :) )
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by rednoah »

Yep, f.text will read the file, and give you a String value. You'll want make sure to only read subtitle files, because Groovy won't stop you from reading >700 MB files, which might explain the hang if you're testing with video files instead of subtitle files. :lol:
:idea: Please read the FAQ and How to Request Help.
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

It worked! :)

Code: Select all

{drive}/series/{~emby.derive{' '+allOf{vf}{vcf}{ac}}} {any{f.subtitle && f.text.matchAll(/\[[\w\s]+\]/).size() >= 10} {0} ? '.sdh' : '.test'}
I spent an ungodly amount of time trying to debug it because I stupidly forgot about the ? operator and was instead trying to nest it inside an 'if' statement. It was just printing 'true' on the filename, I was losing my mind lol.

I got a few questions, 1) academic, 2) is practical:

1) By putting {0} as the second "match" for {any}, we are telling {any} to always return 0 if it doesn't find anything in the curly brackets preceding the {0}, right? Is that because a non-escaped 0 will always return 'false'? We are doing this so that the '?' ternary operator can then trigger then 'else' statement, because a 'null' return is not 'false' for groovy truth. So it's like avoiding a nullexception, right?

2) I have all the subs in a different directory than the actual video files. The parent directory for the subs is the original release name for each file. It's like this: (you've seen this structure before for sure)

Code: Select all

name.season.episode/2.English.srt
I already moved and renamed the actual video files to their final destination, I just left these behind because at the time I didn't know filebot would be able to handle them, but now I realise that it can handle them very easily.

I have some additional properties on the video file names ({vf}{vcf}{ac}).


To rename and move the subs to the same folder as the video files, I need to do a two step operation. 1) Run the code above so it renames and moves the subs to folder with the video file.'derive' doesn't work since the media file is not in the same folder, so I just get empty brackets. Once they are in the final destination, alongside the media files, I run the same code again so that it pulls the ({vf}{vcf}{ac}) data from the media files and renames the subs exactly as the media files (plus .sdh;eng. etc postfixes).

Do you think there is a way to do this in a single step? Somehow tell filebot to lookahead in the destination folder, get the ({vf}{vcf}{ac}) tags and append them to the srt files in one go? Or is it a hard limitation that the subs and media files are under the same parent directory?

I don't mind doing it in two steps, but it is around 4k files and I do like the challenge of coding things so if it's possible in theory, just point me to the right direction and I'll start tinkering with it.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by rednoah »

1.
Your code works accidentally, not on purpose, because your any{}{} will return either true/false or 0 which is then used as condition. Using 0 as a condition happens to work because Groovy interprets 0 as false.

e.g. correct code would look like this:

Code: Select all

f.subtitle && any{ f.text.matchAll(/\[[\w\s]+\]/) }{ 0 } > 10 ? '.SDH' : null
tl;dr get a number; then compare that number; avoid using non-boolean values (e.g. null is interpreted as false) in conditions unless you are familiar with Groovy Truth: https://groovy-lang.org/semantics.html#the-groovy-truth


:!: f.text.matchAll(/\[[\w\s]+\]/) can throw all kinds of exceptions because reading a file can fail for all kinds of reasons. In this case, it's all about catching the "No Match" exception. But yeah, it's all about catching exceptions, NPE specifically is not one of them though.



2.
FileBot can't do a look-ahead in the format code because your format code is what generates the target file path in the first place, i.e. the format doesn't know the result of the format. I'd revert all the video files to their original files paths via the History feature, and then process both video files and subtitle files at once.



3.
Note that .derive() works. It's {vc} and friends that don't work, and .derive() then correctly catches and ignores those errors.
:idea: Please read the FAQ and How to Request Help.
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

rednoah wrote: 11 Mar 2023, 12:51 1.
Your code works accidentally, not on purpose, because your any{}{} will return either true/false or 0 which is then used as condition. Using 0 as a condition happens to work because Groovy interprets 0 as false.

e.g. correct code would look like this:

Code: Select all

f.subtitle && any{ f.text.matchAll(/\[[\w\s]+\]/) }{ 0 } > 10 ? '.SDH' : null
tl;dr get a number; then compare that number; avoid using non-boolean values (e.g. null is interpreted as false) in conditions unless you are familiar with Groovy Truth: https://groovy-lang.org/semantics.html#the-groovy-truth


:!: f.text.matchAll(/\[[\w\s]+\]/) can throw all kinds of exceptions because reading a file can fail for all kinds of reasons. In this case, it's all about catching the "No Match" exception. But yeah, it's all about catching exceptions, NPE specifically is not one of them though.



2.
FileBot can't do a look-ahead in the format code because your format code is what generates the target file path in the first place, i.e. the format doesn't know the result of the format. I'd revert all the video files to their original files paths via the History feature, and then process both video files and subtitle files at once.



3.
Note that .derive() works. It's {vc} and friends that don't work, and .derive() then correctly catches and ignores those errors.
Thank you very much!

Did you remove the 'size()" on purpose or is it accidental? Because the code without it returns a "suppressed" error. It works fine if I add the size() to your last code. I didn't take a screenshot of it at the time but I can reproduce it and post ss if you need it.

Also, I discovered that some of the subtitle files are using () instead of [] for the captions. I was going to write a regex to tell it to look at both () and [] (using pipe character (|) for each), but I just remembered that String.matchBrackets() expression, can't we use that instead? You said "(...)it'll match () [] and {} so it's not quite what you're looking for.", do you mean it matches left and right brackets without text inside them?

To be honest I don't think we need the code to match text surrounded by brackets, we could just tell it to match a left or right bracket (of any type: (|[|{) and if it's over a certain count, append the sdh postfix. I don't think a non-sdh sub would have more than a few brackets.

something like this ( I didn't test it, just typed it now in the forum post)

Code: Select all

any{ f.text.matchAll(/\[|/\(|/{) }{ 0 } .size() > 10 
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by rednoah »

My bad, I should have insisted on sample files for testing instead of winging it, which evidently never works for anyone regardless of experience. :lol: Each {...} is of course supposed to yield an Integer value, so that any{}{} always yields an Integer value, so the first expression of course needs to use .size() to return an Integer and not a List which can't be compared to a Number and thus fails:

Code: Select all

any{ f.text.matchAll(/\[[\w\s]+\]/).size() }{ 0 } > 10
My excuse is that I don't have sample files for testing. :P :roll: :lol:



:arrow: Here's some tested code, which of course needs to adapted to your use case, since I don't have your sample files for testing:

Code: Select all

{
	'''(X) [Y]'''.matchBrackets().size() >= 3 ? 'Y' : 'N'
}
{
	'''(X) [Y] {Z}'''.matchBrackets().size() >= 3 ? 'Y' : 'N'
}

Code: Select all

NY
** This code uses constant String values instead of reading the subtitle contents so that I can run tests quickly without sample files



:!: Note that String.matchBrackets() requires FileBot 5.* which is still in beta:
viewtopic.php?t=13578



:!: Note that String.matchBrackets() also matches {...} patterns which are often used for for subtitle style and positioning instructions. You will get false positives in this case. You will probably want to stick to your own regex pattern so you can customize it as needed.

Code: Select all

{
	'''(X) [Y] {Z}'''.matchAll(/\[[\w\s]+\]|\([\w\s]+\)/)
}

Code: Select all

[(X), [Y]]
:idea: Please read the FAQ and How to Request Help.
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

Thanks again Noah!

So to keep things simple I won't use 'matchBrackets' for now, although I see that the docker container has an update pending which might be v5. I'll revisit it once updated, thanks a lot for the code, I think I kind of get how it works.

You are right, I should have posted the sub files, I will do so now. Keep in mind that there's thousands of these, with very dubious consistency. Like I just discovered now that some episodes have two nearly identical SDH subs, without non-sdh subs. So I'll either need to add some deconfliction code to append (1) (2) to the filenames or just ignore one of the files. I figured out how to do that a couple weeks ago with the duplicate index/count expressions, but I already forgot and didn't save the code. FUN!

My initial plan was to actually delete all of them and let Bazarr download everything again from opensubs.com, but then I got tinkering with filebot and here we are :) The fun part is that I don't need or want SDH subs...it's just that the work to sort and delete them is the same as to sort and keep them...

I've been experimenting with this "count individual instances of brackets instead of text within brackets' idea, but I can't figure out how to properly escape characters, namely the curly bracket '{'.

This

Code: Select all

any{f.text.matchAll(/\[|\(|\{/)}
doesn't work, it throws a ' missing token }' error.
If I add the } like so,

Code: Select all

any{f.text.matchAll(/\[|\(|\{}/)}
it's happy again.

It's specific to the curly brackets, without them, as in like so

Code: Select all

(/\[|\(/)
it works fine. So how do we properly escape curly brackets? I looked into the documentation for regex and groovy but I couldn't find anything.

I don't see what I'm doing wrong, even this online regex IDE confirms that the { is escaped properly but Filebot doesn't like it. https://regexr.com/7a9uo

Anyway, here's two subs for Archer.2009.S02E01, first normal, second is sdh. (Pastebin doesn't let me upload without an account due to offensive language, lol)

https://zerobin.net/?6d92ec6c6ace515b#e ... qBV7ulR40=

https://zerobin.net/?4c73343d5a4e1b81#v ... qm8kZ0NGs=
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by rednoah »

1.
FileBot requires all {} inside a {...} expression to be balanced. Because FileBot itself interprets the outermost {...} as Groovy code, but FileBot can only do that by counting { open and } close and does not have any understanding of the context in which { open is used, and so doesn't know which one counts and which one doesn't.

:arrow: So make sure that your code has a { for every } so that FileBot can find the outermost {...} easily:

Code: Select all

{ f.text.matchAll(/\[|\]|\(|\)|\{|]}/) }


2.
Looking at the [...] in your sample files, just checking for lines that start with [ and end with ] should be fine:

Code: Select all

285
00:13:27,265 --> 00:13:28,600
[Ski lift shuts off]
e.g.

Code: Select all

{ f.text.matchAll(/^[(\[{].+[)\]}]$/) }

Code: Select all

[[Intercom beeps], [Groans], [Burps], [Speaking in Spanish], [Sia speaking in Spanish], [Twins speak in Spanish], [Door opens], [Door closes], [Knocking on door], [Anka screaming], [Javi screaming], [Screaming], [Anka gasps], [Sighs], [Chuckling], [Groans], [Sighs], [Javi coughing and groaning], [Javi groaning], [Speaking in Spanish], [Javi screaming], [Gasps], [Sighs], [Ski lift shuts off], [Sighs], [Gun clicking empty], [Anka screaming], [Gunfire], [Gasps], [Machine guns firing], [Gunfire], [Speaks in Spanish], [Machine guns firing], [Screams], [Anka screams], [Anka screams], [Screams], [Archer groans], [Archer & anka scream], [Screams], [Screaming], [Sighs], [Giggles], [Snoring], [Burps], [English - us - psdh]]
:idea: Note the this sample file uses [...] exclusively, and does not use (...) or {...} at all.
:idea: Please read the FAQ and How to Request Help.
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

Got it, thank you very much! I think this should work without any further problems.

Yeah those two subs only use [...], I have found a couple where (...) is used instead. I wanted to add {...} just to be on the safe side, I have no idea what kind of brackets the other 4000 files are using. They could be using Klingon as far as I know, anything is possible :D
Philmag
Posts: 14
Joined: 09 Mar 2023, 17:56

Re: appending subtitles with .sdh postfix based on text inside the subtitle file

Post by Philmag »

Alright, all done!

I am going to post my solution here for posterity, in case I (or anyone else reading this) need to do this again in a year and I don't remember anything.

I filtered my sub dump to .srt files only and used this code for the first step (partial rename and move to location)

Code: Select all

{drive}/series/{~emby.derive{' '+allOf{vf}{vcf}{ac}}}{f.subtitle && any{f.text.matchAll(/[(\[{]|[)\]}]/)}{ 0 }.size() > 10 ? '.sdh' : null}
second step is load the final location (where media files and .srt are alongside) and filter .srt, use this code (final rename)

Code: Select all

{drive}/series/{~emby.derive{' '+allOf{vf}{vcf}{ac}}}
Thank you very much Noah, I could not have done this without your help.
Post Reply