[Linux CLI] Read strings from a file and remove them from folder names

dgnrt3 · Post by **dgnrt3** » 24 Apr 2023, 08:10

Good day y'all,

Since the last several weekends I'm working on automating some workflows...
I've seen that it's possible to pass command args via @-prefix and use format.groovy files to make complex format expressions more readable but I couldn't find a way to read a list of strings from a file and then remove those strings from the files/folders you're working with.

I came up with the following command (i've split it up for better readability and am planning to use @ method to pass args + format.groovy in the future)

Sysinfo:

Code: Select all

filebot -script fn:sysinfo
[0.027s][warning][cds] A jar file is not the one used while building the shared archive file: /usr/lib/jvm/java-20-openjdk/lib/modules
[0.027s][warning][cds] /usr/lib/jvm/java-20-openjdk/lib/modules timestamp has changed.
FileBot 5.0.2 (r9722)
JNA Native: 6.1.4
MediaInfo: 23.03
Tools: fpcalc/1.5.1 7z/22.01 unrar/6.21 mkvpropedit/75.0.0
Extended Attributes: OK
Unicode Filesystem: OK
GVFS: PlatformGVFS [/run/user/1000/gvfs]
Script Bundle: 2023-04-14 (r896)
Groovy: 4.0.11
JRE: OpenJDK Runtime Environment 20.0.1
JVM: OpenJDK 64-Bit Server VM
CPU/MEM: 16 Core / 8 GB Max Memory / 46 MB Used Memory
OS: Linux (amd64)
HW: Linux des0-arch 6.2.12-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 20 Apr 2023 16:11:27 +0000 x86_64 GNU/Linux
CPU/MEM: AMD Ryzen 7 5800X 8-Core Processor [MemTotal: 33 GB | MemFree: 22 GB | MemAvailable: 25 GB]
STORAGE: NONE
UID/GID: uid=1000(rob) gid=1000(rob) groups=1000(rob),90(network),98(power),150(wireshark),961(autologin),966(adbusers),987(storage),990(optical),991(lp),996(audio),998(wheel)
DATA: /home/rob/.config/filebot
Package: AUR
License: FileBot License P48455038 (Valid-Until: 2024-04-07)
Done ヾ(＠⌒ー⌒＠)ノ

The java warnings appeared since an update yesterday but dont seem to impair the function of any java tool so I didnt care about them too much.

Example Foldername: Ein.Mann.namens.Otto.2022.German.DTSHD.Dubbed.DL.2160p.Hybrid.WEB.DV.HDR.HEVC-QfG

Code: Select all

unrarall /finished/serie ;

unrarall command which recurses through all subfolders and extracts the rar files into the folder in which the rar files are. I found unrarall to be more convenient for our setup than using filebots extract function (but its no must obviously)

Code: Select all

find /finished/serie -type f ! -name "*sample*" \( -name "*.mkv" -o -name"*.mp4" -o -name "*.avi" \) -exec

This part searches for all files with extensions mkv, mp4, avi which don't contain the string "sample" (rednoah mentioned that linux' find is faster than filebots --file-filter and I'm more used to work with find.

Code: Select all

filebot -rename --db TheTVDB --lang German --q "{f.dir.name.replaceAll(/(?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))/,'').replaceAll(/\./, ' ')}" {}

Here I query TheTVDB with the folder name the file is in and use regex to strip the name from following elements:

any number of letters or numbers followed by a dash at the start

s followed by any amount of numbers followed by e and any amount of numbers

letter p prefixed by 3 or 4 letters

letter h or x followed by 3 letters

a dash followed by any amount of letters or numbers at the end of the name

But this obviously isn't enough to strip away all of the strings used in the reality so I was wondering if there's a way to create a file with strings that should be removed and let filebot read from it. I already tried querying with -non-strict and which works in most cases but not all. I also had a look at AMC script but couldn't figure out how to properly customize the formats there.

Code: Select all

--output "/sorted/" --format 
"{vf.match(/1080[pP]|2160[pP]/)}/
{f.dir.dir.name.match(/serie/)}/
{f.dir.dir.name.match(/allgemein|anime|doku|erwachsen|kinder/)}/
{n.substring(0,1).toLowerCase()}/
{n.space('.').lower()}/
{'s'+s.pad(2)}/
{allOf{n}{y}{s00e00}{t}{vf.match(/720[pP]|1080[pP]|2160[pP]/)}{vc}{ac}{af}{any{audioLanguages[0].ISO2} 'und'}}.join('.').space('.').lower()}" \;

this format does the following:

check if its 1080p oder 2160p format and decides in which top level folder to put it, then checks if its a series and which sub-genre according to the folder where its in right now.
Then puts it into following folder structure: starting letter/series name/sXX/name.year.sXXeXX.title.video resolution if it mactches.video codec.audio codec.ISO2 code of first audio track ('und' if undefined).

Thanks for reading and in advance for helping.

Cheers

Post by **rednoah** » 24 Apr 2023, 10:00

This thread should get you started for this general use case:
viewtopic.php?t=12594

Reading files inside your format code is possible. You'll definitely want to use the GUI for prototyping complex custom formats before pasting the result into a CLI call.

e.g.

Code: Select all

{ lines("/path.txt") }

dgnrt3 · Post by **dgnrt3** » 24 Apr 2023, 10:36

Thanks for your fast reply rednoah!
Sorry for the confusion...i need this function for --q to get good results from the DBs

Code: Select all

--q "{f.dir.name.replaceAll(/(?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))/,'').replaceAll(/\./, ' ')}" {}

Would this work there as well?

Update: I wasn't able to get it to work with lines() so I intensily searched around the forums and found csv()! This works for my use case:

Code: Select all

{f.dir.name.replace(csv("/path/to/strings2rm.csv"))}

csv file looks like this:

Code: Select all

A <TAB>
B <TAB>
C <TAB>

dgnrt3 · Post by **dgnrt3** » 24 Apr 2023, 10:46

on a separate note:
at least from my understanding

Code: Select all

{h,x}\d{3}

is a valid regex to match strings starting with h or x followed by 3 numbers

but when used like this:

Code: Select all

{f.dir.name.replaceAll(/(?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))/,'')}

Error Message:

Expression yields empty value: Illegal repetition near index 32 (?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))

UPDATE: Figured out this regex would match a dot followed by letter h or x followed by 3 numbers:

Code: Select all

\.[h,x]\d{3}

Thanks again rednoah for developing filebot and support everyone here so well!! <3

Post by **rednoah** » 26 Apr 2023, 04:22

Please post one or more file paths that you have issues with. So that we can see and understand the issue at hand and run tests.

EDIT:

The Otto* example folder name in the OP does not seem to require --q custom query detection and should just work out of the box. Perhaps you can share more of the file paths you have issues with so we can check for a common pattern that makes things go awry?

[Linux CLI] Read strings from a file and remove them from folder names

[Linux CLI] Read strings from a file and remove them from folder names

Re: [Linux CLI] Read strings from a file and remove them from folder names

Re: [Linux CLI] Read strings from a file and remove them from folder names

Re: [Linux CLI] Read strings from a file and remove them from folder names

Re: [Linux CLI] Read strings from a file and remove them from folder names