[Linux CLI] Read strings from a file and remove them from folder names

Running FileBot from the console, Groovy scripting, shell scripts, etc
Post Reply
dgnrt3
Posts: 26
Joined: 21 Apr 2023, 19:58

[Linux CLI] Read strings from a file and remove them from folder names

Post by dgnrt3 »

Good day y'all,

Since the last several weekends I'm working on automating some workflows...
I've seen that it's possible to pass command args via @-prefix and use format.groovy files to make complex format expressions more readable but I couldn't find a way to read a list of strings from a file and then remove those strings from the files/folders you're working with.

I came up with the following command (i've split it up for better readability and am planning to use @ method to pass args + format.groovy in the future)

Sysinfo:

Code: Select all

filebot -script fn:sysinfo
[0.027s][warning][cds] A jar file is not the one used while building the shared archive file: /usr/lib/jvm/java-20-openjdk/lib/modules
[0.027s][warning][cds] /usr/lib/jvm/java-20-openjdk/lib/modules timestamp has changed.
FileBot 5.0.2 (r9722)
JNA Native: 6.1.4
MediaInfo: 23.03
Tools: fpcalc/1.5.1 7z/22.01 unrar/6.21 mkvpropedit/75.0.0
Extended Attributes: OK
Unicode Filesystem: OK
GVFS: PlatformGVFS [/run/user/1000/gvfs]
Script Bundle: 2023-04-14 (r896)
Groovy: 4.0.11
JRE: OpenJDK Runtime Environment 20.0.1
JVM: OpenJDK 64-Bit Server VM
CPU/MEM: 16 Core / 8 GB Max Memory / 46 MB Used Memory
OS: Linux (amd64)
HW: Linux des0-arch 6.2.12-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 20 Apr 2023 16:11:27 +0000 x86_64 GNU/Linux
CPU/MEM: AMD Ryzen 7 5800X 8-Core Processor [MemTotal: 33 GB | MemFree: 22 GB | MemAvailable: 25 GB]
STORAGE: NONE
UID/GID: uid=1000(rob) gid=1000(rob) groups=1000(rob),90(network),98(power),150(wireshark),961(autologin),966(adbusers),987(storage),990(optical),991(lp),996(audio),998(wheel)
DATA: /home/rob/.config/filebot
Package: AUR
License: FileBot License P48455038 (Valid-Until: 2024-04-07)
Done ヾ(@⌒ー⌒@)ノ
The java warnings appeared since an update yesterday but dont seem to impair the function of any java tool so I didnt care about them too much.

Example Foldername: Ein.Mann.namens.Otto.2022.German.DTSHD.Dubbed.DL.2160p.Hybrid.WEB.DV.HDR.HEVC-QfG

Code: Select all

unrarall /finished/serie ;
unrarall command which recurses through all subfolders and extracts the rar files into the folder in which the rar files are. I found unrarall to be more convenient for our setup than using filebots extract function (but its no must obviously)

Code: Select all

find /finished/serie -type f ! -name "*sample*" \( -name "*.mkv" -o -name"*.mp4" -o -name "*.avi" \) -exec 
This part searches for all files with extensions mkv, mp4, avi which don't contain the string "sample" (rednoah mentioned that linux' find is faster than filebots --file-filter and I'm more used to work with find.

Code: Select all

filebot -rename --db TheTVDB --lang German --q "{f.dir.name.replaceAll(/(?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))/,'').replaceAll(/\./, ' ')}" {}
Here I query TheTVDB with the folder name the file is in and use regex to strip the name from following elements:
  • any number of letters or numbers followed by a dash at the start
  • s followed by any amount of numbers followed by e and any amount of numbers
  • letter p prefixed by 3 or 4 letters
  • letter h or x followed by 3 letters
  • a dash followed by any amount of letters or numbers at the end of the name
But this obviously isn't enough to strip away all of the strings used in the reality so I was wondering if there's a way to create a file with strings that should be removed and let filebot read from it. I already tried querying with -non-strict and which works in most cases but not all. I also had a look at AMC script but couldn't figure out how to properly customize the formats there.

Code: Select all

--output "/sorted/" --format 
"{vf.match(/1080[pP]|2160[pP]/)}/
{f.dir.dir.name.match(/serie/)}/
{f.dir.dir.name.match(/allgemein|anime|doku|erwachsen|kinder/)}/
{n.substring(0,1).toLowerCase()}/
{n.space('.').lower()}/
{'s'+s.pad(2)}/
{allOf{n}{y}{s00e00}{t}{vf.match(/720[pP]|1080[pP]|2160[pP]/)}{vc}{ac}{af}{any{audioLanguages[0].ISO2} 'und'}}.join('.').space('.').lower()}" \;
this format does the following:

check if its 1080p oder 2160p format and decides in which top level folder to put it, then checks if its a series and which sub-genre according to the folder where its in right now.
Then puts it into following folder structure: starting letter/series name/sXX/name.year.sXXeXX.title.video resolution if it mactches.video codec.audio codec.ISO2 code of first audio track ('und' if undefined).

Thanks for reading and in advance for helping.

Cheers
User avatar
rednoah
The Source
Posts: 22970
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: [Linux CLI] Read strings from a file and remove them from folder names

Post by rednoah »

This thread should get you started for this general use case:
viewtopic.php?t=12594


Reading files inside your format code is possible. You'll definitely want to use the GUI for prototyping complex custom formats before pasting the result into a CLI call.

e.g.

Code: Select all

{ lines("/path.txt") }
:idea: Please read the FAQ and How to Request Help.
dgnrt3
Posts: 26
Joined: 21 Apr 2023, 19:58

Re: [Linux CLI] Read strings from a file and remove them from folder names

Post by dgnrt3 »

Thanks for your fast reply rednoah!
Sorry for the confusion...i need this function for --q to get good results from the DBs

Code: Select all

--q "{f.dir.name.replaceAll(/(?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))/,'').replaceAll(/\./, ' ')}" {}
Would this work there as well?

Update: I wasn't able to get it to work with lines() so I intensily searched around the forums and found csv()! :) This works for my use case:

Code: Select all

{f.dir.name.replace(csv("/path/to/strings2rm.csv"))}
csv file looks like this:

Code: Select all

A <TAB>
B <TAB>
C <TAB>
Last edited by dgnrt3 on 24 Apr 2023, 13:23, edited 1 time in total.
dgnrt3
Posts: 26
Joined: 21 Apr 2023, 19:58

Re: [Linux CLI] Read strings from a file and remove them from folder names

Post by dgnrt3 »

on a separate note:
at least from my understanding

Code: Select all

{h,x}\d{3} 
is a valid regex to match strings starting with h or x followed by 3 numbers

but when used like this:

Code: Select all

{f.dir.name.replaceAll(/(?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))/,'')}
Error Message:
Expression yields empty value: Illegal repetition near index 32 (?i:(^\w+-|.s\d+e\d+|.\d{3,4}p|{h,x}\d{3}|-\w*$))
UPDATE: Figured out this regex would match a dot followed by letter h or x followed by 3 numbers:

Code: Select all

\.[h,x]\d{3}
Thanks again rednoah for developing filebot and support everyone here so well!! <3
User avatar
rednoah
The Source
Posts: 22970
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: [Linux CLI] Read strings from a file and remove them from folder names

Post by rednoah »

Please post one or more file paths that you have issues with. So that we can see and understand the issue at hand and run tests.


EDIT:

The Otto* example folder name in the OP does not seem to require --q custom query detection and should just work out of the box. Perhaps you can share more of the file paths you have issues with so we can check for a common pattern that makes things go awry?
:idea: Please read the FAQ and How to Request Help.
Post Reply