Scrape only episode numbers with multiple Episodes per File

Any questions? Need some help?
Post Reply
Eikooc
Posts: 4
Joined: 14 Jan 2017, 13:36

Scrape only episode numbers with multiple Episodes per File

Post by Eikooc »

Hello,

i want to add missing episode numbers for some series. For multiple episodes (Separator ' + '), the absolute episode number of the first episode is used if the number is consecutive, otherwise, each episode number is separated by commas.

That means, I try to have the following output format (german):
Bibi und Tina\Bibi und Tina ¦ 27, 31 ¦ Spuk auf der Ferieninsel + Amadeus verliebt sich.mp4
Bibi und Tina HD\Bibi und Tina HD ¦ 37 ¦ Aufregung auf dem Kupferberg + Das Mittelalterfest.mp4


The input format (german) is something like the following..
Benjamin; bärenstark¡\Benjamin; bärenstark¡ ¦ ¦ 2016.04.23_Wo ist Holly¿ + Ein Stern am Theaterhimmel + Rettet Edgar.mp4
My Little Pony\My Little Pony ¦ 133 ¦ 06.11.2016_Scherzkekse.mp4
My Little Pony - Freundschaft ist Magie\My Little Pony - Freundschaft ist Magie ¦ S04E25 ¦ 2016.05.07_Twilights Königreich - Teil 1.mp4
Cosmic Quantum Ray\Cosmic Quantum Ray ¦ - ¦ Das Buckingham-Puzzle + Glitschikus.mp4
Cosmic Quantum Ray HD\Cosmic Quantum Ray HD ¦ 01 ¦ Das Buckingham-Puzzle.mp4


I know very inconsistent, but that is exactly what I want to change. Even the number of episodes per file is variable within a series, because I sometimes recorded a series multiple times. - only the series name is always fixed per series and in all files equal.


Image
Currently, I'm using {n} ¦ {absolute} ¦ {t} in FileBot

I also want to keep the characters ¡¿; and the HD indicator. short: Only the text between the separators ¦ should be changed except that the date should be removed (but that can cause filename collisions). Now I look for a week for a possibility. I have the problem that several episodes are not recognized and {absolute} often does not provide any values. I have tried it with different regex queries. For example ((?<=\ |^)[^\+]+(?!\+)) separates several episode names in the output format.
I'm not really wiser from the FAQs and forum posts. Slowly I despair a little. I hope someone can help me.


Request Help Infos
OS: Windows 10 Education (64bit)
Java: JDK 8 U 111 (64bit)
FileBot-Version: GUI 4.7.7 (Windows Installer from Chip.de)

Code: Select all

FileBot 4.7.7 (r4678)
JNA Native: 4.0.1
MediaInfo: 0.7.88
7-Zip-JBinding: 9.20
Chromaprint: 1.1.0
Extended Attributes: OK
Script Bundle: 2017-01-05 (r470)
Groovy: 2.4.7
JRE: Java(TM) SE Runtime Environment 1.8.0_111
JVM: 64-bit Java HotSpot(TM) 64-Bit Server VM
CPU/MEM: 8 Core / 1 GB Max Memory / 184 MB Used Memory
OS: Windows 10 (amd64)
Package: MSI
Data: C:\Users\Sonic\AppData\Roaming\FileBot
User avatar
rednoah
The Source
Posts: 23953
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Scrape only episode numbers with multiple Episodes per File

Post by rednoah »

Unfortunately, multi-episode auto-detection is not possible by airdate pattern nor episode title. Multi-Episode detection only works for patterns like S04E25-E26 or 25-26 (in Airdate Order mode only).


EDIT:

In this specific case, a little bit of trickery in the format might get you the filename you want:

Code: Select all

S{s.pad(2)}E{episodelist.findAll{ fn =~ it.title }.episode*.pad(2).join('-E')}
The same trickery could be used for faking extra info via a matching airdate in the filename, but TheTVDB doesn't have airdate information for most of that stuff. It's now your job to add the missing airdate/absolute number information. ;)
:idea: Please read the FAQ and How to Request Help.
Eikooc
Posts: 4
Joined: 14 Jan 2017, 13:36

Re: Scrape only episode numbers with multiple Episodes per File

Post by Eikooc »

That's great! Is there the possibility of a better fuzzy string search to make small differences automatically lead to a match? Some examples which won't match:
  • 2x35: Klein - aber oho¡ vs. Klein - aber oho!
  • 3x75: Wachse, Sonnenblume, wachse vs. Wachse, Sonnenblume, wachse!
  • 1x14: Heimlicher Schnappschuss vs. Heimlicher Schnappschuß
  • 3x66: Ein Teddy muss sein vs Ein Teddy muß sein
I think it is helpful to provide some sample files: http://www.mediafire.com/file/xxmn4k5rk ... Sample.zip (Contains renamed sfv checksum files, so FileBot can work with it.) ;)

Currently for another series:

Image
(Using {file.name.matchAll(/((?<= |^|_)[^\_¦]+(?!\+)(?<!\ ))/).first()} ¦ S{s.pad(2)}E{episodelist.findAll{ fn =~ it.title }.episode*.pad(2).join(', ')} ¦ {file.name.matchAll(/((?<= |^|_)[^\_¦]+(?!\+)(?<!\ ))/).last()}
User avatar
rednoah
The Source
Posts: 23953
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Scrape only episode numbers with multiple Episodes per File

Post by rednoah »

Stripping non-word characters and upper-case characters might help with the logic that selects the episodes:

Code: Select all

fn.lower().replaceAll(/\W/) =~ it.title.lower().replaceAll(/\W/)
:idea: Please read the FAQ and How to Request Help.
Eikooc
Posts: 4
Joined: 14 Jan 2017, 13:36

Re: Scrape only episode numbers with multiple Episodes per File

Post by Eikooc »

Indeed, that's simple. Everything is matching!
Image
Season removed and better matcher:
{file.name.matchAll(/((?<= |^|_)[^\_¦]+(?!\+)(?<!\ ))/).first()} ¦
E{episodelist.findAll{
fn.lower().replaceAll(/[\W]|ss/).replaceAll(/([\w\s])\1+/,"\$1") =~
it.title.lower().replaceAll(/[\W]|ss/).replaceAll(/([\w\s])\1+/,"\$1")
}.episode*.pad(2).join(', ')} ¦
{file.name.matchAll(/((?<= |^|_)[^\_¦]+(?!\+)(?<!\ ))/).last()}



I'm almost done :D
At least, i need to use a csv file, which I created with Excel and Sublime for 2 series, which aren't on thetvdb. But to keep the example the same, here again the same series csv:

Code: Select all

Wo ist Holly?;1
Ein Stern am Theaterhimmel;2
Rettet Edgar;3
Das große Teddylehrbuch;4
I can not get any further. I get Problems with "Binding it: undefined"; "Binding episode: undefined" Is there perhaps again an smart solution to look for the episode names in a csv file?
User avatar
rednoah
The Source
Posts: 23953
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Scrape only episode numbers with multiple Episodes per File

Post by rednoah »

1.
Not really. The format requires a File-Episode match. But if there's no matching Episode, then you won't get to format it.

This might help though. You can read a csv file to a Map object:

Code: Select all

csv('/path/to/file')
@see viewtopic.php?f=5&t=182


2.
Please ask in the TheTVDB forums about adding the missing shows. I'm sure they'll welcome your contribution.
:idea: Please read the FAQ and How to Request Help.
Eikooc
Posts: 4
Joined: 14 Jan 2017, 13:36

Re: Scrape only episode numbers with multiple Episodes per File

Post by Eikooc »

I did it now! {csv('A:\\PathToCSV.csv').findResults{k, v -> if(fn =~ k) v }}

Image

{file.name.matchAll(/((?<= |^|_)[^\_¦]+(?!\+)(?<!\ ))/).first()} ¦
{csv('A:\\PathToCSV.csv').findResults{
k, v -> if(
fn.lower().replaceAll(/[\W]|ss/).replaceAll(/([\w\s])\1+/,"\$1") =~
k.lower().replaceAll(/[\W]|ss/).replaceAll(/([\w\s])\1+/,"\$1")
) v.pad(2)
}.join(', ')} ¦
{file.name.matchAll(/((?<= |^|_)[^\_¦]+(?!\+)(?<!\ ))/).last()}
Post Reply