How to improve AniDB Absolute episode matching
Posted: 13 Aug 2020, 01:39
For various reasons I am adding xattr (episode information) to a large number of files that have been verified/sorted/renamed using one of the AniDB clients.
The filename format is basically this:
English Series name - Episode#WithVersion - Episode Title [ReleaseGroup][Source][Resolution][Codec][CRC].extension
Several examples:
So I am slowly bumbling through creating a groovy script to add the xattr data to them by "renaming" them into a different directory by filebot and then moving them back.
The directory name includes the AniDB # so I can pass in a query on the title to match them ..
z being the ID # of the Series..
of the 1657 files I am currently testing with, after two passes (because sometimes another try seems to match a few more) I was only able to match 591 of them..
huh.. I was expecting a bit of a higher match ratio on strict considering .. the names are fairly close to what is in AniDB (I do strip out most control characters, so they don't 100% match on some episodes/series).
So I added some additional groovy to regex the episodes and do a non-strict match with a filter to the specific episode from the regex ..
where xfiles is a list of the files, z is the ID # of the seires, and x is the episode #. I do switch to using special (vs absolute) when trying to match the normal specials ( I ignore S101/2 etc).
Which worked for most of them in the first pass, second pass is looking to finish it. But man, this is not very efficient, and I really think I'm pushing my luck on the ban hammer from AniDB.
So my question is .. What's wrong about the naming format I am using, and at least for future files is there some tweaking that I could do that would increase the episode match efficiency?
The filename format is basically this:
English Series name - Episode#WithVersion - Episode Title [ReleaseGroup][Source][Resolution][Codec][CRC].extension
Several examples:
Code: Select all
Ahiru no Sora - 32 - Time Limit [FFA][www][1920x1080][HEVC][19BA5AAC].mkv
Somali and the Forest Spirit - 03 - The Sea at the Bottom of the Cave [Erai-raws][www][1280x720][h264][004A20E9].mkv
A Certain Scientific Railgun T - 03v2 - Balloon Hunter [Gremlin][www][1280x720][h264][2DB5BB94].mkv
Bofuri I Don't Want to Get Hurt, So I'll Max Out My Defense_ - 08 - Defense and Third Event [HR][www][1920x1080][HEVC][D54E0BFA].mkv
The directory name includes the AniDB # so I can pass in a query on the title to match them ..
Code: Select all
rename(folder:directory, format: aniAddFormat, query: "${z}", order: 'Absolute', db: 'AniDB')
of the 1657 files I am currently testing with, after two passes (because sometimes another try seems to match a few more) I was only able to match 591 of them..
huh.. I was expecting a bit of a higher match ratio on strict considering .. the names are fairly close to what is in AniDB (I do strip out most control characters, so they don't 100% match on some episodes/series).
So I added some additional groovy to regex the episodes and do a non-strict match with a filter to the specific episode from the regex ..
Code: Select all
rename(file:xfiles, format: aniAddFormat, query: "${z}", filter: "absolute = ${x.toInteger()}", order: 'Absolute', db: 'AniDB', strict:false)
Which worked for most of them in the first pass, second pass is looking to finish it. But man, this is not very efficient, and I really think I'm pushing my luck on the ban hammer from AniDB.
So my question is .. What's wrong about the naming format I am using, and at least for future files is there some tweaking that I could do that would increase the episode match efficiency?
Code: Select all
FileBot 4.9.1 (r7372)
JNA Native: 6.1.0
MediaInfo: 19.09
7-Zip-JBinding: 9.20
Chromaprint: 1.4.3
Extended Attributes: OK
Unicode Filesystem: OK
Script Bundle: 2020-08-04 (r667)
Groovy: 3.0.3
JRE: OpenJDK Runtime Environment 14
JVM: 64-bit OpenJDK 64-Bit Server VM
CPU/MEM: 4 Core / 2.1 GB Max Memory / 33 MB Used Memory
OS: Windows Server 2019 (amd64)
STORAGE: NTFS [(C:)] @ 87 GB | NTFS [music] @ 24 TB | NTFS [pictures] @ 24 TB | NTFS [video] @ 24 TB | NTFS [multimedia] @ 24 TB | NTFS [animebt] @ 22 TB
DATA: C:\Users\vitki\AppData\Roaming\FileBot
Package: MSI
License: FileBot License xxx (Valid-Until: 2069-11-10)
Done ?(?????)?