Fuzzy matching

Running FileBot from the console, Groovy scripting, shell scripts, etc
Post Reply
devster
Posts: 417
Joined: 06 Jun 2017, 22:56

Fuzzy matching

Post by devster »

One of the websites I get my movies (movies exclusively) from has an internal quality assurance system.
However it's also ratio-expensive to download from them so I usually get releases elsewhere and try to seed them here.
To do this, and because these are generally considered the best releases, I'd like to add a tag to my files.
This is what I have until now:

Code: Select all

{ import groovy.json.JsonSlurper
  import groovy.json.JsonOutput

  def url = new URL("https://website.com/api.php?searchstr=$n&scene=2&resolution=$vf")
  def requestHeaders = [
    "Accept": "application/json",
    "ApiUser": "{{ apiuser }}",
    "ApiKey": "{{ apikey }}"
  ]

  def result = url.get(requestHeaders).text
  def json = new JsonSlurper().parseText(result)
  if (json.TotalResults > 0) {
    def qc = json.Movies*.Torrents*.any { it ->
        it.Resolution == vf &&
        it.ReleaseGroup ==~ /(?i)$group/ &&
        it.BooleanQC
    }
    if (qc) { return "(QC)" }
  }
}
I'd like to replace the ReleaseGroup check with something more robust.
Like the below (just testing data inside, hopefully the other check will have removed different resolutions for example

Code: Select all

{
  import net.filebot.similarity.Matcher
  import net.filebot.similarity.Match
  import net.filebot.similarity.SubstringMetric
	Matcher<Object, File> matcher = new Matcher<Object, File>(
		["Avatar.2009.1080p.BluRay.x264-EbP"], // fn binding
		[
			"Avatar 2009 1080p Extended BluRay x264 EbP",
			"Avatar.2009.1080p.BluRay.x264-Friday21st",
			"Avatar.2009.720p.BluRay.x264-EbP",
			"Avatar.2009.1080p.BluRay.x264-EbP",
			"Avatar.2009.Collectors.Ext.Cut.1080p.BluRay.DTS.dxva.x264.D-Z0N3",
			"Avatar.2009.Extended.Collectors.Edition.3in1.Hybrid.1080p.BluRay.x264-VietHD"
		], false, new SubstringMetric() );
	List<Match<Object, File>> matches = matcher.match()
     matches.first().equals() // false for some reason
}
Now, is this the best way to go about it? Is there a way to have, for example, a similarity score to check against (besides the boolean .equals() )?
Ideally I'd also want to match against the Movie model instead of against {fn} as I'm attempting here.
I only work in black and sometimes very, very dark grey. (Batman)
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Fuzzy matching

Post by rednoah »

String.getSimilarity() might help:

Code: Select all

{
	def s1 = "Avatar.2009.1080p.BluRay.x264-EbP"
	def s2 = "Avatar 2009 1080p Extended BluRay x264 EbP"
	s1.getSimilarity(s2)
}

Code: Select all

0.8607595
:idea: Please read the FAQ and How to Request Help.
devster
Posts: 417
Joined: 06 Jun 2017, 22:56

Re: Fuzzy matching

Post by devster »

Thanks, will test with that and sortBySimilarity, completely overlooked the script methods.
I only work in black and sometimes very, very dark grey. (Batman)
Post Reply