Page 1 of 1

Fuzzy matching

Posted: 28 Apr 2019, 19:32
by devster
One of the websites I get my movies (movies exclusively) from has an internal quality assurance system.
However it's also ratio-expensive to download from them so I usually get releases elsewhere and try to seed them here.
To do this, and because these are generally considered the best releases, I'd like to add a tag to my files.
This is what I have until now:

Code: Select all

{ import groovy.json.JsonSlurper
  import groovy.json.JsonOutput

  def url = new URL("https://website.com/api.php?searchstr=$n&scene=2&resolution=$vf")
  def requestHeaders = [
    "Accept": "application/json",
    "ApiUser": "{{ apiuser }}",
    "ApiKey": "{{ apikey }}"
  ]

  def result = url.get(requestHeaders).text
  def json = new JsonSlurper().parseText(result)
  if (json.TotalResults > 0) {
    def qc = json.Movies*.Torrents*.any { it ->
        it.Resolution == vf &&
        it.ReleaseGroup ==~ /(?i)$group/ &&
        it.BooleanQC
    }
    if (qc) { return "(QC)" }
  }
}
I'd like to replace the ReleaseGroup check with something more robust.
Like the below (just testing data inside, hopefully the other check will have removed different resolutions for example

Code: Select all

{
  import net.filebot.similarity.Matcher
  import net.filebot.similarity.Match
  import net.filebot.similarity.SubstringMetric
	Matcher<Object, File> matcher = new Matcher<Object, File>(
		["Avatar.2009.1080p.BluRay.x264-EbP"], // fn binding
		[
			"Avatar 2009 1080p Extended BluRay x264 EbP",
			"Avatar.2009.1080p.BluRay.x264-Friday21st",
			"Avatar.2009.720p.BluRay.x264-EbP",
			"Avatar.2009.1080p.BluRay.x264-EbP",
			"Avatar.2009.Collectors.Ext.Cut.1080p.BluRay.DTS.dxva.x264.D-Z0N3",
			"Avatar.2009.Extended.Collectors.Edition.3in1.Hybrid.1080p.BluRay.x264-VietHD"
		], false, new SubstringMetric() );
	List<Match<Object, File>> matches = matcher.match()
     matches.first().equals() // false for some reason
}
Now, is this the best way to go about it? Is there a way to have, for example, a similarity score to check against (besides the boolean .equals() )?
Ideally I'd also want to match against the Movie model instead of against {fn} as I'm attempting here.

Re: Fuzzy matching

Posted: 29 Apr 2019, 07:40
by rednoah
String.getSimilarity() might help:

Code: Select all

{
	def s1 = "Avatar.2009.1080p.BluRay.x264-EbP"
	def s2 = "Avatar 2009 1080p Extended BluRay x264 EbP"
	s1.getSimilarity(s2)
}

Code: Select all

0.8607595

Re: Fuzzy matching

Posted: 29 Apr 2019, 09:15
by devster
Thanks, will test with that and sortBySimilarity, completely overlooked the script methods.