Strict mode processing files with bad matches

All your suggestions, requests and ideas for future development
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: Strict mode processing files with bad matches

Post by kim »

can you make it so that movie matching is in the log like the TV funnel ?
(that way I/we can better understand why and how to fix it)

the year 2 x weight, is that like 1 word = 1 x weight ?

btw: "The Bodyguard 2003"
#1 The Body 2003
#2 The Bodyguard 2004
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Strict mode processing files with bad matches

Post by rednoah »

You can find the relevant code here:
https://github.com/filebot/filebot/blob ... .java#L696

The overall metric consists of 4 metrics that are based on the name, and 1 metric with 2x weight based on the year. The currently implementation does not take year +- 1 into account when ranking results.

For debugging purposes, writing a script that will give you the values for each individual metric would make sense.
:idea: Please read the FAQ and How to Request Help.
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: Strict mode processing files with bad matches

Post by kim »

I don't get much out of just looking at part of the code...

"For debugging purposes, writing a script that will give you the values for each individual metric would make sense."
meaning you will write it or I ?
because I have NO idea how ;)
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Strict mode processing files with bad matches

Post by rednoah »

Here's some inspiration for a script like that:

Code: Select all

def file = 'Movie/Avatar/Avatar.2009.mkv' as File
def movies = MediaDetection.detectMovie(file, TheMovieDB, Locale.ENGLISH, false)


movies.each{ option ->
	println "$option <=> $file"
	println MediaDetection.movieMatchMetric.getSimilarity(file, option)

	MediaDetection.movieMatchMetric.metrics.each { metric ->
		println "* ${metric.class}: ${metric.getSimilarity(file, option)}"
	}
}
:idea: Please read the FAQ and How to Request Help.
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: Strict mode processing files with bad matches

Post by kim »

* class net.filebot.similarity.NameSimilarityMetric: 0.5074627 = ???
* class net.filebot.media.MediaDetection$1: 0.0 = ???
* class net.filebot.media.MediaDetection$2: 1.3333334 = Year ?
* class net.filebot.similarity.SequenceMatchSimilarity: 0.6 = ???
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0 = ???


why so different result if folder in "def file" ?
Memories of 'The Bodyguard' (2005) <=> The Bodyguard (2005)\The Bodyguard.2005.mkv
0.48815918
* class net.filebot.similarity.NameSimilarityMetric: 0.5074627
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 1.3333334
* class net.filebot.similarity.SequenceMatchSimilarity: 0.6
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0

Memories of 'The Bodyguard' (2005) <=> The Bodyguard.2005.mkv
0.6969697
* class net.filebot.similarity.NameSimilarityMetric: 0.6666667
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 2.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.8181818
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0


btw: I just learned that on TMDB, if you search with the year e.g. "query=the+bodyguard&year=2005" then any "release information" year with be used e.g. "query=the+bodyguard&year=2008" = the bodyguard 1992 because of BR "Physical" year 2008 same with 1999 for GR
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Strict mode processing files with bad matches

Post by rednoah »

1.
Multiple matches in file and folder name increase the total score.

2.
Mind sharing the links for any TheMovieDB query documentation you might have found?
:idea: Please read the FAQ and How to Request Help.
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: Strict mode processing files with bad matches

Post by kim »

1. looks to me like the other way around ?:
with folder 0.48815918
without 0.6969697

2. I did not read it, I just discovered because off the "query=the+bodyguard&year=2008" = the bodyguard 1992


what do the lines mean ?
* class net.filebot.similarity.NameSimilarityMetric: 0.5074627 = ???
* class net.filebot.media.MediaDetection$1: 0.0 = ???
* class net.filebot.media.MediaDetection$2: 1.3333334 = Year ?
* class net.filebot.similarity.SequenceMatchSimilarity: 0.6 = ???
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0 = ???
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Strict mode processing files with bad matches

Post by rednoah »

1.
I see. It probably works different because you have no folder structure at all, which reduces the number of components being compared, and thus average values. This won't happen in real usage scenarios because there's always gonna be a parent folder.

I'd test with something like this:
def file = '''X:/path/to/a/real/file.mkv''' as File


2.
I guess TheMovieDB will give you a result even if just the query is a perfect match. Unfortunately, there's nothing I can do about strange behaviour in the query engine. That's one of the reasons why FileBot has it's own search index.

Does TheMovieDB give results for that query? Because detectMovie(...) will give you results where TheMovieDB search results are combined with local index lookup (which can definitely give you results based on the name even if the year is wrong).


3.
Hard to explain without understanding the code...

NameSimilarityMetric is fussy string similarity (e.g. Hallo <=> Hello is similar):

Code: Select all

* class net.filebot.similarity.NameSimilarityMetric: 0.5074627 = ???
StringEqualsMetric will give you extra points when the filename is exactly the same as the movie name:

Code: Select all

* class net.filebot.media.MediaDetection$1: 0.0 = ???
NumericSimilarityMetric will give you extra points if numeric patterns are similar (e.g. "Best Movie of 1968" is similar to "2001 Space Odyssey 1968")

Code: Select all

* class net.filebot.media.MediaDetection$2: 1.3333334 = Year ?
SequenceMatchSimilarity is about a matching sequence of words anywhere in the word sequence (e.g. Bodyguard is similar to The Bodyguard):

Code: Select all

* class net.filebot.similarity.SequenceMatchSimilarity: 0.6 = ???
Second SequenceMatchSimilarity is about a matching sequence of words at the beginning of the sequence (e.g. The Bodyguard is similar to The Bodyguard: Part Two but not Bodyguard):

Code: Select all

* class net.filebot.similarity.SequenceMatchSimilarity: 0.0 = ???
:idea: Please read the FAQ and How to Request Help.
kim
Power User
Posts: 1251
Joined: 15 May 2014, 16:17

Re: Strict mode processing files with bad matches

Post by kim »

I found a new one that need a bit of work ;)

viewtopic.php?f=4&t=5416

https://www.themoviedb.org/search/movie ... &year=1999
Query Movie => [psycho 1999]
Rank [Psycho 1999] => [The Psychotic Odyssey of Richard Chase (1999), The Masked Strangler (1999), Kiss [1999] Psycho Circus in Buenos Aires (1999), Psycho (1998), Psycho (1960), Psycho Sisters (1998), Psycho Beach Party (2000), American Psycho (2000), The Maddening (1996)]
The Psychotic Odyssey of Richard Chase (1999) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.4566999
* class net.filebot.similarity.NameSimilarityMetric: 0.1904762
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 2.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.093023255
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
The Masked Strangler (1999) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.4501818
* class net.filebot.similarity.NameSimilarityMetric: 0.09090909
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 2.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.16
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
Kiss [1999] Psycho Circus in Buenos Aires (1999) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.3457041
* class net.filebot.similarity.NameSimilarityMetric: 0.25882354
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 1.3333334
* class net.filebot.similarity.SequenceMatchSimilarity: 0.13636364
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
Psycho (1998) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.17062938
* class net.filebot.similarity.NameSimilarityMetric: 0.30769232
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 0.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.54545456
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
Psycho (1960) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.16293707
* class net.filebot.similarity.NameSimilarityMetric: 0.26923078
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 0.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.54545456
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
Psycho Sisters (1998) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.10982456
* class net.filebot.similarity.NameSimilarityMetric: 0.23333333
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 0.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.31578946
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
Psycho Beach Party (2000) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.08342391
* class net.filebot.similarity.NameSimilarityMetric: 0.15625
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 0.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.26086956
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
American Psycho (2000) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.09934427
* class net.filebot.similarity.NameSimilarityMetric: 0.19672132
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 0.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.3
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
The Maddening (1996) <=> D:\movies\Psycho.1999.BRRip.XviD.MP3-RARBG.avi
0.013559322
* class net.filebot.similarity.NameSimilarityMetric: 0.06779661
* class net.filebot.media.MediaDetection$1: 0.0
* class net.filebot.media.MediaDetection$2: 0.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
* class net.filebot.similarity.SequenceMatchSimilarity: 0.0
Result: [The Psychotic Odyssey of Richard Chase (1999), The Masked Strangler (1999), Kiss [1999] Psycho Circus in Buenos Aires (1999), Psycho (1998), Psycho (1960), Psycho Sisters (1998), Psycho Beach Party (2000), American Psycho (2000), The Maddening (1996)]
USA 4 December 1998
Singapore 31 December 1998
Germany 7 January 1999
http://www.imdb.com/title/tt0155975/rel ... =tt_ov_inf
User avatar
rednoah
The Source
Posts: 22976
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Strict mode processing files with bad matches

Post by rednoah »

Future revisions will take year-off-by-one into consideration when ranking options and picking the best one.
:idea: Please read the FAQ and How to Request Help.
Post Reply