Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

skyanchor · Post by **skyanchor** » 23 Mar 2025, 01:14

I appreciate the efficiency and functionality that Filebot offers in organizing media libraries. However, I've encountered challenges when renaming movies with identical titles but different release years. The current matching process sometimes selects incorrect versions, leading to inaccuracies.

To improve precision, I propose integrating **runtime-based disambiguation** into the matching algorithm. By comparing the runtime of local media files with metadata from sources like TMDb, Filebot can more accurately identify and match movies, especially those with identical titles released in different years.

Benefits:

Increased Accuracy: Reduces mismatches by considering the duration of the media file.
User Satisfaction: Minimizes manual corrections, enhancing the user experience.
Competitive Edge: Positions Filebot as a more intelligent and reliable media management tool.

I believe this enhancement would significantly benefit users who manage extensive and diverse movie libraries. Thank you for considering this feature request.

My apologies if this duplicates an existing request.

Here's a code sample:

Code: Select all

def find_best_tmdb_match(movie_title, local_duration):
    search = tmdb.Search()
    response = search.movie(query=movie_title)
    best_match = None
    smallest_diff = float('inf')

    for result in search.results[:5]:  # limit to first 5 results
        movie_id = result['id']
        details = tmdb.Movies(movie_id).info()
        tmdb_runtime = details.get('runtime')
        if tmdb_runtime and local_duration:
            diff = abs(tmdb_runtime - local_duration)
            if diff < smallest_diff:
                smallest_diff = diff
                best_match = result

    return best_match

Post by **rednoah** » 23 Mar 2025, 09:14

skyanchor wrote: ↑23 Mar 2025, 01:14 I appreciate the efficiency and functionality that Filebot offers in organizing media libraries. However, I've encountered challenges when renaming movies with identical titles but different release years. The current matching process sometimes selects incorrect versions, leading to inaccuracies.

Please paste sample file paths as text so we can have a look. The MediaInfo properties would be useful too so that we can confirm the runtime.

skyanchor wrote: ↑23 Mar 2025, 01:14 To improve precision, I propose integrating **runtime-based disambiguation** into the matching algorithm. By comparing the runtime of local media files with metadata from sources like TMDb, Filebot can more accurately identify and match movies, especially those with identical titles released in different years.

The problem here is that TMDB search results do not include the movie runtime for each hit, so we'd have to request full movie details for every single movie before we can make a decision every single time, significantly slowing the the process in the average case, even though there is no benefit most of the time. Limiting full movie detail requests to only certain options to maybe help disambiguation depending. Probably worth doing if it's necessary to find the correct match, but ideally not something we do for every single file.

Match "Coma (2019).mkv" to Coma (2020) and not Coma (2019) would be a prime example where taking the movie runtime into consideration would be helpful, but letting a movie runtime match overtake a movie year match could be a slippery slope that might do more harm that good overall. The video file duration notably doesn't always match the movie runtime on TMDB, could easily be off by +20 minutes, maybe more, due to different editions, different cuts, etc. More research and test cases is needed here.

If you're using the CLI then you could use the --filter option to preemptively exclude short movies from being considered for matching, assuming that you know in advance that you're only going to be processing >50 minutes movies:

Shell: Select all

--filter "runtime > 50"

tl;dr using the movie runtime is a good idea in general; please paste sample file paths as text (the more the better) so we can have a look at it in detail.

If you are generating / rewriting replies with ChatGPT feel free to paste your initial human-handwritten text / prompt alongside the AI-generated text.

skyanchor · Post by **skyanchor** » 23 Mar 2025, 22:19

Thanks for your detailed response. I appreciate the careful thought you've put into this.

I understand your concerns, but I'd like to directly address each of them:

Performance Concerns:

I completely agree that fetching detailed metadata (including runtime) from TMDb for every single search result would be inefficient. However, this overhead can be minimized effectively by only fetching detailed data when multiple ambiguous matches occur (same or similar titles without provided years). This targeted approach should alleviate performance impacts significantly.

Accuracy Concerns:

I acknowledge that runtime discrepancies between editions (theatrical, extended, director's cuts) do exist, but these extreme runtime variations (±20 minutes or more) are relatively uncommon in practice. A reasonable threshold (±5 minutes, for example) could handle the vast majority of cases effectively without causing unintended mismatches.

Slippery Slope Concerns:

I'm not suggesting that runtime disambiguation replace or override existing year-based matching logic. Rather, I'm advocating that runtime be leveraged as an additional, secondary disambiguation factor specifically in cases where the filename lacks a provided year and multiple movies share identical or closely similar titles. This supplementary use of runtime data should improve accuracy significantly without compromising existing functionality.

My ultimate goal here is straightforward: ensuring that Filebot returns consistently formatted filenames that include the correct release year. This is critical because media software like Plex relies heavily on correctly formatted filenames—including release years—for accurate metadata retrieval.

Thanks again for your openness to consider this enhancement. Here is some sample pseudocode that may help to clarify:

Code: Select all

function find_movie_year(title, local_runtime, filename_has_year, filename):
    if filename_has_year:
        return apply_existing_logic(title)

    candidates = tmdb_search(title)

    if candidates.length == 1:
        return candidates[0].year

    # Fetch detailed runtimes for ambiguous cases only
    for candidate in candidates:
        detailed_info = fetch_tmdb_details(candidate.id)
        candidate.runtime_difference = abs(detailed_info.runtime - local_runtime)

        # Immediately prioritize exact runtime match (no commercials)
        if candidate.runtime_difference == 0:
            return candidate.year

    # Check if filename explicitly mentions "[Commercials]"
    if "[Commercials]" in filename:
        adjusted_runtime = local_runtime - 15  # conservative commercial estimate
    else:
        adjusted_runtime = local_runtime

    # Expanded tolerance accounts for commercials, cuts, etc.
    tolerance = 15
    filtered_candidates = filter candidates where abs(candidate.runtime - adjusted_runtime) <= tolerance

    if filtered_candidates.length == 1:
        return filtered_candidates[0].year
    else:
        return apply_existing_disambiguation_logic(candidates)

skyanchor · Post by **skyanchor** » 23 Mar 2025, 23:28

FYI: Here's the entire ChatGPT thread which includes the above:
https://chatgpt.com/share/67e08e84-2d8c ... 85e0240013

Post by **rednoah** » 24 Mar 2025, 06:19

Are you using the GUI or the CLI?

ChatGPT wrote: Provide specific examples, such as “Coma (2019).mkv” vs. “Coma (2020)” with actual runtime data from MediaInfo and TMDb. Real-world examples significantly bolster your case.

Please paste sample file paths as text so we can have a look. The MediaInfo properties would be useful too so that we can confirm the runtime. We really really need file paths for testing. If you cannot provide file paths for some reason, please say so. You can send a private message if you prefer that.

You can use the filebot tool to print the file path and video duration like so:

Console Output: Select all

$ filebot -mediainfo -r .
/volume1/Media/Test/Coma.mkv [640x480 x264 2.0 AC3 0∶23 2.0 Mbps]

** Please run this command on your test folder and paste the output, so that we can recreate your use case with sample files and run tests.

EDIT:

ChatGPT wrote: No, I won't provide examples.

I need 1 use case at the very least to implement and test the new feature. That way I can test if the newly written code actually works.

skyanchor · Post by **skyanchor** » 24 Mar 2025, 06:52

Thanks for your reply. While I don't have a specific file in front of me at the moment, here's a representative example to illustrate the issue:

Code: Select all

Filename: Movies/Crash.mp4

TMDb matches:

Crash (1996) — 100 min

Crash (2004) — 112 min

If the file’s actual runtime (e.g., from MediaInfo) is 112 minutes, it’s likely the 2004 film—but without a year in the filename, Filebot may select the wrong version. My suggestion is that, in these ambiguous cases, Filebot could fetch full metadata only when needed and prefer matches that fall within a small runtime tolerance (e.g., ±5 min). This would make matches more accurate while minimizing performance overhead.

I’d also add that this issue is only going to become more common. More and more films are being released with the same titles—intentionally or not—and we already have numerous titles (e.g., Coma, Cinderella, Crash, Possession) that have been reused multiple times across decades. Some have three or more distinct entries under the same name on TMDb. Runtime is one of the few reliable signals still available when filenames don’t include the year.

Thanks again for considering this!

Post by **rednoah** » 24 Mar 2025, 07:26

skyanchor wrote: ↑24 Mar 2025, 06:52 I don't have a specific file in front of me at the moment

Please keep us posted with real-world examples as you find them. You can post the filebot -mediainfo output here or via private message. That goes for any kind of mismatch you find actually. The more test data that we have, the better we can test and fine-tune the logic.

Alternatively, a quick and dirty solution you could do right now, no need to wait for future updates, is the --filter option:
NOTE: we will assume that you're using the CLI for unattended automation, and not the GUI, since you did not respond to this question one way or another

1. The -list command allows us to see what we're working with:

Console Output: Select all

$ filebot -list --q Crash --db TheMovieDB -non-strict --format "{ny} | {runtime}"
Crash (2005) | 112
Crash! (1976) | 89
Crash (1996) | 100
Honeymoon Crasher (2025) | 93
Crash (1990) | 8

2. The --filter option allows us to exclude all the known-bad search results based on known movie name and known runtime:

Console Output: Select all

$ filebot -rename Crash.mp4 --db TheMovieDB --filter "n == /Crash/ && runtime == 112" -non-strict --action TEST --log INFO
[TEST] from [Crash.mp4] to [Crash (2005).mp4

3. You could then write a script to call different filebot commands for different files as part of your process. Perhaps in multiple passes. Pass 1 requires exact name / runtime match. Pass 2 allows for a more ±5 min runtime mismatch. That way you might be able deal with the low-hanging fruits automatically, so you can then focus on the more tricky less-easily identified files.

I'd usually use bash for this kind of use case, but since seem to be using Windows - and this is the FileBot Forums - lets try this with a filebot -find -exec filebot merry-go-round:

Console Output: Select all

$ filebot -find . -exec filebot -rename '{f}' --db TheMovieDB --filter 'n == /{fn}/ && runtime == {minutes}' -non-strict --action TEST --log INFO
Crash.mp4
Rename movies using [TheMovieDB]
Auto-detect movie from context [Crash.mp4]
Apply filter [n == /Crash/ && runtime == 112] on [33] options
Include [Crash (2005)]
[1] option remaining
[TEST] from [Crash.mp4] to [Crash (2005).mp4]
Processed 1 file

Post by **rednoah** » 27 Mar 2025, 12:09

FileBot r10530 (see Latest Beta Revisions and Release Candidates) adds enhanced support for matching media info from the file with movie info from the database (e.g. runtime minutes) to help put the correct movie match on top.

e.g.

Console Output: Select all

$ filebot -rename -r . --db TheMovieDB -non-strict --format "{ny} [{minutes} minutes @ {runtime} runtime]" --action TEST --log INFO
[TEST] from [b/Crash.mkv] to [b/Crash (1996) [100 minutes @ 100 runtime].mkv]
[TEST] from [a/Crash.mkv] to [a/Crash (2005) [112 minutes @ 112 runtime].mkv]

That said, we are currently limited to one singular real-world test case, so much *much* more testing is required.

EDIT:

[TMDB/Movie] Fix incorrect tmdb match provides multiple real-world file paths with media info tables that support this use case.

skyanchor · Post by **skyanchor** » 07 Apr 2025, 06:28

"I'm curious about the movie titles that have been most re-used over time?"

ChatGPT said:
"You're absolutely right—the reuse of movie titles across different years is a common practice in the film industry, often leading to multiple films sharing the same title across decades. Here are some notable examples:

"Crash":

Crash (1996)
Crash (2004)

"The Black Cat":

The Black Cat (1934)
The Black Cat (1941)

"Crime and Punishment":

Crime and Punishment (1935)
Crime and Punishment (1956)

"The Emperor's Candlesticks":

The Emperor's Candlesticks (1936)
The Emperor's Candlesticks (1937)

"The Three Musketeers":

The Three Musketeers (1921)
The Three Musketeers (1939)
The Three Musketeers (1973)
The Three Musketeers (1993)
The Three Musketeers (2011)

"The Wizard of Oz":

The Wizard of Oz (1925)
The Wizard of Oz (1933)
The Wizard of Oz (1939)

"The Blue Bird":

The Blue Bird (1918)
The Blue Bird (1940)
The Blue Bird (1976)

"Journey into Fear":

Journey into Fear (1943)
Journey into Fear (1975)

"The Last House on the Left":

The Last House on the Left (1972)
The Last House on the Left (2009)

"Gone in 60 Seconds":

Gone in 60 Seconds (1974)
Gone in 60 Seconds (2000)

These examples illustrate how the same title can be used for different films over time, sometimes leading to confusion when organizing or searching for specific movies. Implementing runtime-based disambiguation, as previously discussed, can help accurately identify and differentiate between such films."

skyanchor · Post by **skyanchor** » 07 Apr 2025, 06:33

"And, as I recall, there are multiple films made in the 21st century that have named doubles or triples as well?"

ChatGPT said:
"You're absolutely correct—the reuse of movie titles has continued into the 21st century, leading to multiple films sharing identical names. Here are some notable examples:

"Twilight":

Twilight (1998)
Twilight (2008)

"Alone":

Alone (2007)
Alone (2015)

"Funny Games":

Funny Games (1997)
Funny Games (2007)

"The Fast and the Furious":

The Fast and the Furious (1955)
The Fast and the Furious (2001)

These instances highlight the ongoing trend of reusing film titles, which can lead to confusion when organizing or searching for specific movies. Implementing runtime-based disambiguation, as previously discussed, can help accurately identify and differentiate between such films."

skyanchor · Post by **skyanchor** » 07 Apr 2025, 06:46

I had asked ChatGPT these questions and others weeks ago, when I submitted my request, but you apparently didn't read far enough in the chat to see them, so I included a couple of the replies above where they're hopefully easier to refer back to. Just earlier this evening, I came across another example when I was reading the following article:

https://variety.com/2025/film/global/ch ... 236358586/

The new film is titled "Blame", and I went on imdb.com to see if I could find more information on it. Unfortunately, the 2025 film by that name isn't listed there yet, but there are few other earlier films titled "Blame", which will make this at least a quadruple-use of a title as soon as the imdb database gets updated. If you include TV movies and short films, the number of duplicate titles is much larger:

https://www.imdb.com/find/?q=blame&s=tt ... =fn_ttl_ex

Post by **rednoah** » 07 Apr 2025, 07:09

I would need the real-world MediaInfo properties as well please. ChatGPT cannot generate that for you. You will need to actually have the file, and then copy & paste the MediaInfo properties either here or on pastebin.

The information provided by toddke was complete and thus extremely helpful. Note that (1) file name, (2) TMDB link and (3) MediaInfo properties are included. That is the information I absolutely need to create an automated test case. If you find more cases, please do make sure to include file name, TMDB link and MediaInfo properties share them like so:

toddke wrote: ↑02 Apr 2025, 14:02 Jackpot! (2024)/Jackpot! (2024) - 2160p - WEB-DL - FLUX.mkv; tmdb 1094138. Here is the mediainfo for Jackpot! (2024).

EDIT:

This will be a good test case. Please do provide (1) file name, (2) TMDB link and (3) MediaInfo properties when the movie is actually released (i.e. TMDB link) and you actually have the video file (i.e. file name and MediaInfo properties) at hand:

skyanchor wrote: ↑07 Apr 2025, 06:46 The new film is titled "Blame", and I went on imdb.com to see if I could find more information on it. Unfortunately, the 2025 film by that name isn't listed there yet, but there are few other earlier films titled "Blame", which will make this at least a quadruple-use of a title as soon as the imdb database gets updated.

Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation

Re: Enhance Movie Matching Accuracy with Runtime-Based Disambiguation