Duplicate Movie Removal Automation?

Running FileBot from the console, Groovy scripting, shell scripts, etc
Post Reply
nostalgicstone
Posts: 3
Joined: 11 Aug 2020, 22:13

Duplicate Movie Removal Automation?

Post by nostalgicstone »

I have been looking for a solution to this for months.

Basically what I am looking for is a way to automate the process of deleting older versions of movies. Let me explain.

In plex, when I download a newer copy of a movie it shows up as a duplicate. That makes sense, if I go to the folders I do indeed have two copies of the movie. I can manually delete the older movies one at a time after hunting them down, but with 6k plus movies that’s going to take way more time and labor than I’m willing to put in.

I tried Filebots “filebot -script fn:duplicates /path/to/files” but it only lists a tiny fraction of the actual duplicates, and it’s options do not allow for “delete oldest copy” … only “Delete lower quality”… which is not what I’m looking for.

For example I have two copies of the cartoon Aladdin. Both are the same resolution but one is has a slightly lower bitrate - in this case the newer copy. The reason I want to keep the one with the lower bitrate is because the filesize is less than half of the other, with no perceptible difference.

Can anyone point me in the right direction for finding duplicates and deleting the older one? (Last modified date).

Thanks!
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Duplicate Movie Removal Automation?

Post by rednoah »

If you use a format such as {plex} then duplicates will naturally disappear because different files will coalesce to the same destination file path, and in combination with --conflict override that will result in the most recent files overwriting any files that came before, leaving only the most recent.


filebot -script fn:duplicates indeed only works for files that have been processed with FileBot already and thus have xattr metadata readily available. Please read Metadata and Extended Attributes for details.


There are many ways to go about, but if only a tiny fraction of your files have xattr metadata, then that eliminates all the easy approaches.
:idea: Please read the FAQ and How to Request Help.
nostalgicstone
Posts: 3
Joined: 11 Aug 2020, 22:13

Re: Duplicate Movie Removal Automation?

Post by nostalgicstone »

Thank you for the response, but in plex this is not the behavior I am seeing. I have hundreds of movie folders that have duplicate movies in them, one old and one new. In the plex interface it shows "2" at the top left of the movies, indicating that there are 2 copies. There is no easy way within plex to remove more than one copy at a time, and the delete option does not give enough info about the file for me to discern which one is newer or older, and the delete process itself is very slow, and one at a time.

I have read in multiple posts about the Xattr data that filebot needs to find the movies, and yes every single one of mine have been processed by filebot yet I am still only finding 16 duplicate via that script. Whats more is that even if that script COULD find all the duplicates, it deletes the "lower quality" version, which is not conducive to my goals.

I am a complete noob to filebot as I only recently switched from the old, free version that I used to a limited capacity to the newest paid version.

Can someone please help me out with this from a complete beginners standpoint?

Thanks again!
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Duplicate Movie Removal Automation?

Post by rednoah »

1.
nostalgicstone wrote: 12 Aug 2020, 19:09 Thank you for the response, but in plex this is not the behavior I am seeing.
If you use the {plex} format then it's not possible to have duplicates on a file system level (i.e. the same movie twice with the same file name) irregardless Media Center software.

e.g. {plex} will always give the same path for the same movie, regardless of media info, and so the second occurrence of the movie will overwrite the first occurrence, neatly eliminating duplicates:

Code: Select all

$ filebot -rename *.mkv --db TheMovieDB --action COPY --format {plex} --conflict OVERRIDE
Rename movies using [TheMovieDB]
Auto-detect movie from context [Avatar.2009.HD.mkv]
Auto-detect movie from context [Avatar.2009.SD.mkv]
[COPY] from [Avatar.2009.HD.mkv] to [Movies/Avatar (2009)/Avatar (2009).mkv]
[OVERRIDE] Delete [Movies/Avatar (2009)/Avatar (2009).mkv]
[COPY] from [Avatar.2009.SD.mkv] to [Movies/Avatar (2009)/Avatar (2009).mkv]
Processed 2 files
:idea: You're probably using a custom format that that preserves quality information in the file name, thus allowing you to have logical duplicates (i.e. the same movie twice with different file names).



2.
nostalgicstone wrote: 12 Aug 2020, 19:09 I have read in multiple posts about the Xattr data that filebot needs to find the movies, and yes every single one of mine have been processed by filebot yet I am still only finding 16 duplicate via that script.
Metadata and Extended Attributes can be finicky and strongly rely on underlying file system capabilities. If you're exclusively working with a local file system, then Extended Attributes will just work. But once remote network shares are involved, files are copied around different file systems by different programs, etc, then the Extended Attributes will got silently lost at some point.

:idea: You can use filebot -script fn:xattr /path/to/files to view xattr metadata if available, but if the extended attributes where either never stored due to file system limitations, or just got lost later on, then you're out of luck in this regard.

nostalgicstone wrote: 12 Aug 2020, 19:09 Even if that script COULD find all the duplicates, it deletes the "lower quality" version, which is not conducive to my goals.
If that script could find all the duplicates, then modifying that script to "delete older files" instead of "lower quality files" would be a matter of seconds:
https://github.com/filebot/scripts/blob ... groovy#L48



3.
nostalgicstone wrote: 12 Aug 2020, 19:09 Can someone please help me out with this from a complete beginners standpoint?
Unfortunately, there is no built-in one-click solution that I can think of for your particular circumstances and requirements at this point in time.







EDIT: Since find -exec doesn't make it easy to list files in chronological order, this turns out to be much more tricky then previously assumed. Doable, but not easy.

:arrow: Personally, I'd just run the amc script (with default settings which includes {plex} as default format) on all your movies (in order of oldest to newest file; this part might be tricky and require some bash scripting) and then process them all into a brand new structure. Since we're using --action duplicate your original structure will remain untouched and the new structure (assumed to be on the same file system) won't use additional disk space because it's all hardlinks. The new structure can't have duplicates because of (1) and will have xattr (2) as an added bonus. Now you just point Plex at the new structure instead of the old one and it's done.
:idea: Please read the FAQ and How to Request Help.
nostalgicstone
Posts: 3
Joined: 11 Aug 2020, 22:13

Re: Duplicate Movie Removal Automation?

Post by nostalgicstone »

This is all good info, thanks a ton.

A couple of details - when you said:
"e.g. {plex} will always give the same path for the same movie, regardless of media info, and so the second occurrence of the movie will overwrite the first occurrence, neatly eliminating duplicates:"

- I do indeed have the same movie twice, with the same file name. Could it be because one is a MKV and one is a H.265 MP4 ?


I am not familiar with the ins and outs of the " {plex} format" so thank you very much for the link, I will be checking it out.

When you said the following:
"If that script could find all the duplicates, then modifying that script to "delete older files" instead of "lower quality files" would be a matter of seconds:
https://github.com/filebot/scripts/blob ... groovy#L48 "

I checked that out and I can't make heads or tails of it. I'm assuming that if I understood how the scripts work and how to write them, I could modify them myself but I don't.

Regardless thank you for the info, maybe I can find someone to help me write the script for filebot?
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Duplicate Movie Removal Automation?

Post by rednoah »

A.mkv and A.mp4 are two different file names for all intents and purposes, even though some operating systems may choose to hide file extensions from users.

That does throw a spanner in the works though. If the extension makes each file path unique, then you will end up with one file per file extension. I guess this approach won't work for you either.
:idea: Please read the FAQ and How to Request Help.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Duplicate Movie Removal Automation?

Post by rednoah »

The duplicates script has been augmented with configurable sort orders:
https://github.com/filebot/scripts/comm ... c3a368ecc8

Default for --mode binary:

Code: Select all

--order input
Default for --mode logical:

Code: Select all

--order quality
Sort by Least Recently Modified in either --mode binary or --mode logical:

Code: Select all

--order time
:idea: Please read the FAQ and How to Request Help.
Post Reply