Finding duplicates

vthunder · Post by **vthunder** » 12 Oct 2015, 21:59

Hi,

My 'TV Shows' folder contains some duplicates, and I'd like to find them.

I've been running each show through filebot (amc), and it has been fixing filenames etc. In some cases, there were multiple copies of an episode. In that case, filebot renames one, then skips the rest because the destination filename already exists.

Since I'm trying to clean up the TV Shows folder, that's not quite what I want in this case. Could I get filebot to move these files somewhere? Or even just print something to the logs about them so I can grep later?

To be clear: I am interested in finding files that:
1) cannot be moved because the destination file already exists
2) are not "better" than the destination file
3) are not equal to the destination file

The only suggestion I saw in the forums is to have filebot move *everything* to a new TV Shows folder... that's a little extreme, is that really the preferred way?

Thanks!

Post by **rednoah** » 13 Oct 2015, 06:16

FileBot has no particular support for finding duplicates.

1.
If all files are tagged with xattr metadata (i.e. have been processed with FileBot before) then it'll be easy to find duplicates in seconds, and won't need to process anything again. There's no shared script for that yet, but it's no more than a few lines of code. Probably similar to the find-missing-episodes script.

2.
When processing all files, and there's duplicates, then you should be able to grep them from the output easily since it'll say SKIPPED [/path/to/file] but in this case you could also make smart use of the {di} duplicate index binding and use that to move these files elsewhere.

Post by **rednoah** » 22 Dec 2016, 12:21

Finding binary duplicate is easy and boring. There's a million tools that do that and writing a simple tool yourself is like 1-3 lines of bash script.

Finding logical duplicates (e.g. some movie with different quality settings) and keeping the best one is much more interesting.

FileBot has got a script for that nowadays: viewtopic.php?t=3986

EDIT:

Find Duplicate Movie or Episode Files

Finding duplicates

Finding duplicates

Re: Finding duplicates

Re: Finding duplicates