How to avoid redudant, duplicate hardlinks?

Any questions? Need some help?
Post Reply
AbedlaPaille
Posts: 107
Joined: 12 Apr 2020, 04:02

How to avoid redudant, duplicate hardlinks?

Post by AbedlaPaille »

rednoah wrote: 14 Sep 2016, 05:33 Description:
Finding duplicate movie or episode files is easy once all your content has been renamed and xattr tagged with FileBot. This script allows you to view or delete duplicates.


List Logical Duplicates:

Code: Select all

filebot -script fn:duplicates /path/to/files
:idea: There are logical duplicates when multiple files refer to the same movie or episode (e.g. 1080p and 4K versions of the same movie or episode).



List Binary Duplicates:

Code: Select all

filebot -script fn:duplicates --mode binary /input /output /backup
:idea: There are binary duplicates when multiple files are identical byte for byte (i.e. copies of the same file).

:!: NOTE: FileBot 4.9.2 (r7667) or higher is required to efficiently deduplicate links.


Options:
--action delete delete duplicate files
-mediainfo <options> perform -mediainfo command on duplicate files
-rename <options> perform -rename command on duplicate files


Advanced Options:
--order input sort duplicates by input argument order (default in --mode binary)
--order quality sort duplicates by highest quality first (default in --mode logical)
--order date sort duplicates by least recently created (according to Media Encoding Date or File Creation Date)
--order time sort duplicates by least recently modified (according to File Last-Modified Date)


Notes:
  • In --mode logical, as per --order quality, the --action delete option will keep only the one file with the highest video quality.
  • In --mode logical, this script will not parse or guess any information from the filename. Files that do not contain xattr metadata will be ignored.
  • In --mode binary, as per --order input, the --action delete option will keep only the first occurrence of a file according to the input argument order.
  • In --mode binary, this script may read files partially or entirely to compute checksums, which are then stored as xattr to speed up subsequent runs.
Say i have a hardlink library to browse by movie rating, those ratings can change. When filebot rescans my plex master structure to create hardlinks, it'll format the hardlinks with the new rating - which i want - but won't delete the older one - which i'd want.

Can --order time be altered to keep the most recently modified file instead of the least recently modified one?

Always want the new location/name to be kept while the other one gets deleted.

I realize after hardlinking i should run a second process with fn:duplicates at that hardlink location, but i'm not sure what should be the arguments. I want to keep best quality always, and in case those are binary copies i want to keep the new file/location format.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: How to avoid redudant, duplicate hardlinks?

Post by rednoah »

Why not delete the entire secondary structure at the root folder, and the re-generate it with updated movie ratings?
:idea: Please read the FAQ and How to Request Help.
AbedlaPaille
Posts: 107
Joined: 12 Apr 2020, 04:02

Re: How to avoid redudant, duplicate hardlinks?

Post by AbedlaPaille »

Because my library is big (800) and i'm managing a lot of hardlinks (10), most based on complex formats. Deleting and generating i've been doing, but manually every month or so -> it takes a while.

Here i'm trying to automatize my hardlinks creation/management as a follow up to my qBittorent amc post process -> disk usage and processing power grow in importance. Some of those hardlinks are using --filter for that same precious time/energy combo and i'm trying to go further.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: How to avoid redudant, duplicate hardlinks?

Post by rednoah »

So you're saying you have a complex slow format for 1st time processing, and then you have another simple fast format which primarily copies over the current file path, and specifically updates the movie rating?


:idea: Updating the movie rating does require a network request, so it will always be relatively slow. That's not something we can get around either way.


:idea: Processing files that have already been processed and xattr-tagged can be re-processed relatively quickly via Local Xattr Mode. The updated movie rating will still require a slow online request though.


Well, if you already have a folder full of hardlinks, then you can just use move those hardlinks in your update pass. You will end up with empty folders as your files / hardlinks are moved into a new structure (assuming that folder names change as movie ratings change) but the cleaner script can take care of those easily. (NOTE: The GUI will also auto-delete left-behind empty folders, but it's very conservative about it, to make sure you're not accidentally deleting important hidden files and such)
:idea: Please read the FAQ and How to Request Help.
AbedlaPaille
Posts: 107
Joined: 12 Apr 2020, 04:02

Re: How to avoid redudant, duplicate hardlinks?

Post by AbedlaPaille »

Interesting idea. Great for a manual pass every once in a while.

The problem is MOVE isn't compatible with new additions from the master structure.

So i'm stuck with hardlinks/duplicates for post processing new additions. Can't i make fn:duplicates pick/keep the most recently created file/folder in case of binary duplicates?
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: How to avoid redudant, duplicate hardlinks?

Post by rednoah »

Have you tried the recently added --order time option yet?
viewtopic.php?p=23171#p23171

e.g. delete least-recently modified binary duplicate

Code: Select all

filebot -script fn:duplicates --mode binary --order time --action delete /a /b /c
:!: Run without --action delete first to make sure it works the way you want before deleting files.
:idea: Please read the FAQ and How to Request Help.
AbedlaPaille
Posts: 107
Joined: 12 Apr 2020, 04:02

Re: How to avoid redudant, duplicate hardlinks?

Post by AbedlaPaille »

Amazing, i thought it did the opposite!

Looks like i'm all setup. Thanks for your help.
Post Reply