Feature Request: Find and Remove Duplicates without using xattr

All your suggestions, requests and ideas for future development
Post Reply
oppositelock
Posts: 3
Joined: 18 Dec 2018, 17:30

Feature Request: Find and Remove Duplicates without using xattr

Post by oppositelock »

Would it be possible to have a script that would handle duplicates without using xattr? Xattr isn't supported by tools like rclone or GDFS so there isn't currently a good way to handle duplicates on mounted cloud storage. I think scoring file attributes similar to the way this python script is ideal. Giving users the option to change how certain attributes are scored would be a great addition, but not necessarily needed. Mediainfo provides the needed information, so I'm hopeful this wouldn't be too difficult to implement. This logic could also be used with the --conflict auto tag as well.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

1.
I'm somewhat reluctant, as that would necessarily imply running movie / series / episode auto-detection on each and every file, each and every time that script is run. The plexdupes.py script queries the Plex database, and won't work if you don't have Plex. Kinda like our existing duplicates script only works for files previously identified by FileBot and thus xattr tagged with full metadata.

:idea: Support for a different "storage engine" for xattr metadata might be interesting here though, so you can switch between filesystem xattr, and some other form on storing and retrieving xattr for specific files.


EDIT:

:idea: rclone and gdfs provide a virtual filesystem, so they can handle xattr read/write requests, and store them in some extra hidden file / folder. Maybe a feature request to support xattr would be good, to show the developers that users would be interested in that.



2.
The scoring plexdupes.py is simple and could easily be copied. However, the GPLv3 would probably prohibit me from doing that, even if I just copy the algorithm and scores, but rewrite it in Java / Groovy.

I'm reluctant here as well primarily due to complexity. Using resolution as a single integer number for "quality" is easy to understand and probably works for pretty much any real use case. A complex customizable scoring system (that inherently comes with many odd corner cases) isn't ideal for the "just works" nature of FileBot, and might not even work better for the average user.

:idea: Not sure if I want to build this into FileBot core itself, but it's perfectly conceivable as a 3rd party script.
:idea: Please read the FAQ and How to Request Help.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

r5987 now supports writing xattr to file filesystem:

Code: Select all

-Dnet.filebot.xattr.store=.xattr
e.g.

Code: Select all

Alias\Season 01\.xattr\Alias - S01E01 - Truth Be Told.avi\net.filebot.filename
Alias\Season 01\.xattr\Alias - S01E01 - Truth Be Told.avi\net.filebot.metadata
Alias\Season 01\Alias - S01E01 - Truth Be Told.avi
:idea: Please read the FAQ and How to Request Help.
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

Using filebot r5991 on Windows 10 with GDFS. Filebot amc is working correctly, in that it renames and moves files from one Google Team Drive to another with correct ID and metadata in the filename, and cleans residual files. Thank you. It seems to be doing so with fairly minimal data pulled from GD: about 1-2% of the file size for files 500MB - 5GB. Renames are happening server side between TDs on the same account - great! Overall was much quicker than I had expected.

Filebot does not seem to be able to write metadata via GDFS. Not a problem atm as renaming, moving and deduping is the primary objective. But just fyi, received the following error:

Code: Select all

Failed to set xattr: FileSystemException: G:\Team Drives\tv\Wolf Hall\Season 01\Wolf Hall - S01E05 - Crows [HDTV 720p x264 AC3 2.0].mkv:net.filebot.metadata: Incorrect function.
using this command:

Code: Select all

filebot -script fn:amc --output "G:\Team Drives\tv" --action move --conflict auto -non-strict "G:\Team Drives\tv_raw" --def "ut_label=TV" --def clean=y --log-file amc.log --def excludeList=amc.txt --def seriesFormat="{n}/{'Season '+{s.pad(2)}}/{n} - {s00e00} - {t} [{source} {vf} {vc} {ac} {channels}]"
I'm happy for now. Thanks again!
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

This error strongly suggests that -Dnet.filebot.xattr.store=.xattr is either not picked up correctly or has not actually been set:

Code: Select all

Failed to set xattr: FileSystemException: <path>:net.filebot.metadata: Incorrect function.

Are you sure that FILEBOT_OPTS is set to -Dnet.filebot.xattr.store=.xattr? Please check your Windows Environment Variables.


Please run the sysenv script so we can see what System Properties and Environment Variables are set:

Code: Select all

filebot -script fn:sysenv

:!: I'll push a new build (r5995) now to make sure we're on the same page. The new build will also use Windows Attributes to make the folder hidden on Windows (though no idea how GDFS deal with that, probably fine though).
:idea: Please read the FAQ and How to Request Help.
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

Thanks for the quick reply.

"Are you sure that FILEBOT_OPTS is set to -Dnet.filebot.xattr.store=.xattr? Please check your Windows Environment Variables."

There is no evidence of FILEBOT_OPTS in Windows Environment Variables. I installed from this link, the X64.msi version.

https://get.filebot.net/filebot/FileBot_4.8.5/

Just reinstalled from that link. Checked version is r5995. Ran another test and still have the same xattr error.

Shouldn't the installer set the environment variable?
EDIT: I manually set the FILEBOT_OPTS environment variable. Ran another test, but have the same error.
Last edited by 888 on 03 Jan 2019, 13:25, edited 1 time in total.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

I see. This option is indeed something you have to set yourself. Making it default is not planned.

Just run this command in CMD or PowerShell:

Code: Select all

cmd /c setx FILEBOT_OPTS "-Dnet.filebot.xattr.store=.xattr"
Alternatively, you can use the Environment Variables dialog to set this option.
:idea: Please read the FAQ and How to Request Help.
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

I set the environment variable manually, then ran the script again. Same error.

I then ran your setx command above, ran the script, same error.

It's not critical at the moment as the result is fine for now. But without xattr I imagine the quality compare feature of the amc script would not work as xattr would be missing for the already uploaded file?
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

888 wrote: 03 Jan 2019, 13:32 I then ran your setx command above, ran the script, same error.
Strange. Let's do a sanity check.


1. Open CMD and run this:

Code: Select all

setx FILEBOT_OPTS "-Dnet.filebot.xattr.store=.xattr"
2. Exit CMD:

Code: Select all

exit
3. Open new CMD and run this:

Code: Select all

filebot -script fn:sysenv

Please post the output, so we can confirm that options are set correctly, so that we can look into what might be going awry on your machine.
:idea: Please read the FAQ and How to Request Help.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

888 wrote: 03 Jan 2019, 13:32 But without xattr I imagine the quality compare feature of the amc script would not work as xattr would be missing for the already uploaded file?
--conflict auto tells FileBot how to resolve file path conflicts (i.e. can't move file if file already exists) only, so this is fairly unrelated to there being xattr or not.

The duplicates script on the other hand will check all your files, read xattr, create an inventory, and then check if there's multiple unique file paths that are the same movie or episode.
:idea: Please read the FAQ and How to Request Help.
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

Apologies: Rebooted after the latest env settings, then reran the script. Working correctly now, thank you.

Thank you for clarifying --conflict auto vs the duplicates script. I did misunderstand, thinking that amc included choosing the 'best' version of the same episode or movie. No problem, will run the dupes script separately to handle.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

888 wrote: 03 Jan 2019, 14:05 Thank you for clarifying --conflict auto vs the duplicates script. I did misunderstand, thinking that amc included choosing the 'best' version of the same episode or movie.
If you use a format such as {plex} that yields the same path for the same movie, then yes.
If you use a format such as {ny} {vf} {vc} {af} {ac} that may yield different paths for the same movie depending on the specific video file, then no.
:idea: Please read the FAQ and How to Request Help.
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

Understood. I sadly use the custom naming. My choice, so I accept the implications.

In any case, love filebot. Tx
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

Final query. I can post it as another topic if preferred.

Would it be possible to tell filebot where to write/read the .xattr folders/files? Similar to what is done with logs and exclude lists. This would allow storing the xattr data in a separate tree where it could be used/referenced separately from the larger video files.
User avatar
rednoah
The Source
Posts: 22923
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by rednoah »

Yes, maybe. The folder you pass in as net.filebot.xattr.store is resolved relatively to the parent folder of the file in question, but if that store path itself is absolute, then the absolute destination it is.


Try changing:

Code: Select all

-Dnet.filebot.xattr.store=.xattr
to:

Code: Select all

-Dnet.filebot.xattr.store=X:/MyXattrFolder
and see what happens.


:!: Note that this approach only allows xattr per unique filename, regardless in which folder this unique filename resides. Probably not an issue for the typical video archive though.
:idea: Please read the FAQ and How to Request Help.
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

Your suggestion worked perfectly, running filebot on Win10 with GDFS. The xattr data is written to a folder at the designated location.

Thank you!
888
Posts: 8
Joined: 03 Jan 2019, 09:44

Re: Feature Request: Find and Remove Duplicates without using xattr

Post by 888 »

rednoah wrote: 04 Jan 2019, 08:14
Try changing:

Code: Select all

-Dnet.filebot.xattr.store=.xattr
to:

Code: Select all

-Dnet.filebot.xattr.store=X:/MyXattrFolder
and see what happens.
This worked beautifully on Win10. Would you have any idea where this is stored or how to change it in Mac OS? I'm digging around in preferences, so far no luck.
Post Reply