Page 1 of 1

Feature Request: Find and Remove Duplicates without using xattr

Posted: 18 Dec 2018, 17:49
by oppositelock
Would it be possible to have a script that would handle duplicates without using xattr? Xattr isn't supported by tools like rclone or GDFS so there isn't currently a good way to handle duplicates on mounted cloud storage. I think scoring file attributes similar to the way this python script is ideal. Giving users the option to change how certain attributes are scored would be a great addition, but not necessarily needed. Mediainfo provides the needed information, so I'm hopeful this wouldn't be too difficult to implement. This logic could also be used with the --conflict auto tag as well.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 19 Dec 2018, 10:41
by rednoah
1.
I'm somewhat reluctant, as that would necessarily imply running movie / series / episode auto-detection on each and every file, each and every time that script is run. The plexdupes.py script queries the Plex database, and won't work if you don't have Plex. Kinda like our existing duplicates script only works for files previously identified by FileBot and thus xattr tagged with full metadata.

:idea: Support for a different "storage engine" for xattr metadata might be interesting here though, so you can switch between filesystem xattr, and some other form on storing and retrieving xattr for specific files.


EDIT:

:idea: rclone and gdfs provide a virtual filesystem, so they can handle xattr read/write requests, and store them in some extra hidden file / folder. Maybe a feature request to support xattr would be good, to show the developers that users would be interested in that.



2.
The scoring plexdupes.py is simple and could easily be copied. However, the GPLv3 would probably prohibit me from doing that, even if I just copy the algorithm and scores, but rewrite it in Java / Groovy.

I'm reluctant here as well primarily due to complexity. Using resolution as a single integer number for "quality" is easy to understand and probably works for pretty much any real use case. A complex customizable scoring system (that inherently comes with many odd corner cases) isn't ideal for the "just works" nature of FileBot, and might not even work better for the average user.

:idea: Not sure if I want to build this into FileBot core itself, but it's perfectly conceivable as a 3rd party script.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 27 Dec 2018, 11:53
by rednoah
r5987 now supports writing xattr to file filesystem:

Code: Select all

-Dnet.filebot.xattr.store=.xattr
e.g.

Code: Select all

Alias\Season 01\.xattr\Alias - S01E01 - Truth Be Told.avi\net.filebot.filename
Alias\Season 01\.xattr\Alias - S01E01 - Truth Be Told.avi\net.filebot.metadata
Alias\Season 01\Alias - S01E01 - Truth Be Told.avi

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 10:01
by 888
Using filebot r5991 on Windows 10 with GDFS. Filebot amc is working correctly, in that it renames and moves files from one Google Team Drive to another with correct ID and metadata in the filename, and cleans residual files. Thank you. It seems to be doing so with fairly minimal data pulled from GD: about 1-2% of the file size for files 500MB - 5GB. Renames are happening server side between TDs on the same account - great! Overall was much quicker than I had expected.

Filebot does not seem to be able to write metadata via GDFS. Not a problem atm as renaming, moving and deduping is the primary objective. But just fyi, received the following error:

Code: Select all

Failed to set xattr: FileSystemException: G:\Team Drives\tv\Wolf Hall\Season 01\Wolf Hall - S01E05 - Crows [HDTV 720p x264 AC3 2.0].mkv:net.filebot.metadata: Incorrect function.
using this command:

Code: Select all

filebot -script fn:amc --output "G:\Team Drives\tv" --action move --conflict auto -non-strict "G:\Team Drives\tv_raw" --def "ut_label=TV" --def clean=y --log-file amc.log --def excludeList=amc.txt --def seriesFormat="{n}/{'Season '+{s.pad(2)}}/{n} - {s00e00} - {t} [{source} {vf} {vc} {ac} {channels}]"
I'm happy for now. Thanks again!

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 10:36
by rednoah
This error strongly suggests that -Dnet.filebot.xattr.store=.xattr is either not picked up correctly or has not actually been set:

Code: Select all

Failed to set xattr: FileSystemException: <path>:net.filebot.metadata: Incorrect function.

Are you sure that FILEBOT_OPTS is set to -Dnet.filebot.xattr.store=.xattr? Please check your Windows Environment Variables.


Please run the sysenv script so we can see what System Properties and Environment Variables are set:

Code: Select all

filebot -script fn:sysenv

:!: I'll push a new build (r5995) now to make sure we're on the same page. The new build will also use Windows Attributes to make the folder hidden on Windows (though no idea how GDFS deal with that, probably fine though).

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 13:00
by 888
Thanks for the quick reply.

"Are you sure that FILEBOT_OPTS is set to -Dnet.filebot.xattr.store=.xattr? Please check your Windows Environment Variables."

There is no evidence of FILEBOT_OPTS in Windows Environment Variables. I installed from this link, the X64.msi version.

https://get.filebot.net/filebot/FileBot_4.8.5/

Just reinstalled from that link. Checked version is r5995. Ran another test and still have the same xattr error.

Shouldn't the installer set the environment variable?
EDIT: I manually set the FILEBOT_OPTS environment variable. Ran another test, but have the same error.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 13:22
by rednoah
I see. This option is indeed something you have to set yourself. Making it default is not planned.

Just run this command in CMD or PowerShell:

Code: Select all

cmd /c setx FILEBOT_OPTS "-Dnet.filebot.xattr.store=.xattr"
Alternatively, you can use the Environment Variables dialog to set this option.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 13:32
by 888
I set the environment variable manually, then ran the script again. Same error.

I then ran your setx command above, ran the script, same error.

It's not critical at the moment as the result is fine for now. But without xattr I imagine the quality compare feature of the amc script would not work as xattr would be missing for the already uploaded file?

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 13:41
by rednoah
888 wrote: 03 Jan 2019, 13:32 I then ran your setx command above, ran the script, same error.
Strange. Let's do a sanity check.


1. Open CMD and run this:

Code: Select all

setx FILEBOT_OPTS "-Dnet.filebot.xattr.store=.xattr"
2. Exit CMD:

Code: Select all

exit
3. Open new CMD and run this:

Code: Select all

filebot -script fn:sysenv

Please post the output, so we can confirm that options are set correctly, so that we can look into what might be going awry on your machine.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 13:46
by rednoah
888 wrote: 03 Jan 2019, 13:32 But without xattr I imagine the quality compare feature of the amc script would not work as xattr would be missing for the already uploaded file?
--conflict auto tells FileBot how to resolve file path conflicts (i.e. can't move file if file already exists) only, so this is fairly unrelated to there being xattr or not.

The duplicates script on the other hand will check all your files, read xattr, create an inventory, and then check if there's multiple unique file paths that are the same movie or episode.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 14:05
by 888
Apologies: Rebooted after the latest env settings, then reran the script. Working correctly now, thank you.

Thank you for clarifying --conflict auto vs the duplicates script. I did misunderstand, thinking that amc included choosing the 'best' version of the same episode or movie. No problem, will run the dupes script separately to handle.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 14:15
by rednoah
888 wrote: 03 Jan 2019, 14:05 Thank you for clarifying --conflict auto vs the duplicates script. I did misunderstand, thinking that amc included choosing the 'best' version of the same episode or movie.
If you use a format such as {plex} that yields the same path for the same movie, then yes.
If you use a format such as {ny} {vf} {vc} {af} {ac} that may yield different paths for the same movie depending on the specific video file, then no.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 03 Jan 2019, 14:56
by 888
Understood. I sadly use the custom naming. My choice, so I accept the implications.

In any case, love filebot. Tx

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 04 Jan 2019, 04:15
by 888
Final query. I can post it as another topic if preferred.

Would it be possible to tell filebot where to write/read the .xattr folders/files? Similar to what is done with logs and exclude lists. This would allow storing the xattr data in a separate tree where it could be used/referenced separately from the larger video files.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 04 Jan 2019, 08:14
by rednoah
Yes, maybe. The folder you pass in as net.filebot.xattr.store is resolved relatively to the parent folder of the file in question, but if that store path itself is absolute, then the absolute destination it is.


Try changing:

Code: Select all

-Dnet.filebot.xattr.store=.xattr
to:

Code: Select all

-Dnet.filebot.xattr.store=X:/MyXattrFolder
and see what happens.


:!: Note that this approach only allows xattr per unique filename, regardless in which folder this unique filename resides. Probably not an issue for the typical video archive though.

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 04 Jan 2019, 13:42
by 888
Your suggestion worked perfectly, running filebot on Win10 with GDFS. The xattr data is written to a folder at the designated location.

Thank you!

Re: Feature Request: Find and Remove Duplicates without using xattr

Posted: 05 Jan 2019, 03:18
by 888
rednoah wrote: 04 Jan 2019, 08:14
Try changing:

Code: Select all

-Dnet.filebot.xattr.store=.xattr
to:

Code: Select all

-Dnet.filebot.xattr.store=X:/MyXattrFolder
and see what happens.
This worked beautifully on Win10. Would you have any idea where this is stored or how to change it in Mac OS? I'm digging around in preferences, so far no luck.