Replace invalid Windows filename characters with unicode "equivalents"

All about user-defined episode / movie / file name format expressions
Post Reply
xaeiou
Posts: 15
Joined: 22 Oct 2019, 06:24

Replace invalid Windows filename characters with unicode "equivalents"

Post by xaeiou »

Hi,

I read through this excellent summary of formatting rules pinned here:

viewtopic.php?f=5&t=2

There are rules in that thread which replace characters invalid in the Windows filenames with more regular characters (eg ":" to "-"). However, unless I missed it, there are none that take the approach I prefer: replacing them with similar looking unicode "equivalents". Apologies if I did miss such a rule - please point it out if you've already found one.

This approach works well when your files are on a Linux filesystem (where mostly anything goes except "/"), but you're sharing them via samba to Windows machines.

Here are a couple of references on options, including a program that will help if the files already exist:

https://stackoverflow.com/a/61448658
https://github.com/DDR0/fuseblk-filename-fixer

In the latter case, here are the recommended replacements - this if from the rust code, but I think it is pretty clear what's going on to readers here - two of them are escaped with a backslash as noted:

Code: Select all

const MS_RESERVED_STRINGS: [(&str, &str); 9] = [
	("<", "﹤"),
	(">", "﹥"),
	(":", "ː"),
	("\"", "“"),         -- " escaped
	("/", "⁄"),
	("\\", "∖"),        -- \ escaped
	("|", "⼁"),
	("?", "﹖"),
	("*", "﹡"),
];
In case this doesn't format properly, you can see the original code here: https://github.com/DDR0/fuseblk-filenam ... rc/main.rs

It's fairly straightforward to build a format expression from this, but I have a couple of questions:

1. Does anyone have any better replacement suggestions from the unicode character set? "Better" might include being clearer or prettier on Windows, tho of course this is a bit subjective. For example, I prefer replacing "?" with "?" rather than "﹖" suggested above. I'm still experimenting...

2. A question for rednoah...if it doesn't already exist, would it be worthwhile to have a built-in function for this? It's a fairly common NFS->Samba mapping issue, and there are different systems around that solve this with a single formatting call (which you can usually configure).

Maybe there is something in groovy for this already, but I couldn't find any suitable function among those listed here: https://www.filebot.net/naming.html

This post (including the subject) edited for clarity when I woke up this morning :)
xaeiou
Posts: 15
Joined: 22 Oct 2019, 06:24

Re: Replace invalid Windows filename characters with unicode "equivalents"

Post by xaeiou »

Here is my format for the above (including my preferred change for "?"), prefixed here with the string I used to test the replaces:

Code: Select all

{'<>:"/\\|?*'.replace('<', '﹤').replace('>', '﹥').replace(':', 'ː').replace('"', '“').replace('/', '⁄').replace('\\', '∖').replace('|', '⼁').replace('?', '?').replace('*', '﹡')}
User avatar
rednoah
The Source
Posts: 22975
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

Re: Replace invalid Windows filename characters with unicode "equivalents"

Post by rednoah »

We could easily add a convenience method, but someone would have to do extensive testing on what actually works well across all platforms and font combinations.


:arrow: I'd start with sample code that people can just copy & paste & modify according to what works best for them:

Code: Select all

'<>:"/\\|?*'.replace(
	'<' : '﹤',
	'>' : '﹥',
	':' : 'ː',
	'"' : '“',
	'/' : '⁄',
	'|' : '⼁',
	'?' : '﹖',
	'*' : '﹡',
	'\\': '∖'
)


:!: Do make sure to avoid using right-to-left character as replacement, as that will result is really funky cursor movement that will be utterly incomprehensible to non-technical users.
:idea: Please read the FAQ and How to Request Help.
xaeiou
Posts: 15
Joined: 22 Oct 2019, 06:24

Re: Replace invalid Windows filename characters with unicode "equivalents"

Post by xaeiou »

Thanks. I'm new to groovy, I didn't realise you could combine replaces like that.
:!: Do make sure to avoid using right-to-left character as replacement, as that will result is really funky cursor movement that will be utterly incomprehensible to non-technical users.
Yes, you're right. I have lots of stuff in different languages, and sometimes what I see on the screen, especially on a console output, isn't always the actual filename depending on the font used. This is especially true for languages that use reverse displacements to insert diacritics like Thai and Hindi it's very easy to get very confused. :?

Appreciate your feedback and I'll update my .groovy files to use the more succinct syntax you've suggested.
Post Reply