[FAQ] What is a hardlink?
Posted: 28 Jan 2025, 05:40
"Boiling 1 egg takes 3 minutes. How long does it take to boil 2 eggs?" Also 3 minutes because the egg boiler is not limited to boiling 1 egg at a time.
tl;dr The thing you think of as "file" is in fact a hardlink. You already know what a file / hardlink is. You just didn't know that you can have the same file / hardlink at multiple different file paths. That's it.
If you talk about a "file" then the thing you are talking about is actually just the file system entry. However, at the file system level, a "file" consists of 2 distinct things:

e.g. If you move a 1 GB file from X:/A to X:/B within the same drive / file system, then the operating system can just modify the file system entry and leave the physical data on disk untouched. No file data is read. No file data is written. That's why the operation is always instant no matter the file size. We refer to this as an "atomic move" operation.
e.g. If you move a 1 GB file from X: to Y: across different drives / file systems, then the operating system will have to create a new file system entry and physically copy the physical data on disk, which means reading 1 GB from X: and writing 1 GB to Y: making the operation extremely slow especially for larger files. We refer to this as a "copy+delete" operation.
e.g. If you hardlink a 1 GB file from X:/A to X:/B within the same drive / file system, then the operating system will create a new file system entry that points to the same physical data on disk. No file data is read. No file data is written. That's why the operation is always instant no matter the file size.
e.g. If you try to hardlink a 1 GB file from X: to Y: across different drives / file systems, then the operating system will throw an I/O error: cross-device link because the requested operation is conceptually impossible. You cannot create a file system entry that points to physical data on disk that does not physically exist on disk.
This is a hardlink (i.e. a normal file) with a link count of 1:
If you create an additional hardlink you will then have the same file twice - at different file paths - with a link count of 2:
Note that both files are the original file. The newly created additional hardlink is indistinguishable from the hardlink that was first created when the physical data on disk was written. ls does not account for hardlinks and thus sees 2 files with 293M each for a total of 586M. du however does account for hardlinks and thus shows the actual disk space used by the 2 files at hand.
You can delete a hardlink without affecting any other hardlink that links to the same physical data on disk:
Note that the physical data on disk is only freed when all corresponding hardlinks are deleted. If you delete a hardlink and the link count remains 1 or higher, then the file system will not free up any disk space.
However, if you modify a hardlink then you are in effect modifying the physical data on disk, and that will affect all hardlinks since all hardlinks point to the same physical data on disk:
You can create hardlinks remotely via SMB network shares, if you are using Windows or Linux.
However, macOS does not support the SMB unix extensions and so you will not be able to create hardlinks via SMB network shares from a macOS client. This is solely a client-side limitation of macOS.
Please organize your files like so, and then use /path/to/files as one and only bind mount:
You can then access your files at /volume1/input and /volume1/output from inside the container.
tl;dr The thing you think of as "file" is in fact a hardlink. You already know what a file / hardlink is. You just didn't know that you can have the same file / hardlink at multiple different file paths. That's it.

- The file system entry or hardlink for short. This is the thing you can see. This is the thing you can open / move / copy / delete / etc. You refer to the "the file system entry" as "the file" because that's the thing you can see and interact with.
- The physical data on disk. You cannot see physical data on disk. You cannot interact with physical data on disk. You have no concept whatsoever of "physical data on disk" (i.e. bits on a HDD or SSD) in your mental model since the file system abstracts away all the complexity.
e.g. If you move a 1 GB file from X:/A to X:/B within the same drive / file system, then the operating system can just modify the file system entry and leave the physical data on disk untouched. No file data is read. No file data is written. That's why the operation is always instant no matter the file size. We refer to this as an "atomic move" operation.
e.g. If you move a 1 GB file from X: to Y: across different drives / file systems, then the operating system will have to create a new file system entry and physically copy the physical data on disk, which means reading 1 GB from X: and writing 1 GB to Y: making the operation extremely slow especially for larger files. We refer to this as a "copy+delete" operation.
e.g. If you hardlink a 1 GB file from X:/A to X:/B within the same drive / file system, then the operating system will create a new file system entry that points to the same physical data on disk. No file data is read. No file data is written. That's why the operation is always instant no matter the file size.
e.g. If you try to hardlink a 1 GB file from X: to Y: across different drives / file systems, then the operating system will throw an I/O error: cross-device link because the requested operation is conceptually impossible. You cannot create a file system entry that points to physical data on disk that does not physically exist on disk.

Console Output: Select all
$ ls -lh
total 293M
-rw-r--r-- 1 root root 293M Aug 20 2017 Avatar.mp4

Console Output: Select all
$ ln -v Avatar.mp4 Avatar.2009.mp4
'Avatar.2009.mp4' => 'Avatar.mp4'
$ ls -lh
total 586M
-rw-r--r-- 2 root root 293M Aug 20 2017 Avatar.2009.mp4
-rw-r--r-- 2 root root 293M Aug 20 2017 Avatar.mp4
$ du -h
293M .

Console Output: Select all
$ rm -v Avatar.mp4
removed 'Avatar.mp4'
$ ls -lh
total 293M
-rw-r--r-- 1 root root 293M Aug 20 2017 Avatar.2009.mp4

Console Output: Select all
$ head -c 1M < /dev/random > Avatar.mp4
$ ls -lh
total 2.0M
-rw-r--r-- 2 root root 1.0M Jan 28 08:05 Avatar.2009.mp4
-rw-r--r-- 2 root root 1.0M Jan 28 08:05 Avatar.mp4
Notes on SMB network shares
The SMB protocol typically used for Windows network shares does support hardlinks.

Error: Select all
HARDLINK: Operation not supported: a.mkv -> b.mkv
Notes on Docker Bind Mounts
docker treats each bind mount as a separate filesystem. Thus, if you are using --action move or --action hardlink, then the input path and the output path must be on the same bind mount. If you process files across bind mounts, then --action hardlink will fail with I/O error: cross-device link, and --action move and --action duplicate will resort to physically copying files which is highly inefficient.
Code: Select all
/path/to/files/input
/path/to/files/output
Shell: Select all
-v /path/to/files:/volume1
yml: Select all
volumes:
- /path/to/files:/volume1