[FAQ] What is a hardlink?

Talk about the Ultimate Question of Life, The Universe, and Everything
Post Reply
User avatar
rednoah
The Source
Posts: 23848
Joined: 16 Nov 2011, 08:59
Location: Taipei
Contact:

[FAQ] What is a hardlink?

Post by rednoah »

"Boiling 1 egg takes 3 minutes. How long does it take to boil 2 eggs?" Also 3 minutes because the egg boiler is not limited to boiling 1 egg at a time.


tl;dr The thing you think of as "file" is in fact a hardlink. You already know what a file / hardlink is. You just didn't know that you can have the same file / hardlink at multiple different file paths. That's it.



:idea: If you talk about a "file" then the thing you are talking about is actually just the file system entry. However, at the file system level, a "file" consists of 2 distinct things:
  1. The file system entry or hardlink for short. This is the thing you can see. This is the thing you can open / move / copy / delete / etc. You refer to the "the file system entry" as "the file" because that's the thing you can see and interact with.
  2. The physical data on disk. You cannot see physical data on disk. You cannot interact with physical data on disk. You have no concept whatsoever of "physical data on disk" (i.e. bits on a HDD or SSD) in your mental model since the file system abstracts away all the complexity.
The (1) file system entry points to (2) physical data on disk. Every file is a hardlink because every file is a file system entry that points to physical data on disk. If you move / copy a "file" then the operating system will take care of moving / copying the (1) file system entry and the (2) physical data on disk as necessary.

Screenshot


e.g. If you move a 1 GB file from X:/A to X:/B within the same drive / file system, then the operating system can just modify the file system entry and leave the physical data on disk untouched. No file data is read. No file data is written. That's why the operation is always instant no matter the file size. We refer to this as an "atomic move" operation.


e.g. If you move a 1 GB file from X: to Y: across different drives / file systems, then the operating system will have to create a new file system entry and physically copy the physical data on disk, which means reading 1 GB from X: and writing 1 GB to Y: making the operation extremely slow especially for larger files. We refer to this as a "copy+delete" operation.


e.g. If you hardlink a 1 GB file from X:/A to X:/B within the same drive / file system, then the operating system will create a new file system entry that points to the same physical data on disk. No file data is read. No file data is written. That's why the operation is always instant no matter the file size.


e.g. If you try to hardlink a 1 GB file from X: to Y: across different drives / file systems, then the operating system will throw an I/O error: cross-device link because the requested operation is conceptually impossible. You cannot create a file system entry that points to physical data on disk that does not physically exist on disk.




:idea: This is a hardlink (i.e. a normal file) with a link count of 1:

Console Output: Select all

$ ls -lh
total 293M
-rw-r--r-- 1 root root 293M Aug 20  2017 Avatar.mp4

:idea: If you create an additional hardlink you will then have the same file twice - at different file paths - with a link count of 2:

Console Output: Select all

$ ln -v Avatar.mp4 Avatar.2009.mp4
'Avatar.2009.mp4' => 'Avatar.mp4'
$ ls -lh
total 586M
-rw-r--r-- 2 root root 293M Aug 20  2017 Avatar.2009.mp4
-rw-r--r-- 2 root root 293M Aug 20  2017 Avatar.mp4
$ du -h
293M	.
Note that both files are the original file. The newly created additional hardlink is indistinguishable from the hardlink that was first created when the physical data on disk was written. ls does not account for hardlinks and thus sees 2 files with 293M each for a total of 586M. du however does account for hardlinks and thus shows the actual disk space used by the 2 files at hand.


:idea: You can delete a hardlink without affecting any other hardlink that links to the same physical data on disk:

Console Output: Select all

$ rm -v Avatar.mp4
removed 'Avatar.mp4'
$ ls -lh
total 293M
-rw-r--r-- 1 root root 293M Aug 20  2017 Avatar.2009.mp4
Note that the physical data on disk is only freed when all corresponding hardlinks are deleted. If you delete a hardlink and the link count remains 1 or higher, then the file system will not free up any disk space.


:!: However, if you modify a hardlink then you are in effect modifying the physical data on disk, and that will affect all hardlinks since all hardlinks point to the same physical data on disk:

Console Output: Select all

$ head -c 1M < /dev/random > Avatar.mp4
$ ls -lh
total 2.0M
-rw-r--r-- 2 root root 1.0M Jan 28 08:05 Avatar.2009.mp4
-rw-r--r-- 2 root root 1.0M Jan 28 08:05 Avatar.mp4




Notes on SMB network shares

The SMB protocol typically used for Windows network shares does support hardlinks.

:idea: You can create hardlinks remotely via SMB network shares, if you are using Windows or Linux.

:!: However, macOS does not support the SMB unix extensions and so you will not be able to create hardlinks via SMB network shares from a macOS client. This is solely a client-side limitation of macOS.

Error: Select all

HARDLINK: Operation not supported: a.mkv -> b.mkv



Notes on Docker Bind Mounts

docker treats each bind mount as a separate filesystem. Thus, if you are using --action move or --action hardlink, then the input path and the output path must be on the same bind mount. If you process files across bind mounts, then --action hardlink will fail with I/O error: cross-device link, and --action move and --action duplicate will resort to physically copying files which is highly inefficient.

:arrow: Please organize your files like so, and then use /path/to/files as one and only bind mount:

Code: Select all

/path/to/files/input
/path/to/files/output

Shell: Select all

-v /path/to/files:/volume1

yml: Select all

volumes:
  - /path/to/files:/volume1
You can then access your files at /volume1/input and /volume1/output from inside the container.
:idea: Please read the FAQ and How to Request Help.
Post Reply