The basic thing is, that copying makes a copy of the file, and linking (soft or hard) does not.
As an abstraction model, think of your directory as a table with:
filename where the file is content of the file
---------------------------------------------------------
a.txt sector 13456 abcd
b.txt sector 67679 bcde
When I copy a file, cp a.txt c.txt
, I get the following:
filename where the file is content of the file
---------------------------------------------------------
a.txt sector 13456 abcd
b.txt sector 67679 bcde
c.txt sector 79774 abcd
When I hard-link a file ln b.txt d.txt
, I get the following:
filename where the file is content of the file
---------------------------------------------------------
a.txt sector 13456 abcd
b.txt sector 67679 bcde
c.txt sector 79774 abcd
d.txt sector 67679 bcde
So, now b.txt
and d.txt
are exactly the same file. If I add a character f
to d.txt
, it will also appear in b.txt
The problem with hard linking is that you can only do it on the same filesystem. Therefore, most people use soft links, ln -s a.txt e.txt
:
filename where the file is content of the file
---------------------------------------------------------
a.txt sector 13456 abcd
b.txt sector 67679 bcde
c.txt sector 79774 abcd
d.txt sector 67679 bcde
e.txt sector 81233 "Look at where a.txt is located"
As a first order approximation, soft links are a bit like shortcuts in Windows. However, soft links are a part of the filesystem, and will therefore work with every program. Windows shortcuts are just a file that is interpreted by explore.exe
(and some other programs). But Windows programs need to do something in interpreting the shortcut, where as in Linux, soft links are handled automatically.
Most uses of links use soft links, because they are more flexible, can point to other filesystems, can be used with NFS et cetera.
The one use-case I have seen for hard links is to make sure that a file is not deleted by a user. The sysadmin created hard-links in a "pointer" directory and when the user inadvertently rm
-ed a file (which apparently happened a lot there) he could in no-time restore the file without the use of tape, without double disk space etc.
That works as follows:
filename where the file is content of the file
---------------------------------------------------------
a.txt sector 13456 abcd
b.txt sector 67679 bcde
When the user types rm a.txt
, the table will be:
filename where the file is content of the file
---------------------------------------------------------
b.txt sector 67679 bcde
All reference to a.txt
is lost. The disk space may be reclaimed for other files.
However, if the sysadmin keeps a copy of links to important files, the tabel will be:
filename where the file is content of the file
---------------------------------------------------------
a.txt sector 13456 abcd
b.txt sector 67679 bcde
link.a.txt sector 13456 abcd
link.b.txt sector 67679 bcde
When a user now types rm a.txt
, the table becomes:
filename where the file is content of the file
---------------------------------------------------------
b.txt sector 67679 bcde
link.a.txt sector 13456 abcd
link.b.txt sector 67679 bcde
Because there is still a reference to the file starting at 13456, the disk space of the file will not be marked as free. So the file is still there. When the user now asks if it would be possible somehow restore the a.txt
, the sysadmin simply dose ln link.a.txt a.txt
and the file a.txt
re-appears! And with its latest edits too. (of course, the link.a.txt
is in another directory on the same filesystem and this doesn't mean that you can forget about backups, but at that time and place, it was a useful option).