hardlink
may not satisfy all requirements for this, but it can be used for what it is, to make the hardlinks. It can accept file arguments, not only directories, and it seems it is always linking a group of identical files to the first in order. Also it will ignore zero size files.
fdupes
selects exactly what needed, but does not output real file arguments but a paragraph-mode output, with groups of identical files, every group is ended with an empty line.
So in order to be sure that the exact selections of fdupes
will be hardlinked, we have to call hardlink
separately once per paragraph. To avoid the case where two pairs of the same identicals exists for different owners or with different permissions. And of course files have to be filtered for binaries.
#!/bin/bash
unset arr i
while IFS= read -r f; do
# move file to array if binary
if file -i "$f" | grep -q "charset=binary"; then
arr[++i]="$f"
fi
# if end of paragraph and array has files, hardlink and unset array
if [[ "$f" == "" && "${arr[@]}" ]]; then
printf "\n => Hardlink for %d files:\n" "$i"
hardlink -n -c -vv "${arr[@]}"
unset arr i
fi
done < <(fdupes -rpio time .)
hardlink
with -n
parameter simulates and does not write anything, so test the above as is and remove -n
later.
Also filenames with newlines are not handled, testing with whitespaces seems ok.
hardlink
command for this purpose. Readman hardlink
.fdupes
is returning groups of files (in paragraph mode or using -1 in a line), you lose this grouping after the first following command, and you need it for any processing later, where will you point a hard link? To the first or last, according to your time order, of that files group. Also filenames should be preserved.