Skip to main content

Questions tagged [deduplication]

For questions where a task is to be applied to only one instance of multiple copies of data (files or blocks of data on a filesystem, or strings in a text), or where duplicates of the first such instance are to be ignored for space/time saving purposes.

Filter by
Sorted by
Tagged with
0 votes
1 answer
32 views

Deduplication tool which is able to only compare two directories against each other?

I tried rdfind and jdupes, and if I specify two directories for them, they both match not only files in one directory against other directory, but also files inside of one of the given directories ...
bodqhrohro's user avatar
0 votes
1 answer
38 views

Pop OS with a deduplication filesystem

I'm moving a friend development machine to Linux (PopOS), permanently. Don't worry guys, he dualbooted and he's ready for the tux. The problem is his drive. It's a 256GB SSD, and he is moving from a ...
DarkGhostHunter's user avatar
-1 votes
2 answers
124 views

Batch rename of files that share the same prefix

I have a list of files on my server with a prefix that I want to de-dupe. These are completely different generated files. It seems to be generated files with {Title} - {yyyy-MM-dd}_{random} - {...
mysterio21_troy's user avatar
0 votes
0 answers
48 views

20+ backup directories, I'd like to dedupe all files to 1 "master directory"

As the title suggests, I have inherited a file structure where there are about 30 "complete or partial backups" of a fileserver full of text files. This obviously makes no sense, and I'd ...
Frank Rizzo's user avatar
5 votes
6 answers
890 views

Keep unique values (comma separated) from each column

I have a .tsv (tab-separated columns) file on a Linux system with the following columns that contain different types of values (strings, numbers) separated by a comma: col1 col2 . NS,NS,...
df_v's user avatar
  • 51
0 votes
2 answers
498 views

Standalone Fileserver with deduplication wanted

Situation: I want to reinstall a Homelab Server (Windows OS) as a linux-based Server Server | Purpose: Backup System (mostly offline) I currently have an HP Proliant Microserver N54 Turion II Neo N54l ...
David's user avatar
  • 1
1 vote
1 answer
59 views

See if any of a number of zip files contains any of the original files in a directory structure

I have a pretty hard problem here. I have a photo library with a lot of photos in it in various folders. I then started using Google Photos for my photos, I put those originals into Google Photos, and ...
Albert's user avatar
  • 161
-1 votes
2 answers
165 views

Join files together without using space in filesystem [duplicate]

I want to join (concatenate) two files in Linux without using space in the filesystem. Can I do this? A + B = AB The file AB use sectors or fragments of A and B from the filesystem. Is it possible to ...
ArtEze's user avatar
  • 65
0 votes
2 answers
78 views

I want list only list the text using grep or any other option using shell script

I have folder called rules/resources, in that I have the sub folders lets say A, B and C. Each sub-folder contains constraint.yaml. Now I want to grep the constraint.yaml files which contain the ...
Maaaa's user avatar
  • 1
0 votes
1 answer
253 views

Removing duplicates from multiple directories(more than 2 paths) using rmlint or another tool

I am trying to remove duplicated files and folders from several directories, and I was wondering if rmlint supports inputting multiple directories (I know you can use two directories if one of them is ...
ricardo3889's user avatar
0 votes
2 answers
945 views

Linux command for moving, merging and renaming duplicates

I am trying to move directories (with sub-directories and files) to another directory. With mv some folders are not merging because the same directory exists with files. This is no good because even ...
linuxuser24569's user avatar
2 votes
2 answers
3k views

How to get deduplicatuion for Ext4 partition used by Debian, Ubuntu and Linux Mint?

Ext4 don't support de duplication, against p.e. btrfs, bcachefs and ZFS, reduplication by standard. How to get support of reduplication for Ext4 ?
Alfred.37's user avatar
  • 207
3 votes
1 answer
1k views

Choosing the right block size for duperemove

I am trying to deduplicate a BTRFS filesystem with multiple subvolumes. Altogether, it holds around 3.5 TB of data, which I expect to be slightly more than half that size after deduping. I am mostly ...
user149408's user avatar
  • 1,425
4 votes
1 answer
3k views

Is there a way to consolidate (deduplicate) btrfs?

I have a btrfs volume, which I create regular snapshots of. The snapshots are rotated, the oldest being one year old. As a consequence, deleting large files may not actually free up the space for a ...
user149408's user avatar
  • 1,425
0 votes
4 answers
158 views

Filtering duplicates with AWK differing by timestamp

Given the list of files ordered by timestamp as shown below. I am seeking to retrieve the last occurrence of each file (the one at the bottom of each) For example: archive-daily/document-sell-report-...
Luciano's user avatar
-2 votes
1 answer
575 views

low-memory server with ZFS deduplication: can zram help?

Is it a good idea to use compressed memory if people want to save on expensive RAM when using ZFS with deduplication?
jim7475's user avatar
  • 31
-1 votes
1 answer
390 views

can swap help a low-memory ZFS server?

If we cannot buy many RAM, can we replace the "missing" RAM with higher amount of SWAP? Ex.: using a dedicated 512 GB SSD for SWAP instead of 512 GB RAM for ZFS with deduplication on an ...
jim7475's user avatar
  • 31
0 votes
2 answers
52 views

Why do these two commands to write text-processing results back to the input file behave so differently?

I have a file authorized_keys and want to deduplicate the content, i.e. remove duplicate entries. I found two possible solutions to achieve this: Use cat and uniq, then redirect the output to the ...
vicky's user avatar
  • 1
0 votes
1 answer
77 views

Why did sort -u removed duplicates only in pipe?

In this command, sort -u removed duplicates. curl https://en.wikipedia.org/wiki/Help:Special_page -s | grep -oP 'Special:\K[a-zA-Z0-9]*' | sort -u > special_page_names In this command, it didn't. ...
Lahor's user avatar
  • 123
1 vote
1 answer
925 views

How to let already copied files share fragments (reflink)?

I copy a file to a different XFS volume on daily basis as follows: # on monday cp --sparse=always /mnt/disk1/huge.file /mnt/disk2/monday/huge.file # on tuesday cp --sparse=always /mnt/disk1/huge.file /...
mgutt's user avatar
  • 497
0 votes
1 answer
1k views

lvmvdo doesn't deduplicate my data

I am installing lvmvdo con Debian 11.2 to store Proxmox vm disk, doing the following : 1) apt install -y build-essential libdevmapper-dev libz-dev uuid-dev git sudo libblkid-dev man vim dwarves dkms ...
Yanluis Laya's user avatar
1 vote
1 answer
273 views

Meaning of Deduplicated during Borg Create's Realtime Output

When the borg create command is used with the --progress argument, it output like this: 5.50 GB O 5.10 GB C 23.95 kB D 15600 N /path/to/current/file/being/processed I was able to locate what the ...
Lonnie Best's user avatar
  • 5,255
1 vote
2 answers
3k views

How to install vdo/kvdo on Ubuntu 20.04?

I would like to know if there is a way to install Red Hat's vdo in Ubuntu 20.04. So far, I have tried to download the source and compile it, but I get the following error: cc -fPIC -fpic -D_GNU_SOURCE ...
k.Cyborg's user avatar
  • 527
6 votes
3 answers
4k views

Finding duplicate files with same filename AND exact same size

I have a huge songs folder with a messy structure and files duplicated in multiple folders. I need a recommendation for a tool or a script that can find and remove duplicates with simple two matches: ...
Electrifyings's user avatar
4 votes
1 answer
2k views

Is there a fuzzy duplicate finder for videos, that does not require a GUI?

I am currently trying to eliminate duplicated videos with minimal changes. Those might be a slightly different encoding, a lower resolution or just changed meta data. These videos are in a complex ...
Auravendill's user avatar
0 votes
6 answers
556 views

Remove adjacent duplicated words from string

I have a string like this string: one one tow tow three three tow one three How can i remove duplicated words to make it like this: one tow three tow one three The point is that I want to write a ...
meme246p's user avatar
0 votes
4 answers
1k views

How to remove duplicate values on the same row using awk?

I want to remove duplicated columns/fields on the same row only. I tried, but I ended up with a long code with nest loops, conditions and arrays that doesn't work correctly. input data: 1 2 3 4 1 2 3 ...
CodingNoob's user avatar
0 votes
2 answers
353 views

Script to find duplicate files by extension and delete them

Recently my NAS was ransomware attacked and all my files were 7zipped. I managed to get the password and extract them and at the same time I renamed the 7zipped file to 7z.bad (so that it's easier ...
Ilias Kouroudis's user avatar
3 votes
1 answer
4k views

Is there any advantage to using a hard-link on ZFS instead of relying upon deduplication when considering only disk space allocation?

If I want to create multiple instances of a file on a ZFS file system, is there any advantage to using a hard-link instead of relying upon deduplication as a method of preserving disk space? This ...
Zhro's user avatar
  • 2,759
0 votes
0 answers
113 views

I want to write a shell script to delete duplicates in a directory

I do find ./ -type f -exec md5 '{}' \; > tempfile tempfile has lines like this: MD5 (.//Photos-10/IMG_20200901_183050612.jpg) = 2e1f245d195b8d2c3z926dbe0410f7b5 I want to check for repeated ...
Muralidhar Kamidi's user avatar
2 votes
2 answers
2k views

Deduplicating Files while moving them to XFS

I've got a folder on a non reflink-capable file system (ext4) which I know contains many files with identical blocks in them. I'd like to move/copy that directory to an XFS file system whilst ...
Marcus Müller's user avatar
1 vote
2 answers
3k views

How to use `rmlint` to remove duplicates only from one location and leave all else untouched?

I have two locations /path/to/a and /path/to/b. I need to find duplicate files in both paths and remove only items in /path/to/b items. rmlint generates quite a large removal script, but it contains ...
ylluminate's user avatar
2 votes
1 answer
341 views

FIDEDUPERANGE ioctl doesn't behave as expected on btrfs

According to ioctl_fideduperange, The maximum size of src_length is filesystem dependent and is typically 16 MiB. However, I've been able use src_length of > 1 Gib successfully with a single call ...
jrw32982's user avatar
  • 875
1 vote
1 answer
712 views

How to use rmlint to merge two large folders?

In exploring options to merge two folders, I've come across a very powerful tool known as rmlint. It has some useful documentation (and Gentle Guide). I have a scenario that I previously mentioned and ...
ylluminate's user avatar
1 vote
2 answers
950 views

Finding duplicate files using bash script

How do you write a bash one-liner that will find binary files with identical contents, permissions, and owner on the same ext4 file-system, from the current working directory recursively, and replace ...
Himadri Ganguly's user avatar
0 votes
1 answer
313 views

Find duplicate paragraphs in two files and delete one

I have two bib files, some of the entries are duplicates, the duplicates entries are in paragraphs, or could be identified with same pattern, e.g. a.bib looks like @InProceedings{Arranged, author = {...
John Chen's user avatar
1 vote
1 answer
31 views

Separately storing parts of text files and their reconstruction: symlinks with multiple targets?

I have two text files whose headers are different, while their contents is the same. $ cat original_file_v1 header 1 beginning header 1 contents header 1 end common contents line 1 common contents ...
BowPark's user avatar
  • 4,985
1 vote
1 answer
171 views

Hard link duplicate files based on just size

I'm currently running rdfind on a directory containing more than 4TB of files. Since the checksum part takes an inordinate amount of time I'm looking for alternatives. I know fairly certain that there ...
Martijn's user avatar
  • 125
2 votes
0 answers
127 views

dedup seems to be working, but free space says otherwise

I'm testing out dedup with zol on Debian. I've set dedup=on on the zvol before copying any data, then copied data to the mounted volume with no problems. Then I've run cp -r folder1/* folder2/ on test ...
laughing muppet's user avatar
2 votes
0 answers
547 views

Can I share a flatpak directory across hosts?

The context is more or less as follows: I run Distro Foo as my main driver. Inside that I have also set up a few chroots or schroots with Distro Bar, Distro Baz, and an older (or newer) version of ...
Luis Machuca's user avatar
0 votes
2 answers
125 views

Remove duplicated first fields of a CSV file

I try to remove repetitions of the same value of the first column from a CSV file without changing the other cell contents and alignment (in other columns). My txt: ACCIDENT EP 4 STEM PERCUS,, ...
Jf Viguie's user avatar
3 votes
2 answers
470 views

Remove duplicate lines from files recursively in place but leave one - make lines unique across files

I have many folders and folders contain files. The same line might appear multiple times in single file and/or in multiple files. Files are not sorted. So there are some lines duplicated across ...
Ikrom's user avatar
  • 131
1 vote
1 answer
1k views

Append text file with command output, but replace the words that already exist (don't add the same text twice)

I am appending a command output to a text file. But if I do it again it will have the same text twice in the text file. Is there a way with for example sed that if a word already exist don't add a new ...
Danny's user avatar
  • 15
0 votes
1 answer
940 views

How to run Shredder Duplicate Finder (rmlint --gui) on Debian? ("Failed to load shredder: No module named 'shredder'")

I'd like to run the rmlint GUI (Shredder) on Debian10 but I get this error: Failed to load shredder: No module named 'shredder' This might be due to a corrupted install; try reinstalling.
mYnDstrEAm's user avatar
  • 4,480
2 votes
0 answers
274 views

Compare disks for missing files with differing directory structure

Apologies for the length of this post, I've tried to keep it short as possible. I'm looking for a tool / method which, given two paths, would show which files are not present in one of the paths. The ...
Pete's user avatar
  • 133
5 votes
0 answers
751 views

Backing up a deduplicated BTRFS filesystem

I have some long-term data in a BTRFS volume. I've been using btrfs-dedupe to deduplicate my data and I'm able to save a tremendous amount of disk space between filesystem compression and ...
Naftuli Kay's user avatar
  • 40.8k
7 votes
2 answers
3k views

Is there a block-level storage file system?

I'm looking for a file system that stores files by block content, therefore similar files would only take one block. This is for backup purposes. It is, similar to what block-level backup storage ...
MappaM's user avatar
  • 175
10 votes
1 answer
7k views

Is there a way to enable reflink on an existing XFS filesystem?

I currently have a 4TB RAID 1 setup on a small, personal Linux server, which is formatted as XFS in LVM. I am interested in enabling the reflink feature of XFS, but I did not do so when I first ...
TheSola10's user avatar
  • 221
4 votes
1 answer
1k views

How to copy multiple snapshots at once without duplicating data?

I have a live btrfs filesystem of 3.7TiB that's >90% full including old snapshots and a fresh 4TB backup harddisk. How to copy all existing snapshots to the backup harddisk? I tried # btrfs send ...
Daniel Böhmer's user avatar
2 votes
1 answer
68 views

Checking identical files in Linux and deleting according to location

I use fdupes to find and delete identical files. But I want to be able to say something like this ... find all the files that are duplicate in directory A or its subdirectories if there's a ...
interstar's user avatar
  • 1,097