- How to Find Duplicate Files in Linux and Remove Them
- FSlint: GUI tool to find and remove duplicate files
- FDUPES: CLI tool to find and remove duplicate files
- Installation on Debian / Ubuntu
- Installation on Fedora
- Final Words
- About Ambarish Kumar
- 4 Useful Tools to Find and Delete Duplicate Files in Linux
- 1. Rdfind – Finds Duplicate Files in Linux
- 2. Fdupes – Scan for Duplicate Files in Linux
- 3. dupeGuru – Find Duplicate Files in a Linux
- 4. FSlint – Duplicate File Finder for Linux
- If You Appreciate What We Do Here On TecMint, You Should Consider:
How to Find Duplicate Files in Linux and Remove Them
Last updated August 31, 2017 By Ambarish Kumar 34 Comments
Brief: FSlint is a great GUI tool to find duplicate files in Linux and remove them. FDUPES also find the files with same name in Linux but in the command line way.
If you have this habit of downloading everything from the web like me, you will end up having multiple duplicate files. Most often, I can find the same songs or a bunch of images in different directories or end up backing up some files at two different places. It’s a pain locating these duplicate files manually and deleting them to recover the disk space.
If you want to save yourself from this pain, there are various Linux applications that will help you in locating these duplicate files and removing them. In this article, we will cover how you can find and remove these files in Ubuntu.
Note: You should know what you are doing. If you are using a new tool, it’s always better to try it in a virtual directory structure to figure out what it does before taking it to root or home folder. Also, it’s always better to backup your Linux system!
FSlint: GUI tool to find and remove duplicate files
FSlint helps you search and remove duplicate files, empty directories or files with incorrect names. It has a command-line as well as GUI mode with a set of tools to perform a variety of tasks.
To install FSlint, type the below command in Terminal.
Open FSlint from the Dash search.
FSlint includes a number of options to choose from. There are options to find duplicate files, installed packages, bad names, name clashes, temp files, empty directories etc. Choose the Search Path and the task which you want to perform from the left panel and click on Find to locate the files. Once done, you can select the files you want to remove and Delete it.
You can click on any file directory from the search result to open it if you are not sure and want to double check it before deleting it.
You can select Advanced search parameters where you can define rules to exclude certain file types or exclude directories which you don’t want to search.
FDUPES: CLI tool to find and remove duplicate files
FDUPES is a command line utility to find and remove duplicate files in Linux. It can list out the duplicate files in a particular folder or recursively within a folder. It asks which file to preserve before deletion and the noprompt option lets you delete all the duplicate files keeping the first one without asking you.
Installation on Debian / Ubuntu
Installation on Fedora
Once installed, you can search duplicate files using the below command:
For recursively searching within a folder, use -r option
This will only list the duplicate files and do not delete them by itself. You can manually delete the duplicate files or use -d option to delete them.
This won’t delete anything on its own but will display all the duplicate files and gives you an option to either delete files one by one or select a range to delete it. If you want to delete all files without asking and preserving the first one, you can use the noprompt -N option.
FDUPES: finding and removing duplicate files
In the above screenshot, you can see the -d command showing all the duplicate files within the folder and asking you to select the file which you want to preserve.
Final Words
There are many other ways and tools to find and delete duplicate files in Linux. Personally, I prefer the FDUPES command line tool; it’s simple and takes no resources.
How do you deal with the finding and removing duplicate files in your Linux system? Do tell us in the comment section.
Like what you read? Please share it with others.
Filed Under: Tutorial Tagged With: How To
About Ambarish Kumar
DevOps Engineer by profession, believes in «Human Knowledge belongs to the world»!
Have a look at the tool rmlint, it is such a gem!
I have tried rdfind and fdupes extensively. Although I found them to be useful and robust, they were limited by considering files as being duplicates based mainly on their content. Without possible consideration of the name of the file. Then I discovered rmlint recently, and it offers this as an optional argument. And so much more! Very flexible and user-friendly. Different output formats, even duplicate folders … Just what I needed. Plus GUI.
I have separate hard drives that have many of the same songs. I would like to eliminate from one drive the songs that exist on both w/o doing it manually. Can rmlint identify the dupes between drives and eliminate them from only one drive?
From my experience thus far, this should definitely work. So that on drive_E the duplicate songs are eliminated, and on drive_K they are kept, be sure to specify them on the command line like …”Drive_E // Drive_K …” where the drive after the double slash signifies the preferred/tagged ‘original’ having priority.
For example, a composition of a terminal/bash command which I found useful:
rmlint –type=”f” –addoutput=csv:rmlint.csv MapOnDrive_E // MapOnDrive_K | tee output.txt
where:
–type=”f” specifies files only (not other ‘lint’ like broken symlinks, empty files, (empty) dirs etc.);
–addoutput=csv:rmlint.csv produces an extra .csv (spreadsheet) file named rmlint.csv with the names of the candidates to be removed. This is optional, but I like it;
| tee output.txt generates an extra .txt output file, in addition to the screen output. Also optional, and handy;
Perhaps superfluous to say: don’t worry that this command will delete anything, it does only a “dry-run” providing you with all the means to run the “real thing” next.
Try it out first with a small sample set, eg with both intended directories on another drive, in order to finetune things for your particular use-case. And glance through the options, given by the manpage for rmlint (man rmlint). It is also possible to filter/search for certain filenames/extensions.
The same should work through the GUI version of rmlint, but I have not yet used that.
What changes in the command string you provided that makes it work for real?
Not so easy to answer, as your configuration undoubtedly differs from mine. And this comment box is not that suited to code nor screen-shots. I noticed that the double dashes of the rmlint option ‘–types=’ had been replaced by a single dash, after submitting my reply! Therefore it remains essential to consult the man page, tutorial, and have some confidence with the command line in order to experiment a little.
More to the point: I realize I made a bad typo in the hurry … I wrongly listed the main option as ‘–type=”f”‘, but that should have been ‘–types=”df”‘. Sorry.
In order to simulate your particular scenario, I have used two USB sticks, named ‘USB_E’ (eliminate) and ‘USB_K’ (keep). On each I created a directory named ‘Songs’. In ‘USB_E/Songs’ I placed a set of 4 audio files: Song1.wav, Song2.wav, Song1.mp3, Song2.mp3. In ‘USB_K/Songs’ I placed an overlapping set of 4 audio files Song2.wav, Song3.wav, Song2.mp3, Song3.mp3. Thus, the dupes between these folders are: Song2.wav and Song2.mp3.
For this scenario, my basic commands to eliminate the dupes from the external ‘USB_E’, while keeping them on the external ‘USB_K’ are:
$ cd /home/paul/Tmp [enter]
$ rmlint –types=”df” “/media/paul/USB_E/Songs” // “/media/paul/USB_K/Songs” [enter]
Note the double dashes before ‘types=’, and the use of quotation marks around the paths should these contain spaces. “/media/paul” Is where my external drives are attached, as indicated in the file manager.
After running these commands, rmlint will show in its screen output exactly which files are the duplicates, and which of these it intends to keep, and which it intends to remove. Then, in order to actually carry out the job, run the executable script ‘rmlint.sh’ which rmlint has generated in the current working directory. Run this script as follows:
$ ./rmlint.sh [enter]
The script gives some brief info first, you can still easily abort at this stage. In order to proceed, just type any string key (e.g. a single ‘c’) followed by [enter] at the keyboard. After execution, the script will be removed automatically.
Checking the outcome, ‘USB_E/Songs’ now contains only Song1.wav and Song1.mp3, whereas ‘USB_K/Songs’ still contains Song2.wav, Song3.wav, Song2.mp3, Song3.mp3.
That’s it. It sounds more complicated than it is, really.
rmlint Has a lot of options for finetuning, most of which I have not explored myself. Hope this will now work in your case. Good luck!
I was going to go with rdfind, but your recommendation to rmlint is a wonderful alternative, I found it very useful, very well documented for what I required and easily available on the package manager.
Thanks!
Nice to read. To be fair, I find rdfind also very practical, and slightly prefer it over fdupes, but for some extra options/flexibility rmlint is hard to beat.
And I realized that often simple deduplication can be achieved using a GUI file comparison program (like Meld or Double Commander) by manually selecting the duplicates and then click Delete.
Yeah, I know, but in my case, where I just assumed I had good manual control over my repeated files and they’re a ton, rmlint was a very good one, that separation about letting me know what’s repeated and also building the script just in case I want to use it is a new perspective that I liked a lot for the bulky part.
Hi from Tübingen,S.Germany…………
When i try to install FSLINT :-
sudo apt-get install fslint
Reading package lists… Done
Building dependency tree
Reading state information… Done
Package fslint is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package ‘fslint’ has no installation candidate
Источник
4 Useful Tools to Find and Delete Duplicate Files in Linux
Organizing your home directory or even system can be particularly hard if you have the habit of downloading all kinds of stuff from the internet.
Often you may find you have downloaded the same mp3, pdf, epub (and all kind of other file extensions) and copied it to different directories. This may cause your directories to become cluttered with all kinds of useless duplicated stuff.
In this tutorial, you are going to learn how to find and delete duplicate files in Linux using rdfind and fdupes command-line tools, as well as using GUI tools called DupeGuru and FSlint.
A note of caution – always be careful what you delete on your system as this may lead to unwanted data loss. If you are using a new tool, first try it in a test directory where deleting files will not be a problem.
1. Rdfind – Finds Duplicate Files in Linux
Rdfind comes from redundant data find. It is a free tool used to find duplicate files across or within multiple directories. It uses checksum and finds duplicates based on file contains not only names.
Rdfind uses an algorithm to classify the files and detects which of the duplicates is the original file and considers the rest as duplicates. The rules of ranking are:
- If A was found while scanning an input argument earlier than B, A is higher ranked.
- If A was found at a depth lower than B, A is higher ranked.
- If A was found earlier than B, A is higher ranked.
The last rule is used particularly when two files are found in the same directory.
To install rdfind in Linux, use the following command as per your Linux distribution.
To run rdfind on a directory simply type rdfind and the target directory. Here is an example:
Find Duplicate Files in Linux
As you can see rdfind will save the results in a file called results.txt located in the same directory from where you ran the program. The file contains all the duplicate files that rdfind has found. You can review the file and remove the duplicate files manually if you want to.
Another thing you can do is to use the -dryrun an option that will provide a list of duplicates without taking any actions:
When you find the duplicates, you can choose to replace them with hard links.
And if you wish to delete the duplicates you can run.
To check other useful options of rdfind you can use the rdfind manual with.
2. Fdupes – Scan for Duplicate Files in Linux
Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:
- Comparing partial md5sum signatures
- Comparing full md5sum signatures
- byte-by-byte comparison verification
Just like rdfind it has similar options:
- Search recursively
- Exclude empty files
- Shows size of duplicate files
- Delete duplicates immediately
- Exclude files with a different owner
To install fdupes in Linux, use the following command as per your Linux distribution.
Fdupes syntax is similar to rdfind. Simply type the command followed by the directory you wish to scan.
To search files recursively, you will have to specify the -r an option like this.
You can also specify multiple directories and specify a dir to be searched recursively.
To have fdupes calculate the size of the duplicate files use the -S option.
To gather summarized information about the found files use the -m option.
Scan Duplicate Files in Linux
Finally, if you want to delete all duplicates use the -d an option like this.
Fdupes will ask which of the found files to delete. You will need to enter the file number:
Delete Duplicate Files in Linux
A solution that is definitely not recommended is to use the -N option which will result in preserving the first file only.
To get a list of available options to use with fdupes review the help page by running.
3. dupeGuru – Find Duplicate Files in a Linux
dupeGuru is an open-source and cross-platform tool that can be used to find duplicate files in a Linux system. The tool can either scan filenames or content in one or more folders. It also allows you to find the filename that is similar to the files you are searching for.
dupeGuru comes in different versions for Windows, Mac, and Linux platforms. Its quick fuzzy matching algorithm feature helps you to find duplicate files within a minute. It is customizable, you can pull the exact duplicate files you want to, and Wipeout unwanted files from the system.
To install dupeGuru in Linux, use the following command as per your Linux distribution.
DupeGuru – Find Duplicate Files in Linux
4. FSlint – Duplicate File Finder for Linux
FSlint is a free utility that is used to find and clean various forms of lint on a filesystem. It also reports duplicate files, empty directories, temporary files, duplicate/conflicting (binary) names, bad symbolic links and many more. It has both command-line and GUI modes.
To install FSlint in Linux, use the following command as per your Linux distribution.
FSlint – Duplicate File Finder for -Linux
Conclusion
These are the very useful tools to find duplicated files on your Linux system, but you should be very careful when deleting such files.
If you are unsure if you need a file or not, it would be better to create a backup of that file and remember its directory prior to deleting it. If you have any questions or comments, please submit them in the comment section below.
If You Appreciate What We Do Here On TecMint, You Should Consider:
TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.
If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.
We are thankful for your never ending support.
Источник