Finding duplicate files linux

Содержание

4 Useful Tools to Find and Delete Duplicate Files in Linux
1. Rdfind – Finds Duplicate Files in Linux
2. Fdupes – Scan for Duplicate Files in Linux
3. dupeGuru – Find Duplicate Files in a Linux
4. FSlint – Duplicate File Finder for Linux
If You Appreciate What We Do Here On TecMint, You Should Consider:
fdupes – A Command Line Tool to Find and Delete Duplicate Files in Linux
What is fdupes?
Install fdupes on a Linux
How to use fdupes command?
If You Appreciate What We Do Here On TecMint, You Should Consider:
Поиск дубликатов файлов в Linux
Поиск дубликатов файлов в Linux

4 Useful Tools to Find and Delete Duplicate Files in Linux

Organizing your home directory or even system can be particularly hard if you have the habit of downloading all kinds of stuff from the internet.

Often you may find you have downloaded the same mp3, pdf, epub (and all kind of other file extensions) and copied it to different directories. This may cause your directories to become cluttered with all kinds of useless duplicated stuff.

In this tutorial, you are going to learn how to find and delete duplicate files in Linux using rdfind and fdupes command-line tools, as well as using GUI tools called DupeGuru and FSlint.

A note of caution – always be careful what you delete on your system as this may lead to unwanted data loss. If you are using a new tool, first try it in a test directory where deleting files will not be a problem.

1. Rdfind – Finds Duplicate Files in Linux

Rdfind comes from redundant data find. It is a free tool used to find duplicate files across or within multiple directories. It uses checksum and finds duplicates based on file contains not only names.

Rdfind uses an algorithm to classify the files and detects which of the duplicates is the original file and considers the rest as duplicates. The rules of ranking are:

If A was found while scanning an input argument earlier than B, A is higher ranked.
If A was found at a depth lower than B, A is higher ranked.
If A was found earlier than B, A is higher ranked.

The last rule is used particularly when two files are found in the same directory.

To install rdfind in Linux, use the following command as per your Linux distribution.

To run rdfind on a directory simply type rdfind and the target directory. Here is an example:

Find Duplicate Files in Linux

As you can see rdfind will save the results in a file called results.txt located in the same directory from where you ran the program. The file contains all the duplicate files that rdfind has found. You can review the file and remove the duplicate files manually if you want to.

Another thing you can do is to use the -dryrun an option that will provide a list of duplicates without taking any actions:

When you find the duplicates, you can choose to replace them with hard links.

And if you wish to delete the duplicates you can run.

To check other useful options of rdfind you can use the rdfind manual with.

2. Fdupes – Scan for Duplicate Files in Linux

Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:

Comparing partial md5sum signatures
Comparing full md5sum signatures
byte-by-byte comparison verification

Just like rdfind it has similar options:

Search recursively
Exclude empty files
Shows size of duplicate files
Delete duplicates immediately
Exclude files with a different owner

To install fdupes in Linux, use the following command as per your Linux distribution.

Fdupes syntax is similar to rdfind. Simply type the command followed by the directory you wish to scan.

To search files recursively, you will have to specify the -r an option like this.

You can also specify multiple directories and specify a dir to be searched recursively.

To have fdupes calculate the size of the duplicate files use the -S option.

To gather summarized information about the found files use the -m option.

Scan Duplicate Files in Linux

Finally, if you want to delete all duplicates use the -d an option like this.

Fdupes will ask which of the found files to delete. You will need to enter the file number:

Delete Duplicate Files in Linux

A solution that is definitely not recommended is to use the -N option which will result in preserving the first file only.

To get a list of available options to use with fdupes review the help page by running.

3. dupeGuru – Find Duplicate Files in a Linux

dupeGuru is an open-source and cross-platform tool that can be used to find duplicate files in a Linux system. The tool can either scan filenames or content in one or more folders. It also allows you to find the filename that is similar to the files you are searching for.

dupeGuru comes in different versions for Windows, Mac, and Linux platforms. Its quick fuzzy matching algorithm feature helps you to find duplicate files within a minute. It is customizable, you can pull the exact duplicate files you want to, and Wipeout unwanted files from the system.

To install dupeGuru in Linux, use the following command as per your Linux distribution.

DupeGuru – Find Duplicate Files in Linux

4. FSlint – Duplicate File Finder for Linux

FSlint is a free utility that is used to find and clean various forms of lint on a filesystem. It also reports duplicate files, empty directories, temporary files, duplicate/conflicting (binary) names, bad symbolic links and many more. It has both command-line and GUI modes.

To install FSlint in Linux, use the following command as per your Linux distribution.

FSlint – Duplicate File Finder for -Linux

Conclusion

These are the very useful tools to find duplicated files on your Linux system, but you should be very careful when deleting such files.

If you are unsure if you need a file or not, it would be better to create a backup of that file and remember its directory prior to deleting it. If you have any questions or comments, please submit them in the comment section below.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

We are thankful for your never ending support.

Источник

fdupes – A Command Line Tool to Find and Delete Duplicate Files in Linux

It is a common requirement to find and replace duplicate files for most of the computer users. Finding and removing duplicate files is a tiresome job that demands time and patience. Finding duplicate files can be very easy if your machine is powered by GNU/Linux, thanks to ‘fdupes‘ utility.

Fdupes – Find and Delete Duplicate Files in Linux

What is fdupes?

Fdupes is a Linux utility written by Adrian Lopez in C programming Language released under MIT License. The application is able to find duplicate files in the given set of directories and sub-directories. Fdupes recognize duplicates by comparing MD5 signature of files followed by a byte-to-byte comparison. A lots of options can be passed with Fdupes to list, delete and replace the files with hardlinks to duplicates.

The comparison starts in the order:

size comparison > Partial MD5 Signature Comparison > Full MD5 Signature Comparison > Byte-to-Byte Comparison.

Install fdupes on a Linux

Installation of latest version of fdupes (fdupes version 1.51) as easy as running following command on Debian based systems such as Ubuntu and Linux Mint.

On CentOS/RHEL and Fedora based systems, you need to turn on epel repository to install fdupes package.

Note: The default package manager yum is replaced by dnf from Fedora 22 onwards…

How to use fdupes command?

1. For demonstration purpose, let’s a create few duplicate files under a directory (say tecmint) simply as:

After running above command, let’s verify the duplicates files are created or not using ls command.

The above script create 15 files namely tecmint1.txt, tecmint2.txt…tecmint15.txt and every files contains the same data i.e.,

2. Now search for duplicate files within the folder tecmint.

3. Search for duplicates recursively under every directory including it’s sub-directories using the -r option.

It search across all the files and folder recursively, depending upon the number of files and folders it will take some time to scan duplicates. In that mean time, you will be presented with the total progress in terminal, something like this.

4. See the size of duplicates found within a folder using the -S option.

5. You can see the size of duplicate files for every directory and subdirectories encountered within using the -S and -r options at the same time, as:

6. Other than searching in one folder or all the folders recursively, you may choose to choose in two folders or three folders as required. Not to mention you can use option -S and/or -r if required.

7. To delete the duplicate files while preserving a copy you can use the option ‘-d’. Extra care should be taken while using this option else you might end up loosing necessary files/data and mind it the process is unrecoverable.

You may notice that all the duplicates are listed and you are prompted to delete, either one by one or certain range or all in one go. You may select a range something like below to delete files files of specific range.

8. From safety point of view, you may like to print the output of ‘fdupes’ to file and then check text file to decide what file to delete. This decrease chances of getting your file deleted accidentally. You may do:

Note: You may replace ‘/home’ with the your desired folder. Also use option ‘-r’ and ‘-S’ if you want to search recursively and Print Size, respectively.

9. You may omit the first file from each set of matches by using option ‘-f’.

First List files of the directory.

and then omit the first file from each set of matches.

10. Check installed version of fdupes.

11. If you need any help on fdupes you may use switch ‘-h’.

That’s for all now. Let me know how you were finding and deleting duplicates files till now in Linux? and also tell me your opinion about this utility. Put your valuable feedback in the comment section below and don’t forget to like/share us and help us get spread.

I am working on another utility called fslint to remove duplicate files, will soon post and you people will love to read.

If You Appreciate What We Do Here On TecMint, You Should Consider:

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

We are thankful for your never ending support.

Источник

Поиск дубликатов файлов в Linux

Дубликаты файлов могут появляться при сохранении резервных копий на диск, одновременном редактировании нескольких версий одного и того же файла или при изменении структуры каталогов. Одни и те же файлы могут быть сохранены несколько раз с различными именами или в разных папках и только засоряют дисковое пространство.

Охота на них каждый раз может стать большой проблемой. Но к счастью существует маленькая утилита которая может сберечь ваше время потраченное на поиск и уничтожение дубликатов файлов на компьютере — FSLint. Она написана на Python. Время навести порядок и удалить старые файлы.

Поиск дубликатов файлов в Linux

Вы можете установить утилиту из официальных репозиториев большинства дистрибутивов Linux. Давайте рассмотрим пример для Ubuntu. Сначала обновите списки пакетов:

sudo apt update

Затем установите утилиту:

sudo apt install fslint

После завершения установки вы можете запустить утилиту из главного меню:

В главном окне программы можно выбрать различные варианты поиска неисправностей файловой системы. По умолчанию выбран Поиск дубликатов, ещё вам предстоит настроить папки, в которых будет выполнятся поиск, по умолчанию добавлена только домашняя папка:

После того как вы выберите каталоги, запустите поиск дубликатов Linux. Для этого надо нажать кнопку Поиск. Утилита сразу же начнёт выводить обнаруженные дубликаты файлов:

Когда поиск завершится вы сможете удалить файлы, которые вам не нужны, для этого выделите их мышью и нажмите кнопку Удалить. Программа спросит подтверждения действия и удалит файл:

Также вы можете объединить файлы дубликаты с помощью жесткой ссылки. По нажатию кнопки Объединить, утилита объединяет все файлы кроме выделенных. Кроме того, утилита позволяет искать несовместимые имена файлов, временные файлы, плохие ссылки, пустые директории и многое другое. Поэкспериментируйте с ней если будет желание.

Источник