Linux compare two lists

Bash compare two lists find missing items

Below are two example file listings. I need to compare files of lists (of files) — the last «X» characters of each record to the right of the last «/».

If the file NAME is not found I need the entire row sent to a third file as output.

These are file listings, might be three files in the second listing, two thousand in the first.
FIRST:
1 /home/dev/share/Datafiles/cases.dbf
2 /home/dev/share/Datafiles/cells.csv
3 /home/dev/share/Datafiles/clusters.db
4 /home/dev/share/Datafiles/competition.csv
5 /home/dev/share/Datafiles/coplot.csv
6 /home/dev/share/Datafiles/daphnia.csv
7 /home/dev/share/Datafiles/das.txt
8 /home/dev/share/Datafiles/deaths.sas7bdat
9 /home/dev/share/Datafiles/decay.csv
10 /home/dev/share/Datafiles/example.db
11 /home/dev/share/Datafiles/fertyield.lst
12 /home/dev/share/Datafiles/fisher.csv

TWO:
1 /test/kitchen/cooks/transfer/cases.dbf
2 /test/kitchen/cooks/transfer/cells.csv
3 /test/kitchen/cooks/transfer/clusters.db
4 /test/kitchen/cooks/transfer/coplot.csv
5 /test/kitchen/cooks/transfer/das.txt
6 /test/kitchen/cooks/transfer/deaths.sas7bdat
7 /test/kitchen/cooks/transfer/decay.csv
8 /test/kitchen/cooks/transfer/example.db
9 /test/kitchen/cooks/transfer/fertyield.lst
10 /test/kitchen/cooks/transfer/fisher.csv

Two files not found in listing TWO that exist in listing ONE : «Competition.csv» (#4) and «daphinia.csv» (#6).

Sorting the files does not work, file paths can be very short or very long and multiple copies of files can be found in muliple directories.

Comm/diff/cmp produced unsatisfactory results as I’m only looking for the last ‘X» number of characters (based on file name, extension) in the RIGHT of each row.
(In Microsfot EXCEL I would simply extract everything to the right of the last «/», row-by-row, save it to a another list and VLOOKUP that list with the first list.)

But this is not a Microsoft installation.

A script to awk in the contents of list (file) two, and search through list (file) one, output not matching to file three?

Also parsing out the directory names with sed and leaving only two lists of file names has been difficult — don’t know what paths I’d be replacing as they would differ every time. I played around with cut, but the start of the file name could be anywhere from column 10 to column 150. My intuition is there has to be a way to isolate all characters to the right of that last «/» in the file path.

Источник

How to compare 2 lists of ranges in bash?

Using bash script (Ubuntu 16.04), I’m trying to compare 2 lists of ranges: does any number in any of the ranges in file1 coincide with any number in any of the ranges in file2? If so, print the row in the second file. Here I have each range as 2 tab-delimited columns (in file1, row 1 represents the range 1-4, i.e. 1, 2, 3, 4). The real files are quite big.

Читайте также:  Linux bash true false

My best attempt has been:

This returns an empty file.

I’m thinking that the script will need to involve range comparisons using if-then conditions and iterate through each line in both files. I’ve found examples of each concept, but can’t figure out how to combine them.

Any help appreciated!

6 Answers 6

It depends on how big your files are, of course. If they are not big enough to exhaust the memory, you can try this 100% bash solution:

This is just a starting point. There are many possible performance / memory footprint improvements. But they strongly depend on the sizes of your files and on the distributions of your ranges.

EDIT 1: improved the range overlap test.

EDIT 2: reused the excellent optimization proposed by RomanPerekhrest (unset already printed ranges from file2 ). The performance should be better when the probability that ranges overlap is high.

EDIT 3: performance comparison with the awk version proposed by RomanPerekhrest (after fixing the initial small bugs): awk is between 10 and 20 times faster than bash on this problem. If performance is important and you hesitate between awk and bash , prefer:

Источник

Fastest way to tell if two files have the same contents in Unix/Linux?

I have a shell script in which I need to check whether two files contain the same data or not. I do this a for a lot of files, and in my script the diff command seems to be the performance bottleneck.

Could there be a faster way to compare the files, maybe a custom algorithm instead of the default diff ?

8 Answers 8

I believe cmp will stop at the first byte difference:

I like @Alex Howansky have used ‘cmp —silent’ for this. But I need both positive and negative response so I use:

I can then run this in the terminal or with a ssh to check files against a constant file.

To quickly and safely compare any two files:

It’s readable, efficient, and works for any file names including «` $()

Because I suck and don’t have enough reputation points I can’t add this tidbit in as a comment.

But, if you are going to use the cmp command (and don’t need/want to be verbose) you can just grab the exit status. Per the cmp man page:

If a FILE is ‘-‘ or missing, read standard input. Exit status is 0 if inputs are the same, 1 if different, 2 if trouble.

So, you could do something like:

EDIT: Thanks for the comments everyone! I updated the test syntax here. However, I would suggest you use Vasili’s answer if you are looking for something similar to this answer in readability, style, and syntax.

Читайте также:  Как сделать экран приветствия как у windows 10

For files that are not different, any method will require having read both files entirely, even if the read was in the past.

There is no alternative. So creating hashes or checksums at some point in time requires reading the whole file. Big files take time.

File metadata retrieval is much faster than reading a large file.

So, is there any file metadata you can use to establish that the files are different? File size ? or even results of the file command which does just read a small portion of the file?

File size example code fragment:

If the files are the same size then you are stuck with full file reads.

Источник

Comparing the contents of two directories

I have two directories that should contain the same files and have the same directory structure.

I think that something is missing in one of these directories.

Using the bash shell, is there a way to compare my directories and see if one of them is missing files that are present in the other?

14 Answers 14

You can use the diff command just as you would use it for files:

If you want to see subfolders and -files too, you can use the -r option:

A good way to do this comparison is to use find with md5sum , then a diff .

Example

Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:

Do the same procedure to the another directory:

Then compare the result two files with diff :

Or as a single command using process substitution:

If you want to see only the changes:

The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.

But you won’t know which file changed.

For that, you can try something like

This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.

Another good way to do the job is using Git’s diff command (may cause problems when files has different permissions -> every file is listed in output then):

Источник

How to Find Difference Between Two Directories Using Diff and Meld Tools

In an earlier article, we reviewed 9 best file comparison and difference (Diff) tools for Linux and in this article, we will describe how to find the difference between two directories in Linux.

Читайте также:  Linux and free command

Normally, to compare two files in Linux, we use the diff – a simple and original Unix command-line tool that shows you the difference between two computer files; compares files line by line and it is easy to use, comes with pre-installed on most if not all Linux distributions.

The question is how do we get the difference between two directories in Linux? Here, we want to know what files/subdirectories are common in the two directories, those that are present in one directory but not in the other.

The conventional syntax for running diff is as follows:

By default, its output is ordered alphabetically by file/subdirectory name as shown in the screenshot below. In this command, the -q switch tells diff to report only when files differ.

Difference Between Two Directories

Again diff doesn’t go into the subdirectories, but we can use the -r switch to read the subdirectories as well like this.

Using Meld Visual Diff and Merge Tool

There is a cool graphical option called meld (a visual diff and merge tool for the GNOME Desktop) for those who enjoy using the mouse, you can install it as follows.

Once you have installed it, search for “meld” in the Ubuntu Dash or Linux Mint Menu, in Activities Overview in Fedora or CentOS desktop and launch it.

You will see the Meld interface below, where you can choose file or directory comparison as well as version control view. Click on directory comparison and move to the next interface.

Meld Comparison Tool

Select the directories you want to compare, note that you can add a third directory by checking the option “3-way Comparison”.

Select Comparison Directories

Once you selected the directories, click on “Compare”.

Listing Difference Between Directories

In this article, we described how to find the difference between two directories in Linux. If you know any other commandline or gui way don’t forget to share your thoughts to this article via the comment section below.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

We are thankful for your never ending support.

Источник

Оцените статью