Join files in linux

join Command in Linux

The join command in UNIX is a command line utility for joining lines of two files on a common field.

Suppose you have two files and there is a need to combine these two files in a way that the output makes even more sense.For example, there could be a file containing names and the other containing ID’s and the requirement is to combine both files in such a way that the names and corresponding ID’s appear in the same line. join command is the tool for it. join command is used to join the two files based on a key field present in both the files. The input file can be separated by white space or any delimiter.
Syntax:Example : Let us assume there are two files file1.txt and file2.txt and we want to combine the contents of these two files.

Now, in order to combine two files the files must have some common field. In this case, we have the numbering 1, 2. as the common field in both the files.

NOTE : When using join command, both the input files should be sorted on the KEY on which we are going to join the files.

So, the output contains the key followed by all the matching columns from the first file file1.txt, followed by all the columns of second file file2.txt.

Now, if we wanted to create a new file with the joined contents, we could use the following command:

Options for join command:

1. -a FILENUM : Also, print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2.
2. -e EMPTY : Replace missing input fields with EMPTY.
3. -i — -ignore-case : Ignore differences in case when comparing fields.
4. -j FIELD : Equivalent to «-1 FIELD -2 FIELD».
5. -o FORMAT : Obey FORMAT while constructing output line.
6. -t CHAR : Use CHAR as input and output field separator.
7. -v FILENUM : Like -a FILENUM, but suppress joined output lines.
8. -1 FIELD : Join on this FIELD of file 1.
9. -2 FIELD : Join on this FIELD of file 2.
10. — -check-order : Check that the input is correctly sorted, even if all input lines are pairable.
11. — -nocheck-order : Do not check that the input is correctly sorted.
12. — -help : Display a help message and exit.
13. — -version : Display version information and exit.

Using join with options
1. using -a FILENUM option : Now, sometimes it is possible that one of the files contain extra fields so what join command does in that case is that by default, it only prints pairable lines. For example, even if file file1.txt contains an extra field provided that the contents of file2.txt are same then the output produced by join command would be same:

What if such unpairable lines are important and must be visible after joining the files. In such cases we can use -a option with join command which will help in displaying such unpairable lines. This option requires the user to pass a file number so that the tool knows which file you are talking about.

2. using -v option : Now, in case you only want to print unpairable lines i.e suppress the paired lines in output then -v option is used with join command.
This option works exactly the way -a works(in terms of 1 used with -v in example below).

3. using -1, -2 and -j option : As we already know that join combines lines of files on a common field, which is first field by default.However, it is not necessary that the common key in the both files always be the first column.join command provides options if the common key is other than the first column.
Now, if you want the second field of either file or both the files to be the common field for join, you can do this by using the -1 and -2 command line options. The -1 and -2 here represents he first and second file and these options requires a numeric argument that refers to the joining field for the corresponding file. This will be easily understandable with the example below:

Читайте также:  Переменные среды windows 10 pip

So, this is how we can use different columns other than the first as the common field for joining.
In case, we have the position of common field same in both the files(other than first) then we can simply replace the part -1[field] -2[field] in the command with -j[field]. So, in the above case the command could be:4. using -i option : Now, other thing about join command is that by default, it is case sensitive. For example, consider the following examples:

Now, if you try joining these two files, using the default (first) common field, nothing will happen. That’s because the case of field elements in both files is different. To make join ignore this case issue, use the -i command line option.

5. using — -nocheck-order option : By default, the join command checks whether or not the supplied input is sorted, and reports if not. In order to remove this error/warning then we have to use — -nocheck-order command like:

6. using -t option : Most of the times, files contain some delimiter to separate the columns. Let us update the files with comma delimiter.

Now, -t option is the one we use to specify the delimiterin such cases.
Since comma is the delimiter we will specify it along with -t.

Источник

RootUsers

Guides, tutorials, reviews and news for System Administrators.

Linux How To: Join Two Files – Append One File To Another

With the Bash shell in Linux it is quite simple to append the contents of one file to another, here we will cover how to perform file concatenation.

In this example we have two files, file1 and file2. Both files contain unique contents, and we want to join them both together without overwriting any of the data.

This can be done quite simply in bash and other shells by using ‘>>’ to append the contents with the ‘cat’ command, as shown below.

First we’ll create our example files.

Now we will concatenate these files together, by adding file2 to the bottom of file1.

The first line above uses ‘>>’ to append the contents of file2 to the end of file1 without overwriting anything. The second line is simply used to output the contents of file1, showing that we have successfully appended the content of file2 to file1.

It’s important that ‘>>’ is used, as this appends the content. If instead ‘>’ was used, file1 would be deleted and replaced entirely with the contents of file2.

We can also add the contents of file1 and file2 into a completely new file, file3.

This way we don’t modify the original contents of either file1 or file2, and instead create a new file, file3, which contains the contents of both file1 and file2 joined together.

As file3 does not exist here, it is created. If the file specified does exist it will be created, however if the destination file does exist the contents are simply appended on to the end of that file.

Summary

As shown we can use the ‘cat’ command and the ‘>>’ operator to append one file to another without removing the original content.

Источник

Learn How To Join Files In Linux

Copyright of image used: geekz.co.uk

Introduction

There are several ways to join files in Linux. Some of the options are very smart and can merge only the differences. Some are just straight file joiners. Why would you have to use this? Think of the following scenarios: You have a file that you shared and someone has send you the file back with amendments. Now you need to join the two into one file. Or you have a really old file and started on a new version, and now you find the older version and want to join the two. Most of the commands discussed here are part of the Coreutils and should therefore be available in each distribution.

Whatever you end up using after reading this article, you will be able to join the files in Linux.

Man Pages

All of the commands below have a manual page. If you are new to the Linux command line, the manual pages explain what options there are for you to use. In most cases it also includes example uses. There are websites dedicated to just displaying these manual pages, however you can access most man pages directly in any shell. Just type “man COMMAND” (without double quotes).

If you want a manual that is more complete, then just the man page try this command: “info coreutils ‘nl invocation’” (without double quotes). This example produces the complete manual for the NL command. Read the INFO man page for more “man info”.

SDIFF

SDIFF is one of the smart commands to join a file. If you have two files that are the same but have slight differences, SDIFF can merge only the differences. SDIFF is a derivative from its original UNIX command DIFF. DIFF was developed in the early 70’s by AT&T Bell Labs. The final version of DIFF was released in 1974 and was part of the 5th edition of UNIX.

Читайте также:  Ограничение скорости интерфейса linux

sdiff [OPTION] FILE1 FILE2

The way you can use SDIFF is so diverse that to describe everything you can do with SDIFF would take up an article on its own.

For more information check out the SDIFF man page:

In a terminal type:

NL also a UNIX command was originally used for numbering lines. Unfortunately I couldn’t find any more history on the NL command. With NL we can just straight join two files it doesn’t compare anything it just joins the files. NL isn’t really meant for file joining but does do a decent job of it nonetheless.

nl FILE1 > FILE2

SORT is actually used for sorting through a list of files. To accomplish this, it uses keys to sort on. By default, SORT takes takes the entire input as a key. SORT sorts,merges and compares files; therefore we can also use it to join two files:

sort FILE1 > FILE2

Check the man page of SORT for all the options, man sort. Or check the info page for a full manual: “info coreutils ‘sort invocation’” (without double quotes).

PASTE

Despite popular beliefs, the UNIX PASTE command is meant to be used to join files. PASTE uses two options; -d which is used to add a delimiter, and -s which will append the data in serial instead of parallel (read horizontal instead of vertical). Using paste, we can merge two files into one third file:

paste -d ‘,’ FILE1 FILE2 > FILE3

The CAT command is to meant to be used to concatenate — or join — files. The abbreviation stands for catenate, which is a synonym for concatenate. CAT is often also used to display a file’s contents. If you search for definitions or uses of the CAT command, you often find the term Useless Use Of Cat(UUOC). This phrase was made popular by the comp.unix.shell group on Usenet. The phrase was coined because users of that newsgroup where of the opinion that using cat without concatenating was useless. This is also referred to as CAT abuse. It is thought that using cat in this way, according to the group, is a waste of time and a process. However you still see a lot of Linux tutorials (on several subjects) use it in this manner. You could therefore conclude that this use of CAT is now widely accepted.

Here is the example:

cat FILE1 » FILE2

The JOIN command is similar to the join you might know from talking SQL. It works much the same as join for relational databases. In addition to other options, you can use -t to add a delimiter. This is handy if you are joining a .csv file for instance. If you know the format of the files, you can also select a field to join by using the -1 field and -2 field option. It’s great if you need to quickly join two fields and you don’t want to join all fields in the file. Here is a simple example:

join [OPTION] FILE1 FILE2

JOIN also has more options so again check the man or info page by using the commands below:

Источник

Linux and Unix join command tutorial with examples

Tutorial on using join, a UNIX and Linux command to join lines of two files on a common field. Examples of joining two files, sorting before joining, specifying a field separator and specifying the output format.

Estimated reading time: 3 minutes

Table of contents

What is the join command in UNIX?

The join command in UNIX is a command line utility for joining lines of two files on a common field. It can be used to join two files by selecting fields within the line and joining the files on them. The result is written to standard output.

How to join two files

To join two files using the join command files must have identical join fields. The default join field is the first field delimited by blanks. For the following example there are two files foodtypes.txt and foods.txt .

These files share a join field as the first field and can be joined.

How to join two files on different fields

To join files using different fields the -1 and -2 options can be passed to join . In the following example there are two files wine.txt and reviews.txt .

These files can be joined by specifying the fields that should be used to join the files. Common to both files is the name of the wine. In wine.txt this is the second field. In reviews.txt this is the first field. The files can be joined using -1 and -2 by specifying these fields.

Читайте также:  Find package version linux

How to sort before joining

Join expects that files will be sorted before joining. For this example suppose there are two files from the previous example are not sorted.

Running join on these files results in an error becuase the files are not sorted.

The sort command can sort the files before passing to join.

How to specify a field separator for joining

To specify a field separator for joining using the join command use the -t option. An example is a CSV file where the separator is , . In the following example there are two files names.csv and deposits.csv .

Using the -t option the comma can set as the delimiter.

How to specify the output format

To specify the output format of join use the -o option. This allows the order of fields that will be shown in the output to be defined, or for only certain fields to be shown.

In the previous example the output we as follows.

To specify the order the list of fields are passed to -o . For this example this is -o 1.1,1.2,1.3,2.2,2.1 . This formats the output in the order desired.

Further reading

Have an update or suggestion for this article? You can edit it here and send me a pull request.

Recent Posts

About the author

George Ornbo is a UK based human.

He is interested in people, music, food and writing. In a previous version of himself he wrote books on technology.

Источник

Join Command in Unix/Linux Examples

Join command is one of the text processing utility in Unix/Linux. Join command is used to combine two files based on a matching fields in the files. If you know SQL, the join command is similar to joining two tables in a database.

The syntax of join command is

The join command options are

Unix Join Command Examples

1. Write a join command to join two files on the first field?

The basic usage of join command is to join two files on the first field. By default the join command matches the files on the first fields when we do not specify the field numbers explicitly. Let’s say we have two files emp.txt and dept.txt

Here we will join on the first field and see the output. By default, the join command treats the field delimiter as space or tab.

Important Note: Before joining the files, make sure to sort the fields on the joining fields. Otherwise you will get incorrect result.

2. Write a join command to join the two files? Here use the second field from the first file and the first field from the second file to join.

In this example, we will see how to join two files on different fields rather than the first field. For this consider the below two files as an example

From the above, you can see the join fields are the second field from the emp.txt and the first field from the dept.txt. The join command to match these two files is

You can also see that the two files can also be joined on the third filed. As the both the files have the matching join field, you can use the j option in the join command.

Here -1 2 specifies the second field from the first file (emp.txt) and -2 1 specifies the first field from the second file (dept.txt)

3. Write a join command to select the required fields from the input files in the output? Select first filed from first file and second field from second file in the output.

By default, the join command prints all the fields from both the files (except the join field is printed once). We can choose what fields to be printed on the terminal with the -o option. We will use the same files from the above example.

Here 1.1 means in the first file select the first field. Similarly, 2.2 means in the second file select the second field

4. Write a command to join two delimited files? Here the delimiter is colon (:)

So far we have joined files with space delimiter. Here we will see how to join files with a colon as delimiter. Consider the below two files.

The -t option is used to specify the delimiter. The join command for joining the files is

5. Write a command to ignore case when joining the files?

If the join fields are in different cases, then the join will not be performed properly. To ignore the case in join use the -i option.

6. Write a join command to print the lines which do not match the values in joining fields?

By default the join command prints only the matched lines from both the files which means prints the matched lines that passed the join condition. We can use the -a option to print the non-matched lines.

Источник

Оцените статью