Sort linux ��

Содержание

Linux sort command
Overview
Syntax
Options
Checking For Sorted Order
Sorting Multiple Files Using The Output Of find
Comparing Only Selected Fields Of Data
Using sort And join Together
Related commands
Команда sort в Linux с примерами
Сохранить результат в другом файле
Сортировка по номеру столбца
Проверьте отсортированное состояние файла
Отсортированные данные
Удалить повторяющиеся элементы
Сортировка с помощью конвейера в команде
Случайная сортировка
Сортировка данных из нескольких файлов
Сортировать с присоединением
Сравнить файлы с помощью сортировки
Заключение

Linux sort command

sort sorts the contents of a text file, line by line.

Overview

sort is a simple and very useful command which will rearrange the lines in a text file so that they are sorted, numerically and alphabetically. By default, the rules for sorting are:

Lines starting with a number will appear before lines starting with a letter.
Lines starting with a letter that appears earlier in the alphabet will appear before lines starting with a letter that appears later in the alphabet.
Lines starting with a lowercase letter will appear before lines starting with the same letter in uppercase.

The rules for sorting can be changed according to the options you provide to the sort command; these are listed below.

Syntax

Options

-b, —ignore-leading-blanks	Ignore leading blanks.
-d, —dictionary-order	Consider only blanks and alphanumeric characters.
-f, —ignore-case	Fold lower case to upper case characters.
-g, —general-numeric-sort	Compare according to general numerical value.
-i, —ignore-nonprinting	Consider only printable characters.
-M, —month-sort	Compare (unknown) Note

if you are using the join command in conjunction with sort, be aware that there is a known incompatibility between the two programs — unless you define the locale. If you are using join and sort to process the same input, it is highly recommended that you set LC_ALL to C, which will standardize the localization used by all programs.

Checking For Sorted Order

If you just want to check to see if your input file is already sorted, use the -c option:

If your data is unsorted, you will receive an informational message reporting the line number of the first unsorted data, and what the unsorted data is:

Sorting Multiple Files Using The Output Of find

One useful way to sort data is to sort the input of multiple files, using the output of the find command. The most reliable (and responsible) way to accomplish this is to specify that find produces a NUL-terminated file list as its output, and to pipe that output into sort using the —files0-from option.

Normally, find outputs one file on each line; in other words, it inserts a line break after each file name it outputs. For instance, let’s say we have three files named data1.txt, data2.txt, and data3.txt. find can generate a list of these files using the following command:

This command uses the question mark wildcard to match any file that has a single character after the word «data» in its name, ending in the extension «.txt«. It produces the following output:

It would be nice if we could use this output to tell the sort command, «sort the data in any files found by find as if they were all one big file.» The problem with the standard find output is, even though it’s easy for humans to read, it can cause problems for other programs that need to read it in. Because file names can include non-standard characters, so in some cases, this format will be read incorrectly by another program.

The correct way to format find‘s output to be used as a file list for another program is to use the -print0 option when running find. This terminates each file name with the NUL character (ASCII character number zero), which is universally illegal to use in file names. This makes things easier for the program reading the file list, since it knows that any time it sees the NUL character, it can be sure it’s at the end of a file name.

So, if we run the previous command with the -print0 option at the end, like this:

. it will produce the following output:

You can’t see it, but after each file name is a NUL character. This character is non-printable, so it will not appear on your screen, but it’s there, and any programs you pipe this output to (sort, for example) will see them.

Be careful how you word the find command. It’s important to specify -print0 last; find needs this to be specified after the other options.

Okay, but how do we tell sort to read this file list and sort the contents of all those files?

One way to do it is to pipe the find output to sort, specifying the —files0-from option in the sort command, and specify the file as a dash («—«), which will read from the standard input. Here’s what the command will look like:

. and it will output the sorted data of any files located by find which matches the pattern data?.txt, as if they were all one file. This example is a very powerful function of sort — give it a try.

Comparing Only Selected Fields Of Data

Normally, sort decides how to sort lines based on the entire line: it compares every character from the first character in a line, to the last one.

If, on the other hand, you want sort to compare a limited subset of your data, you can specify which fields to compare using the -k option.

For instance, if you have an input file data.txt With the following data:

. and you sort it without any options, like this:

. you will receive the following output:

. as you can see, nothing was changed from the original data ordering, because of the numbers at the beginning of the line — which were already sorted. However, if you want to sort based on the names, you can use the following command:

This command will sort the second field, and ignore the first. (The «k» in «-k» stands for «key» — we are defining the «sorting key» used in the comparison.)

Fields are defined as anything separated by whitespace; in this case, an actual space character. Our command above will produce the following output:

. which is sorted by the second field, listing the lines alphabetically by name, and ignoring the numbers in the sorting process.

You can also specify a more complex -k option. The complete positional argument looks like this:

. where POS1 is the starting field position, and POS2 is the ending field position. Each field position, in turn, is defined as:

. where F is the field number and C is the character within that field to begin the sort comparison.

So, let’s say our input file data.txt contains the following data:

. we can sort by seniority if we specify the third field as the sort key:

. this produces the following output:

Or, we can ignore the first three characters of the third field, and sort solely based on title, ignoring seniority:

We can also specify where in the line to stop comparing. If we sort based on only the third-through-fifth characters of the third field of each line, like this:

. sort will see only the same thing on every line: «.De» . and nothing else. As a result, sort will not see any differences in the lines, and the sorted output will be the same as the original file:

Using sort And join Together

sort can be especially useful when used in conjunction with the join command. Normally join will join the lines of any two files whose first field match. Let’s say you have two files, file1.txt and file2.txt. file1.txt contains the following text:

. and file2.txt contains the following:

If you’d like sort these two files and join them, you can do so all in one command if you’re using the bash command shell, like this:

Here, the sort commands in parentheses are each executed, and their output is redirected to join, which takes their output as standard input for its first and second arguments; it is joining the sorted contents of both files and gives results similar to the below results.

comm — Compare two sorted files line by line.
join — Join the lines of two files which share a common field of data.
uniq — Identify, and optionally filter out, repeated lines in a file.

Источник

Команда sort в Linux с примерами

Команда SORT в Linux используется для упорядочивания записей в определенном порядке в соответствии с используемой опцией. Это помогает в сортировке данных в файле построчно. Команда SORT имеет разные функции, которым она следует в результате команд. Во-первых, строки с номерами будут предшествовать буквенным строкам. Строки с строчными буквами будут отображаться раньше, чем строки с тем же символом в верхнем регистре.

Предпосылка

Вам необходимо установить Ubuntu на виртуальный ящик и настроить его. Пользователи должны быть созданы, чтобы иметь права доступа к приложениям.

Синтаксис

Пример

Это простой пример сортировки файла, имеющего данные об именах. Эти имена расположены не по порядку, и для того, чтобы упорядочить их, вам необходимо их отсортировать.

Итак, рассмотрим файл с именем file1.txt. Мы отобразим содержимое файла с помощью добавленной команды:

Теперь используйте команду для сортировки текста в файле:

Сохранить результат в другом файле

Используя команду сортировки, вы узнаете, что ее результат только отображается, но не сохраняется. Чтобы зафиксировать результат, нам нужно его сохранить. Для этого используется опция —o в команде сортировки.

Рассмотрим пример имени sample1.txt с названиями автомобилей. Мы хотим отсортировать их и сохранить полученные данные в отдельном файле. Во время выполнения создается файл с именем result.txt, и в нем сохраняется соответствующий вывод. Данные из sample1.txt передаются в результирующий файл, а затем с помощью —o соответствующие данные сортируются. Мы отобразили данные с помощью команды cat:

$ sort –o result.txt sample1.txt

Вывод показывает, что данные отсортированы и сохранены в другом файле.

Сортировка по номеру столбца

Сортировка выполняется не только по одному столбцу. Мы можем отсортировать один столбец из-за второго столбца. Приведем пример текстового файла, в котором есть имена и оценки студентов. Мы хотим расположить их в порядке возрастания. Поэтому мы будем использовать в команде ключевое слово —k. В то время как —n используется для числовой сортировки.

Поскольку есть два столбца, поэтому 2 используется с n.

Проверьте отсортированное состояние файла

Если вы не уверены, отсортирован данный файл или нет, удалите это сомнение с помощью команды, которая проясняет путаницу и отображает сообщение. Мы рассмотрим два основных примера:

Теперь рассмотрим несортированный файл с названиями овощей.

В команде будет использоваться ключевое слово —c. Это проверит, отсортированы ли данные в файле или нет. Если данные не отсортированы, то вывод будет отображать номер строки первого слова, в котором присутствует несортированность, а также слово.

Из приведенного вывода вы можете понять, что 3- е слово в файле было неуместным.

Отсортированные данные

В этом случае, когда данные уже организованы, больше ничего делать не нужно. Рассмотрим файл result.txt.

Из результата вы можете видеть, что не отображается сообщение, указывающее на то, что данные в соответствующем файле уже отсортированы.

Удалить повторяющиеся элементы

Вот самый полезный вариант. Это помогает удалить повторяющиеся слова в файле и упорядочить элемент файла. Он также поддерживает согласованность данных в файле.

Представьте, что имя файла file2.txt содержит имена субъектов, но одна тема повторяется несколько раз. Затем команда сортировки будет использовать ключевое слово —u для удаления дублирования и родства:

Теперь вы можете видеть, что повторяющиеся элементы удаляются из вывода и что данные также сортируются.

Сортировка с помощью конвейера в команде

Если мы хотим отсортировать данные файла, предоставив список каталога относительно размеров файлов, мы включим все соответствующие данные каталога. ’Ls’ используется в команде, и -l отобразит его. Pipe поможет в упорядоченном отображении файлов.

Случайная сортировка

Иногда, выполняя какую-либо функцию, можно нарушить аранжировку. Если вы хотите расположить данные в любой последовательности и если нет критериев для сортировки, предпочтительнее случайная сортировка. Рассмотрим файл с именем sample3.txt, содержащий названия континентов.

Соответствующие выходные данные показывают, что файл отсортирован, а элементы расположены в другом порядке.

Сортировка данных из нескольких файлов

Одна из самых полезных команд сортировки — это одновременная сортировка данных из разных файлов. Это можно сделать с помощью команды find. Выходные данные команды find будут действовать как входные данные для команды после канала, который является командой сортировки. Ключевое слово Find используется для выдачи только одного файла в каждой строке, или мы можем сказать, что оно использует разрыв после каждого слова.

Например, давайте рассмотрим три файла с именами sample1.txt, sample2.txt и sample3.txt. Здесь «?» представляет собой любое число, за которым следует слово «образец». Find извлечет все три файла, и их данные будут отсортированы с помощью команды sort с инициативой pipe:

Выходные данные показывают, что данные всех файлов серии sample.txt отображаются и упорядочены в алфавитном порядке.

Сортировать с присоединением

Теперь мы представляем пример, который сильно отличается от тех, которые обсуждались ранее в этом руководстве. В дополнение к сортировке мы использовали join. Этот процесс выполняется таким образом, что оба файла сначала сортируются, а затем объединяются с помощью ключевого слова join.

Рассмотрим два файла, которые вы хотите объединить.

Теперь используйте приведенный ниже запрос, чтобы применить данную концепцию:

Из вывода видно, что данные обоих файлов объединены в отсортированном виде.

Сравнить файлы с помощью сортировки

Мы также можем принять концепцию сравнения двух файлов. Техника такая же, как и для стыковки. Сначала сортируются два файла, а затем данные в них сравниваются.

Рассмотрим те же два файла, что и в предыдущем примере. Sample2.txt и sample3.txt:

Данные сортируются и упорядочиваются поочередно. Начальная строка файла sample2.txt записывается рядом с первой строкой файла sample3.txt.

Заключение

В этой статье мы рассказали об основных функциях и параметрах команды сортировки. Команда сортировки Linux очень полезна для обслуживания данных и фильтрации всех бесполезных элементов из файлов.