Linux sed regexp example

To know how to use sed , people should understand regular expressions ( for short). A regular expression is a pattern that is matched against a subject string from left to right. Most characters are : they stand for themselves in a pattern, and match the corresponding characters in the subject. As a trivial example, the pattern

In most scripts, pattern space is initialized to the content of each line (see How sed works). So, it is a useful simplification to think of ^#include as matching only lines where ‘ #include ’ is the first thing on line—if there are spaces before, for example, the match fails. This simplification is valid as long as the original content of pattern space is not modified, for example with an s command.

A leading ^ reverses the meaning of list , so that it matches any single character not in list . To include ] in the list, make it the first character (after the ^ if needed), to include — in the list, make it the first or last; to include ^ put it after the first character.

The characters $ , * , . , [ , and \ are normally not special within list . For example, [\*] matches either ‘ \ ’ or ‘ * ’, because the \ is not special here. However, strings like [.ch.] , [=a=] , and [:space:] are special within list and represent collating symbols, equivalence classes, and character classes, respectively, and [ is therefore special within list when it is followed by . , = , or : . Also, when not in POSIXLY_CORRECT mode, special escapes like \n and \t are recognized within list . See Escapes.
regexp1 \| regexp2 Matches either regexp1 or regexp2 . Use parentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. It is a GNU extension.
regexp1 regexp2 Matches the concatenation of regexp1 and regexp2 . Concatenation binds more tightly than \| , ^ , and $ , but less tightly than the other regular expression operators.
\ digit Matches the digit -th $. $ parenthesized subexpression in the regular expression. This is called a . Subexpressions are implicity numbered by counting occurrences of \( left-to-right.
\n Matches the newline character.
\ char Matches char , where char is one of $ , * , . , [ , \ , or ^ . Note that the only C-like backslash sequences that you can portably assume to be interpreted are \n and \\ ; in particular \t is not portable, and matches a ‘ t ’ under most implementations of sed , rather than a tab character.

Note that the regular expression matcher is greedy, i.e., matches are attempted from left to right and, if two or more matches are possible starting at the same character, it selects the longest.

Examples: ‘ abcdef ’ Matches ‘ abcdef ’.
‘ a*b ’ Matches zero or more ‘ a ’s followed by a single ‘ b ’. For example, ‘ b ’ or ‘ aaaaab ’.
‘ a\?b ’ Matches ‘ b ’ or ‘ ab ’.
‘ a\+b\+ ’ Matches one or more ‘ a ’s followed by one or more ‘ b ’s: ‘ ab ’ is the shortest possible match, but other examples are ‘ aaaab ’ or ‘ abbbbb ’ or ‘ aaaaaabbbbbbb ’.
‘ .* ’ ‘ .\+ ’ These two both match all the characters in a string; however, the first matches every string (including the empty string), while the second matches only strings containing at least one character.
‘ ^main.*(.*) ’ This matches a string starting with ‘ main ’, followed by an opening and closing parenthesis. The ‘ n ’, ‘ ( ’ and ‘ ) ’ need not be adjacent.
‘ ^# ’ This matches a string beginning with ‘ # ’.
‘ \\$ ’ This matches a string ending with a single backslash. The regexp contains two backslashes for escaping.
‘ \$ ’ Instead, this matches a string consisting of a single dollar sign, because it is escaped.
‘ [a-zA-Z0-9] ’ In the C locale, this matches any ASCII letters or digits.
‘ [^ tab ]\+ ’ (Here tab stands for a single tab character.) This matches a string of one or more characters, none of which is a space or a tab. Usually this means a word.
‘ ^$.*$\n\1$ ’ This matches a string consisting of two equal substrings separated by a newline.
‘ .\<9\>A$ ’ This matches nine characters followed by an ‘ A ’.
‘ ^.\<15\>A ’ This matches the start of a string that contains 16 characters, the last of which is an ‘ A ’.

Источник

Изучаем команды Linux: sed

Оригинал: Learning Linux Commands: sed
Автор: Rares Aioanei
Дата публикации: 19 ноября 2011 года
Перевод: А. Кривошей
Дата перевода: июль 2012 г.

Николай Игнатушко проверил на GNU sed version 4.2.1 в дистрибутиве Gentoo все команды, упомянутые в этой статье. Не все скрипты хорошо отрабатывали на версии GNU sed. Но дело касалось мелочей, которые исправлены. Только скрипт по замене hill на mountains пришлось существенно переделать.

1. Введение

Добро пожаловать во вторую часть нашей серии, которая посвящена sed, версии GNU. Существует несколько версий sed, которые доступны на разных платформах, но мы сфокусируемся на GNU sed версии 4.x. Многие из вас слышали о sed, или уже использовали его, скорее всего в качестве инструмента замены. Но это только одно из предназначений sed, и мы постараемся показать вам все аспекты использования этой утилиты. Его название расшифровывается как «Stream EDitor» и слово «stream» (поток) в данном случае может означать файл, канал, или просто stdin. Мы надеемся, что у вас уже есть базовые знания о Linux, а если вы уже работали с регулярными выражениями, или по крайней мере знаете, что это такое, то все для вас будет намного проще. Объем статьи не позволяет включить в нее полное руководство по регулярным выражениям, вместо этого мы озвучим базовые концепции и дадим большое количество примеров использования sed.

2. Установка

Здесь не нужно много рассказывать. Скорее все sed у вас уже установлен, так как он используется различными системными скриптами, а также пользователями Linux, которые хотят повысить эффективность своей работы. Вы можете узнать, какая версия sed у вас установлена, с помощью команды:

В моей системе эта команда показывает, что у меня установлен GNU sed 4.2.1 плюс дает ссылку на домашнюю страницу программы и другие полезные сведения. Пакет называется «sed» независимо от дистрибутива, кроме Gentoo, где он присутствует неявно.

3. Концепции

Перед тем, как идти дальше, мы считаем важным акцентировать внимание на том, что делает «sed», так как словосочетание «потоковый редактор» мало что говорит о его назначении. sed принимает на входе текст, выполняет заданные операции над каждой строкой (если не задано другое) и выводит модифицированный текст. Указанными операциями могут быть добавление, вставка, удаление или замена. Это не так просто, как выглядит: предупреждаю, что имеется большое количество опций и их комбинаций, которые могут сделать команду sed очень трудной для понимания. Поэтому мы рекомендуем вам изучить основы регулярных выражений, чтобы понимать, как это работает. Перед тем, как приступить к руководству, мы хотели бы поблагодарить Eric Pement и других за вдохновление и за то, что он сделал для всех, кто хочет изучать и использовать sed.

4. Регулярные выражения

Так как команды (скрипты) sed для многих остаются загадкой, мы чувствуем, что наши читатели должны понимать базовые концепции, а не слепо копировать и вставлять команды, значения которых они не понимают. Когда человек хочет понять, что представляют собой регулярные выражения, ключевым словом является «соответствие», или, точнее, «шаблон соответствия». Например, в отчете для своего департамента вы написали имя Nick, обращаясь к сетевому архитектору. Но Nick ушел, а на его место пришел John, поэтому теперь вы должны заменить слово Nick на John. Если файл с отчетом называется report.txt, вы должны выполнить следующую команду:

По умолчанию sed использует stdout, вы можете использовать оператор перенаправления вывода, как показано в примере выше. Это очень простой пример, но мы проиллюстрировали несколько моментов: мы ищем все соответствия шаблону «Nick» и заменяем во всех случаях на «John». Отметим, что sed призводит поиск с учетом регистра, поэтому будьте внимательны и проверьте выходной файл, чтобы убедиться, что все замены были выполнены. Приведенный выше пример можно было записать и так:

Хорошо, скажете вы, но где же здесь регулярные выражения? Да, мы хотели сначала показать пример, а теперь начинается самая интересная часть.
Если вы не уверены, написали ли вы «nick» или «Nick», и хотите предусмотреть оба случая, необходимо использовать команду sed ‘s/Nick|nick/John/g’. Вертикальная черта имеет значение, которое вы должны знать, если изучали C, то есть ваше выражение будет соответствовать «nick» или «Nick». Как вы увидите ниже, канал может использоваться и другими способами, но смысл остается тот же самый. Другие операторы, широко использующиеся в регулярных выражениях — это «?», который соответствует повторению предшествующего символа ноль или один раз (то есть flavou?r будет соответствовать flavor и flavour), «*» — ноль или более раз, «+» — один или более раз. «^» соответствует началу строки, а «$» — наоборот. Если вы — пользователь vi или vim, многие вещи покажутся вам знакомыми. В конце концов, эти утилиты, вместе с awk и С уходят корнями в ранние дни UNIX. Мы не будем больше говорить на эту тему, так как проще понять значение этих символов на примерах, но вы должны знать, что существуют различные реализации регулярных выражений: POSIX, POSIX Extended, Perl, а также различные реализации нечетких регулярных выражений, гарантирующие вам головную боль.

Источник

Unix / Linux — Regular Expressions with SED

In this chapter, we will discuss in detail about regular expressions with SED in Unix.

A regular expression is a string that can be used to describe several sequences of characters. Regular expressions are used by several different Unix commands, including ed, sed, awk, grep, and to a more limited extent, vi.

Here SED stands for stream editor. This stream-oriented editor was created exclusively for executing scripts. Thus, all the input you feed into it passes through and goes to STDOUT and it does not change the input file.

Invoking sed

Before we start, let us ensure we have a local copy of /etc/passwd text file to work with sed.

Читайте также: Windows aktivator by daz

As mentioned previously, sed can be invoked by sending data through a pipe to it as follows −

The cat command dumps the contents of /etc/passwd to sed through the pipe into sed’s pattern space. The pattern space is the internal work buffer that sed uses for its operations.

The sed General Syntax

Following is the general syntax for sed −

Here, pattern is a regular expression, and action is one of the commands given in the following table. If pattern is omitted, action is performed for every line as we have seen above.

The slash character (/) that surrounds the pattern are required because they are used as delimiters.

Prints the line

Deletes the line

Substitutes the first occurrence of pattern1 with pattern2

Deleting All Lines with sed

We will now understand how to delete all lines with sed. Invoke sed again; but the sed is now supposed to use the editing command delete line, denoted by the single letter d −

Instead of invoking sed by sending a file to it through a pipe, the sed can be instructed to read the data from a file, as in the following example.

The following command does exactly the same as in the previous example, without the cat command −

The sed Addresses

The sed also supports addresses. Addresses are either particular locations in a file or a range where a particular editing command should be applied. When the sed encounters no addresses, it performs its operations on every line in the file.

The following command adds a basic address to the sed command you’ve been using −

Notice that the number 1 is added before the delete edit command. This instructs the sed to perform the editing command on the first line of the file. In this example, the sed will delete the first line of /etc/password and print the rest of the file.

The sed Address Ranges

We will now understand how to work with the sed address ranges. So what if you want to remove more than one line from a file? You can specify an address range with sed as follows −

The above command will be applied on all the lines starting from 1 through 5. This deletes the first five lines.

Try out the following address ranges −

Sr.No.	Range & Description
1

Lines starting from the 4 th till the 10 th are deleted

Only 10 th line is deleted, because the sed does not work in reverse direction

This matches line 4 in the file, deletes that line, continues to delete the next five lines, and then ceases its deletion and prints the rest

This deletes everything except starting from 2 nd till 5 th line

This deletes the first line, steps over the next three lines, and then deletes the fourth line. Sed continues to apply this pattern until the end of the file.

This tells sed to delete the second line, step over the next line, delete the next line, and repeat until the end of the file is reached

Lines starting from 4 th till 10 th are printed

This generates the syntax error

This would also generate syntax error

Note − While using the p action, you should use the -n option to avoid repetition of line printing. Check the difference in between the following two commands −

The Substitution Command

The substitution command, denoted by s, will substitute any string that you specify with any other string that you specify.

To substitute one string with another, the sed needs to have the information on where the first string ends and the substitution string begins. For this, we proceed with bookending the two strings with the forward slash (/) character.

The following command substitutes the first occurrence on a line of the string root with the string amrood.

It is very important to note that sed substitutes only the first occurrence on a line. If the string root occurs more than once on a line only the first match will be replaced.

For the sed to perform a global substitution, add the letter g to the end of the command as follows −

Substitution Flags

There are a number of other useful flags that can be passed in addition to the g flag, and you can specify more than one at a time.

Sr.No.	Range & Description
1

Replaces all matches, not just the first match

Replaces only NUMBER th match

If substitution was made, then prints the pattern space

If substitution was made, then writes result to FILENAME

Matches in a case-insensitive manner

In addition to the normal behavior of the special regular expression characters ^ and $, this flag causes ^ to match the empty string after a newline and $ to match the empty string before a newline

Using an Alternative String Separator

Suppose you have to do a substitution on a string that includes the forward slash character. In this case, you can specify a different separator by providing the designated character after the s.

In the above example, we have used : as the delimiter instead of slash / because we were trying to search /root instead of the simple root.

Replacing with Empty Space

Use an empty substitution string to delete the root string from the /etc/passwd file entirely −

Address Substitution

If you want to substitute the string sh with the string quiet only on line 10, you can specify it as follows −

Similarly, to do an address range substitution, you could do something like the following −

As you can see from the output, the first five lines had the string sh changed to quiet, but the rest of the lines were left untouched.

The Matching Command

You would use the p option along with the -n option to print all the matching lines as follows −

Using Regular Expression

While matching patterns, you can use the regular expression which provides more flexibility.

Check the following example which matches all the lines starting with daemon and then deletes them −

Following is the example which deletes all the lines ending with sh −

The following table lists four special characters that are very useful in regular expressions.

Sr.No.	Flag & Description
1

Matches the beginning of lines

Matches the end of lines

Matches any single character

Matches zero or more occurrences of the previous character

Matches any one of the characters given in chars, where chars is a sequence of characters. You can use the — character to indicate a range of characters.

Matching Characters

Look at a few more expressions to demonstrate the use of metacharacters. For example, the following pattern −

Sr.No.	Character & Description
1

Matches lines that contain strings such as a+c, a-c, abc, match, and a3c

Matches the same strings along with strings such as ace, yacc, and arctic

Matches the string The and the

Matches blank lines

Matches an entire line whatever it is

Matches one or more spaces

Matches blank lines

Following table shows some frequently used sets of characters −

Sr.No.	Expression & Description
1

Matches a single lowercase letter

Matches a single uppercase letter

Matches a single letter

Matches a single number

Matches a single letter or number

Character Class Keywords

Some special keywords are commonly available to regexps, especially GNU utilities that employ regexps. These are very useful for sed regular expressions as they simplify things and enhance readability.

For example, the characters a through z and the characters A through Z, constitute one such class of characters that has the keyword [[:alpha:]]

Using the alphabet character class keyword, this command prints only those lines in the /etc/syslog.conf file that start with a letter of the alphabet −

The following table is a complete list of the available character class keywords in GNU sed.

Sr.No.	Set & Description
1

Alphanumeric [a-z A-Z 0-9]

Alphabetic [a-z A-Z]

Blank characters (spaces or tabs)

Any visible characters (excludes whitespace)

Lowercase letters [a-z]

Printable characters (non-control characters)

Uppercase letters [A-Z]

Hex digits [0-9 a-f A-F]

Aampersand Referencing

The sed metacharacter & represents the contents of the pattern that was matched. For instance, say you have a file called phone.txt full of phone numbers, such as the following −

You want to make the area code (the first three digits) surrounded by parentheses for easier reading. To do this, you can use the ampersand replacement character −

Here in the pattern part you are matching the first 3 digits and then using & you are replacing those 3 digits with the surrounding parentheses.

Using Multiple sed Commands

You can use multiple sed commands in a single sed command as follows −

Here command1 through commandN are sed commands of the type discussed previously. These commands are applied to each of the lines in the list of files given by files.

Using the same mechanism, we can write the above phone number example as follows −

Note − In the above example, instead of repeating the character class keyword [[:digit:]] three times, we replaced it with \ , which means the preceding regular expression is matched three times. We have also used \ to give line break and this has to be removed before the command is run.

Back References

The ampersand metacharacter is useful, but even more useful is the ability to define specific regions in regular expressions. These special regions can be used as reference in your replacement strings. By defining specific parts of a regular expression, you can then refer back to those parts with a special reference character.

To do back references, you have to first define a region and then refer back to that region. To define a region, you insert backslashed parentheses around each region of interest. The first region that you surround with backslashes is then referenced by \1, the second region by \2, and so on.

Assuming phone.txt has the following text −

Try the following command −

Note − In the above example, each regular expression inside the parenthesis would be back referenced by \1, \2 and so on. We have used \ to give line break here. This should be removed before running the command.

Источник

Читайте также: Технология работы с операционными системами семейства windows linux

Оцените статью

Sr.No.	Character Class & Description
1