Packet loss in linux

Содержание

How to show dropped packets per interface on Linux
Displaying show dropped packets per interface on Linux using the netstat
To display summary statistics for each protocol, run:
Show tcp stats
Display udp stats
Showing dropped packets statistics per network interface on Linux using the ip
Queries the specified network device for NIC- and driver-specific statistics with ethtool
Finding out why a Linux server is dropping packets
Building dropwatch
Conclusion
Diagnosing Packet Loss in Linux Network Virtualization Layers: Part 4
Building on work described in Parts 1, 2, and 3 of this series, we present a careful study of the Linux kernel source code related to macvtap queue handling and the potential for data races there.
Macvtap device packet handling
Storing inbound packets
Packet queue management
Enqueuing packets
Checking for data race safety
The early “queue full” test
Examining possible data races
A candidate data race
Instrumenting the driver
Summary
Имитируем сетевые проблемы в Linux
Имитируем проблемы с сетью
Задержка пакетов
Потеря пакетов
Добавление шума в пакеты
Дублирование пакетов
Изменение порядка пакетов
Изменение пропускной способности
Имитируем connection timeout
REJECT
REJECT with tcp-reset
REJECT with icmp-host-unreachable
Имитируем request timeout
REJECT
REJECT with tcp-reset
REJECT with icmp-host-unreachable
Вывод

How to show dropped packets per interface on Linux

H ow do I display dropped packets per interface on Linux operating systems from the command line option? How can I determine why a Linux server is dropping packets?

We can use the ip command or netstat command or ethtool command to show dropped packets statistics per network interface on Linux. Let us see how to use both commands list dropped packets per interface.

Tutorial details
Difficulty level	Advanced
Root privileges	Yes
Requirements	Linux with GCC compilers
Est. reading time	6 minutes (depends upon your skills & situation)

Displaying show dropped packets per interface on Linux using the netstat

The netstat command is mostly obsolete. Replacement for netstat is ss and ip command. However, netstat still available on older Linux distros, which are in productions. Hence, I will start with netstat but if possible, use the ip/ss tools. The syntax is:
netstat -i
netstat —interfaces

Displaying network stats per network interface on LInux

To display summary statistics for each protocol, run:

netstat -s
netstat —statistics
Outputs:

No ads and tracking
In-depth guides for developers and sysadmins at Opensourceflare✨
Join my Patreon to support independent content creators and start reading latest guides:
- How to set up Redis sentinel cluster on Ubuntu or Debian Linux
- How To Set Up SSH Keys With YubiKey as two-factor authentication (U2F/FIDO2)
- How to set up Mariadb Galera cluster on Ubuntu or Debian Linux
- A podman tutorial for beginners – part I (run Linux containers without Docker and in daemonless mode)
- How to protect Linux against rogue USB devices using USBGuard

Join Patreon ➔

Show tcp stats

netstat —statistics —tcp
netstat -s -t

Display udp stats

netstat —statistics —udp
netstat -s -u

Showing dropped packets statistics per network interface on Linux using the ip

Let us see how to see link device stats using the ip command. The syntax is:
ip -s link
ip -s link show
ip -s link show eth0

In this example display link stats for wg0:
ip -s link show wg0

It is clear that TX is Transmit and RX is Receive. Wireguard creates the wg0 interface. So either Wireguard or firewall dropping packets as per policy.

Queries the specified network device for NIC- and driver-specific statistics with ethtool

Pass the -S or —statistics option to display stats. Again the syntax is straightforward:
ethtool -S
ethtool -S eth0

Another option is to directly query the /proc/net/dev file either using the cat command or column command:
cat /proc/net/dev
column -t /proc/net/dev
And this is what we see:

Finding out why a Linux server is dropping packets

It is a project I started in an effort to improve the ability for developers and system administrator to diagnose problems in the Linux Networking stack, specifically in our ability to diagnose where packets are getting dropped.

Building dropwatch

Install required tools, libs and gcc compiler collection on Ubuntu or Debian Linux:
sudo apt-get install libpcap-dev libnl-3-dev libnl-genl-3-dev binutils-dev libreadline6-dev autoconf libtool pkg-config build-essential
Next, clone the repo and then compile it:
git clone https://github.com/nhorman/dropwatch
cd dropwatch
./autogen.sh
./configure
make
make install
Session:

Run it as follows:
# dropwatch -l kas

You see nf_hook_slow or icmpv6_rcv are Linux kernel functions. It means I need to search Linux kernel trees and see what is going on. Naturally, you must understand C programming and have a good understanding of Linux kernel internals including TCP/IP stack. Now, you know how to see if packets are dropped at the IP layer, the link layer, the UDP/TCP layer, or the application layer. If packets are dropped in TCP/IP, we need to look into the Linux kernel source code documentation. See man page and source code of dropwatch for futher information:
man dropwatch
I would also suggest trying out tcpdump to dump traffic on a network interface. Often it gives hints about packets and easily analyzed in wireshark tool:
man tcpdump

Conclusion

You learned about various Linux commands to see packet loss on Linux per-interface, including excellent tools such as dropwatch. We can also use Linux profiling with performance counters utility called perf. Check out perf examples for further information.

🐧 Get the latest tutorials on Linux, Open Source & DevOps via

Источник

Diagnosing Packet Loss in Linux Network Virtualization Layers: Part 4

3 March 2021 5 min read

Kean Kuiper, Senior Engineer
Saju Mathew, Senior Engineer
Rei Odaira, Research Staff Member

Building on work described in Parts 1, 2, and 3 of this series, we present a careful study of the Linux kernel source code related to macvtap queue handling and the potential for data races there.

We will then show how we utilized SystemTap to confirm our theory of the data races. This is part of the series of blogs that is intended for network administrators and developers who are interested in how to diagnose packet loss in the Linux network virtualization layers.

Macvtap device packet handling

Inbound packets are handled by the following function, where “ … ” represents other code not relevant to this discussion:

tap_handle_frame() has two paths that can lead to dropped packets. The first is an early check to give up if the queue is already full, and the second produces a packet for consumption but can still encounter a full queue. To make the process easier to understand, we explore the producing path first.

Storing inbound packets

skb_array_produce() is just a wrapper around a more generic implementation.

ptr_ring_produce() wraps spin lock serialization around the core logic, guaranteeing that only a single CPU adds packets to the queue at a time.

Packet queue management

Before examining queue handling in more detail, let’s look at the structure of the queue itself:

An entry in the queue is either (NULL) or points to a packet stored elsewhere. In an empty queue, all entries are (NULL). The queue is logically structured as a ring, where Producer and Consumer indexes wrap back to the beginning of queue storage when they advance beyond the end.

A full queue, such as the one shown below, has no entries that are (NULL):

This is the only case where the entry indexed by Producer is not (NULL).

Enqueuing packets

Next, we examine the internals of packet handling, remembering that on this path, we are serialized under a lock:

__ptr_ring_produce() adds the packet to the queue after first verifying that the queue has space for the incoming packet, returning -ENOSPC when full. In the path under study here, this would lead to a dropped packet on eventual return to tap_handle_frame() .

Checking for data race safety

Since we did not vary queue sizes during scenarios with dropped packets, the queue full test on r->size is uninteresting. The test of r->queue[r->producer] is correct and safe based on the presence of the serializing lock on this path.

r->queue[r->producer++] = ptr handles storing the packet. Again, this is done under lock. The next code stanza wraps the producer index when necessary.

The early “queue full” test

So far, nothing stands out as a problem. Let’s return to the top level:

We have covered the path from skb_array_produce() on down. Here is the early queue full test:

This is just a wrapper around the now-familiar queue full test r->queue[r->producer] . An important difference is that there are no locks on this path. When suspecting a race condition, this seems quite interesting.

Examining possible data races

This code allows both of the following statements to execute concurrently in the case where inbound packet handling on two CPUs chooses the same queue:

Note that in the second statement above, there are updates both to r->producer and to r->queue[] . Without going deeply into C language semantics, let’s see how the compiler chooses to order these in the system under study. We focus on the code highlighted below:

These are the corresponding x86_64 instructions, with comments on the right following “;”:

We can see here that gcc sequences the r->producer update ahead of the r->queue[] update. A second consideration is the memory ordering model of the CPU. X86_64, being strongly ordered, preserves the order of these memory writes as seen by other processors.

A candidate data race

The two stores in the order described leave a window open where a queue full condition can be falsely detected:

The reader sees the entry just produced by the writer due to its stale value of the producer index. Stepping back a bit, this means that the early un-serialized queue full test in tap_handle_frame() is the problem. A scenario like this fits our observations of occasional packet loss in cross-CPU packet production, but more work is required to confirm these findings.

Instrumenting the driver

To get further confidence, we wanted to check whether this branch in tap_handle_frame() was really taken:

SystemTap, which we introduced in Part 2 of this series, is once again the best tool for such a purpose, but this time you would need a set-up that is a bit complicated. SystemTap allows you to instrument any machine instruction in the kernel at runtime, but you must know the absolute memory address of the target instruction. Because tap_handle_frame() is in the macvtap kernel module, the first step was to disassemble the module and to identify which jump instruction corresponded to the branch:

After reading the assembly code, we figured out that the je instruction shown above was our target. Je is an x86 instruction to jump to the target specified in the operand when the Zero flag is true. 235c shown at the head of the line is the relative memory address of this instruction within the tap_handle_frame() function.

The second step was to obtain the absolute memory address of the head of tap_handle_frame() by searching the /proc/kallsyms file. The /proc/kallsyms files provides the addresses of all of the symbols in the Linux kernel. We calculated the absolute address of the je instruction by adding the absolute address of the head of the function to the relative address of the instruction within the function.

The final step was to determine what condition to check at the je instruction. In the x86 architecture, the Zero flag is at the bit position 0x40 of the flags register, according the architecture manual. In a SystemTap script, you can read the value of the flags register at the time the instrumented instruction is executed. By reading the assembly code, we figured out that the branch jumps to the drop label when the je instruction falls through. In other words, the packet is dropped when the Zero flag is false at the je instruction.

Putting all of the information together, we came up with the following SystemTap script:

0xFFFFFFFFC1F1B35C is the calculated absolute memory address of the je instruction. This script printed a full message every time the je instruction fell through.

By executing it, we confirmed that the number of messages exactly matched the number of dropped packets at macvtap.

Summary

In this post, we have explained the root cause of the packet loss, which was a concurrency bug in the Linux macvtap driver. We have also shown how we used SystemTap to double-check our finding by instrumenting the target jump instruction. In the next (and final) post, we will present how we took advantage of a kernel patch mechanism to confirm that our proposed patch would actually solve the packet loss issue.

Источник

Имитируем сетевые проблемы в Linux

Всем привет, меня зовут Саша, я руковожу тестированием бэкенда в FunCorp. У нас, как и у многих, реализована сервис-ориентированная архитектура. С одной стороны, это упрощает работу, т.к. каждый сервис проще тестировать по отдельности, но с другой — появляется необходимость тестировать взаимодействие сервисов между собой, которое часто происходит по сети.

В этой статье я расскажу о двух утилитах, с помощью которых можно проверить базовые сценарии, описывающие работу приложения при наличии проблем с сетью.

Имитируем проблемы с сетью

Обычно ПО тестируется на тестовых серверах с хорошим интернет-каналом. В суровых условиях продакшена всё может быть не так гладко, поэтому иногда нужно проверять программы в условиях плохого соединения. В Linux с задачей имитации таких условий поможет утилита tc.

tc (сокр. от Traffic Control) позволяет настраивать передачу сетевых пакетов в системе. Эта утилита обладает большими возможностями, почитать про них подробнее можно здесь. Тут же я рассмотрю лишь несколько из них: нас интересует шедулинг трафика, для чего мы используем qdisc, а так как нам нужно эмулировать нестабильную сеть, то будем использовать classless qdisc netem.

Запустим echo-сервер на сервере (я для этого использовал nmap-ncat):

Для того чтобы детально вывести все таймстемпы на каждом шаге взаимодействия клиента с сервером, я написал простой скрипт на Python, который шлёт запрос Test на наш echo-сервер.

Запустим его и посмотрим на трафик на интерфейсе lo и порту 12345:

Всё стандартно: трёхстороннее рукопожатие, PSH/ACK и ACK в ответ дважды — это обмен запросом и ответом между клиентом и сервером, и дважды FIN/ACK и ACK — завершение соединения.

Задержка пакетов

Теперь установим задержку 500 миллисекунд:

Запускаем клиент и видим, что теперь скрипт выполняется 2 секунды:

Что же в трафике? Смотрим:

Можно увидеть, что во взаимодействии между клиентом и сервером появился ожидаемый лаг в полсекунды. Гораздо интереснее себя ведёт система, если лаг будет больше: ядро начинает повторно слать некоторые TCP-пакеты. Изменим задержку на 1 секунду и посмотрим трафик (вывод клиента я показывать не буду, там ожидаемые 4 секунды в total duration):

Видно, что клиент дважды посылал SYN-пакет, а сервер дважды посылал SYN/ACK.

Помимо константного значения, для задержки можно задавать отклонение, функцию распределения и корреляцию (со значением для предыдущего пакета). Делается это следующим образом:

Здесь мы задали задержку в промежутке от 100 до 900 миллисекунд, значения будут подбираться в соответствии с нормальным распределением и будет 50-процентная корреляция со значением задержки для предыдущего пакета.

Вы могли заметить, что в первой команде я использовал add, а затем change. Значение этих команд очевидно, поэтому добавлю лишь, что ещё есть del, которым можно убрать конфигурацию.

Потеря пакетов

Попробуем теперь сделать потерю пакетов. Как видно из документации, осуществить это можно аж тремя способами: терять пакеты рандомно с какой-то вероятностью, использовать для вычисления потери пакета цепь Маркова из 2, 3 или 4 состояний или использовать модель Эллиота-Гилберта. В статье я рассмотрю первый (самый простой и очевидный) способ, а про другие можно почитать здесь.

Сделаем потерю 50% пакетов с корреляцией 25%:

К сожалению, tcpdump не сможет нам наглядно показать потерю пакетов, будем лишь предполагать, что она и правда работает. А убедиться в этом нам поможет увеличившееся и нестабильное время работы скрипта client.py (может выполниться моментально, а может и за 20 секунд), а также увеличившееся количество retransmitted-пакетов:

Добавление шума в пакеты

Помимо потери пакетов, можно имитировать их повреждение: в рандомной позиции пакета появится шум. Сделаем повреждение пакетов с 50-процентной вероятностью и без корреляции:

Запускаем скрипт клиента (там ничего интересного, но выполнялся он 2 секунды), смотрим трафик:

Видно, что некоторые пакеты отправлялись повторно и есть один пакет с битыми метаданными: options [nop,unknown-65 0x0a3dcf62eb3d,[bad opt]>. Но главное, что в итоге всё отработало корректно — TCP справился со своей задачей.

Дублирование пакетов

Что ещё можно делать с помощью netem? Например, сымитировать ситуацию, обратную потере пакетов, — дубликацию пакетов. Эта команда также принимает 2 аргумента: вероятность и корреляцию.

Изменение порядка пакетов

Можно перемешать пакеты, причём двумя способами.

В первом часть пакетов посылается сразу, остальные — с заданной задержкой. Пример из документации:

С вероятностью 25% (и корреляцией 50%) пакет отправится сразу, остальные отправятся с задержкой 10 миллисекунд.

Второй способ — это когда каждый N-й пакет отсылается моментально с заданной вероятностью (и корреляцией), а остальные — с заданной задержкой. Пример из документации:

Каждый пятый пакет с вероятностью 25% будет отправлен без задержки.

Изменение пропускной способности

Обычно везде отсылаются к TBF, но с помощью netem тоже можно изменить пропускную способность интерфейса:

Эта команда сделает походы по localhost такими же мучительными, как серфинг в интернете через dial-up-модем. Помимо установки битрейта, можно также эмулировать модель протокола канального уровня: задать оверхед для пакета, размер ячейки и оверхед для ячейки. Например, так можно сымитировать ATM и битрейт 56 кбит/сек.:

Имитируем connection timeout

Ещё один важный пункт в тест-плане при приёмке ПО — таймауты. Это важно, потому что в распределённых системах при отключении одного из сервисов остальные должны вовремя сфоллбэчиться на другие или вернуть ошибку клиенту, при этом они ни в коем случае не должны просто зависать, ожидая ответа или установления соединения.

Есть несколько способов сделать это: например, использовать мок, который ничего не отвечает, или подключиться к процессу с помощью дебаггера, в нужном месте поставить breakpoint и остановить выполнение процесса (это, наверное, самый извращённый способ). Но один из самых очевидных — это фаерволлить порты или хосты. С этим нам поможет iptables.

Для демонстрации будем фаерволлить порт 12345 и запускать наш скрипт клиента. Можно фаерволлить исходящие пакеты на этот порт у отправителя или входящие на приёмнике. В моих примерах будут фаерволлиться входящие пакеты (используем chain INPUT и опцию —dport). Таким пакетам можно делать DROP, REJECT или REJECT с TCP флагом RST, можно с ICMP host unreachable (на самом деле дефолтное поведение — это icmp-port-unreachable, а ещё есть возможность послать в ответ icmp-net-unreachable, icmp-proto-unreachable, icmp-net-prohibited и icmp-host-prohibited).

При наличии правила с DROP пакеты будут просто «исчезать».

Запускаем клиент и видим, что он зависает на этапе подключения к серверу. Смотрим трафик:

Видно, что клиент посылает SYN-пакеты с увеличивающимся по экспоненте таймаутом. Вот мы и нашли небольшой баг в клиенте: нужно использовать метод settimeout(), чтобы ограничить время, за которое клиент будет пытаться подключаться к серверу.

Сразу удаляем правило:

Можно удалить сразу все правила:

Если вы используете Docker и вам нужно зафаерволлить весь трафик, идущий на контейнер, то сделать это можно следующим образом:

REJECT

Теперь добавим аналогичное правило, но с REJECT:

Клиент завершается через секунду с ошибкой [Errno 111] Connection refused. Смотрим трафик ICMP:

Видно, что клиент дважды получил port unreachable и после этого завершился с ошибкой.

REJECT with tcp-reset

Попробуем добавить опцию —reject-with tcp-reset:

В этом случае клиент сразу выходит с ошибкой, потому что на первый же запрос получил RST пакет:

REJECT with icmp-host-unreachable

Попробуем ещё один вариант использования REJECT:

Клиент завершается через секунду с ошибкой [Errno 113] No route to host, в ICMP трафике видим ICMP host 127.0.0.1 unreachable.

Можете также попробовать остальные параметры REJECT, а я остановлюсь на этих 🙂

Имитируем request timeout

Еще одна ситуация — это когда клиент смог подключиться к серверу, но не может отправить ему запрос. Как отфильтровать пакеты, чтобы фильтрация началась как бы не сразу? Если посмотреть на трафик любого общения между клиентом и сервером, то можно заметить, что при установлении соединения используются только флаги SYN и ACK, а вот при обмене данными в последнем пакете запроса будет флаг PSH. Он устанавливается автоматически, чтобы избежать буферизации. Можно использовать эту информацию для создания фильтра: он будет пропускать все пакеты, кроме тех, которые содержат флаг PSH. Таким образом, соединение будет устанавливаться, а вот отправить данные серверу клиент не сможет.

Для DROP команда будет выглядеть следующим образом:

Запускаем клиент и смотрим трафик:

Видим, что соединение установлено, и клиент не может послать данные серверу.

REJECT

В этом случае поведение будет таким же: клиент не сможет отправить запрос, но будет получать ICMP 127.0.0.1 tcp port 12345 unreachable и увеличивать время между переотправкой запроса по экспоненте. Команда выглядит так:

REJECT with tcp-reset

Команда выглядит следующим образом:

Мы уже знаем, что при использовании —reject-with tcp-reset клиент получит в ответ RST-пакет, поэтому можно предугадать поведение: получение RST-пакета при установленном соединении означает непредвиденное закрытие сокета с другой стороны, значит, клиент должен получить Connection reset by peer. Запускаем наш скрипт и удостоверяемся в этом. А вот так будет выглядеть трафик:

REJECT with icmp-host-unreachable

Думаю, уже всем очевидно, как будет выглядеть команда 🙂 Поведение клиента в таком случае будет немного отличаться от того, которое было с простым REJECT: клиент не будет увеличивать таймаут между попытками переотправить пакет.

Вывод

Не обязательно писать мок для проверки взаимодействия сервиса с зависшим клиентом или сервером, иногда достаточно использовать стандартные утилиты, которые есть в Linux.

Рассмотренные в статье утилиты обладают ещё большим количеством возможностей, чем было описано, поэтому вы можете придумать какие-то свои варианты их использования. Лично мне всегда хватает того, про что я написал (на самом деле даже меньше). Если вы используете эти или подобные утилиты в тестировании в своей компании, напишите, пожалуйста, как именно. Если же нет, то надеюсь, ваше ПО станет качественней, если вы решите проверять его в условиях проблем с сетью предложенными способами.

Источник