Содержание

Какие параметры влияют на производительность приложений? Часть 1. TCP Window Size
Wireshark [TCP Window Full] & [Zero Window]
Related
About: rtoodtoo
Troubleshooting “TCP Zero Window” Issues
1 Answer 1

Какие параметры влияют на производительность приложений? Часть 1. TCP Window Size

Самый простой способ понять значение термина размер TCP окна (TCP Window Size), это представить разговор двух человек. Один человек говорит, а второй кивает головой или говорит да, тем самым подтверждая, что он понял, а по сути, получил все слова, которые ему были сказаны. После этого разговор продолжается. Если мы встречаем особо говорливого человека, то наша голова быстро загружается, и мы начинаем терять нить разговора или переспрашивать нашего собеседника. Тоже самое происходит и в Матрице — в мире цифр и машин.

Размер TCP окна (TCP Window Size) – количество октетов (начиная с номера подтверждения), которое принимающая сторона готова принять в настоящий момент без подтверждения. На стадии установления соединения рабочая станция и сервер обмениваются значениями максимального размера TCP окна (TCP Window Size), которые присутствуют в пакете и эти значения можно легко увидеть, воспользовавшись захватом трафика.

Например, если размер окна получателя равен 16384 байта, то отправитель может отправить 16384 байта без остановки. Принимая во внимание, что максимальная длина сегмента (MSS) может быть 1460 байт, то отправитель сможет передать данный объем в 12 фреймах, и затем будет ждать подтверждение доставки от получателя и информации по обновлению размера окна. Если процесс прошел без ошибок, то размер окна может быть увеличен. Таким образом, реализуется размер скользящего окна в стеке протокола TCP.

В зависимости от состояния каналов связи, размер окна может быть больше или меньше. Каналы связи могут быть высокоскоростными (большая пропускная способность) и протяженными (большая задержка и возможно потери), поэтому при небольшом размере TCP окна мы будем вынуждены отправлять один или несколько фреймов и ждать подтверждения от получателя, затем процесс повторяется. Таким образом, наши приложения будут неэффективно использовать доступную полосу пропускания. Пакетов будет много, но реального полезного трафика будет передано не много. Чтобы получить максимальную пропускную способность, необходимо использовать оптимально установленный размер передающего и принимающего окна для канала, который вы используете.

Для расчёта максимального размера окна (т.е. максимальный объем данных, которые могут передаваться одним пользователем другому в канале связи) рассчитывается по формуле:

Полоса пропускания (бит/сек) * RTT (круговое время передачи по сети) = размер окна в битах

Таким образом, если ваши два офиса соединяет канал связи в 10 Мбит/сек и круговое время составляет 85 миллисекунд, то воспользовавшись данной формулой, мы получим значение окна равное:

10 000 000 * 0,085 / 8 = 106250 байт

Размер поля Window в заголовке TCP составляет 16 бит; это означает, что узел TCP может указать максимальный размер TCP окна 65535 байт. Таким образом, максимальная пропускная способность составляет:

65535 * 8 / 0,085 = 6,2 Мбит/сек

т.е. чуть больше 50% от реально доступной полосы пропускания канала.

В современных версиях операционных систем можно увеличить размер окна TCP Window Size и включить динамическое изменение окна в зависимости от состояния канала связи. В предложении RFC 1323 дано определение масштабирования окон, позволяющего получателю указывать размер окна больше 65535 байт, что позволит применять большие размеры окон и высокоскоростные каналы передачи. Параметр TCP Window Scale указывает коэффициент масштабирования окна, который в сочетании с 16-битным полем Window в заголовке TCP может увеличивать размер окна приема до максимального значения, составляющего примерно 1 ГБ. Параметр Window Scale отправляется только в сегментах синхронизации (SYN) при установке соединения. На нашем скриншоте из WireShark он составляет 256. Устройства, общающиеся друг с другом, могут указывать разные коэффициенты масштабирования для TCP окон.

Таким образом, активировав масштабирование окон TCP и уменьшив круговое время передачи по сети, мы сможем повысить эффективность использования доступной полосы пропускания и как следствие скорость работы приложений. А проверить это можно захватив пакеты, и посмотреть о каких значениях размера окна и коэффициенте масштабирования договорились устройства в момент установки соединения. Это динамическое увеличение и уменьшение размера окна является непрерывным процессом в TCP и определяет оптимальный размер окна для каждого сеанса. В очень эффективных сетях размеры окна могут стать очень большими, потому что данные не теряются. В сетях, где сетевая инфраструктура перегружена, размер окна, вероятно, останется маленьким.

Wireshark [TCP Window Full] & [Zero Window]

TCP sliding window is very crucial concept in understanding how TCP behaves. In order to see how this mechanism works, I have rate limited an HTTP download and observed what happens during this scenario in which we will see reports from Wireshark that [TCP Window Full] and [TCP ZeroWindow]. The aim of this post is to try to show how wireshark understands that Window is full.

We have a web server and a client machine on this setup. We intentionally rate limit the traffic by using wget to allow us investigate this scenario.

Yes we have downloaded the the file. During the download I also took packet capture on the client side. In order to understand the behaviour, first this rate limiting needs a bit of explanation. When you set the option “–limit-rate” on wget, software in order to sustain the throughput you set, sends a TCP segment with Window Size set to 0 which literally instructs the sender to pause. As my aim is to try to understand how Wireshark notices window full situation, we are starting to investigate the packet capture right after client sends a TCP ACK with Window Size zero.

We should better zoom into particular time frame in order to understand this event easier as the whole story is developed between Pkt 181 and Pkt 200 in this capture. You need to take a look at this screenshot before going further in the article since it shows how we go from “Window Zero” to Window full state.

Packet no: 181 is sent from client with Win=0 as you can see. At that particular moment, sender knows that it isn’t allowed to send any more packets.

Packet no: 182 client this time decides to accept packets because of which sets the WindowSize to 22656 (Win=22656). What does this mean practically? As window size is a number which tells the sender that “You can send this number of bytes without expecting any acknowledgement from me“, sender resumes sending segments as fast as it can from this point on.

Packet no: 183 – 197 In a normal TCP communication you don’t see this number of segments which aren’t acknowledged. This is because of our rate limiting actually.

Now we stop the clock right after packet 197 is received i.e T=0.653210000 and take a closer look at the status of the window or we somehow take a snapshot of the receive window at this point.

Have you stopped the clock at Time = 0.653210000. Remember! client had told sender that your window is 22656Bytes. You can only send this amount with no ACK from me. What we see here is that packets from 183 to 197 didn’t receive any ACK from client (receiver). Hence number of bytes sent but not ACKed are 20272Bytes. This means sender can send 22656 – 20272 = 2280Bytes more and if doesn’t receive any feedback(ACK), then it will stop.

Packet 198: but something happens here. Receiver(client) decides to acknowledge some of the bytes by setting ACK=4109684389. What does really this mean? With this single number, Receiver is telling the Sender that “I have received all the bytes up to this byte”. By this announcement, Receiver has actually acknowledged Pkt 183 sent from the sender. Not clear? Maybe I can show it like this. Let’s open the bytes on Pkt 183 and see how the bytes are counted.

Читайте также: Use windows repair disc

SEQ number (4109683925) is the first byte in the TCP segment which is an inclusive number and our packet’s TCP payload has 464 bytes which means the last byte number is (4109684388). Hence on the ACK Pkt 198, receiver says that it is acknowledging the last byte 4109684388 by setting the ACK to 4109684389. It is important to understand that ACK number is the next expected sequence number of the TCP segment from the other side.

Now what happens. This is really important to understand the whole topic. At the beginning of the transaction on Pkt 183, receiver had set the Window to 22656 but now on this latest ACK (Pkt 198), it is reducing the window size further to 22272 (Win=22272) hence sender should send less bytes and the below image shows both how window is sliding and reduced at the same time.

What does this snapshot of the Receive window mean?
It means sender can send 2000 Bytes more without any acknowledgement from the receiver.

Pkt 199 (1448 Bytes) and Pkt 200 (552 Bytes) are sent from the sender which fills this usable window 2000Bytes. Therefore there isn’t any available space left in the receive window and Wireshark immediately detects and displays you the message [TCP Window Full]

I must say that it is really cool! Wireshark is doing a wonderful job to help people troubleshooting network issues.
I am hoping I haven’t done any conceptual mistake here. This is how I understand this behaviour. If you have anything to add or correct, please do let me know.

About: rtoodtoo

Worked for more than 10 years as a Network/Support Engineer and also interested in Python, Linux, Security and SD-WAN, currently living in the Netherlands and works as a Network Support Engineer. // JNCIE-SEC #223 / RHCE / PCNSE

Troubleshooting “TCP Zero Window” Issues

I’m currently having a problem troubleshooting a trading application. Let me give a simple diagram of the current network setup

(Gov’t Stock Exchange Network Router)X—>(256kbs Leased Line) Our Network Devices(5 Switches, 1 Firewall) Trading Server.

Our users reports that they are experiencing slowness at around 9:30 to 9:45am. I checked the CPU, Memory, Response Time and Link Utilization of all our Network Devices and Interfaces and all of them reports normal levels.

Part of the trading process is the communication between the Stock Exchange Network and our Trading Servers so if there is any slowness on that 256kbps leased line link, surely it would contribute to the slowness. Unfortunately, the telco router is not being monitored by the Telco and we’re still asking for permission if we can add their device to our Solarwinds.

So the closest link I could look at is the 100mbps link from our switch going to the leased line router on our side.

When the traders are experiencing 3ms to 5ms latency in trading, it shows this:

Transmit: 1500bps — 1900bps

Receive: 2000bps — 2400bps

Bytes Transferred per Minute: 44KB-60KB

Wireshark Reports no problem at this time

Special note though on every 9:34 — 9:37 because they experience 10ms — 15ms latency in trading:

Transmit: 1900bps — 2400bps

Receive: 2400bps — 3200bps

Bytes Transferred per Minute: 90KB — 170KB

Wireshark Reports that I’m getting TCP Zero Window(trade server sending the zero window alert to the to stock exchange server) errors but it only lasts for a few milliseconds and only happens at twice or thrice a day.

And there was even one incident when our traders where experiencing crazy latencies of 1min — 3mins delay in trading!:

Wireshark Reports that we were getting TCP Zero Window(trade server sending the zero window alert to the to stock exchange server) errors for the whole trading period of that day. This only happened once and until now, I’m still not available to resolve this issue

The Trading Server team reports that their CPU, Memory and NIC utilization is normal and of course, everyone is blaming the network guys.

So here are my questions:

0.) Is there something with the way i troubleshoot this problem?? I figured I should write this as question no. 0, haha.

1.) When TCP Zero Window happens, what things and devices should I check? Because server team reports that the Memory and NIC utilization of their trading server is normal.

2.) Is there a way to graph in wireshark the transmit/receive bps an bytes received? What I currently do is to go to Statistics -> Conversation -> IPV4 -> Check the «Limit Display to this Filter» and the filter I’m using is ip.addr eq X.X.X.X and ip.addr eq Y.Y.Y.Y and (frame.time ge «DATE HH:MM:SS.000000000» and frame.time le «DATE HH:MM:SS.999999999») and go look at the bps and bytes received

3.) Are there other things I could look at or check(network devices, etc.)?

Thanks a lot for all your help guys! 🙂

1 Answer 1

Well. Since you’ve captured packets that show that your Trading Server is sending the TCP ACK’s with a window size of 0, you at least know the problem is definitely on your side. Which is actually a good thing, because you are in a position to fix it. (There is one thing that might be the issue which would be a problem on their end, I’ll talk about that later)

You’ve also traced the issue to happening during times of increased throughput, also a good thing.

You said the CPU/RAM usage on your Trading Server reported normal. The application you are using, is it by chance configured to use a limited amount of RAM on the host OS? Maybe a limited percent? Because it would stand to reason that if so, as you had more connections and more throughput, there was less RAM available to the application, and therefore less resources available for TCP.

Either way, what OS is your Trading Server using? If you haven’t already, you should look into tuning the OS to dedicate more RAM to TCP. In Windows, there are Registry values you can modify. In Linux, there are config files you can edit.

It would also be wise to make sure your Firewall (and nothing else in between) is trying to proxy your TCP sessions. That way you know you are dealing with the full «client to server» TCP connection, and not something in between.

The last thing I can offer is to study the TCP packets being sent from the Stock Exchange to your server just before your server sends a Window Size of 0. In particular, look for the incoming packets to have the value 11 in the IP Header’s ECN field (Explicit Congestion Notification — the last two bits in what used to be DSCP, bits 14 and 15 if you’re looking at an IP Header). There is a chance that if both the Client and Server in the communication supported ECN, and a router in transit detected congestion, that it turned these bits on to tell the client and server to slow down their transfers. (This is that thing I said that might be a problem on their end)

I think that (tries to) answer questions 0,1,3. I’ll have to dig around a bit more to give you a reliable answer for 2. But I’m pretty confident there is a way.

Tcp zero window windows

Какие параметры влияют на производительность приложений? Часть 1. TCP Window Size

Wireshark [TCP Window Full] & [Zero Window]

Related

About: rtoodtoo

Troubleshooting “TCP Zero Window” Issues

1 Answer 1