fix issues with bandwidth plugin and htb introduction #1097
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With the introduction of the htb changes through #921, 2 performance issues were introduced which we see in our production environments.
issue 1
the submitter of the htb changes did not knew about rate2quantum and subsequently about quantum issues which may arise with HTB (or any other qdisc like fq_codel).
https://github.com/containernetworking/plugins/blob/main/plugins/meta/bandwidth/ifb_creator.go#L163
The kernel is warning once unreasonable values are specified, because too high values cause subsequent issues (like txqueue drops once too much packets are dequeued).
https://github.com/torvalds/linux/blob/baeb9a7d8b60b021d907127509c44507539c15e5/net/sched/sch_htb.c#L2037
https://github.com/torvalds/linux/blob/baeb9a7d8b60b021d907127509c44507539c15e5/net/sched/sch_htb.c#L2052-L2055
issue 2
if the txqueuelen is unspecified, the kernel will decide about appropriate values. In our case the kernel will use reasonable defaults like txqueuelen=0 for this virtual device once the value is not specified. This means queueing is not to be done in case the receiving kernel code is busy and not picking up the packet fast enough. We are talking about kernel here and not the ring buffers nor /sys/class/net/eth0/queues/tx-*
Systems should drop early enough to ensure continuity and avoid working on full buffer. Bursting is OK, but HTB bursts beyond HW limits should still get limited, because this is causing latency for all containers running on the system for longer period of time. The "full buffer" can get caused by several reasons, one reason is the "unlimited HTB class" allowing unlimited rates / bursts for a single container which is causing unfairness on a system with containers running below their limited rates.
The following screenshot describes the old tbf setup which was compared to the htb setup (numtxqueues / numrxquees are the same for the tbf and the htb code and match the amount of vCPU).
Dequeue-ing from veth tx queues with 64 concurrent queues (queues equal to vCPU to concurrently execute kernel code) compared to a single txqueue without "concurrency". This was not relevant until the introduction of HTB which is dropping less (causing latency) and the introduction of txqueuelen=1000 causing latency as well.
https://gist.github.com/eqhmcow/939373#file-hfsc-shape-sh-L48-L50
It was intended to make txqueuelen equal between the veth interface, but numtxqueues / numrxqueues were not considered. Once you queue with different amount of queues between veth and ifb, it will cause kernel performance issues. Similar warnings exists in veth.c itself already, because the Host and container numtxqueues need to have at least same size counterparts:
https://github.com/torvalds/linux/blob/9d3684c24a5232c2d7ea8f8a3e60fe235e6a9867/drivers/net/veth.c#L1199-L1206
The usual veth creation may look like this, where numtxqueues of outer0 correlate to the numrxqueues of inner0, the same applies the other way around.
The kernel is even suggesting to use numtxqueues for ifb devices, which is missing here: https://github.com/torvalds/linux/blob/e8fc317dfca9021f0ea9ed77061d8df677e47a9f/drivers/net/ifb.c#L396
The upstream netlink library does not offer setting numtxqueues / numrxqueues for ifb adapters yet (just supported for veth), therefore it is required to avoid setting qlen until the upstream library offers support and numtxqueues / numrxqueues match between veth and ifb.