Network Traffic Control with tc command in Linux

1.0 Network Traffic Control

For any host connected to a network, there is the possibility of network congestion. The network bandwidth is always limited. As the data flow on a network link increases, a time comes when the quality of service (QoS) gets degraded. Some of the data packets are delayed and/or lost. New connections are blocked and the network throughput deteriorates.

The term Quality of Service (QoS) is often used as a synonym for network traffic control.

2.0 Queues

The incoming and outgoing packets are queued before these are received or transmitted respectively. The queue for incoming packets is known as the ingress queue. Similarly, the queue for outgoing packets is called the egress queue. We have more control over the egress queue as it has packets generated by our host. We can re-order these packets in the queue, effectively favoring some packets over the rest. The ip -s link command gives the queue capacity (qlen) in number of packets. If the queue is full and more packets come; these are discarded and are not transmitted. The ingress queue has packets which have been sent to us by other hosts. We can not reorder them; the only thing we can do is to drop some packets, indicating network congestion by not sending the TCP ACK to the sending host. The sending host gets the hint and slows down transmission of packets to us. For UDP packets, this does not work. If UDP packets are dropped, they are simply lost as there is no ACK and re-transmission.

3.0 TRAFFIC CONTROL ELEMENTS

3.1 SHAPING

Shaping involves delaying the transmission of packets to meet a certain data rate. This is the way we ensure that the output data rate does not exceed the desired value. Shapers can also smooth out the bursts in traffic. Shaping is done at egress.

3.2 SCHEDULING

Scheduling is deciding which packet would be transmitted next. This is done by rearranging the packets in the queue. The objectives are to provide a quick response for interactive applications and also to provide adequate bandwidth for bulk transfers like downloads initiated by remote hosts. Scheduling is done at egress.

3.3 POLICING

Policing is measuring the packets received on an interface and limiting these to a particular value. The packets might be reclassified or dropped. Policing is done at ingress.

3.4 DROPPING

After the traffics exceeds a predefined value, the packets are simply dropped. This is done both at the ingress and the egress.

4.0 TRAFFIC CONTROL COMPONENTS

4.1 qdiscs

qdisc is an abbreviation for Queue Discipline. A qdisc is the packet scheduling code that is attached to a network interface. qdiscs are implemented as modules, which are inserted in the kernel at the run time. A qdisc can drop, forward, queue, delay or re-order packets at a network interface. tc is a user space program for managing qdiscs for network interfaces. The other terms used for qdisc are Packet Scheduler, queuing algorithm and the packet scheduler algorithm.

The kernel sends (en-queues) packets received on an network interface to a qdisc. Similarly, the kernel takes (de-queues) packets from a qdisc for transmission on a network interface.

qdiscs are of two types, classful qdiscs, which contain classes, and classless qdiscs, which don't.

4.2 Classes

A Class is a sub-qdisc. A class may contain another class. Using classes, we can configure the QoS in more detail. When packets are received by a qdisc, these may be queued in inner qdiscs in classes. When the kernel wants the packets for transmission, the packets of certain classes might be given ahead of others, thereby prioritizing certain types of traffic.

4.3 Filters

When a qdisc with classes receives a packet, it needs to decide in which class it has to be enqueued. It needs to be classified. Filters are used for classification of packets. A filter must contain a classifier phrase. The most common classifier used by filters is the u32 classifier which is used by filers for selecting packets based on packet attributes.

5.0 Using the tc command

The tc command has sub-commands to add, change, replace and delete qdiscs, classes and filters. Also, there is the show sub-command to give details of currently configured objects. For example, running the tc -s qdisc show command on a desktop running Linux,

$ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 8728071 bytes 59911 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0

pfifo_fast is the default qdisc for all interfaces in Linux. It is a FIFO qdisc with prioritization. It has three bands, FIFO 0, FIFO 1 and FIFO 2. The band 0 is for traffic from interactive applications, where we wish to minimize the delay. The band 1 if for best effort, and is the normal service. Band 2 is for bulk data transfers, where the goal is to maximize the throughput and minimize the monetary cost. The packets are put in one of the three bands based on the value of the ToS field in the packet. First, all the packets in FIFO 0 are transmitted. When there are no packets left in FIFO 0, packets in FIFO 1 are transmitted. Lastly, packets in FIFO 2 are transmitted.

6.0 Bufferbloat

Network latency is the time taken by a packet to reach from one end of the connection to another. A typical TCP connection between a sender and a receiver passes through many devices and has many links of varying bandwidth. There are buffers at each processing unit in the communication path so that packets arriving can be stored while the communication link is being used for transmission. Buffers are a necessary part of the communication pipe and help in making effective use of the communication link. But, as the network devices have got more and more RAM and also due to the misguided objective of preventing packet loss to the maximum extent possible, the buffer sizes have increased to high values. The result is that the communication pipe has buffers of unreasonably big sizes at intermediate devices. These buffers get filled and obstruct the normal functioning of the TCP. TCP uses end to end signalling of packet loss, but because of bloated buffers, the information gets delayed. Also, since the buffers along the communication path get filled, the packets for high priority as well as normal traffic can't go through. This results in very high network latency. As it is caused by big buffers filled with in-flight network packets, it is called bufferbloat.

The solution is Active Queue Management (AQM) at hosts and routers. This involves managing the queue in buffers so that the packet queue is kept at reasonable limits and also signalling the sender TCP to slow down by dropping or marking packets in case the queue grows fast. The tc -s qdisc show command on a router running OpenWRT gives the following output,

# tc -s qdisc show dev eth0 qdisc fq_codel 0: root refcnt 2 limit 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn Sent 395154953 bytes 464130 pkt (dropped 0, overlimits 0 requeues 4) backlog 0b 0p requeues 4 maxpacket 1414 drop_overlimit 0 new_flow_count 2 ecn_mark 0 new_flows_len 0 old_flows_len 0

Here FQ_CoDel qdisc is being used. FQ_CoDel stands for Fair Queuing (FQ) with Controlled Delay (CoDel) Active Queue Management scheme. FQ_CoDel maintains a fair queue by having a number of FIFO queues and using a hash function to dispatch incoming packets to one of the queues. Each of these queues are monitored by the CoDel queue discipline. CoDel tries to control the delay of packets to a certain value, say 5 msec by default. It examines the head of each queue and drops the packets which have been there for long. With the packets dropped and the delay controlled, the TCP congestion control mechanisms are able to do their work. We can set FQ_CoDel qdisc for the eth0 interface in the earlier desktop system,

$ sudo tc qdisc add dev eth0 root fq_codel $ tc -s qdisc show dev eth0 qdisc fq_codel 8001: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 1806 bytes 20 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 256 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 $

The tc qdisc del command deletes the current qdisc and restores the default pfifo_fast qdisc.

$ sudo tc qdisc del dev eth0 root $ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 528 bytes 8 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0

7.0 Traffic Control with tc command

7.1 Handle

All qdiscs and classes have individual id which has the format m:n, where m is the major number and n is the minor number. Both m and n are limited to 16 bits. The id is used as the handle in the tc command. For a qdisc, the minor number is 0. For a class, the major number is that of the qdisc that the class belongs to. So if a handle's minor number is 0, it is the id of a qdisc. Otherwise, it is an id of a class whose qdisc is identified by its major number. The root qdisc has the handle 1:0. The handle ffff:0 is reserved for the ingress qdisc.

7.2 Root qdisc

Each network interface has an egress root qdisc with the handle 1:0. As the name suggests, it is the root of the tree of qdiscs. Sub-qdiscs are known as classes. So, at the top of the tree, we have the root qdisc. The other nodes are classes. The kernel interacts with the root. It enqueues packets to the root. Similarly, it dequeues packets from the root. The packets may get classified to one of the classes, down the line. The classification is done by filters attached to a classful qdisc. Traffic control on the egress of an interface boils down, in effect, to building up (or down, as tress grow down here) this tree.

8.0 Example

Suppose we wish to reduce the bandwidth for wireless users in general and reduce it further for a particular user, identified by the IP address. We can run the following commands on the router.

# tc qdisc add dev wlan0 root handle 1:0 hfsc default 1 # tc class add dev wlan0 parent 1:0 classid 1:1 hfsc sc rate 1mbit ul rate 1mbit # tc class add dev wlan0 parent 1:0 classid 1:2 hfsc sc rate 400kbit ul rate 400kbit # tc filter add dev wlan0 protocol all parent 1: u32 match ip dst 192.168.2.157 flowid 1:2

We added the HFSC qdisc as the root qdisc to the wireless interface and set its default class 1:1. Any packet that is not classified would be sent to class 1:1. By default, HFSC drops all packets that are not classified and that is the reason for the default class. We set the bandwidth limit of 1Mbps for the default class, which, in effect, becomes the default bandwidth for wireless. Now we make one more class, 1:2, and set its bandwidth to 400 kbps. Finally, we set a filter to the root qdisc to match the IP, for which we want to reduce bandwidth, and send it to flow id 1:2, which is the class id of the relevant qdisc.

References:

  1. Linux Advanced Routing & Traffic Control HOWTO
  2. Network Traffic Control
  3. Jim Gettys and Kathleen Nichols, Bufferbloat: Dark Buffers in the Internet, Communications of the ACM, Vol. 55 No. 1, January 2012
  4. A damp discussion of network queuing
Software: