Research & Development

Posted by Stuart Grace on , last updated

Recent R&D blog posts have described why, in an IP Studio environment, we sometimes need to send video at 10 gigabits per second or more and how we've recently started installing some 100 Gigabit Ethernet links in R&D. To put those data rates in context, one 100 gigabit link carries the same volume of data as about 3000 typical home broadband connections.

Network cables for data rates from 1 to 100 Gbps

Fast networks are typically used where a large number of lower speed connections are bundled together. For example, if a server is supplying data at 100 Mbps to 40 clients simultaneously, the total data rate adds up to 4 Gbps. This is no problem because a server could have many CPU cores, each dealing with a few clients. Each core needs to generate a few thousand data packets per second, because Ethernet packets typically contain up to 1500 bytes. This gives the CPU around 100 microseconds to process each packet.

Our requirements in the IP Studio project are different: we have individual streams carrying ultra-high definition video at 4 Gbps or more. Each stream consists of 340,000 packets per second all of which must be processed in sequence by a single program – we cannot easily split this across lots of CPUs. This gives us less than 3 microseconds to process each data packet and achieving this requires some special software techniques.

With conventional programming techniques such the network sockets API, each data packet passes through several layers of software inside the operating system, as the packet's route on the network is determined and the network headers are generated. Along the way, the data is copied from the application's buffers to the socket buffer, and then from the socket buffer to the device driver's buffers. This is fine when you have 100 µs to spare, but doesn't really work when there is less than 3 µs for each packet!


So we've started bypassing the operating system network stack completely, and instead use a zero-copy kernel bypass interface, where the application and the network hardware device driver share a common set of memory buffers. A group of data packets, complete with all their network headers, is created in these buffers by the application. Then using a single function call, the whole group is handed over to the control of the device driver which transmits them directly on to the network. There is no copying of data, a short path between the application and the hardware, and packets can be processed efficiently in large batches. The receiving end uses the same technique but in the reverse direction.

 

This has all been built using free, open-source Linux software and standard off-the-shelf network cards, so there are no software licences and no special hardware required.

Our application programs have become more complicated as they are now responsible for dealing with the packet headers, a job normally handled by the operating system. But in return, we have measured ten-fold performance gains when sending and receiving very high bitrate video streams. Using this technique, we can send or receive uncompressed UHD 2160p50 video (more than 8 Gbps) using a single CPU core, leaving all the rest of the server's cores free for video processing.

We intend to extend this work to achieve very high throughput uncompressed video streams on the latest generation of 100 Gbps network hardware.

Topics