Launch of BPF & XDP Documentation

We are excited to announce the "BPF & XDP Reference Guide" as part of the Cilium project documentation. We have received various requests on further technical information about BPF and XDP with the desire to learn more about the technology that is driving the Cilium project.

Daniel Borkmann, kernel developer, cilium contributor and one of the BPF subsystem maintainers, has started to put together a comprehensive and detailed document which describes BPF and XDP in great depth:

BPF and XDP Reference Guide

It is still being actively worked on but we wanted to share this with other developers interested in BPF and XDP as early as possible.

BPF and XDP Reference Guide

BPF is a highly flexible and efficient “virtual machine”-like construct in the Linux kernel allowing to execute bytecode at various hook points in a safe manner. It is used in a number of Linux kernel subsystems, most prominently networking, tracing and security (f.e. sandboxing).

While BPF has existed since 1992, this document covers the extended Berkley Paket Filter (eBPF) version which has first appeared in Kernel 3.18 and obsoletes the original version which is being referred to as “classic” BPF (cBPF) these days. cBPF is known to many as being the packet filter language use by tcpdump. Nowadays, the Linux kernel runs eBPF only and loaded cBPF bytecode is transparently translated into an eBPF representation in the kernel before program execution. This documentation will generally refer to the term BPF unless explicit differences between eBPF and cBPF are being pointed out.

Even though the name Berkley Packet Filter hints at a packet filtering specific purpose, the instruction set is generic and flexible enough these days that there are many use cases for BPF apart from networking. See BPF Users for a list of projects which use BPF.

Cilium uses BPF heavily in its data path, see Architecture Guide for further information. The goal of this chapter is to provide an BPF reference guide in oder to gain understanding of BPF its networking specific use including loading BPF programs with tc (traffic control) and XDP (eXpress Data Path), and to aide developing Cilium’s BPF templates.

BPF Architecture

BPF does not define itself by only providing its instruction set, but also by offering further infrastructure around it such as maps that act as efficient key / value stores, helper functions to interact with and leverage kernel functionality, tail calls for calling into other BPF programs, security hardening primitives, a pseudo file system for pinning objects (maps, programs), and infrastructure for allowing BPF to be offloaded, for example, to a network card.

LLVM provides an BPF back end, such that tools like clang can be used to compile C into an BPF object file, which can then be loaded into the kernel.  BPF is deeply tied into the Linux kernel and allows for full programmability without sacrificing native kernel performance.

Last but not least, also the kernel subsystems making use of BPF are part of BPF’s infrastructure. The two main subsystems discussed throughout this document are tc and XDP where BPF programs can be attached to. XDP BPF programs are attached at the earliest networking driver stage and trigger a run of the BPF program upon packet reception. By definition, this achieves the best possible packet processing performance since packets cannot get processed at an even earlier point in software. Driver support is necessary in order to use XDP BPF programs, though. However, tc BPF programs don’t need any driver support and can be attached to receive and transmit paths of any networking device, including virtual ones such as veth devices since they hook later in the kernel stack compared to XDP. Apart from tc and XDP programs, there are various other kernel subsystems as well that use BPF such as tracing (kprobes, uprobes, tracepoints, etc).

The following subsections provide further details on individual aspects of the BPF architecture.

Read the full guide here: BPF and XDP Reference Guide

Learn More About Cilium at KubeCon in Berlin!

This week the Cilium team is excited to be in the beautiful city of Berlin at KubeCon / CloudNativeCon EU!

Come by our booth to learn more about how Cilium provides HTTP-aware network security for microservices applications running on Kubernetes!  Our booth is located just to the left when you enter the exhibit area (booth S5).  

At the booth, you'll see a super-cool "Star Wars" themed demo of Cilium or to pick up one of our fresh off the presses Cilium t-shirts.  Let's help Linux love Microservices!

Cilium: Networking and security for containers with BPF and XDP

This is a guest post by Daniel Borkmann who was recently recognized through the Google Open Source Peer Bonus program for his work on the Cilium project. We invited Daniel to share his project on our blog.

Our open source project, called Cilium, started as an experiment for Linux container networking tackling four requirements:

Scale

How can we scale in terms of addressing and with regards to network policy?

Extensibility

Can we be as extensible as user space networking in the Linux kernel itself?

Simplicity

What is an appropriate abstraction away from traditional networking?

Performance

Do we sacrifice performance in the process of implementing the aforementioned aspects?


We realize these goals in Cilium with the help of eBPF. eBPF is an efficient and generic in-kernel bytecode engine, that allows for full programmability. There are many subsystems in the Linux kernel that utilize eBPF, mainly in the areas of networking, tracing and security.

eBPF can be attached to key ingress and egress points of the kernel's networking data path for every network device. As input, eBPF operates on the kernel's network packet representation and can thus access and mangle various kinds of data, redirect the packet to other devices, perform encapsulations, etc.

This is a typical workflow: eBPF is programmed in a subset of C, compiled with LLVMwhich contains an eBPF back-end. LLVM then generates an ELF file containing program code, specification for maps and related relocation data. In eBPF, maps are efficient key/value stores in the kernel that can be shared between various eBPF programs, but also between user space. Given the ELF file, tools like tc (traffic control) can parse its content and load the program into the kernel. Before the program is executed, the kernel verifies the eBPF bytecode in order to make sure that it cannot affect the kernel's stability (e.g. crash the kernel and out of bounds access) and always terminates, which requires programs to be free of loops. Once it passed verification, the program is JIT (just-in-time) compiled.

Today, architectures such as x86_64, arm64, ppc64 and s390 have the ability to compile a native opcode image out of an eBPF program, so that instead of an execution through an in-kernel eBPF interpreter, the resulting image can run natively like any other kernel code. tc then installs the program into the kernel's networking data path, and with a capable NIC, the program can also be offloaded entirely into the hardware.

Cilium acts as a middle layer, plugs into container runtimes and orchestrators such as KubernetesDocker or CNI, and can generate and atomically update eBPF programs on the fly without requiring a container to restart. Thus, unlike connection proxies, an update of the datapath does not cause connections to be dropped. These programs are specifically tailored and optimized for each container, for example, a feature that a particular container does not need can just be compiled out and the majority of configuration becomes constant, allowing LLVM for further optimizations.

We have many implemented building blocks in Cilium using eBPF, such as NAT64L3/L4 load balancing with direct server return, a connection tracker, port mapping, access control, NDisc and ARP responder and integration with various encapsulations like VXLANGeneve and GRE, just to name a few. Since all these building blocks run in the Linux kernel and have a stable API, there is of course no need to cross kernel/user space boundary, which makes eBPF a perfectly suited and flexible technology for container networking.

One step further in that direction is XDP, which was recently merged into the Linux kernel and allows for DPDK-like performance for the kernel itself. The basic idea is that XDP is tightly coupled with eBPF and hooks into a very early ingress path at the driver layer, where it operates with direct access to the packet's DMA buffer.

This is effectively as low-level as it can get to reach near-optimal performance, which mainly allows for tailoring high-performance load balancers or routers with commodity hardware. One advantage that comes with XDP is also that it reuses the kernel's security model for accessing the device as opposed to user space based mechanisms. It doesn't require any third party modules and works in concert with the Linux kernel. Both XDP and tc with eBPF are complementary to each other, and constitute a bigger piece of the puzzle for Cilium itself.

If you’re curious, check out the Cilium code or demos on GitHub.


By Daniel Borkmann, Cilium contributor