BPF Updates 13

This is issue 13 of the regular newsletter around BPF written by Alexander Alemayhu. It summarizes ongoing development, presentations, videos and other information related to BPF and XDP. It is released roughly once a week.


The v4.15 merge window is open and LWN.net already has a summary on part 1 out. Which contains a BPF section listing some of the new things:

BPF

The user-space bpftool utility can be used to examine and manipulate BPF programs and maps; see this man page for more information.

Hooks have been added to allow security modules to control access to BPF objects; see this changelog for more information.

A new BPF-based device controller has been added; it uses the version-2 control-group interface. Documentation for this feature is entirely absent, but one can look at the sample program added in this commit that uses it.

The highlights since last time

  • New helper function bpf_getsockops to retrieve socket options. supports TCP_CONGESTION for now. The new BPF_SOCK_OPS_BASE_RTT feature significantly improves TCP-NV.
  • It is now possible to attach multiple programs to tracepoint / kprobes / uprobes. The programs will run in sequence. With the change for trace points one application does not exclude others from attaching to the same call.

More interesting topics

  • New helper function bpf_override_function under discussion to allow for error injection via kprobes.
  • BPF runtime finally gets a FAQ section in the kernel's documentation directory.
  • bpftool gets support for dumping JSON.

Presentations

Cilium - Kernel Native Security & DDOS Mitigation for Microservices with BPF

The slides of Cynthia's talk were already in the last issue. Docker has since published the recording as well, definitely worth watching the recording. Fun talk on Cilium, BPF, and Kafka.

Linux Networking Development

Focusing on development areas in the kernel. Also some advice in there for aspiring kernel developers. ;-)

XDP: The Future of Networks

Great introduction to BPF and XDP. With some myth busting and potential improvements.

A Gentle Introduction to [e]BPF - Michael Schubert, Kinvolk GmbH

Good introduction to BPF. Also nice that it shows the structures, links to some tools and verifier.

LISA 17 - Fast and Safe Production Monitoring of JVM Applications with BPF Magic

Focusing on the tracing case with Java but the approaches could still be applied to other environments.

LISA17 Container Performance Analysis

Goes through some of the tools used at Netflix and a lot of other smaller tools for tracing. The emphasis on identifying the bottlenecks sounds good.

LISA17 Linux Performance Monitoring With BPF

Lab session for tracing tools with BCC. This is useful for learning about tracing on Linux. It also answers basic question what is tracepoints, kprobes, uprobes, etc. and what are some of the limitations to dynamic tracing. Looks like a lot of fun.

XDP – eXpress Data Path An in-kernel network fast-path A technology overview

Great introduction to BPF and XDP. Also explains the problems and why it is needed.

In case you missed it

Reports from Netconf and Netdev

LWN.net coverage of the discussions from netconf and all the talks from netdev. All lot of interesting BPF topics in there. Check it out!

security things in Linux v4.14

The security summary contains a section eBPF JIT 32-bit ARM support and seccomp improvements.

SystemTap 3.2 release

SystemTap now has an experimental eBPF backend.

Steven Rostedt proposes different scheme where tracepoints are placed but no trace event. Then on userspace a kernel module have to be loaded and there would be no need to add this to the kernel ABI. Will moving the ABI to a module really solve this problem?

LWN.net coverage of Eric Leblond's talk from Kernel Recipes. The recording was already in the last issue.

Projects

awesome-ebpf

A curated list of awesome projects related to eBPF

k8s-snowflake

Configs and scripts for bootstrapping an opinionated Kubernetes cluster anywhere.

libseccomp

The libseccomp library provides an easy to use, platform independent, interface to the Linux Kernel's syscall filtering mechanism. The libseccomp API is designed to abstract away the underlying BPF based syscall filter language and present a more conventional function-call based filtering interface that should be familiar to, and easily adopted by, application developers.

cbpf-rust

Userspace cBPF interpreter and cBPF to eBPF converter

vltrace

vltrace is a syscall tracing tool which utilizes eBPF - an efficient tracing feature of the Linux kernel.

Random cool note

We blew way past 7Mpps with UDP+XDP. I’m sure you know that already though :)

Patches

Please note that netdev and llvm-commits receive a lot of patches and the list below is not meant to be comprehensive.

LLVM

netdev

Cilium v0.10 & v0.11 Released: Double the Fun - Two Updates in One!

We're happy to announce our 2 recent Cilium releases: v0.10 and v0.11!

This is a brief recap of noteworthy functionality, including the expansion of Network Policy, simplifying deployments, Kubernetes integration updates, and Mesos integration. For the full list of changes, please refer to the Release Notes.

BPF Updates 11

The highlights since last time are

- New helper functions `bpf_perf_read_counter_time` and `bpf_perf_prog_read_time`.
- Initial BPF assembly support in LLVM.
- LRU map lookup improvements.

Linux 4.13 was released last week and net-next closed around the same time. The
last `[GIT] Networking` pull request includes a couple of BPF fixes and so do
the two after the merge window opened up as well. See the dates for all the
details

- [01 Septemper 2017](https://www.spinics.net/lists/netdev/msg453325.html).
- [05 Septemper 2017](https://www.spinics.net/lists/netdev/msg453873.html).
- [09 Septemper 2017](https://marc.info/?l=linux-netdev&m=150493364601151&w=2).

LLVM [5.0.0](http://lists.llvm.org

BPF updates 10

The highlights since last time are

- A new iteration of the Landlock unprivileged sandbox series.
- A new iteration of the socket redirect series.
- ARM eBPF JIT got finally [merged](https://www.spinics.net/lists/netdev/msg451025.html).
- Bug fixes and tests.

Now that there is 32bit eBPF JIT support for ARM, will more embedded devices start running eBPF?
[Marvell routers](https://www.mail-archive.com/netdev@vger.kernel.org/msg169582.html),
wifi devices soon? :) Also worth checking out the Landlock documentation, which
is really nice, both rendered and the code comments.

Some interesting topics from the lists

BPF updates 09

This is issue 09 of the regular newsletter around BPF written by Alexander Alemayhu. It summarizes ongoing development, presentations, videos and other information related to BPF and XDP. It is released roughly once a week.

The highlights since the previous issue

  • New comparison instructions for reducing register pressure, stack usage and potentially smaller programs.
  • RFC patchset for BPF socket redirect with a awesome new helper function bpf_sk_redirect_map.
  • Verifier fixes, more tests and alignment tracking work got merged.
  • The XDP redirect series got merged.
  • XDP support for tap got merged

BPF updates 08

Linux 4.12 was released and net-next is closed. The Kernel Newbies release notes is still under construction but worth checking out for the BPF commits in 4.12.

Most of the new patches from the lists should show up in the next release candidate for 4.13. Some highlights from the recent activity are

  • i40e gets XDP support for drop, pass and tx actions.
  • Iterations of the alignment tracking work. The main changes; dropped RFC tag and added more tests.
  • NFP flag for XDP offload mode to offer more flexibility for programs that can be offloaded.
  • The new BPF_PROG_TYPE_SOCKET_OPS series got merged.

More interesting topics

  • iproute gets support for IFLA_XDP_PROG_ID. Also cls_bpf and act_bpf start using the BPF program id.
  • BPF program id available for i40e via XDP_QUERY_PROG.
  • A new function helper bpf_skb_adjust_room for adjusting net headroom.

One issue reoccurring is the header asm issue. While BPF can mix and match headers from kernel and userspace, the asm headers seem to be causing pain. Will one more hack be added on top of BPF, or will we see a clean / nice solution emerge from the disccusions?

Tutorial: Applying HTTP security rules with Kubernetes

This blog post focuses on Layer 7 (HTTP) policy rules and how to apply them for both outgoing and incoming connections in the context of a Kubernetes cluster using a ThirdPartyResource. This is a first step in integrating L7 policies into the Kubernetes world, next steps will involve integration with Istio and the Envoy proxy. We will talk about our plans and the details how Cilium empowers both of them in one of the next blog posts.


The Cilium 0.9 release (Release Notes) was a big step towards awesome Kubernetes integration. One of the many things that we added is a new ThirdPartyResource named CiliumNetworkPolicy. The purpose of CiliumNetworkPolicy is to extend the standardized NetworkPolicy resource and make all of the Cilium functionality available that is not yet accessible via the standard NetworkPolicy.

Step by Step Guide

This step by step guide shows how to apply HTTP security rules in three easy steps.

Step1: Deploy demo app

We start out with a standard Kubernetes cluster with three worker nodes:

$ kubectl get nodes
NAME      STATUS    AGE
worker0   Ready     115d
worker1   Ready     115d
worker2   Ready     115d

Cilium is deployed as DaemonSet:

$ kubectl -n kube-system get pods
NAME                                    READY     STATUS    RESTARTS   AGE
cilium-0srz0                            1/1       Running   0          10h
cilium-153hp                            1/1       Running   0          10h
cilium-5pk5c                            1/1       Running   2          10h
cilium-consul-0kf04                     1/1       Running   1          17h

We deploy a simple demo application in the form of Kubernetes deployments. This will create three deployments: app1, app2, and app3. It will also make app1 available via a service app1-service.

$ kubectl create -f https://raw.githubusercontent.com/cilium/cilium/master/examples/minikube/demo.yaml
service "app1-service" created
deployment "app1" created
deployment "app2" created
deployment "app3" created

We can now check the status of these deployments:

$ kubectl get pods
   NAME                       READY     STATUS              RESTARTS   AGE
   po/app1-2741898079-66lz0   0/1       ContainerCreating   0          40s
   po/app1-2741898079-jwfmk   1/1       Running             0          40s
   po/app2-2889674625-wxs08   0/1       ContainerCreating   0          40s
   po/app3-3000954754-fbqtz   0/1       ContainerCreating   0          40s

Step 2: Create L7/HTTP security policy

We want to define a Layer7 (HTTP) policy to protect app1. app1 has two API endpoints which can be called: GET /public and GET /private. We want to continue allowing GET /public but prohibit all calls to GET /private. The following policy achieves this:

apiVersion: "cilium.io/v1"
kind: CiliumNetworkPolicy
description: "L7 policy for getting started using Kubernetes guide"
metadata:
  name: "rule1"
spec:
  endpointSelector:
    matchLabels:
      id: app1
  ingress:
  - fromEndpoints:
    - matchLabels:
        id: app2
  - toPorts:
    - ports:
      - port: "80"
        protocol: TCP
      rules:
        HTTP:
        - method: "GET"
          path: "/public"

We can now import this Layer 7 (HTTP) policy using kubectl:

$ kubectl create -f https://raw.githubusercontent.com/cilium/cilium/master/examples/minikube/l3_l4_l7_policy.yaml

Step 3: Test the policy

app1 is now protected. While we can still access app1/public from app2...

$ kubectl exec $APP2_POD -- curl -s http://${SVC_IP}/public
{ 'val': 'this is public' }

... and we can no longer access app1/private.

$ kubectl exec $APP2_POD -- curl -s http://${SVC_IP}/private
Access denied

Next Steps

This is just a first preview into our first step to integrate HTTP layer policies into Kubernetes. We will cover more of our upcoming next steps in follow-up blog posts:

  • Adding L7/HTTP security rules definitions to the Kubernetes NetworkPolicy to no longer require a ThirdPartyResource or CustomResourceDefinition.
  • Integration with Envoy proxy to enable protocols beyond HTTP (gRPC, MongoDB, ...)
  • The difference between a shared proxy vs a side car proxy model and how Cilium can provide to run a hybrid model where this decision can be made per pod.
  • Tight cooperation with the Envoy proxy where Cilium can share the existing context information is has, e.g. source security identity for ingress rules, existing service loadbalancing/routing decision.
  • Kernel-assisted acceleration of the Envoy proxy
  • Adding support for CustomResoureDefinition as ThirdPartyResource will be deprecated with Kubernetes 1.8

Stay tuned for more blog posts but feel free to ask questions or provide feedback on our journey so far.