Now that you have completed the basic steps of the tutorial, you are ready to begin writing packet processing programs. This lesson contains the first introduction to packet processing, where we will see how to parse the packet contents, and how to make sure that the kernel verifier will accept your programs.
We will use the loader from the basic04 lesson. The Makefile in this directory
contains a rule to copy it over to this directory, so you can run it as
./xdp_loader
after running make
. The testenv.sh
script will use this
when run with the load
command. In the example below, we assume you’ve
installed the alias for testenv (by running eval $(./testenv.sh alias)
),
so all examples will use the short t
command to refer to the testenv.sh
script.
This lesson will teach you how to parse packet data from XDP. This is done via direct memory access using pointers to the packet data, which is one of the reasons for the high performance attainable using XDP. This works because the kernel verifier will check that all data accesses are within the packet boundaries. We will see how this works, and how you can deal with verifier errors.
You will also learn how to decide the packet verdict via program return code based on the packet data, and we will cover how to structure the packet parsing code to ensure readability and code reuse.
A few points to be aware of while completing the assignments are listed below.
When an XDP program is executed, it will receive as a parameter a pointer to
a struct xdp_md
object, which contains context information about the
packet. This object is defined in bpf.h
as follows:
struct xdp_md {
__u32 data;
__u32 data_end;
__u32 data_meta;
/* Below access go through struct xdp_rxq_info */
__u32 ingress_ifindex; /* rxq->dev->ifindex */
__u32 rx_queue_index; /* rxq->queue_index */
};
The last two items in this struct are just data fields which contain the ifindex and RX queue index that the packet was received on. The program can use this in its decision making (along with the packet data itself).
The three first items are actually pointers, even though they are defined
with the __u32
type. The data
field points to the start of the packet,
the data_end
field points to the end, and the data_meta
field points to
the metadata area that XDP programs can use to store extra metadata to
accompany the packet. In this lesson we will only be working with the data
and data_end
fields.
The verifier will rewrite the pointer accesses when the program is loaded to point to the actual packet data. But to satisfy the compiler type checking we need to cast the fields to pointers when accessing them. For this reason, XDP programs often start with an assignment like this:
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
As mentioned above, packet data is accessed using direct memory reads, which
the verifier will ensure are safe. However, doing this at runtime for every
pointer access would result in a significant performance overhead. So
instead, what the verifier does is check that the XDP program does its own
bounds checking; this is the purpose of the data_end
pointer.
When the verifier performs its static analysis at load time, it will track
all memory address offsets used by the program, and look for comparisons
with the data_end
pointer, which will be set to the end of the packet at
runtime. This means that if the program does something like this:
if (data + 10 < data_end)
/* do something with the first 10 bytes of data */
else
/* skip the packet access */
The verifier can know that all instructions in the true
branch of the if
statement can safely access the first 10 bytes of the packet, while the
else
branch cannot. So if a program does attempt to access the packet data
in the else branch, the program will be rejected.
When going through a packet and parsing subsequent headers, it is generally necessary to keep track of the current parsing position. When using helper functions to parse packet headers, those helper functions generally need to modify the current parser position. To avoid having to deal with pointer arithmetic on pointers to pointers, we encapsulate this in a cursor object, which we can pass to helper functions. The cursor is simply defined as a single-entry struct:
/* Header cursor to keep track of current parsing position */
struct hdr_cursor {
void *pos;
};
The final verdict for what happens to a packet after it has been processed
by the XDP program is communicated to the kernel by means of the program
return code. These are also defined in bpf.h
:
enum xdp_action {
XDP_ABORTED = 0,
XDP_DROP,
XDP_PASS,
XDP_TX,
XDP_REDIRECT,
};
ABORTED
and DROP
will both drop the packet, but ABORTED
will also
trigger a tracepoint event (xdp:xdp_exception
; this has zero overhead when
the tracepoint is not reached). PASS
will allow the packet to continue up to the
kernel networking stack for processing, TX
will retransmit the packet out
of the same interface it was received on, and REDIRECT
will transmit the
packet out of another interface (where the destination interface needs to be
set by a BPF helper call prior to returning REDIRECT
).
Note that the XDP program can perform arbitrary alterations to the packets
before these verdicts are rendered. For the TX
and REDIRECT
actions,
some packet data transformation is generally required (such as rewriting
ethernet header addresses), while for the others it is optional. We will see
how this can be used in the next lesson.
Since an XDP program only receives a pointer to a raw data buffer, it will need to do its own parsing of packet headers. To aid in this, the kernel headers define structs that contain the packet header fields. Parsing packets generally involves a lot of casting of data buffers to the right struct types, as we will see in the assignments below. The header definitions we will be using in this lesson are the following:
Struct | Header file |
---|---|
struct ethhdr | <linux/if_ether.h> |
struct ipv6hdr | <linux/ipv6.h> |
struct iphdr | <linux/ip.h> |
struct icmp6hdr | <linux/icmpv6.h> |
struct icmphdr | <linux/icmp.h> |
Since the packet data comes straight off the wire, the data fields will be
in network byte order. Use the bpf_ntohs()
and bpf_htons()
functions to
convert to and from host byte order, respectively. See the comment at the
top of ../headers/bpf_endian.h for why the bpf_
-prefixed versions are
needed.
Because eBPF programs only have limited support for function calls, helper
functions need to be inlined into the main function. The __always_inline
marker on the function definition ensures this, overriding any inlining
decisions the compiler would otherwise make.
Similarly, because eBPF does not support looping, we need to unroll any
loops in the program. This can be done by adding the #pragma unroll
statement on the line before the loop, and only works with loops where the
number of iterations are known at compile time (such as for loops with a
static counter).
The end goal of this lesson is to build an XDP program that will inspect packet headers, and drop every other ICMP echo request (i.e. ping) packet seen on the interface, while allowing everything else to pass up to the kernel. The assignments below will gradually build up towards this goal.
The starting point for this assignment is the packet parsing program in
xdp_prog_kern.c, which will parse the packet Ethernet header using a
helper function. Each assignment will extend this program by adding new
features. The program includes the stats helper from basic04 which you can
use to monitor what actions the program takes as you develop it. Use t
stats
to run the stats monitoring application (after you have loaded your
BPF program).
The parser function in xdp_prog_kern.c
will parse the Ethernet header, do
bounds checking, and return the next header type and position. However,
there is a bug in the bounds checking logic, so the program will be rejected
by the verifier (test this by running t load
after compiling it).
Your first assignment is to fix this bug (hint: it’s in the if
statement
in parse_ethhdr()
), and make sure that the program can be successfully
loaded onto an interface.
Now that our Ethernet parsing program runs, we will add parsing of the IP
header. To do this, implement the parse_ip6hdr()
function that has a
commented-out prototype below the parse_ethhdr()
function. The function
will be quite similar to parse_ethhdr()
, but you’ll need to look up the
IPv6 header structure definition in ipv6.h
.
When you add bounds checking, notice that the style used in
parse_ethhdr()
, which computes the size of the header and does byte-wise
comparison, is not the only one possible. You can also use pointer
arithmetic-style comparison, which makes use of the fact that incrementing a
pointer will move the memory it is pointing to by the size of the structure.
Using this will get you a bounds check that looks like this:
struct ipv6hdr *ip6h = nh->pos;
/* Pointer-arithmetic bounds check; pointer +1 points to after end of
* thing being pointed to. We will be using this style in the remainder
* of the tutorial.
*/
if (ip6h + 1 > data_end)
return -1;
Don’t forget to also increment the nh->pos pointer so it points to the data right after the IP header, which is what you’ll be looking at when parsing the ICMPv6 header in the next assignment.
To check that your program works, test that it compiles and loads. You can
also change the return code to drop IP packets, and check that this works
using either t tcpdump
or t ping
.
Now that we can successfully parse packets down to the IP header, we will
need to add parsing of the payload that we are interested in. I.e., the
ICMPv6 header. To do this, implement the parse_icmp6hdr()
function.
After parsing the ICMPv6 header, it is finally time to make our decisions
based on the packet payload. In this case, we are interested in the sequence
number. The data structure is quite deeply nested, but the header file also
defines a convenient alias, so the sequence number can be accessed as
icmp6h->icmp6_sequence
(but don’t forget byte order conversion).
With this, we can finally implement the drop logic mentioned above, by
simply returning XDP_DROP
if the sequence number is even, and XDP_PASS
otherwise. Verify that this works by loading the program and running a ping;
you should see responses on every other sequence number:
$ make
$ t load
$ t ping
Running ping from inside test environment:
PING fc00:dead:cafe:1::1(fc00:dead:cafe:1::1) 56 data bytes
64 bytes from fc00:dead:cafe:1::1: icmp_seq=1 ttl=64 time=0.059 ms
64 bytes from fc00:dead:cafe:1::1: icmp_seq=3 ttl=64 time=0.135 ms
^C
--- fc00:dead:cafe:1::1 ping statistics ---
4 packets transmitted, 2 received, 50% packet loss, time 44ms
rtt min/avg/max/mdev = 0.059/0.097/0.135/0.038 ms
Now that we have the basic functionality working, we can improve it to also correctly handle VLAN tags on the Ethernet packets, as an example of how to parse multiple variable headers depending on the payload. In Linux, VLANs are configured by creating virtual interfaces of type vlan; but since the XDP program runs directly on the real interface, it will see all packets with their VLAN tags, before the kernel assigns them to the virtual VLAN interfaces. We can use this to create a parser that will work with any VLAN encapsulation (but see the note about hardware offloads below).
For now we just want to parse the VLAN tags and find the encapsulated IP
header (in the next lesson we will move on to adding and removing VLAN
tags). This means that we can just augment our parse_ethhdr()
function to
also parse VLAN tags. If any tags are found, we simply grab the next header
type from the innermost tag instead of directly from the Ethernet header,
and move the nexthdr pointer to after the end of the VLAN tag.
Unfortunately, the VLAN tag header is not exported by any of the IP header files. However, it is quite simple, so we can just define it ourselves, like this (copied from the internal kernel headers):
struct vlan_hdr {
__be16 h_vlan_TCI;
__be16 h_vlan_encapsulated_proto;
};
The ethertype of a VLAN tag is either ETH_P_8021Q
or ETH_P_8021AD
, both
of which are defined in if_ether
. So we can define a simple helper
function to check if a VLAN tag is present:
static __always_inline int proto_is_vlan(__u16 h_proto)
{
return !!(h_proto == bpf_htons(ETH_P_8021Q) ||
h_proto == bpf_htons(ETH_P_8021AD));
}
Which can be used like this:
if (proto_is_vlan(eth->h_proto)) {
/* Process VLAN tag */
}
Another thing to bear in mind is that a single packet can have several nested VLAN tags. We can handle this by using an unrolled loop to parse subsequent VLAN headers, as long as their encapsulated protocol continues to be on of the VLAN types.
Using the above, modify your parsing program to also work with VLAN tags.
You can test this by setting up the test environment with VLAN interfaces;
simply pass the --vlan
tag to t setup
; or run t reset --vlan
to
re-initialise an existing environment with the addition of a VLAN interface.
Once you have initialised the environment to include VLANs, you can run t
ping --vlan
to run a ping on the VLAN interfaces, and verify that every
other packet is still being dropped.
Since XDP needs to see the VLAN headers as part of the packet headers, it is
important to turn off VLAN hardware offload (which most hardware NICs
support), since that will remove the VLAN tag from the packet header and
instead communicate it out of band to the kernel via the packet hardware
descriptor. The testenv
script already disables VLAN offload when setting
up the environment, but for reference, here is how to turn it off for other
devices, using ethtool:
# Check current setting: ethtool -k DEV | grep vlan-offload # Disable for both RX and TX ethtool --offload DEV rxvlan off txvlan off # Same as: # ethtool -K DEV rxvlan off txvlan off
While we would obviously all like IPv6 to be ubiquitous everywhere, sometimes it is still necessary to handle legacy IPv4 packets. To this end, the final assignment for this lesson is to extend our program to perform the same function for v4 ICMP packets as it does for ICMPv6 packets. This means adding two new parser functions for the IPv4 headers, and handling each according to the payload type of the Ethernet header.
This should be a pretty straight-forward extension of the program. The only
complication to be aware of is that IPv4 headers can vary in size, so you’ll
need to do the bounds checking in two passes: First verify that the iphdr
struct itself fits in the packet payload, then compute the actual header
size as hdrsize = iph->ihl * 4
, and finally verify that this full size
fits in the packet (and adjust the nexthdr pointer accordingly).
To test IPv4 support, you can run t setup --legacy-ip
which will configure
IPv4 addresses on the virtual interfaces, and t ping --legacy-ip
to run a
ping afterwards. Note that you will need to pass both --legacy-ip
and
--vlan
to the setup
(or reset
) commands if you want both at the same
time; however, no IPv4 addresses will be configured on the VLAN interfaces,
so you can’t use t ping
with both.
Once you have added IPv4 support and verified that every other v4 ICMP
packet is being dropped when you load the program, you have completed this
lesson, and you are ready move on to packet02
to learn about packet
modification!