eBPF : opensnoop on Nginx

Figuring out performance issues in production systems can be tricky, system CPU and memory stats can’t always reveal what’s going wrong with the precision needed to nail the problem down.

A new set of tools capable of running safely within the Linux operating system’s kernel provides a framework for gaining deeper insights into a running system.

This framework is called the extended Berkeley Packet Filter ( eBPF ). Because eBPF’s virtual machine is capable of running within the kernel itself it can support programs to measure exactly how system resources are being allocated in the wild.

Many eBPF tools have already been developed, particularly by Brendan Gregg and I’m going to use a tool he’s built to examine a system running Nginx.

Opensnoop tracks when files have been opened by tracing the open() system call, this can be useful when examining running programs, Nginx serves files as part of its webserving function as well as reading configuration files, so we should be able to catch it in action, let's see if we can watch it work.

First we need to install the eBPF tools, I’m running an Ubuntu box so we’ll need to use our package manager to get started

sudo apt-get install bpfcc-tools linux-headers-$(uname -r)

Opensnoop can generate a huge amount of output so to start let's just run it for a short time and see how many lines we get in our output

sudo opensnoop-bpfcc -d 1 | wc -l
// 742

The -d flag stands for duration in seconds and on this system we get 742 lines of output, not bad for an idle snoop, yours may differ and it will probably change each time you run it.

Now what about Nginx ? We can use the name flag -n to search for full or partial matches for the name of the process, though in the opensnoop output this is listed under the heading COMM.

sudo opensnoop-bpfcc -d 1 -n nginx | wc -l
// 1

Not much happening here, we’re just getting the one line for the column headings, let's force the issue by increasing the duration and, in a separate terminal logged into the same machine, reload Nginx.

sudo opensnoop-bpfcc -d 10 -n nginx | wc -l

----

// separate terminal window

sudo systemctl reload nginx

----

// 99

We’ve had success, we know that the nginx.conf file should be opened when we reload nginx, let’s repeat the process and check that this is true

sudo opensnoop-bpfcc -d 10 -n nginx | grep nginx.conf

//
255342 nginx 4 0 /etc/nginx/nginx.conf
255342 nginx 5 0 /etc/nginx/conf.d
4028 nginx 3 0 /etc/nginx/nginx.conf
4028 nginx 8 0 /etc/nginx/conf.d

Great, we can see that it has been opened and can note that two different process Ids (PIDs) have read this file, without knowing anything about Nginx internals. Maybe this could tell us how it works ?

Looking at the output in full from a 10 second opensnoop trace I can see three PIDs in total, in my system they are PID 255342, 4028 and 255343.

We can check our PID values with the standard OS tool ‘ps’ (not based on eBPF)

ps -aux | grep nginx

//

root 4028 0.0 0.5 56000 5556 ? Ss Jul26 0:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data 255343 0.0 0.5 56240 5196 ? S 07:43 0:00 nginx: worker process

Here we can put the pieces together and figure out how Nginx might work; the master PID is 4028 which is right in between the other PIDs in sequence, PID 255342 is no longer running so perhaps this is the process for reading and handling the reload which once finished is closed and the master nginx process creates a new process, PID 255343, to run and process requests.

Of course you should probably read the documentation for a major piece of software like Nginx but for internal tools or smaller opensource projects this may not be available.

Thanks for reading

Please get in touch by email or by twitter if you have any questions or follow ups

Niall McGinness