Install
# Debian / Ubuntu
sudo apt install bpftrace bpfcc-tools linux-headers-$(uname -r)
# Arch
sudo pacman -S bpftrace bcc
# Fedora
sudo dnf install bpftrace bcc-tools
Kernel needs to be 5.x+ with BTF (BPF Type Format) for the smoothest experience; most distro kernels ship that. Verify:
ls /sys/kernel/btf/vmlinux
bpftrace --info | head
If vmlinux isn't there, kernel function tracing still works but type-aware access (struct foo field reads) doesn't, and many one-liners below won't compile.
One-liners
Most of bpftrace's day-to-day value is one-liners against existing probes.
# Count every syscall by name, system-wide
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter
{ @[ksym(args->args[1])] = count(); }'
# Histogram of read() sizes from any process
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_read
{ @sizes = hist(args->count); }'
# Print every command executed system-wide (replacement for forkstat / execsnoop)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_execve
{ printf("%s %s\n", comm, str(args->filename)); }'
# Latency of every open() call
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_openat { @t[tid] = nsecs; }
tracepoint:syscalls:sys_exit_openat /@t[tid]/ {
@lat = hist(nsecs - @t[tid]);
delete(@t[tid]);
}
'
# Track who is opening /etc/shadow
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_openat
/str(args->filename) == "/etc/shadow"/
{ printf("pid=%d comm=%s\n", pid, comm); }
'
Ctrl-C to stop. bpftrace prints any aggregations (@name maps) it accumulated.
Built-in tools
The bcc / bpftrace projects ship dozens of curated diagnostic tools that are essentially named bpftrace scripts:
# Listed under /usr/share/bpftrace/tools/ (or /usr/sbin/ on some distros)
biolatency-bpfcc # block I/O latency histogram
biosnoop-bpfcc # every block I/O with PID, file, offset, latency
execsnoop-bpfcc # every process exec
opensnoop-bpfcc # every open() with path + result
tcptop-bpfcc # top-style TCP throughput per connection
tcpconnlat-bpfcc # TCP connect latency distribution
runqlat-bpfcc # CPU run queue latency
slabratetop-bpfcc # kernel slab allocations by type
profile-bpfcc # sampling profiler with stack traces
funccount-bpfcc # count calls to a kernel function
Each is a complete diagnostic in itself; reading the source of one is the easiest way to learn bpftrace by example.
Custom scripts
A bpftrace script lives in a .bt file. Example: file-open latency by command, only for opens that take more than 1 ms:
#!/usr/bin/env bpftrace
// open-slow.bt
tracepoint:syscalls:sys_enter_openat {
@t[tid] = nsecs;
@f[tid] = str(args->filename);
}
tracepoint:syscalls:sys_exit_openat
/@t[tid]/
{
$lat = nsecs - @t[tid];
if ($lat > 1000000) { // 1 ms in ns
printf("%-16s %6dus %s\n", comm, $lat / 1000, @f[tid]);
}
delete(@t[tid]);
delete(@f[tid]);
}
END {
clear(@t);
clear(@f);
}
sudo bpftrace open-slow.bt
Probe types worth knowing
- tracepoint:<subsystem>:<name> — stable, kernel-defined trace points. Best choice when one exists for what you want; immune to kernel version churn.
- kprobe:<function> — entry of any kernel function. Unstable across kernel versions but vastly more general.
- kretprobe:<function> — return of a kernel function. Useful for measuring latency or capturing return values.
- uprobe:<binary>:<function> — entry of a user-space function.
uprobe:/usr/bin/openssl:SSL_readtraces every TLS read in any process that loads OpenSSL. - usdt:<binary>:<probe-name> — statically defined trace points the binary opted into (most modern databases, libc, JVMs have these).
- profile:hz:<freq> — sample at N Hz across all CPUs. The basis for flame-graph profiling.
- interval:<period> — fire every N seconds; useful for periodic snapshots of maps.
To discover what's available:
sudo bpftrace -l 'tracepoint:syscalls:*read*'
sudo bpftrace -l 'kprobe:tcp_*'
sudo bpftrace -l 'uprobe:/usr/lib/x86_64-linux-gnu/libc.so.6:*malloc*'
Aggregations and histograms
bpftrace's aggregations are the part that makes it actually faster than naive instrumentation: aggregation happens in the kernel, only the result crosses to user space.
// Linear histogram, bucket width 100us
@lat = lhist(elapsed_us, 0, 10000, 100);
// Exponential (default) histogram
@bytes = hist(args->count);
// Average, min, max, sum, count
@reads = stats(args->count);
// Per-key
@by_comm[comm] = count(); // count events grouped by process name
@by_pid[pid] = sum(args->count); // sum bytes by PID
Real example: tracing slow disk I/O
Question: which process is making disk I/O slow right now? Plain iostat shows device-level latency; bpftrace can pin it to PIDs.
sudo /usr/share/bcc/tools/biosnoop
Output: one line per block I/O, with PID, comm, device, R/W, sector, size, latency. Sort by latency, find the offenders.
Or build your own — the same thing in bpftrace, 12 lines:
#!/usr/bin/env bpftrace
// biosnoop-mini.bt
kprobe:blk_account_io_start { @start[arg0] = nsecs; }
tracepoint:block:block_rq_complete
/@start[args->sector]/
{
$lat_us = (nsecs - @start[args->sector]) / 1000;
printf("%-12s %-8d %-12s %5d us\n", comm, pid, args->rwbs, $lat_us);
delete(@start[args->sector]);
}
Overhead
eBPF runs in the kernel, attached to events that are already happening. Adding a probe to a high-frequency function (e.g. every read()) costs some nanoseconds per call — usually well under 1% on a real workload, but it shows up if instrumenting truly hot paths. The reasonable rule: instrument first, optimize the instrumentation only if it visibly perturbs the workload.
What's not bpftrace's lane
- Long-running production telemetry — for that, the structured-output BPF-based projects (parca-agent, pixie, cilium/tetragon, biotop running as a service) are the right tool.
- Modifying behavior — bpftrace is read/aggregate only; full eBPF programs can edit packets, redirect syscalls, etc. but that's a C-with-libbpf workflow.
- Kernels older than 4.18 — usable, but BTF and many tracepoints will be missing.