iperf3 + flent: throughput and bufferbloat benchmarking

iperf3: install + run

# Debian / Ubuntu
sudo apt install iperf3

# macOS
brew install iperf3

# Arch / Alpine / Fedora similar packages

iperf3 is symmetric — one side runs as server, the other as client; either side can be the source of the flow.

# On the server end (anywhere on your LAN, or a public server you control)
iperf3 -s

# On the client end
iperf3 -c <server-ip>
# Sends from client to server for 10 seconds; reports throughput

# Reverse direction: server sends to client
iperf3 -c <server-ip> -R

# Longer test
iperf3 -c <server-ip> -t 60

# Multiple parallel streams (often hits link cap better than 1 stream)
iperf3 -c <server-ip> -P 4

# UDP test at 100 Mbps (TCP is default)
iperf3 -c <server-ip> -u -b 100M

# Bidirectional simultaneous
iperf3 -c <server-ip> --bidir

How to read iperf3 output

$ iperf3 -c speedtest.example.com -t 30 -P 4
Connecting to host speedtest.example.com, port 5201
[  5] local 10.0.5.10 port 54321 connected to 203.0.113.5 port 5201
...
[SUM] 0.00-30.00  sec   1.20 GBytes   343 Mbits/sec      retr   1.5
[SUM] 30.00-30.04 sec   ...                              receiver

The interesting numbers:

Bandwidth — throughput. Mbps for residential, Gbps for fast LAN / datacenter.
Retr — TCP retransmissions. Should be near zero on a healthy LAN; on a noisy Wi-Fi link, dozens. High retr = packet loss; investigate the physical layer.
Window size (in verbose mode) — TCP receive window. If the window is small and the link is long-fat (high bandwidth-delay product), the achievable single-stream throughput is bounded by window_size / RTT. Tune via sysctl (net.core.rmem_max / net.ipv4.tcp_rmem) or use parallel streams.

The throughput number is incomplete

"How fast" usually conflates two questions:

Capacity — how many bits per second can this link carry? iperf3 measures this.
Quality under load — what does latency look like when the link is being used? iperf3 doesn't measure this; flent does.

A 1 Gbps fiber link with poor buffering (high bufferbloat) can have 200–500 ms of added latency when saturated — making video calls unusable while someone downloads. The same link with proper queue management (CAKE, fq_codel) might have <5 ms of added latency at the same throughput.

flent: install + run

flent is a Python tool that orchestrates concurrent tests (typically using netperf or iperf3 as the data movers) and produces graphs of latency vs throughput vs time:

# Debian / Ubuntu
sudo apt install flent netperf

# macOS
brew install flent netperf

# Or via pip
pipx install flent

flent needs a netperf server reachable from the client. Either set up your own or use one of the public flent / netperf test servers (the project lists some).

The RRUL test

"Realtime Response Under Load" — the canonical bufferbloat measurement. Saturates the link in both directions with multiple flows while measuring ICMP and UDP latency on a separate flow throughout:

# Run RRUL against a netperf server
flent rrul -H <netperf-server> -t "Description of test environment" -p all_scaled

# Output: a .flent.gz binary results file + a .png chart
ls *.flent.gz

The chart shows:

Top half — throughput up + throughput down over time. Should approach link capacity within a few seconds.
Bottom half — ICMP and UDP latency during the same time period. This is the important number.

On a well-queue-managed link, latency stays within a few ms of idle even at full load. On a bufferbloated link, latency rises to hundreds of ms.

Interpreting the latency numbers

<5 ms added under load — excellent. Likely running CAKE or fq_codel queue discipline.
5–30 ms added — good. Most users won't notice.
30–100 ms added — mediocre. Video calls suffer; gaming becomes unpleasant.
>100 ms added — bufferbloat. Anything time-sensitive (calls, games, real-time apps) breaks when the link is loaded.

If you see >30 ms added in flent: enable a modern queue discipline on the router. CAKE is the gold standard; available in OpenWRT (and OPNsense, see that tutorial) as a per-interface shaper. Configure with your actual link rate (typically slightly below the ISP-promised rate to keep the queue in the router rather than at the ISP modem).

The other flent tests worth knowing

# Just throughput (4 TCP streams up + 4 down)
flent tcp_4up_4down -H <server>

# Latency-only ping during competing load (gentler than RRUL)
flent ping -H <server>

# Bursty traffic that mimics web browsing patterns
flent rtt_fair -H <server>

# Single TCP stream
flent tcp_download -H <server>

iperf3 server caveats

iperf3 by default uses one CPU per server thread. For multi-Gbps LAN testing, the CPU may bottleneck before the NIC. Use iperf3 -s -1 to spawn a new server per test (workaround for the single-thread bottleneck) or use iperf2 if you need multi-thread server.
UDP tests — iperf3 reports loss + jitter. UDP at line rate often saturates kernel UDP buffers; watch /proc/net/snmp for UdpInErrors climbing.
Run iperf3 server on the box that's actually serving your real traffic, not on a separate host — the path to a measurement-dedicated box might not match the production path.

Other tools in the kit

mtr — combined traceroute + ping. Shows per-hop loss + latency. The first tool to reach for when a path is misbehaving.
iftop / nethogs / bmon — live per-flow / per-process / per-interface bandwidth views.
tcptrace / tcpdump (see that tutorial) — analyze captured pcaps for retransmission patterns.
speedtest-cli / fast-cli — CDN-served speed tests; quick approximations but less rigorous than iperf3 to a known endpoint.

One workflow that pays off

When investigating a "the network is slow" complaint:

mtr first — is there packet loss on a specific hop? If yes, the problem is between you and that hop.
iperf3 — what's the actual throughput on a clean test? Is it close to the link's rated speed?
flent rrul — what's the latency under load? Is bufferbloat the actual root cause?
tcpdump on a specific connection — are application-level retries / TLS handshake failures explaining the user-visible "slow" sense?

Most "the internet is slow" complaints are actually bufferbloat — throughput is fine, but interactive latency under load is poor. Once you measure it, the fix is usually one queue-discipline configuration on the router.