iperf3: install + run
# Debian / Ubuntu
sudo apt install iperf3
# macOS
brew install iperf3
# Arch / Alpine / Fedora similar packages
iperf3 is symmetric — one side runs as server, the other as client; either side can be the source of the flow.
# On the server end (anywhere on your LAN, or a public server you control)
iperf3 -s
# On the client end
iperf3 -c <server-ip>
# Sends from client to server for 10 seconds; reports throughput
# Reverse direction: server sends to client
iperf3 -c <server-ip> -R
# Longer test
iperf3 -c <server-ip> -t 60
# Multiple parallel streams (often hits link cap better than 1 stream)
iperf3 -c <server-ip> -P 4
# UDP test at 100 Mbps (TCP is default)
iperf3 -c <server-ip> -u -b 100M
# Bidirectional simultaneous
iperf3 -c <server-ip> --bidir
How to read iperf3 output
$ iperf3 -c speedtest.example.com -t 30 -P 4
Connecting to host speedtest.example.com, port 5201
[ 5] local 10.0.5.10 port 54321 connected to 203.0.113.5 port 5201
...
[SUM] 0.00-30.00 sec 1.20 GBytes 343 Mbits/sec retr 1.5
[SUM] 30.00-30.04 sec ... receiver
The interesting numbers:
- Bandwidth — throughput. Mbps for residential, Gbps for fast LAN / datacenter.
- Retr — TCP retransmissions. Should be near zero on a healthy LAN; on a noisy Wi-Fi link, dozens. High retr = packet loss; investigate the physical layer.
- Window size (in verbose mode) — TCP receive window. If the window is small and the link is long-fat (high bandwidth-delay product), the achievable single-stream throughput is bounded by
window_size / RTT. Tune via sysctl (net.core.rmem_max/net.ipv4.tcp_rmem) or use parallel streams.
The throughput number is incomplete
"How fast" usually conflates two questions:
- Capacity — how many bits per second can this link carry? iperf3 measures this.
- Quality under load — what does latency look like when the link is being used? iperf3 doesn't measure this; flent does.
A 1 Gbps fiber link with poor buffering (high bufferbloat) can have 200–500 ms of added latency when saturated — making video calls unusable while someone downloads. The same link with proper queue management (CAKE, fq_codel) might have <5 ms of added latency at the same throughput.
flent: install + run
flent is a Python tool that orchestrates concurrent tests (typically using netperf or iperf3 as the data movers) and produces graphs of latency vs throughput vs time:
# Debian / Ubuntu
sudo apt install flent netperf
# macOS
brew install flent netperf
# Or via pip
pipx install flent
flent needs a netperf server reachable from the client. Either set up your own or use one of the public flent / netperf test servers (the project lists some).
The RRUL test
"Realtime Response Under Load" — the canonical bufferbloat measurement. Saturates the link in both directions with multiple flows while measuring ICMP and UDP latency on a separate flow throughout:
# Run RRUL against a netperf server
flent rrul -H <netperf-server> -t "Description of test environment" -p all_scaled
# Output: a .flent.gz binary results file + a .png chart
ls *.flent.gz
The chart shows:
- Top half — throughput up + throughput down over time. Should approach link capacity within a few seconds.
- Bottom half — ICMP and UDP latency during the same time period. This is the important number.
On a well-queue-managed link, latency stays within a few ms of idle even at full load. On a bufferbloated link, latency rises to hundreds of ms.
Interpreting the latency numbers
- <5 ms added under load — excellent. Likely running CAKE or fq_codel queue discipline.
- 5–30 ms added — good. Most users won't notice.
- 30–100 ms added — mediocre. Video calls suffer; gaming becomes unpleasant.
- >100 ms added — bufferbloat. Anything time-sensitive (calls, games, real-time apps) breaks when the link is loaded.
If you see >30 ms added in flent: enable a modern queue discipline on the router. CAKE is the gold standard; available in OpenWRT (and OPNsense, see that tutorial) as a per-interface shaper. Configure with your actual link rate (typically slightly below the ISP-promised rate to keep the queue in the router rather than at the ISP modem).
The other flent tests worth knowing
# Just throughput (4 TCP streams up + 4 down)
flent tcp_4up_4down -H <server>
# Latency-only ping during competing load (gentler than RRUL)
flent ping -H <server>
# Bursty traffic that mimics web browsing patterns
flent rtt_fair -H <server>
# Single TCP stream
flent tcp_download -H <server>
iperf3 server caveats
- iperf3 by default uses one CPU per server thread. For multi-Gbps LAN testing, the CPU may bottleneck before the NIC. Use
iperf3 -s -1to spawn a new server per test (workaround for the single-thread bottleneck) or useiperf2if you need multi-thread server. - UDP tests — iperf3 reports loss + jitter. UDP at line rate often saturates kernel UDP buffers; watch
/proc/net/snmpforUdpInErrorsclimbing. - Run iperf3 server on the box that's actually serving your real traffic, not on a separate host — the path to a measurement-dedicated box might not match the production path.
Other tools in the kit
- mtr — combined traceroute + ping. Shows per-hop loss + latency. The first tool to reach for when a path is misbehaving.
- iftop / nethogs / bmon — live per-flow / per-process / per-interface bandwidth views.
- tcptrace / tcpdump (see that tutorial) — analyze captured pcaps for retransmission patterns.
- speedtest-cli / fast-cli — CDN-served speed tests; quick approximations but less rigorous than iperf3 to a known endpoint.
One workflow that pays off
When investigating a "the network is slow" complaint:
- mtr first — is there packet loss on a specific hop? If yes, the problem is between you and that hop.
- iperf3 — what's the actual throughput on a clean test? Is it close to the link's rated speed?
- flent rrul — what's the latency under load? Is bufferbloat the actual root cause?
- tcpdump on a specific connection — are application-level retries / TLS handshake failures explaining the user-visible "slow" sense?
Most "the internet is slow" complaints are actually bufferbloat — throughput is fine, but interactive latency under load is poor. Once you measure it, the fix is usually one queue-discipline configuration on the router.