Add network throughput diagnostic skill (Sam & Claude)

Document a clean osa-to-debby HTTPS download test with synchronized pcaps, counters, Hermes summaries, graph inputs, and mandatory cleanup.

---
Build: pass | Tests: pass — 2456 passed (182 files)
This commit is contained in:
Operator & Codex 2026-05-15 12:27:53 +02:00
parent 8637c52e39
commit 466ebefd63

View file

@ -0,0 +1,495 @@
---
name: network-throughput
description: Run a clean osa-to-debby HTTPS network throughput test with packet captures, counters, cleanup, and Hermes graph-ready artifacts
compatibility: FreeBSD 15.x server, Linux client
invoke_patterns:
- "Run throughput test"
- "Run network throughput test"
- "Test download speed"
- "Capture pcaps on both sides"
- "Diagnose image download"
- "Network throughput test"
- "Network troughput test"
estimated_tokens: 1400-2200
---
# network-throughput
Run one clean HTTPS image download from osa FreeBSD server to debby, with packet capture and counters on both sides, to separate PF/PMTU/MSS/server behavior from access-path/bufferbloat behavior.
Main rule:
> One download. Captures first. No overwrite. Summaries first. Cleanup after.
## Definitions
Record these before the run so there is no ambiguity:
- `client`: debby
- `client network`: house Wi-Fi | hotspot | wired | other
- `server`: osa.smilepowered.org
- `server public IP`: 51.83.197.148
- `server Tailscale IP`: 100.72.229.63
- `URL`: exact image URL under test
- `downloader`: curl or wget only; prefer curl with `--http1.1`
Terminology:
- "osa hotspot" means the phone hotspot/mobile Wi-Fi path.
- "osa server" means `osa.smilepowered.org`, FreeBSD 15, public `51.83.197.148`, Tailscale `100.72.229.63`.
## Safety Rules
- Ask before starting any privileged capture or firewall-visible test.
- Use one test download only. Do not run Firefox/browser downloads in parallel.
- Use project-local scratch directories, not system `/tmp` or `/var/tmp`.
- On FreeBSD, run root commands in the visible tmux root window when one is available.
- Do not leave large pcaps or duplicate downloaded images behind after analysis.
- Do not change PF during the measurement unless the purpose of the test is explicitly PF comparison.
- Use UTC timestamps at before-counters, pcap start, download start, download stop, and after-counters.
## Variables
Set these before the run.
### Server / osa
```sh
TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)"
SERVER_IF="vtnet0"
SERVER_DIR="/home/clawdie/clawdie-iso/tmp/network-tests/$TEST_ID"
CLIENT_IP="<debby-public-ip-if-known>"
SERVER_IP="51.83.197.148"
SERVER_TS_IP="100.72.229.63"
DURATION_SEC="600"
```
### Linux client / debby / Hermes
Run from the Hermes project/repo root, then keep artifacts under that repo `tmp/` directory:
```sh
TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)"
CLIENT_DIR="$PWD/tmp/network-tests/$TEST_ID"
URL="https://osa.smilepowered.org/downloads/iso/clawdie-xfce-operator-usb-fbsd15.0-amd64-15.maj.2026.img.gz"
SHA_URL="$URL.sha256"
SERVER_IP="51.83.197.148"
SERVER_TS_IP="100.72.229.63"
DURATION_SEC="600"
```
## Preflight
### Server before-counters
```sh
mkdir -p "$SERVER_DIR"
date -u '+before_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$SERVER_DIR/timestamps.txt"
df -h /home/clawdie > "$SERVER_DIR/df-before.txt"
ifconfig "$SERVER_IF" > "$SERVER_DIR/ifconfig-server-if.txt"
netstat -rn > "$SERVER_DIR/routes.txt"
/sbin/pfctl -si -v > "$SERVER_DIR/pf-before.txt"
netstat -s -p tcp > "$SERVER_DIR/tcp-before.txt"
netstat -s -p ip > "$SERVER_DIR/ip-before.txt"
netstat -s -p icmp > "$SERVER_DIR/icmp-before.txt"
netstat -s -p icmp6 > "$SERVER_DIR/icmp6-before.txt"
```
Also save the TCP lines we care about most:
```sh
netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-before-key-lines.txt"
```
### Linux client preflight
```sh
mkdir -p "$CLIENT_DIR"
date -u '+client_preflight_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$CLIENT_DIR/timestamps.txt"
df -h . > "$CLIENT_DIR/df-before.txt"
ip addr > "$CLIENT_DIR/ip-addr.txt"
ip route > "$CLIENT_DIR/ip-route.txt"
curl -4 -s https://ifconfig.me > "$CLIENT_DIR/public-ip.txt"
ip route get "$SERVER_IP" > "$CLIENT_DIR/route-to-osa-public.txt"
```
Pick the outbound interface from `route-to-osa-public.txt`:
```sh
CLIENT_IF="<interface-from-ip-route-get>"
```
## Packet Capture
Start both captures before the download. Wait until tcpdump prints `listening on ...` on both sides.
Avoid tiny rotating pcap rings. For a 5-10 minute test, either use a single non-overwriting pcap or enough ring files to preserve the beginning/SYN.
### Server capture
If client public IP is known, prefer this filter:
```sh
date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \
-s 0 \
-C 200 -W 20 \
-w "$SERVER_DIR/osa-vtnet0-download.pcap" \
"(host $CLIENT_IP and tcp port 443) or icmp or icmp6"
```
If client public IP is not known yet, temporarily capture all HTTPS plus ICMP/PTB signals:
```sh
date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \
-s 0 \
-C 200 -W 20 \
-w "$SERVER_DIR/osa-vtnet0-download.pcap" \
"tcp port 443 or icmp or icmp6"
```
### Linux client capture
Run in a root shell or with sudo, but store output under the Hermes project `tmp/` directory:
```sh
date -u '+client_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$CLIENT_DIR/timestamps.txt"
sudo /usr/bin/timeout 900 tcpdump -ni "$CLIENT_IF" \
-s 0 \
-C 200 -W 20 \
-w "$CLIENT_DIR/debby-osa-public.pcap" \
"host $SERVER_IP and (tcp port 443 or icmp or icmp6)"
```
## Client Latency Monitor
Run this on debby while the download runs. It helps identify access-path/bufferbloat symptoms.
```sh
GW=$(ip route show default | awk '{print $3; exit}')
while true; do
date -u '+utc=%Y-%m-%dT%H:%M:%SZ'
for t in "$GW" 1.1.1.1 "$SERVER_TS_IP" 100.103.255.41; do
echo "### $t"
ping -c 5 -i 0.2 -W 2 "$t"
done
sleep 5
done | tee "$CLIENT_DIR/ping-monitor.log"
```
Stop it when the download stops.
## Download Test
Use a CLI downloader, not a browser. For diagnosis, default to a fresh output file and no resume. Prefer `curl --http1.1` so the test is one simple TCP flow and avoids HTTP/2 multiplexing noise.
### Full artifact download
```sh
cd "$CLIENT_DIR"
date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
curl -L \
--fail \
--http1.1 \
--output clawdie-test.img.gz \
--write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \
"$URL" \
2>&1 | tee curl-download.log
date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
```
### Bounded 5-10 minute sample
Use this when disk or time is limited. A timeout exit is acceptable; we care about behavior during the window.
```sh
cd "$CLIENT_DIR"
date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
timeout "$DURATION_SEC" curl -L \
--fail \
--http1.1 \
--output clawdie-test.img.gz \
--write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \
"$URL" \
2>&1 | tee curl-download.log
date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
```
## Postflight
Stop captures first, then collect after-counters.
### Server after-counters
```sh
date -u '+after_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
/sbin/pfctl -si -v > "$SERVER_DIR/pf-after.txt"
netstat -s -p tcp > "$SERVER_DIR/tcp-after.txt"
netstat -s -p ip > "$SERVER_DIR/ip-after.txt"
netstat -s -p icmp > "$SERVER_DIR/icmp-after.txt"
netstat -s -p icmp6 > "$SERVER_DIR/icmp6-after.txt"
netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-after-key-lines.txt"
ls -lh "$SERVER_DIR" > "$SERVER_DIR/ls.txt"
```
### Linux client postflight
```sh
ls -lh "$CLIENT_DIR" > "$CLIENT_DIR/ls.txt"
curl -L --fail -o "$CLIENT_DIR/expected.sha256" "$SHA_URL"
```
If the full image completed, verify checksum:
```sh
cd "$CLIENT_DIR"
sha256sum -c expected.sha256
```
## Summary-First Exchange
Before copying raw pcaps around, generate text summaries on each side. Exchange summaries first; only copy raw pcaps if something suspicious needs deeper inspection.
```sh
PCAP="/path/to/file.pcap"
OUT="$PCAP.summary.txt"
{
echo "## capinfos"
capinfos "$PCAP"
echo
echo "## tcp conversations"
tshark -r "$PCAP" -q -z conv,tcp
echo
echo "## 1s io stats"
tshark -r "$PCAP" -q \
-z io,stat,1,'tcp','icmp || icmpv6','tcp.analysis.retransmission || tcp.analysis.fast_retransmission','tcp.analysis.duplicate_ack','tcp.analysis.zero_window'
echo
echo "## SYN MSS/window options"
tshark -r "$PCAP" -Y 'tcp.flags.syn == 1' -T fields \
-e frame.time_relative \
-e ip.src -e ip.dst \
-e tcp.srcport -e tcp.dstport \
-e tcp.flags.ack \
-e tcp.options.mss_val \
-e tcp.window_size_value \
-e tcp.options.wscale.shift \
-e tcp.options.sack_perm \
| head -100
echo
echo "## interesting TCP/ICMP"
tshark -r "$PCAP" \
-Y 'tcp.analysis.retransmission or tcp.analysis.fast_retransmission or tcp.analysis.duplicate_ack or tcp.analysis.zero_window or tcp.analysis.out_of_order or tcp.analysis.lost_segment or icmp or icmpv6' \
-T fields \
-e frame.time_relative \
-e frame.number \
-e ip.src -e ip.dst \
-e _ws.col.Protocol \
-e tcp.srcport -e tcp.dstport \
-e tcp.seq -e tcp.ack -e tcp.len \
-e tcp.analysis.retransmission \
-e tcp.analysis.fast_retransmission \
-e tcp.analysis.duplicate_ack \
-e tcp.analysis.duplicate_ack_num \
-e tcp.analysis.zero_window \
-e tcp.analysis.out_of_order \
-e tcp.analysis.lost_segment \
-e icmp.type -e icmp.code \
-e icmpv6.type -e icmpv6.code \
| head -300
echo
echo "## expert"
tshark -r "$PCAP" -q -z expert | head -300
} > "$OUT"
```
## Hermes Graph Inputs
Hermes should process locally from the client-side artifacts plus copied server-side summaries/artifacts:
- `curl-download.log`
- `ping-monitor.log`
- `timestamps.txt`
- `pf-before.txt`, `pf-after.txt`
- `tcp-before.txt`, `tcp-after.txt`
- `ip-before.txt`, `ip-after.txt`
- `icmp-before.txt`, `icmp-after.txt`
- `icmp6-before.txt`, `icmp6-after.txt`
- `*.pcap.summary.txt`
- raw `*.pcap*` only if needed
Recommended graph outputs:
- throughput over time
- retransmissions over time
- duplicate ACKs over time
- RTT estimate over time, if available
- TCP window/zero-window events over time
- bytes by flow
- gateway/internet/Tailscale latency during download
- timeline of stalls or timeout bursts
## Cleanup
Cleanup is mandatory after the report/dashboard exists.
Keep:
- curl log
- ping monitor log
- counter before/after text
- pcap summaries
- final dashboard/report
Remove:
- downloaded test image
- partial image
- raw pcaps, unless still needed
### Client cleanup
```sh
rm -f "$CLIENT_DIR/clawdie-test.img.gz"
rm -f "$CLIENT_DIR"/*.partial
```
If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`:
```sh
for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do
[ -e "$f" ] || continue
case "$f" in
*.summary.txt|*.gz) continue ;;
esac
rm -f "$f"
done
```
If suspicious and raw pcaps must be kept temporarily, compress only raw capture files:
```sh
for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do
[ -e "$f" ] || continue
case "$f" in
*.summary.txt|*.gz) continue ;;
esac
gzip -9 "$f"
done
```
### Server cleanup
If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`:
```sh
for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do
[ -e "$f" ] || continue
case "$f" in
*.summary.txt|*.gz) continue ;;
esac
rm -f "$f"
done
```
If suspicious and raw pcaps must be kept temporarily, compress only raw capture files:
```sh
for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do
[ -e "$f" ] || continue
case "$f" in
*.summary.txt|*.gz) continue ;;
esac
gzip -9 "$f"
done
```
Then confirm space recovered:
```sh
df -h /home/clawdie
```
## Result Summary Template
```text
Test ID:
Date/time UTC:
Client: debby
Client network path: house Wi-Fi | hotspot | wired | other
Server: osa.smilepowered.org
Server public IP: 51.83.197.148
Server Tailscale IP: 100.72.229.63
URL:
Downloader: curl --http1.1 | wget | other
Parallel downloads: no | yes, describe
Server capture duration:
Client capture duration:
Downloaded bytes:
Average throughput:
Checksum result: pass | fail | partial only
Latency:
- Gateway latency during download:
- 1.1.1.1 latency during download:
- osa Tailscale latency during download:
- domedog latency during download:
PF/PMTU/MSS:
- PF state/search/fragment counters changed:
- ICMP frag-needed / IPv6 Packet Too Big seen:
- PMTU black-hole counters:
- SYN MSS values:
TCP:
- Server retransmitted data delta:
- Server retransmit timeout delta:
- Pcap retransmissions:
- Fast retransmissions:
- Duplicate ACKs:
- Zero-window events:
Conclusion:
- PF likely root cause? yes | no | inconclusive
- PMTU/MSS likely root cause? yes | no | inconclusive
- Access path/bufferbloat suspect? yes | no | inconclusive
- Next action:
Cleanup:
- Large client image removed: yes | no
- Large server pcap removed/compressed: yes | no
- Raw pcap copies deleted after conclusion: yes | no
- Disk checked after cleanup: yes | no
```
## Shared Intent Text
```text
Intent for next throughput test:
We want one clean HTTPS download from osa.smilepowered.org to debby, with no Firefox and no parallel downloads. The purpose is to separate server/PF/PMTU/MSS behavior from client/access-path/bufferbloat behavior.
Definitions:
- “osa hotspot” means phone hotspot/mobile Wi-Fi path.
- “osa server” means osa.smilepowered.org, FreeBSD 15, public 51.83.197.148, Tailscale 100.72.229.63.
Rules:
1. Start pcaps on both sides before the download.
2. Use one curl/wget download only, preferably curl --http1.1.
3. Use UTC timestamps for before/start/stop/after.
4. Server collects TCP/PF/ICMP/ICMPv6 counters before and after.
5. Pcaps must not overwrite the beginning of the test; use enough -W or shorten runtime.
6. Exchange pcap summaries first, not raw pcaps.
7. Keep final logs/summaries/dashboard, then clean raw pcaps and downloaded test image.
Success criteria:
- We know bytes/time/throughput.
- We know whether retransmits/dupACKs/timeouts were normal or abnormal.
- We know whether ICMP/PTB/fragmentation-needed/PMTU looked suspicious.
- We know whether latency spikes were local/access-path or server-side.
```