Add network throughput diagnostic skill (Sam & Claude)

Document a clean osa-to-debby HTTPS download test with synchronized pcaps, counters, Hermes summaries, graph inputs, and mandatory cleanup. --- Build: pass | Tests: pass — 2456 passed (182 files)
2026-05-15 12:27:53 +02:00 · 2026-05-15 12:27:53 +02:00 · 466ebefd63
commit 466ebefd63
parent 8637c52e39
1 changed files with 495 additions and 0 deletions
--- a/.agent/skills/network-throughput/SKILL.md
+++ b/.agent/skills/network-throughput/SKILL.md
@ -0,0 +1,495 @@
+---
+name: network-throughput
+description: Run a clean osa-to-debby HTTPS network throughput test with packet captures, counters, cleanup, and Hermes graph-ready artifacts
+compatibility: FreeBSD 15.x server, Linux client
+invoke_patterns:
+  - "Run throughput test"
+  - "Run network throughput test"
+  - "Test download speed"
+  - "Capture pcaps on both sides"
+  - "Diagnose image download"
+  - "Network throughput test"
+  - "Network troughput test"
+estimated_tokens: 1400-2200
+---
+
+# network-throughput
+
+Run one clean HTTPS image download from osa FreeBSD server to debby, with packet capture and counters on both sides, to separate PF/PMTU/MSS/server behavior from access-path/bufferbloat behavior.
+
+Main rule:
+
+> One download. Captures first. No overwrite. Summaries first. Cleanup after.
+
+## Definitions
+
+Record these before the run so there is no ambiguity:
+
+- `client`: debby
+- `client network`: house Wi-Fi | hotspot | wired | other
+- `server`: osa.smilepowered.org
+- `server public IP`: 51.83.197.148
+- `server Tailscale IP`: 100.72.229.63
+- `URL`: exact image URL under test
+- `downloader`: curl or wget only; prefer curl with `--http1.1`
+
+Terminology:
+
+- "osa hotspot" means the phone hotspot/mobile Wi-Fi path.
+- "osa server" means `osa.smilepowered.org`, FreeBSD 15, public `51.83.197.148`, Tailscale `100.72.229.63`.
+
+## Safety Rules
+
+- Ask before starting any privileged capture or firewall-visible test.
+- Use one test download only. Do not run Firefox/browser downloads in parallel.
+- Use project-local scratch directories, not system `/tmp` or `/var/tmp`.
+- On FreeBSD, run root commands in the visible tmux root window when one is available.
+- Do not leave large pcaps or duplicate downloaded images behind after analysis.
+- Do not change PF during the measurement unless the purpose of the test is explicitly PF comparison.
+- Use UTC timestamps at before-counters, pcap start, download start, download stop, and after-counters.
+
+## Variables
+
+Set these before the run.
+
+### Server / osa
+
+```sh
+TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)"
+SERVER_IF="vtnet0"
+SERVER_DIR="/home/clawdie/clawdie-iso/tmp/network-tests/$TEST_ID"
+CLIENT_IP="<debby-public-ip-if-known>"
+SERVER_IP="51.83.197.148"
+SERVER_TS_IP="100.72.229.63"
+DURATION_SEC="600"
+```
+
+### Linux client / debby / Hermes
+
+Run from the Hermes project/repo root, then keep artifacts under that repo `tmp/` directory:
+
+```sh
+TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)"
+CLIENT_DIR="$PWD/tmp/network-tests/$TEST_ID"
+URL="https://osa.smilepowered.org/downloads/iso/clawdie-xfce-operator-usb-fbsd15.0-amd64-15.maj.2026.img.gz"
+SHA_URL="$URL.sha256"
+SERVER_IP="51.83.197.148"
+SERVER_TS_IP="100.72.229.63"
+DURATION_SEC="600"
+```
+
+## Preflight
+
+### Server before-counters
+
+```sh
+mkdir -p "$SERVER_DIR"
+date -u '+before_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$SERVER_DIR/timestamps.txt"
+df -h /home/clawdie > "$SERVER_DIR/df-before.txt"
+ifconfig "$SERVER_IF" > "$SERVER_DIR/ifconfig-server-if.txt"
+netstat -rn > "$SERVER_DIR/routes.txt"
+/sbin/pfctl -si -v > "$SERVER_DIR/pf-before.txt"
+netstat -s -p tcp > "$SERVER_DIR/tcp-before.txt"
+netstat -s -p ip > "$SERVER_DIR/ip-before.txt"
+netstat -s -p icmp > "$SERVER_DIR/icmp-before.txt"
+netstat -s -p icmp6 > "$SERVER_DIR/icmp6-before.txt"
+```
+
+Also save the TCP lines we care about most:
+
+```sh
+netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-before-key-lines.txt"
+```
+
+### Linux client preflight
+
+```sh
+mkdir -p "$CLIENT_DIR"
+date -u '+client_preflight_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$CLIENT_DIR/timestamps.txt"
+df -h . > "$CLIENT_DIR/df-before.txt"
+ip addr > "$CLIENT_DIR/ip-addr.txt"
+ip route > "$CLIENT_DIR/ip-route.txt"
+curl -4 -s https://ifconfig.me > "$CLIENT_DIR/public-ip.txt"
+ip route get "$SERVER_IP" > "$CLIENT_DIR/route-to-osa-public.txt"
+```
+
+Pick the outbound interface from `route-to-osa-public.txt`:
+
+```sh
+CLIENT_IF="<interface-from-ip-route-get>"
+```
+
+## Packet Capture
+
+Start both captures before the download. Wait until tcpdump prints `listening on ...` on both sides.
+
+Avoid tiny rotating pcap rings. For a 5-10 minute test, either use a single non-overwriting pcap or enough ring files to preserve the beginning/SYN.
+
+### Server capture
+
+If client public IP is known, prefer this filter:
+
+```sh
+date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
+/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \
+  -s 0 \
+  -C 200 -W 20 \
+  -w "$SERVER_DIR/osa-vtnet0-download.pcap" \
+  "(host $CLIENT_IP and tcp port 443) or icmp or icmp6"
+```
+
+If client public IP is not known yet, temporarily capture all HTTPS plus ICMP/PTB signals:
+
+```sh
+date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
+/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \
+  -s 0 \
+  -C 200 -W 20 \
+  -w "$SERVER_DIR/osa-vtnet0-download.pcap" \
+  "tcp port 443 or icmp or icmp6"
+```
+
+### Linux client capture
+
+Run in a root shell or with sudo, but store output under the Hermes project `tmp/` directory:
+
+```sh
+date -u '+client_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$CLIENT_DIR/timestamps.txt"
+sudo /usr/bin/timeout 900 tcpdump -ni "$CLIENT_IF" \
+  -s 0 \
+  -C 200 -W 20 \
+  -w "$CLIENT_DIR/debby-osa-public.pcap" \
+  "host $SERVER_IP and (tcp port 443 or icmp or icmp6)"
+```
+
+## Client Latency Monitor
+
+Run this on debby while the download runs. It helps identify access-path/bufferbloat symptoms.
+
+```sh
+GW=$(ip route show default | awk '{print $3; exit}')
+while true; do
+  date -u '+utc=%Y-%m-%dT%H:%M:%SZ'
+  for t in "$GW" 1.1.1.1 "$SERVER_TS_IP" 100.103.255.41; do
+    echo "### $t"
+    ping -c 5 -i 0.2 -W 2 "$t"
+  done
+  sleep 5
+done | tee "$CLIENT_DIR/ping-monitor.log"
+```
+
+Stop it when the download stops.
+
+## Download Test
+
+Use a CLI downloader, not a browser. For diagnosis, default to a fresh output file and no resume. Prefer `curl --http1.1` so the test is one simple TCP flow and avoids HTTP/2 multiplexing noise.
+
+### Full artifact download
+
+```sh
+cd "$CLIENT_DIR"
+date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
+curl -L \
+  --fail \
+  --http1.1 \
+  --output clawdie-test.img.gz \
+  --write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \
+  "$URL" \
+  2>&1 | tee curl-download.log
+date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
+```
+
+### Bounded 5-10 minute sample
+
+Use this when disk or time is limited. A timeout exit is acceptable; we care about behavior during the window.
+
+```sh
+cd "$CLIENT_DIR"
+date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
+timeout "$DURATION_SEC" curl -L \
+  --fail \
+  --http1.1 \
+  --output clawdie-test.img.gz \
+  --write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \
+  "$URL" \
+  2>&1 | tee curl-download.log
+date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
+```
+
+## Postflight
+
+Stop captures first, then collect after-counters.
+
+### Server after-counters
+
+```sh
+date -u '+after_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
+/sbin/pfctl -si -v > "$SERVER_DIR/pf-after.txt"
+netstat -s -p tcp > "$SERVER_DIR/tcp-after.txt"
+netstat -s -p ip > "$SERVER_DIR/ip-after.txt"
+netstat -s -p icmp > "$SERVER_DIR/icmp-after.txt"
+netstat -s -p icmp6 > "$SERVER_DIR/icmp6-after.txt"
+netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-after-key-lines.txt"
+ls -lh "$SERVER_DIR" > "$SERVER_DIR/ls.txt"
+```
+
+### Linux client postflight
+
+```sh
+ls -lh "$CLIENT_DIR" > "$CLIENT_DIR/ls.txt"
+curl -L --fail -o "$CLIENT_DIR/expected.sha256" "$SHA_URL"
+```
+
+If the full image completed, verify checksum:
+
+```sh
+cd "$CLIENT_DIR"
+sha256sum -c expected.sha256
+```
+
+## Summary-First Exchange
+
+Before copying raw pcaps around, generate text summaries on each side. Exchange summaries first; only copy raw pcaps if something suspicious needs deeper inspection.
+
+```sh
+PCAP="/path/to/file.pcap"
+OUT="$PCAP.summary.txt"
+{
+  echo "## capinfos"
+  capinfos "$PCAP"
+
+  echo
+  echo "## tcp conversations"
+  tshark -r "$PCAP" -q -z conv,tcp
+
+  echo
+  echo "## 1s io stats"
+  tshark -r "$PCAP" -q \
+    -z io,stat,1,'tcp','icmp || icmpv6','tcp.analysis.retransmission || tcp.analysis.fast_retransmission','tcp.analysis.duplicate_ack','tcp.analysis.zero_window'
+
+  echo
+  echo "## SYN MSS/window options"
+  tshark -r "$PCAP" -Y 'tcp.flags.syn == 1' -T fields \
+    -e frame.time_relative \
+    -e ip.src -e ip.dst \
+    -e tcp.srcport -e tcp.dstport \
+    -e tcp.flags.ack \
+    -e tcp.options.mss_val \
+    -e tcp.window_size_value \
+    -e tcp.options.wscale.shift \
+    -e tcp.options.sack_perm \
+    | head -100
+
+  echo
+  echo "## interesting TCP/ICMP"
+  tshark -r "$PCAP" \
+    -Y 'tcp.analysis.retransmission or tcp.analysis.fast_retransmission or tcp.analysis.duplicate_ack or tcp.analysis.zero_window or tcp.analysis.out_of_order or tcp.analysis.lost_segment or icmp or icmpv6' \
+    -T fields \
+    -e frame.time_relative \
+    -e frame.number \
+    -e ip.src -e ip.dst \
+    -e _ws.col.Protocol \
+    -e tcp.srcport -e tcp.dstport \
+    -e tcp.seq -e tcp.ack -e tcp.len \
+    -e tcp.analysis.retransmission \
+    -e tcp.analysis.fast_retransmission \
+    -e tcp.analysis.duplicate_ack \
+    -e tcp.analysis.duplicate_ack_num \
+    -e tcp.analysis.zero_window \
+    -e tcp.analysis.out_of_order \
+    -e tcp.analysis.lost_segment \
+    -e icmp.type -e icmp.code \
+    -e icmpv6.type -e icmpv6.code \
+    | head -300
+
+  echo
+  echo "## expert"
+  tshark -r "$PCAP" -q -z expert | head -300
+} > "$OUT"
+```
+
+## Hermes Graph Inputs
+
+Hermes should process locally from the client-side artifacts plus copied server-side summaries/artifacts:
+
+- `curl-download.log`
+- `ping-monitor.log`
+- `timestamps.txt`
+- `pf-before.txt`, `pf-after.txt`
+- `tcp-before.txt`, `tcp-after.txt`
+- `ip-before.txt`, `ip-after.txt`
+- `icmp-before.txt`, `icmp-after.txt`
+- `icmp6-before.txt`, `icmp6-after.txt`
+- `*.pcap.summary.txt`
+- raw `*.pcap*` only if needed
+
+Recommended graph outputs:
+
+- throughput over time
+- retransmissions over time
+- duplicate ACKs over time
+- RTT estimate over time, if available
+- TCP window/zero-window events over time
+- bytes by flow
+- gateway/internet/Tailscale latency during download
+- timeline of stalls or timeout bursts
+
+## Cleanup
+
+Cleanup is mandatory after the report/dashboard exists.
+
+Keep:
+
+- curl log
+- ping monitor log
+- counter before/after text
+- pcap summaries
+- final dashboard/report
+
+Remove:
+
+- downloaded test image
+- partial image
+- raw pcaps, unless still needed
+
+### Client cleanup
+
+```sh
+rm -f "$CLIENT_DIR/clawdie-test.img.gz"
+rm -f "$CLIENT_DIR"/*.partial
+```
+
+If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`:
+
+```sh
+for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do
+  [ -e "$f" ] || continue
+  case "$f" in
+    *.summary.txt|*.gz) continue ;;
+  esac
+  rm -f "$f"
+done
+```
+
+If suspicious and raw pcaps must be kept temporarily, compress only raw capture files:
+
+```sh
+for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do
+  [ -e "$f" ] || continue
+  case "$f" in
+    *.summary.txt|*.gz) continue ;;
+  esac
+  gzip -9 "$f"
+done
+```
+
+### Server cleanup
+
+If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`:
+
+```sh
+for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do
+  [ -e "$f" ] || continue
+  case "$f" in
+    *.summary.txt|*.gz) continue ;;
+  esac
+  rm -f "$f"
+done
+```
+
+If suspicious and raw pcaps must be kept temporarily, compress only raw capture files:
+
+```sh
+for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do
+  [ -e "$f" ] || continue
+  case "$f" in
+    *.summary.txt|*.gz) continue ;;
+  esac
+  gzip -9 "$f"
+done
+```
+
+Then confirm space recovered:
+
+```sh
+df -h /home/clawdie
+```
+
+## Result Summary Template
+
+```text
+Test ID:
+Date/time UTC:
+Client: debby
+Client network path: house Wi-Fi | hotspot | wired | other
+Server: osa.smilepowered.org
+Server public IP: 51.83.197.148
+Server Tailscale IP: 100.72.229.63
+URL:
+Downloader: curl --http1.1 | wget | other
+Parallel downloads: no | yes, describe
+Server capture duration:
+Client capture duration:
+Downloaded bytes:
+Average throughput:
+Checksum result: pass | fail | partial only
+
+Latency:
+- Gateway latency during download:
+- 1.1.1.1 latency during download:
+- osa Tailscale latency during download:
+- domedog latency during download:
+
+PF/PMTU/MSS:
+- PF state/search/fragment counters changed:
+- ICMP frag-needed / IPv6 Packet Too Big seen:
+- PMTU black-hole counters:
+- SYN MSS values:
+
+TCP:
+- Server retransmitted data delta:
+- Server retransmit timeout delta:
+- Pcap retransmissions:
+- Fast retransmissions:
+- Duplicate ACKs:
+- Zero-window events:
+
+Conclusion:
+- PF likely root cause? yes | no | inconclusive
+- PMTU/MSS likely root cause? yes | no | inconclusive
+- Access path/bufferbloat suspect? yes | no | inconclusive
+- Next action:
+
+Cleanup:
+- Large client image removed: yes | no
+- Large server pcap removed/compressed: yes | no
+- Raw pcap copies deleted after conclusion: yes | no
+- Disk checked after cleanup: yes | no
+```
+
+## Shared Intent Text
+
+```text
+Intent for next throughput test:
+
+We want one clean HTTPS download from osa.smilepowered.org to debby, with no Firefox and no parallel downloads. The purpose is to separate server/PF/PMTU/MSS behavior from client/access-path/bufferbloat behavior.
+
+Definitions:
+- “osa hotspot” means phone hotspot/mobile Wi-Fi path.
+- “osa server” means osa.smilepowered.org, FreeBSD 15, public 51.83.197.148, Tailscale 100.72.229.63.
+
+Rules:
+1. Start pcaps on both sides before the download.
+2. Use one curl/wget download only, preferably curl --http1.1.
+3. Use UTC timestamps for before/start/stop/after.
+4. Server collects TCP/PF/ICMP/ICMPv6 counters before and after.
+5. Pcaps must not overwrite the beginning of the test; use enough -W or shorten runtime.
+6. Exchange pcap summaries first, not raw pcaps.
+7. Keep final logs/summaries/dashboard, then clean raw pcaps and downloaded test image.
+
+Success criteria:
+- We know bytes/time/throughput.
+- We know whether retransmits/dupACKs/timeouts were normal or abnormal.
+- We know whether ICMP/PTB/fragmentation-needed/PMTU looked suspicious.
+- We know whether latency spikes were local/access-path or server-side.
+```