diff --git a/.agent/skills/network-throughput/SKILL.md b/.agent/skills/network-throughput/SKILL.md new file mode 100644 index 0000000..9cd3996 --- /dev/null +++ b/.agent/skills/network-throughput/SKILL.md @@ -0,0 +1,495 @@ +--- +name: network-throughput +description: Run a clean osa-to-debby HTTPS network throughput test with packet captures, counters, cleanup, and Hermes graph-ready artifacts +compatibility: FreeBSD 15.x server, Linux client +invoke_patterns: + - "Run throughput test" + - "Run network throughput test" + - "Test download speed" + - "Capture pcaps on both sides" + - "Diagnose image download" + - "Network throughput test" + - "Network troughput test" +estimated_tokens: 1400-2200 +--- + +# network-throughput + +Run one clean HTTPS image download from osa FreeBSD server to debby, with packet capture and counters on both sides, to separate PF/PMTU/MSS/server behavior from access-path/bufferbloat behavior. + +Main rule: + +> One download. Captures first. No overwrite. Summaries first. Cleanup after. + +## Definitions + +Record these before the run so there is no ambiguity: + +- `client`: debby +- `client network`: house Wi-Fi | hotspot | wired | other +- `server`: osa.smilepowered.org +- `server public IP`: 51.83.197.148 +- `server Tailscale IP`: 100.72.229.63 +- `URL`: exact image URL under test +- `downloader`: curl or wget only; prefer curl with `--http1.1` + +Terminology: + +- "osa hotspot" means the phone hotspot/mobile Wi-Fi path. +- "osa server" means `osa.smilepowered.org`, FreeBSD 15, public `51.83.197.148`, Tailscale `100.72.229.63`. + +## Safety Rules + +- Ask before starting any privileged capture or firewall-visible test. +- Use one test download only. Do not run Firefox/browser downloads in parallel. +- Use project-local scratch directories, not system `/tmp` or `/var/tmp`. +- On FreeBSD, run root commands in the visible tmux root window when one is available. +- Do not leave large pcaps or duplicate downloaded images behind after analysis. +- Do not change PF during the measurement unless the purpose of the test is explicitly PF comparison. +- Use UTC timestamps at before-counters, pcap start, download start, download stop, and after-counters. + +## Variables + +Set these before the run. + +### Server / osa + +```sh +TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)" +SERVER_IF="vtnet0" +SERVER_DIR="/home/clawdie/clawdie-iso/tmp/network-tests/$TEST_ID" +CLIENT_IP="" +SERVER_IP="51.83.197.148" +SERVER_TS_IP="100.72.229.63" +DURATION_SEC="600" +``` + +### Linux client / debby / Hermes + +Run from the Hermes project/repo root, then keep artifacts under that repo `tmp/` directory: + +```sh +TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)" +CLIENT_DIR="$PWD/tmp/network-tests/$TEST_ID" +URL="https://osa.smilepowered.org/downloads/iso/clawdie-xfce-operator-usb-fbsd15.0-amd64-15.maj.2026.img.gz" +SHA_URL="$URL.sha256" +SERVER_IP="51.83.197.148" +SERVER_TS_IP="100.72.229.63" +DURATION_SEC="600" +``` + +## Preflight + +### Server before-counters + +```sh +mkdir -p "$SERVER_DIR" +date -u '+before_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$SERVER_DIR/timestamps.txt" +df -h /home/clawdie > "$SERVER_DIR/df-before.txt" +ifconfig "$SERVER_IF" > "$SERVER_DIR/ifconfig-server-if.txt" +netstat -rn > "$SERVER_DIR/routes.txt" +/sbin/pfctl -si -v > "$SERVER_DIR/pf-before.txt" +netstat -s -p tcp > "$SERVER_DIR/tcp-before.txt" +netstat -s -p ip > "$SERVER_DIR/ip-before.txt" +netstat -s -p icmp > "$SERVER_DIR/icmp-before.txt" +netstat -s -p icmp6 > "$SERVER_DIR/icmp6-before.txt" +``` + +Also save the TCP lines we care about most: + +```sh +netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-before-key-lines.txt" +``` + +### Linux client preflight + +```sh +mkdir -p "$CLIENT_DIR" +date -u '+client_preflight_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$CLIENT_DIR/timestamps.txt" +df -h . > "$CLIENT_DIR/df-before.txt" +ip addr > "$CLIENT_DIR/ip-addr.txt" +ip route > "$CLIENT_DIR/ip-route.txt" +curl -4 -s https://ifconfig.me > "$CLIENT_DIR/public-ip.txt" +ip route get "$SERVER_IP" > "$CLIENT_DIR/route-to-osa-public.txt" +``` + +Pick the outbound interface from `route-to-osa-public.txt`: + +```sh +CLIENT_IF="" +``` + +## Packet Capture + +Start both captures before the download. Wait until tcpdump prints `listening on ...` on both sides. + +Avoid tiny rotating pcap rings. For a 5-10 minute test, either use a single non-overwriting pcap or enough ring files to preserve the beginning/SYN. + +### Server capture + +If client public IP is known, prefer this filter: + +```sh +date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt" +/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \ + -s 0 \ + -C 200 -W 20 \ + -w "$SERVER_DIR/osa-vtnet0-download.pcap" \ + "(host $CLIENT_IP and tcp port 443) or icmp or icmp6" +``` + +If client public IP is not known yet, temporarily capture all HTTPS plus ICMP/PTB signals: + +```sh +date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt" +/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \ + -s 0 \ + -C 200 -W 20 \ + -w "$SERVER_DIR/osa-vtnet0-download.pcap" \ + "tcp port 443 or icmp or icmp6" +``` + +### Linux client capture + +Run in a root shell or with sudo, but store output under the Hermes project `tmp/` directory: + +```sh +date -u '+client_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$CLIENT_DIR/timestamps.txt" +sudo /usr/bin/timeout 900 tcpdump -ni "$CLIENT_IF" \ + -s 0 \ + -C 200 -W 20 \ + -w "$CLIENT_DIR/debby-osa-public.pcap" \ + "host $SERVER_IP and (tcp port 443 or icmp or icmp6)" +``` + +## Client Latency Monitor + +Run this on debby while the download runs. It helps identify access-path/bufferbloat symptoms. + +```sh +GW=$(ip route show default | awk '{print $3; exit}') +while true; do + date -u '+utc=%Y-%m-%dT%H:%M:%SZ' + for t in "$GW" 1.1.1.1 "$SERVER_TS_IP" 100.103.255.41; do + echo "### $t" + ping -c 5 -i 0.2 -W 2 "$t" + done + sleep 5 +done | tee "$CLIENT_DIR/ping-monitor.log" +``` + +Stop it when the download stops. + +## Download Test + +Use a CLI downloader, not a browser. For diagnosis, default to a fresh output file and no resume. Prefer `curl --http1.1` so the test is one simple TCP flow and avoids HTTP/2 multiplexing noise. + +### Full artifact download + +```sh +cd "$CLIENT_DIR" +date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt +curl -L \ + --fail \ + --http1.1 \ + --output clawdie-test.img.gz \ + --write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \ + "$URL" \ + 2>&1 | tee curl-download.log +date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt +``` + +### Bounded 5-10 minute sample + +Use this when disk or time is limited. A timeout exit is acceptable; we care about behavior during the window. + +```sh +cd "$CLIENT_DIR" +date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt +timeout "$DURATION_SEC" curl -L \ + --fail \ + --http1.1 \ + --output clawdie-test.img.gz \ + --write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \ + "$URL" \ + 2>&1 | tee curl-download.log +date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt +``` + +## Postflight + +Stop captures first, then collect after-counters. + +### Server after-counters + +```sh +date -u '+after_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt" +/sbin/pfctl -si -v > "$SERVER_DIR/pf-after.txt" +netstat -s -p tcp > "$SERVER_DIR/tcp-after.txt" +netstat -s -p ip > "$SERVER_DIR/ip-after.txt" +netstat -s -p icmp > "$SERVER_DIR/icmp-after.txt" +netstat -s -p icmp6 > "$SERVER_DIR/icmp6-after.txt" +netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-after-key-lines.txt" +ls -lh "$SERVER_DIR" > "$SERVER_DIR/ls.txt" +``` + +### Linux client postflight + +```sh +ls -lh "$CLIENT_DIR" > "$CLIENT_DIR/ls.txt" +curl -L --fail -o "$CLIENT_DIR/expected.sha256" "$SHA_URL" +``` + +If the full image completed, verify checksum: + +```sh +cd "$CLIENT_DIR" +sha256sum -c expected.sha256 +``` + +## Summary-First Exchange + +Before copying raw pcaps around, generate text summaries on each side. Exchange summaries first; only copy raw pcaps if something suspicious needs deeper inspection. + +```sh +PCAP="/path/to/file.pcap" +OUT="$PCAP.summary.txt" +{ + echo "## capinfos" + capinfos "$PCAP" + + echo + echo "## tcp conversations" + tshark -r "$PCAP" -q -z conv,tcp + + echo + echo "## 1s io stats" + tshark -r "$PCAP" -q \ + -z io,stat,1,'tcp','icmp || icmpv6','tcp.analysis.retransmission || tcp.analysis.fast_retransmission','tcp.analysis.duplicate_ack','tcp.analysis.zero_window' + + echo + echo "## SYN MSS/window options" + tshark -r "$PCAP" -Y 'tcp.flags.syn == 1' -T fields \ + -e frame.time_relative \ + -e ip.src -e ip.dst \ + -e tcp.srcport -e tcp.dstport \ + -e tcp.flags.ack \ + -e tcp.options.mss_val \ + -e tcp.window_size_value \ + -e tcp.options.wscale.shift \ + -e tcp.options.sack_perm \ + | head -100 + + echo + echo "## interesting TCP/ICMP" + tshark -r "$PCAP" \ + -Y 'tcp.analysis.retransmission or tcp.analysis.fast_retransmission or tcp.analysis.duplicate_ack or tcp.analysis.zero_window or tcp.analysis.out_of_order or tcp.analysis.lost_segment or icmp or icmpv6' \ + -T fields \ + -e frame.time_relative \ + -e frame.number \ + -e ip.src -e ip.dst \ + -e _ws.col.Protocol \ + -e tcp.srcport -e tcp.dstport \ + -e tcp.seq -e tcp.ack -e tcp.len \ + -e tcp.analysis.retransmission \ + -e tcp.analysis.fast_retransmission \ + -e tcp.analysis.duplicate_ack \ + -e tcp.analysis.duplicate_ack_num \ + -e tcp.analysis.zero_window \ + -e tcp.analysis.out_of_order \ + -e tcp.analysis.lost_segment \ + -e icmp.type -e icmp.code \ + -e icmpv6.type -e icmpv6.code \ + | head -300 + + echo + echo "## expert" + tshark -r "$PCAP" -q -z expert | head -300 +} > "$OUT" +``` + +## Hermes Graph Inputs + +Hermes should process locally from the client-side artifacts plus copied server-side summaries/artifacts: + +- `curl-download.log` +- `ping-monitor.log` +- `timestamps.txt` +- `pf-before.txt`, `pf-after.txt` +- `tcp-before.txt`, `tcp-after.txt` +- `ip-before.txt`, `ip-after.txt` +- `icmp-before.txt`, `icmp-after.txt` +- `icmp6-before.txt`, `icmp6-after.txt` +- `*.pcap.summary.txt` +- raw `*.pcap*` only if needed + +Recommended graph outputs: + +- throughput over time +- retransmissions over time +- duplicate ACKs over time +- RTT estimate over time, if available +- TCP window/zero-window events over time +- bytes by flow +- gateway/internet/Tailscale latency during download +- timeline of stalls or timeout bursts + +## Cleanup + +Cleanup is mandatory after the report/dashboard exists. + +Keep: + +- curl log +- ping monitor log +- counter before/after text +- pcap summaries +- final dashboard/report + +Remove: + +- downloaded test image +- partial image +- raw pcaps, unless still needed + +### Client cleanup + +```sh +rm -f "$CLIENT_DIR/clawdie-test.img.gz" +rm -f "$CLIENT_DIR"/*.partial +``` + +If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`: + +```sh +for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do + [ -e "$f" ] || continue + case "$f" in + *.summary.txt|*.gz) continue ;; + esac + rm -f "$f" +done +``` + +If suspicious and raw pcaps must be kept temporarily, compress only raw capture files: + +```sh +for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do + [ -e "$f" ] || continue + case "$f" in + *.summary.txt|*.gz) continue ;; + esac + gzip -9 "$f" +done +``` + +### Server cleanup + +If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`: + +```sh +for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do + [ -e "$f" ] || continue + case "$f" in + *.summary.txt|*.gz) continue ;; + esac + rm -f "$f" +done +``` + +If suspicious and raw pcaps must be kept temporarily, compress only raw capture files: + +```sh +for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do + [ -e "$f" ] || continue + case "$f" in + *.summary.txt|*.gz) continue ;; + esac + gzip -9 "$f" +done +``` + +Then confirm space recovered: + +```sh +df -h /home/clawdie +``` + +## Result Summary Template + +```text +Test ID: +Date/time UTC: +Client: debby +Client network path: house Wi-Fi | hotspot | wired | other +Server: osa.smilepowered.org +Server public IP: 51.83.197.148 +Server Tailscale IP: 100.72.229.63 +URL: +Downloader: curl --http1.1 | wget | other +Parallel downloads: no | yes, describe +Server capture duration: +Client capture duration: +Downloaded bytes: +Average throughput: +Checksum result: pass | fail | partial only + +Latency: +- Gateway latency during download: +- 1.1.1.1 latency during download: +- osa Tailscale latency during download: +- domedog latency during download: + +PF/PMTU/MSS: +- PF state/search/fragment counters changed: +- ICMP frag-needed / IPv6 Packet Too Big seen: +- PMTU black-hole counters: +- SYN MSS values: + +TCP: +- Server retransmitted data delta: +- Server retransmit timeout delta: +- Pcap retransmissions: +- Fast retransmissions: +- Duplicate ACKs: +- Zero-window events: + +Conclusion: +- PF likely root cause? yes | no | inconclusive +- PMTU/MSS likely root cause? yes | no | inconclusive +- Access path/bufferbloat suspect? yes | no | inconclusive +- Next action: + +Cleanup: +- Large client image removed: yes | no +- Large server pcap removed/compressed: yes | no +- Raw pcap copies deleted after conclusion: yes | no +- Disk checked after cleanup: yes | no +``` + +## Shared Intent Text + +```text +Intent for next throughput test: + +We want one clean HTTPS download from osa.smilepowered.org to debby, with no Firefox and no parallel downloads. The purpose is to separate server/PF/PMTU/MSS behavior from client/access-path/bufferbloat behavior. + +Definitions: +- “osa hotspot” means phone hotspot/mobile Wi-Fi path. +- “osa server” means osa.smilepowered.org, FreeBSD 15, public 51.83.197.148, Tailscale 100.72.229.63. + +Rules: +1. Start pcaps on both sides before the download. +2. Use one curl/wget download only, preferably curl --http1.1. +3. Use UTC timestamps for before/start/stop/after. +4. Server collects TCP/PF/ICMP/ICMPv6 counters before and after. +5. Pcaps must not overwrite the beginning of the test; use enough -W or shorten runtime. +6. Exchange pcap summaries first, not raw pcaps. +7. Keep final logs/summaries/dashboard, then clean raw pcaps and downloaded test image. + +Success criteria: +- We know bytes/time/throughput. +- We know whether retransmits/dupACKs/timeouts were normal or abnormal. +- We know whether ICMP/PTB/fragmentation-needed/PMTU looked suspicious. +- We know whether latency spikes were local/access-path or server-side. +```