Add network throughput diagnostic skill (Sam & Claude)
Document a clean osa-to-debby HTTPS download test with synchronized pcaps, counters, Hermes summaries, graph inputs, and mandatory cleanup. --- Build: pass | Tests: pass — 2456 passed (182 files)
This commit is contained in:
parent
8637c52e39
commit
466ebefd63
1 changed files with 495 additions and 0 deletions
495
.agent/skills/network-throughput/SKILL.md
Normal file
495
.agent/skills/network-throughput/SKILL.md
Normal file
|
|
@ -0,0 +1,495 @@
|
|||
---
|
||||
name: network-throughput
|
||||
description: Run a clean osa-to-debby HTTPS network throughput test with packet captures, counters, cleanup, and Hermes graph-ready artifacts
|
||||
compatibility: FreeBSD 15.x server, Linux client
|
||||
invoke_patterns:
|
||||
- "Run throughput test"
|
||||
- "Run network throughput test"
|
||||
- "Test download speed"
|
||||
- "Capture pcaps on both sides"
|
||||
- "Diagnose image download"
|
||||
- "Network throughput test"
|
||||
- "Network troughput test"
|
||||
estimated_tokens: 1400-2200
|
||||
---
|
||||
|
||||
# network-throughput
|
||||
|
||||
Run one clean HTTPS image download from osa FreeBSD server to debby, with packet capture and counters on both sides, to separate PF/PMTU/MSS/server behavior from access-path/bufferbloat behavior.
|
||||
|
||||
Main rule:
|
||||
|
||||
> One download. Captures first. No overwrite. Summaries first. Cleanup after.
|
||||
|
||||
## Definitions
|
||||
|
||||
Record these before the run so there is no ambiguity:
|
||||
|
||||
- `client`: debby
|
||||
- `client network`: house Wi-Fi | hotspot | wired | other
|
||||
- `server`: osa.smilepowered.org
|
||||
- `server public IP`: 51.83.197.148
|
||||
- `server Tailscale IP`: 100.72.229.63
|
||||
- `URL`: exact image URL under test
|
||||
- `downloader`: curl or wget only; prefer curl with `--http1.1`
|
||||
|
||||
Terminology:
|
||||
|
||||
- "osa hotspot" means the phone hotspot/mobile Wi-Fi path.
|
||||
- "osa server" means `osa.smilepowered.org`, FreeBSD 15, public `51.83.197.148`, Tailscale `100.72.229.63`.
|
||||
|
||||
## Safety Rules
|
||||
|
||||
- Ask before starting any privileged capture or firewall-visible test.
|
||||
- Use one test download only. Do not run Firefox/browser downloads in parallel.
|
||||
- Use project-local scratch directories, not system `/tmp` or `/var/tmp`.
|
||||
- On FreeBSD, run root commands in the visible tmux root window when one is available.
|
||||
- Do not leave large pcaps or duplicate downloaded images behind after analysis.
|
||||
- Do not change PF during the measurement unless the purpose of the test is explicitly PF comparison.
|
||||
- Use UTC timestamps at before-counters, pcap start, download start, download stop, and after-counters.
|
||||
|
||||
## Variables
|
||||
|
||||
Set these before the run.
|
||||
|
||||
### Server / osa
|
||||
|
||||
```sh
|
||||
TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)"
|
||||
SERVER_IF="vtnet0"
|
||||
SERVER_DIR="/home/clawdie/clawdie-iso/tmp/network-tests/$TEST_ID"
|
||||
CLIENT_IP="<debby-public-ip-if-known>"
|
||||
SERVER_IP="51.83.197.148"
|
||||
SERVER_TS_IP="100.72.229.63"
|
||||
DURATION_SEC="600"
|
||||
```
|
||||
|
||||
### Linux client / debby / Hermes
|
||||
|
||||
Run from the Hermes project/repo root, then keep artifacts under that repo `tmp/` directory:
|
||||
|
||||
```sh
|
||||
TEST_ID="osa-clean-download-$(date -u +%Y%m%dT%H%M%SZ)"
|
||||
CLIENT_DIR="$PWD/tmp/network-tests/$TEST_ID"
|
||||
URL="https://osa.smilepowered.org/downloads/iso/clawdie-xfce-operator-usb-fbsd15.0-amd64-15.maj.2026.img.gz"
|
||||
SHA_URL="$URL.sha256"
|
||||
SERVER_IP="51.83.197.148"
|
||||
SERVER_TS_IP="100.72.229.63"
|
||||
DURATION_SEC="600"
|
||||
```
|
||||
|
||||
## Preflight
|
||||
|
||||
### Server before-counters
|
||||
|
||||
```sh
|
||||
mkdir -p "$SERVER_DIR"
|
||||
date -u '+before_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$SERVER_DIR/timestamps.txt"
|
||||
df -h /home/clawdie > "$SERVER_DIR/df-before.txt"
|
||||
ifconfig "$SERVER_IF" > "$SERVER_DIR/ifconfig-server-if.txt"
|
||||
netstat -rn > "$SERVER_DIR/routes.txt"
|
||||
/sbin/pfctl -si -v > "$SERVER_DIR/pf-before.txt"
|
||||
netstat -s -p tcp > "$SERVER_DIR/tcp-before.txt"
|
||||
netstat -s -p ip > "$SERVER_DIR/ip-before.txt"
|
||||
netstat -s -p icmp > "$SERVER_DIR/icmp-before.txt"
|
||||
netstat -s -p icmp6 > "$SERVER_DIR/icmp6-before.txt"
|
||||
```
|
||||
|
||||
Also save the TCP lines we care about most:
|
||||
|
||||
```sh
|
||||
netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-before-key-lines.txt"
|
||||
```
|
||||
|
||||
### Linux client preflight
|
||||
|
||||
```sh
|
||||
mkdir -p "$CLIENT_DIR"
|
||||
date -u '+client_preflight_utc=%Y-%m-%dT%H:%M:%SZ' | tee "$CLIENT_DIR/timestamps.txt"
|
||||
df -h . > "$CLIENT_DIR/df-before.txt"
|
||||
ip addr > "$CLIENT_DIR/ip-addr.txt"
|
||||
ip route > "$CLIENT_DIR/ip-route.txt"
|
||||
curl -4 -s https://ifconfig.me > "$CLIENT_DIR/public-ip.txt"
|
||||
ip route get "$SERVER_IP" > "$CLIENT_DIR/route-to-osa-public.txt"
|
||||
```
|
||||
|
||||
Pick the outbound interface from `route-to-osa-public.txt`:
|
||||
|
||||
```sh
|
||||
CLIENT_IF="<interface-from-ip-route-get>"
|
||||
```
|
||||
|
||||
## Packet Capture
|
||||
|
||||
Start both captures before the download. Wait until tcpdump prints `listening on ...` on both sides.
|
||||
|
||||
Avoid tiny rotating pcap rings. For a 5-10 minute test, either use a single non-overwriting pcap or enough ring files to preserve the beginning/SYN.
|
||||
|
||||
### Server capture
|
||||
|
||||
If client public IP is known, prefer this filter:
|
||||
|
||||
```sh
|
||||
date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
|
||||
/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \
|
||||
-s 0 \
|
||||
-C 200 -W 20 \
|
||||
-w "$SERVER_DIR/osa-vtnet0-download.pcap" \
|
||||
"(host $CLIENT_IP and tcp port 443) or icmp or icmp6"
|
||||
```
|
||||
|
||||
If client public IP is not known yet, temporarily capture all HTTPS plus ICMP/PTB signals:
|
||||
|
||||
```sh
|
||||
date -u '+server_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
|
||||
/usr/bin/timeout 900 /usr/sbin/tcpdump -ni "$SERVER_IF" \
|
||||
-s 0 \
|
||||
-C 200 -W 20 \
|
||||
-w "$SERVER_DIR/osa-vtnet0-download.pcap" \
|
||||
"tcp port 443 or icmp or icmp6"
|
||||
```
|
||||
|
||||
### Linux client capture
|
||||
|
||||
Run in a root shell or with sudo, but store output under the Hermes project `tmp/` directory:
|
||||
|
||||
```sh
|
||||
date -u '+client_pcap_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$CLIENT_DIR/timestamps.txt"
|
||||
sudo /usr/bin/timeout 900 tcpdump -ni "$CLIENT_IF" \
|
||||
-s 0 \
|
||||
-C 200 -W 20 \
|
||||
-w "$CLIENT_DIR/debby-osa-public.pcap" \
|
||||
"host $SERVER_IP and (tcp port 443 or icmp or icmp6)"
|
||||
```
|
||||
|
||||
## Client Latency Monitor
|
||||
|
||||
Run this on debby while the download runs. It helps identify access-path/bufferbloat symptoms.
|
||||
|
||||
```sh
|
||||
GW=$(ip route show default | awk '{print $3; exit}')
|
||||
while true; do
|
||||
date -u '+utc=%Y-%m-%dT%H:%M:%SZ'
|
||||
for t in "$GW" 1.1.1.1 "$SERVER_TS_IP" 100.103.255.41; do
|
||||
echo "### $t"
|
||||
ping -c 5 -i 0.2 -W 2 "$t"
|
||||
done
|
||||
sleep 5
|
||||
done | tee "$CLIENT_DIR/ping-monitor.log"
|
||||
```
|
||||
|
||||
Stop it when the download stops.
|
||||
|
||||
## Download Test
|
||||
|
||||
Use a CLI downloader, not a browser. For diagnosis, default to a fresh output file and no resume. Prefer `curl --http1.1` so the test is one simple TCP flow and avoids HTTP/2 multiplexing noise.
|
||||
|
||||
### Full artifact download
|
||||
|
||||
```sh
|
||||
cd "$CLIENT_DIR"
|
||||
date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
|
||||
curl -L \
|
||||
--fail \
|
||||
--http1.1 \
|
||||
--output clawdie-test.img.gz \
|
||||
--write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \
|
||||
"$URL" \
|
||||
2>&1 | tee curl-download.log
|
||||
date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
|
||||
```
|
||||
|
||||
### Bounded 5-10 minute sample
|
||||
|
||||
Use this when disk or time is limited. A timeout exit is acceptable; we care about behavior during the window.
|
||||
|
||||
```sh
|
||||
cd "$CLIENT_DIR"
|
||||
date -u '+download_start_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
|
||||
timeout "$DURATION_SEC" curl -L \
|
||||
--fail \
|
||||
--http1.1 \
|
||||
--output clawdie-test.img.gz \
|
||||
--write-out '\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\ntime_total=%{time_total}\nspeed_download=%{speed_download}\nsize_download=%{size_download}\nhttp_code=%{http_code}\n' \
|
||||
"$URL" \
|
||||
2>&1 | tee curl-download.log
|
||||
date -u '+download_end_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a timestamps.txt
|
||||
```
|
||||
|
||||
## Postflight
|
||||
|
||||
Stop captures first, then collect after-counters.
|
||||
|
||||
### Server after-counters
|
||||
|
||||
```sh
|
||||
date -u '+after_counters_utc=%Y-%m-%dT%H:%M:%SZ' | tee -a "$SERVER_DIR/timestamps.txt"
|
||||
/sbin/pfctl -si -v > "$SERVER_DIR/pf-after.txt"
|
||||
netstat -s -p tcp > "$SERVER_DIR/tcp-after.txt"
|
||||
netstat -s -p ip > "$SERVER_DIR/ip-after.txt"
|
||||
netstat -s -p icmp > "$SERVER_DIR/icmp-after.txt"
|
||||
netstat -s -p icmp6 > "$SERVER_DIR/icmp6-after.txt"
|
||||
netstat -s -p tcp | grep -E 'retransmitted|retransmit timeouts|resend initiated by MTU discovery|Path MTU|black hole' > "$SERVER_DIR/tcp-after-key-lines.txt"
|
||||
ls -lh "$SERVER_DIR" > "$SERVER_DIR/ls.txt"
|
||||
```
|
||||
|
||||
### Linux client postflight
|
||||
|
||||
```sh
|
||||
ls -lh "$CLIENT_DIR" > "$CLIENT_DIR/ls.txt"
|
||||
curl -L --fail -o "$CLIENT_DIR/expected.sha256" "$SHA_URL"
|
||||
```
|
||||
|
||||
If the full image completed, verify checksum:
|
||||
|
||||
```sh
|
||||
cd "$CLIENT_DIR"
|
||||
sha256sum -c expected.sha256
|
||||
```
|
||||
|
||||
## Summary-First Exchange
|
||||
|
||||
Before copying raw pcaps around, generate text summaries on each side. Exchange summaries first; only copy raw pcaps if something suspicious needs deeper inspection.
|
||||
|
||||
```sh
|
||||
PCAP="/path/to/file.pcap"
|
||||
OUT="$PCAP.summary.txt"
|
||||
{
|
||||
echo "## capinfos"
|
||||
capinfos "$PCAP"
|
||||
|
||||
echo
|
||||
echo "## tcp conversations"
|
||||
tshark -r "$PCAP" -q -z conv,tcp
|
||||
|
||||
echo
|
||||
echo "## 1s io stats"
|
||||
tshark -r "$PCAP" -q \
|
||||
-z io,stat,1,'tcp','icmp || icmpv6','tcp.analysis.retransmission || tcp.analysis.fast_retransmission','tcp.analysis.duplicate_ack','tcp.analysis.zero_window'
|
||||
|
||||
echo
|
||||
echo "## SYN MSS/window options"
|
||||
tshark -r "$PCAP" -Y 'tcp.flags.syn == 1' -T fields \
|
||||
-e frame.time_relative \
|
||||
-e ip.src -e ip.dst \
|
||||
-e tcp.srcport -e tcp.dstport \
|
||||
-e tcp.flags.ack \
|
||||
-e tcp.options.mss_val \
|
||||
-e tcp.window_size_value \
|
||||
-e tcp.options.wscale.shift \
|
||||
-e tcp.options.sack_perm \
|
||||
| head -100
|
||||
|
||||
echo
|
||||
echo "## interesting TCP/ICMP"
|
||||
tshark -r "$PCAP" \
|
||||
-Y 'tcp.analysis.retransmission or tcp.analysis.fast_retransmission or tcp.analysis.duplicate_ack or tcp.analysis.zero_window or tcp.analysis.out_of_order or tcp.analysis.lost_segment or icmp or icmpv6' \
|
||||
-T fields \
|
||||
-e frame.time_relative \
|
||||
-e frame.number \
|
||||
-e ip.src -e ip.dst \
|
||||
-e _ws.col.Protocol \
|
||||
-e tcp.srcport -e tcp.dstport \
|
||||
-e tcp.seq -e tcp.ack -e tcp.len \
|
||||
-e tcp.analysis.retransmission \
|
||||
-e tcp.analysis.fast_retransmission \
|
||||
-e tcp.analysis.duplicate_ack \
|
||||
-e tcp.analysis.duplicate_ack_num \
|
||||
-e tcp.analysis.zero_window \
|
||||
-e tcp.analysis.out_of_order \
|
||||
-e tcp.analysis.lost_segment \
|
||||
-e icmp.type -e icmp.code \
|
||||
-e icmpv6.type -e icmpv6.code \
|
||||
| head -300
|
||||
|
||||
echo
|
||||
echo "## expert"
|
||||
tshark -r "$PCAP" -q -z expert | head -300
|
||||
} > "$OUT"
|
||||
```
|
||||
|
||||
## Hermes Graph Inputs
|
||||
|
||||
Hermes should process locally from the client-side artifacts plus copied server-side summaries/artifacts:
|
||||
|
||||
- `curl-download.log`
|
||||
- `ping-monitor.log`
|
||||
- `timestamps.txt`
|
||||
- `pf-before.txt`, `pf-after.txt`
|
||||
- `tcp-before.txt`, `tcp-after.txt`
|
||||
- `ip-before.txt`, `ip-after.txt`
|
||||
- `icmp-before.txt`, `icmp-after.txt`
|
||||
- `icmp6-before.txt`, `icmp6-after.txt`
|
||||
- `*.pcap.summary.txt`
|
||||
- raw `*.pcap*` only if needed
|
||||
|
||||
Recommended graph outputs:
|
||||
|
||||
- throughput over time
|
||||
- retransmissions over time
|
||||
- duplicate ACKs over time
|
||||
- RTT estimate over time, if available
|
||||
- TCP window/zero-window events over time
|
||||
- bytes by flow
|
||||
- gateway/internet/Tailscale latency during download
|
||||
- timeline of stalls or timeout bursts
|
||||
|
||||
## Cleanup
|
||||
|
||||
Cleanup is mandatory after the report/dashboard exists.
|
||||
|
||||
Keep:
|
||||
|
||||
- curl log
|
||||
- ping monitor log
|
||||
- counter before/after text
|
||||
- pcap summaries
|
||||
- final dashboard/report
|
||||
|
||||
Remove:
|
||||
|
||||
- downloaded test image
|
||||
- partial image
|
||||
- raw pcaps, unless still needed
|
||||
|
||||
### Client cleanup
|
||||
|
||||
```sh
|
||||
rm -f "$CLIENT_DIR/clawdie-test.img.gz"
|
||||
rm -f "$CLIENT_DIR"/*.partial
|
||||
```
|
||||
|
||||
If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`:
|
||||
|
||||
```sh
|
||||
for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do
|
||||
[ -e "$f" ] || continue
|
||||
case "$f" in
|
||||
*.summary.txt|*.gz) continue ;;
|
||||
esac
|
||||
rm -f "$f"
|
||||
done
|
||||
```
|
||||
|
||||
If suspicious and raw pcaps must be kept temporarily, compress only raw capture files:
|
||||
|
||||
```sh
|
||||
for f in "$CLIENT_DIR"/*.pcap "$CLIENT_DIR"/*.pcap[0-9]*; do
|
||||
[ -e "$f" ] || continue
|
||||
case "$f" in
|
||||
*.summary.txt|*.gz) continue ;;
|
||||
esac
|
||||
gzip -9 "$f"
|
||||
done
|
||||
```
|
||||
|
||||
### Server cleanup
|
||||
|
||||
If raw pcaps are no longer needed, delete only raw capture files and keep `*.summary.txt`:
|
||||
|
||||
```sh
|
||||
for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do
|
||||
[ -e "$f" ] || continue
|
||||
case "$f" in
|
||||
*.summary.txt|*.gz) continue ;;
|
||||
esac
|
||||
rm -f "$f"
|
||||
done
|
||||
```
|
||||
|
||||
If suspicious and raw pcaps must be kept temporarily, compress only raw capture files:
|
||||
|
||||
```sh
|
||||
for f in "$SERVER_DIR"/*.pcap "$SERVER_DIR"/*.pcap[0-9]*; do
|
||||
[ -e "$f" ] || continue
|
||||
case "$f" in
|
||||
*.summary.txt|*.gz) continue ;;
|
||||
esac
|
||||
gzip -9 "$f"
|
||||
done
|
||||
```
|
||||
|
||||
Then confirm space recovered:
|
||||
|
||||
```sh
|
||||
df -h /home/clawdie
|
||||
```
|
||||
|
||||
## Result Summary Template
|
||||
|
||||
```text
|
||||
Test ID:
|
||||
Date/time UTC:
|
||||
Client: debby
|
||||
Client network path: house Wi-Fi | hotspot | wired | other
|
||||
Server: osa.smilepowered.org
|
||||
Server public IP: 51.83.197.148
|
||||
Server Tailscale IP: 100.72.229.63
|
||||
URL:
|
||||
Downloader: curl --http1.1 | wget | other
|
||||
Parallel downloads: no | yes, describe
|
||||
Server capture duration:
|
||||
Client capture duration:
|
||||
Downloaded bytes:
|
||||
Average throughput:
|
||||
Checksum result: pass | fail | partial only
|
||||
|
||||
Latency:
|
||||
- Gateway latency during download:
|
||||
- 1.1.1.1 latency during download:
|
||||
- osa Tailscale latency during download:
|
||||
- domedog latency during download:
|
||||
|
||||
PF/PMTU/MSS:
|
||||
- PF state/search/fragment counters changed:
|
||||
- ICMP frag-needed / IPv6 Packet Too Big seen:
|
||||
- PMTU black-hole counters:
|
||||
- SYN MSS values:
|
||||
|
||||
TCP:
|
||||
- Server retransmitted data delta:
|
||||
- Server retransmit timeout delta:
|
||||
- Pcap retransmissions:
|
||||
- Fast retransmissions:
|
||||
- Duplicate ACKs:
|
||||
- Zero-window events:
|
||||
|
||||
Conclusion:
|
||||
- PF likely root cause? yes | no | inconclusive
|
||||
- PMTU/MSS likely root cause? yes | no | inconclusive
|
||||
- Access path/bufferbloat suspect? yes | no | inconclusive
|
||||
- Next action:
|
||||
|
||||
Cleanup:
|
||||
- Large client image removed: yes | no
|
||||
- Large server pcap removed/compressed: yes | no
|
||||
- Raw pcap copies deleted after conclusion: yes | no
|
||||
- Disk checked after cleanup: yes | no
|
||||
```
|
||||
|
||||
## Shared Intent Text
|
||||
|
||||
```text
|
||||
Intent for next throughput test:
|
||||
|
||||
We want one clean HTTPS download from osa.smilepowered.org to debby, with no Firefox and no parallel downloads. The purpose is to separate server/PF/PMTU/MSS behavior from client/access-path/bufferbloat behavior.
|
||||
|
||||
Definitions:
|
||||
- “osa hotspot” means phone hotspot/mobile Wi-Fi path.
|
||||
- “osa server” means osa.smilepowered.org, FreeBSD 15, public 51.83.197.148, Tailscale 100.72.229.63.
|
||||
|
||||
Rules:
|
||||
1. Start pcaps on both sides before the download.
|
||||
2. Use one curl/wget download only, preferably curl --http1.1.
|
||||
3. Use UTC timestamps for before/start/stop/after.
|
||||
4. Server collects TCP/PF/ICMP/ICMPv6 counters before and after.
|
||||
5. Pcaps must not overwrite the beginning of the test; use enough -W or shorten runtime.
|
||||
6. Exchange pcap summaries first, not raw pcaps.
|
||||
7. Keep final logs/summaries/dashboard, then clean raw pcaps and downloaded test image.
|
||||
|
||||
Success criteria:
|
||||
- We know bytes/time/throughput.
|
||||
- We know whether retransmits/dupACKs/timeouts were normal or abnormal.
|
||||
- We know whether ICMP/PTB/fragmentation-needed/PMTU looked suspicious.
|
||||
- We know whether latency spikes were local/access-path or server-side.
|
||||
```
|
||||
Loading…
Add table
Reference in a new issue