layered-soul/skills/systematic-debugging/references/network-ssh-wifi-lag.md

63 lines
3.4 KiB
Markdown
Raw Normal View History

# Network / SSH / tmux lag triage reference
Use this when a session investigates interactive lag, Wi-Fi quality, suspected RF/interference from a projector/peripheral, or whether kernel Wi-Fi drivers/firmware are at fault.
## Evidence hierarchy
1. Confirm the Wi-Fi stack:
- `lspci -nnk | sed -n '/Network controller/,+8p'`
- `nmcli -f GENERAL,WIFI-PROPERTIES,IP4 device show <iface>`
- `modinfo <driver>`
- `journalctl -k --since today --no-pager | egrep -i 'iwlwifi|wlp|wlan|firmware|deauth|disconnect|timeout|fail|error'`
2. Check package freshness only against configured repos, not against vague "newest upstream":
- `apt-cache policy firmware-iwlwifi linux-image-amd64 wireless-regdb network-manager iw`
- `apt list --upgradable 2>/dev/null | egrep 'linux-image|firmware|wireless|network-manager|iw'`
3. Separate harmless firewall/discovery noise from the interactive path:
- `UFW BLOCK` from router DNS or LAN discovery ports 1900/5353/5355/3702 can be normal background noise.
- Do not treat it as root cause unless it correlates with retransmits, loss, or the exact failing flow.
4. Inspect actual SSH sockets:
- `ss -nti '( sport = :22 or dport = :22 )'`
- Strong fields: `rtt:<avg>/<var>`, `bytes_retrans`, `retrans:<active>/<total>`, `dsack_dups`, `reord_seen`, `rcv_ooopack`, `cwnd`.
5. Compare network layers:
- router/default gateway ping for local Wi-Fi/AP jitter
- public IP ping for ISP/internet jitter
- overlay/VPN peer ping if Tailscale/WireGuard is involved
6. Inspect Wi-Fi conditions:
- `nmcli -f ACTIVE,SSID,BSSID,CHAN,RATE,SIGNAL,BARS,SECURITY dev wifi list --rescan yes`
- `iw dev <iface> link`
- `ip -s link show <iface>`
- 2.4 GHz channels plus weak/mid signal often explain sticky SSH even with zero packet loss.
7. For suspected projector/peripheral interference, run a before/after capture with identical duration and compare:
- gateway RTT max/mdev
- SSH retransmits/reordering before vs after
- Wi-Fi signal/rate/channel before vs after
- packet capture only if tshark/tcpdump is installed and useful
## Reusable scripts
This skill includes:
- `scripts/network-lag-baseline.sh`: single snapshot collector for Wi-Fi, SSH sockets, pings, Tailscale state, and recent logs.
- `scripts/network-interference-capture.sh`: timed before/after test with ping streams and optional tshark/tcpdump packet capture.
Copy scripts out or run them from the skill directory. Prefer setting env vars rather than hardcoding hosts:
```bash
WIFI_IFACE=wlp1s0 ROUTER_IP=192.168.1.1 REMOTE_IP=100.103.255.41 ./network-lag-baseline.sh ./baseline.txt
WIFI_IFACE=wlp1s0 ROUTER_IP=192.168.1.1 REMOTE_IP=100.103.255.41 ./network-interference-capture.sh 90 ./before-projector
WIFI_IFACE=wlp1s0 ROUTER_IP=192.168.1.1 REMOTE_IP=100.103.255.41 ./network-interference-capture.sh 90 ./after-projector
```
## Reporting pattern
Report in this order:
1. Driver/firmware status and whether configured repositories offer newer packages.
2. Kernel/journal findings: explicitly distinguish driver crashes/deauths from unrelated boot warnings or firewall noise.
3. Wi-Fi signal/band/channel and local gateway jitter.
4. SSH socket evidence: retransmits/reordering/RTT variance.
5. Overlay path comparison, if present.
6. Next experiment: before/after test or packet capture.
Avoid jumping to fixes like changing drivers, disabling firewalls, or changing power settings until the above evidence points there.