layered-soul/skills/systematic-debugging/references/network-ssh-wifi-diagnostics.md

91 lines
3.1 KiB
Markdown
Raw Normal View History

# Network / SSH / Wi-Fi diagnostic notes
Use this reference when investigating SSH/tmux lag, Wi-Fi jitter, hotspot switching, Tailscale reachability, or projector/network interference.
## Logging convention for this user
The user dislikes Desktop clutter. For repeated diagnostics, prefer one timestamped logfile under:
```bash
mkdir -p "$HOME/.local/state/hermes/net-tests"
LOG="$HOME/.local/state/hermes/net-tests/net-test-$(date +%Y%m%d-%H%M%S).log"
{ ...diagnostics...; } | tee "$LOG"
```
Use `/tmp` only for throwaway scratch output. Do not create multiple Desktop files unless explicitly requested.
## Network-switch pitfall: stale SSH sessions
When a host switches Wi-Fi networks, existing public SSH sessions may break even if the server is still reachable. Inspect live sockets before concluding the remote is down:
```bash
ss -nti '( sport = :22 or dport = :22 )'
ip -brief addr
ip route
```
Signs of a stale/broken old path include:
- socket local address from previous network, or FIN-WAIT state
- nonzero Send-Q / notsent data
- retrans/backoff
- weird PMTU collapse
A Tailscale SSH session to another host may survive the same switch if it is bound to the Tailscale path, while a public SSH session to the same or another host may die.
## Always compare public vs tailnet identities
For hosts with both public DNS and MagicDNS/tailnet DNS, resolve and test both forms:
```bash
getent ahosts osa.smilepowered.org
getent ahosts osa.taile682b7.ts.net
ip route get <public-ip>
ip route get <tailscale-ip>
tailscale ping --timeout=5s --c 5 <name>
timeout 6 bash -c '</dev/tcp/<host>/22' && echo TCP/22-open || echo TCP/22-failed
```
ICMP ping may fail while TCP/22 and `tailscale ping` work, so do not use ICMP alone as the availability test.
## Hotspot/router gateway test
Do not hardcode the gateway when comparing Wi-Fi vs phone hotspot. Derive it from the current route:
```bash
GW=$(ip route show default | awk '{print $3; exit}')
ping -c 80 -i 0.1 "$GW"
ping -c 80 -i 0.1 1.1.1.1
```
For the user's tested setup, home Wi-Fi used `192.168.1.1`, while phone hotspot `osa` used `10.91.179.29`. A script hardcoded to `192.168.1.1` falsely reports gateway loss after switching to the hotspot.
## Useful single-log command skeleton
```bash
mkdir -p "$HOME/.local/state/hermes/net-tests"
LOG="$HOME/.local/state/hermes/net-tests/ssh-wifi-$(date +%Y%m%d-%H%M%S).log"
{
date
echo '## wifi/default route'
ip -brief addr show wlp1s0 || true
ip route || true
nmcli -f GENERAL.CONNECTION,GENERAL.DEVICE,GENERAL.STATE,GENERAL.METERED,IP4.ADDRESS,IP4.GATEWAY,IP4.DNS device show wlp1s0 2>/dev/null || true
nmcli -f IN-USE,SSID,BSSID,CHAN,FREQ,RATE,SIGNAL,BARS,SECURITY dev wifi list --rescan no 2>/dev/null || true
echo '## tailscale'
tailscale status 2>&1 || true
tailscale netcheck 2>&1 || true
echo '## routes and sockets'
ss -nti '( sport = :22 or dport = :22 )' 2>/dev/null || true
GW=$(ip route show default | awk '{print $3; exit}')
for target in "$GW" 1.1.1.1 ${DOMEDOG_TS_IP}; do
echo "--- ping $target"
ping -c 30 -i 0.2 "$target" 2>&1 | tail -8 || true
done
} | tee "$LOG"
echo "$LOG"
```