- SOUL.md: full agent identity, operating principles, voice - IDENTITY.md: runtime identity, hosts, boundaries - USER.md: operator context imported from hermes-soul - AGENTS.md: actual operating rules, infrastructure, quick reference - memories/curated/: 5 topics (tailscale, forgejo, agents, projects, vaultwarden) - skills/: 9 cross-harness skills imported from hermes-soul after review - docs/PLAN-CONFIGURE-PRIVATE-REPO.md: configuration plan - Validate: passes clean
5.3 KiB
Network live diagnostics patterns
Use this reference for recurring Wi-Fi/Tailscale/SSH/tmux lag investigations, especially when the user is switching networks or running a large download.
Log location and clutter rule
Prefer a single bounded log under:
~/.local/state/hermes/net-tests/
Avoid placing diagnostic files on the Desktop unless the user explicitly asks. Desktop files become clutter quickly; dashboards and generated artifacts fit better under:
~/.local/share/hermes/net-dashboard/
Safe live monitoring during large downloads
When a large download is active and disk space is limited, do not start unbounded packet captures. First collect lightweight evidence:
df -h / /home /tmpfor disk headroomss -tinpfor per-TCP socket RTT, retransmits, reorder, bytes_receivedip -s link show <iface>for interface counters- short ping samples to gateway, public internet, and relevant Tailscale peers
nmcli ... dev wififor SSID/channel/signal
Use JSONL or compact text summaries. Bound collection by all of:
- max runtime
- sample interval
- max log size
- disk free-space stop threshold
Example safety shape:
mkdir -p ~/.local/state/hermes/net-tests
MON_INTERVAL=10 MON_MAX_SECONDS=1800 MON_MAX_BYTES=2097152 \
MON_WARN_FREE_GB=15 MON_STOP_FREE_GB=10 \
~/.local/share/hermes/net-dashboard/live_download_monitor.py
What to inspect in ss -tinp
Relevant fields:
rtt:<avg>/<variance>— path latency and jitterbytes_received— confirms a download is progressingbytes_retransandretrans:<active>/<total>— TCP loss/retry evidencereord_seen/rcv_ooopack— packet reordering/out-of-order deliverySend-Q— a stuck send queue can explain a frozen SSH session- stale local IPs after a Wi-Fi switch — old public SSH sessions may die while Tailscale sessions survive
Large download interpretation
If gateway ping stays clean but internet/Tailscale pings jump to hundreds or thousands of ms during a large download, suspect uplink/downlink saturation or bufferbloat on the hotspot/ISP path, not local Wi-Fi driver failure.
Useful pattern observed:
- hotspot gateway: ~1–3 ms, 0% loss
- internet/Tailscale peers: high latency / intermittent ping loss during download
- active HTTPS socket bytes_received increasing
- SSH sockets remain established but interactive use feels laggy
This indicates the download is filling the pipe and interactive packets wait behind it.
Wireshark/tshark inclusion
Only add packet capture after lightweight counters show what to focus on. Avoid full unfiltered captures during large downloads. If tshark is available, prefer short, filtered, capped captures and extract summaries into the dashboard/log:
sudo tshark -i wlp1s0 -a duration:60 -w capture.pcapng \
-f 'host <download-ip> or host <tailnet-ip> or port 22'
tshark -r capture.pcapng \
-Y 'tcp.analysis.retransmission or tcp.flags.reset==1 or dns or tcp.port==22' \
-T fields -e frame.time -e ip.src -e ip.dst \
-e tcp.analysis.retransmission -e tcp.flags.reset -e dns.qry.name
Keep raw pcaps under ~/.local/state/hermes/net-tests/ and summarize them; do not dump large packet logs into chat.
Local dashboard pattern
For recurring investigations, a static dashboard is often simpler than a full app. A starter implementation is available at scripts/network_story_dashboard.py:
- JSONL/text logs in
~/.local/state/hermes/net-tests/ - generated HTML in
~/.local/share/hermes/net-dashboard/dashboard.html - served by
python3 -m http.server <port> --directory ~/.local/share/hermes/net-dashboard
This gives history progression without requiring Node, npm, databases, or MCP. MCP is optional later if the workflow becomes a reusable tool API (start capture, summarize latest, append event, etc.).
Make dashboards understandable to non-technical viewers
When the dashboard is meant for a roommate/family member or another non-technical observer, do not lead with numeric tables. Lead with a story:
- one big chart that answers "did the network spike?"
- color-coded lines: local gateway/Wi-Fi hop, public internet, Tailscale peer, download/progress, free disk
- checkboxes to turn lines on/off
- plain-language event cards: "Phone hotspot stayed clean", "Internet got laggy under load"
- hide raw ping/SSH/TCP tables under
<details><summary>technical details</summary> - filters for event classes: Download stress, Projector/Epson, Tailscale/SSH, Wi-Fi changes
For interference testing such as "turn on Epson/projector", run at least two comparable bounded windows:
- before event / baseline
- event active / projector on
- optional after event / projector off recovery
Then visualize the event as a clear marker or separate cards so the viewer can say either "lag started exactly when Epson turned on" or "Epson did not correlate with the spikes."
Static HTML JSON pitfall
If embedding event data in HTML, use raw JSON in an inert script block:
data_json = json.dumps(data, ensure_ascii=False).replace("</", "<\\/")
html = f'<script id="data" type="application/json">{data_json}</script>'
Do not wrap the JSON with html.escape(). In <script type="application/json">, textContent will still contain literal entities such as ", so JSON.parse() fails and the dashboard may show only static summary counts with a blank chart/timeline.