docs(hive-routing): fleet SSH reliability — password off, agent keys, PF limits #246

Merged
clawdie merged 1 commit from fix/ssh-agent-keys-persistence into main 2026-06-27 23:51:11 +02:00

View file

@ -398,3 +398,51 @@ T2.x Eval Harness 📋 Task success measurement
```
The key insight: local LLM is the **ultimate cache-hit token**. Every token generated on a beefy node's GPU is $0.0000. The routing engine's job is to maximize the use of $0 tokens without compromising task success rates.
## Fleet SSH reliability
Two one-liner configs that prevent SSH interruptions and ksshaskpass popups
on fleet nodes:
### 1. Disable password auth — no brute-force surface
When a fleet node connects and the key doesn't match on first attempt, sshd
falls back to password authentication. Fail2ban counts those as failures and
bans the source IP after `maxretry` attempts. With password auth off, there
is nothing to brute-force:
```sh
# /etc/ssh/sshd_config
PasswordAuthentication no
```
Caveat: nodes with password auth disabled need physical/console access if
they lose their private key.
### 2. Auto-add keys to agent — no ksshaskpass popups
When `ssh-agent` has no identities, Kitty SSH triggers ksshaskpass on
reconnect. `AddKeysToAgent yes` auto-loads keys on first use:
```
# ~/.ssh/config
Host *
AddKeysToAgent yes
```
### 3. FreeBSD: PF rate limiting
On FreeBSD nodes, `max-src-conn-rate 5/60` with `<ssh_brutes>` table
provides the same protection independently of fail2ban:
```sh
# /etc/pf.conf
table <ssh_brutes> persist
pass in proto tcp to port 22 \
max-src-conn-rate 5/60 overload <ssh_brutes> flush global
```
---
→ [fail2ban-tailscale skill](../../.agent/skills/fail2ban-tailscale/SKILL.md)
→ [freebsd-admin skill](../../.agent/skills/freebsd-admin/SKILL.md)