diff --git a/.agent/skills/fail2ban-tailscale/SKILL.md b/.agent/skills/fail2ban-tailscale/SKILL.md new file mode 100644 index 0000000..4175689 --- /dev/null +++ b/.agent/skills/fail2ban-tailscale/SKILL.md @@ -0,0 +1,123 @@ +--- +name: fail2ban-tailscale +description: "Prevent fail2ban from banning fleet SSH traffic. Root cause: password auth enabled triggers password-fallback failures during key negotiation. Fix: disable password auth or whitelist fleet IPs." +platforms: [linux] +--- + +# fail2ban & Fleet SSH Reliability + +## Root cause + +When a fleet node connects via SSH and the key doesn't match on first +attempt, `sshd` falls back to password authentication. Those password +failures accumulate in fail2ban's counters. After `maxretry = 5`, the +source Tailscale IP is banned — breaking all fleet SSH to that node. + +The trigger is NOT a brute-force attack. It's the key negotiation +sequence between trusted nodes during normal fleet operation. + +## Fix — choose one path + +### Path A: Disable password auth (recommended if key-only) + +One line, permanent. Removes the attack surface entirely — no password +attempts means no fail2ban bans: + +```sh +sudo sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config +sudo systemctl reload sshd +``` + +Pros: Zero ongoing maintenance. Works for all hosts, known or unknown. +No IP lists to update. fail2ban becomes irrelevant for SSH. + +Cons: Password login is disabled. If a node loses its private key, +physical/console access is needed. Not suitable for OOTB setups that +need password auth. + +Verification: + +```sh +ssh -o PreferredAuthentications=password localhost +# Should fail: "Permission denied (publickey)" +``` + +### Path B: Whitelist specific fleet IPs (if password auth must stay on) + +For nodes that need password auth (OOTB state, temporary access, shared +machines). Whitelist only known fleet nodes — do NOT whitelist the +entire `100.64.0.0/10` (that trusts every Tailscale device on any +tailnet): + +```sh +# Get fleet IPs from any node: +tailscale status | awk '/active|idle/{print $1}' + +echo '[DEFAULT] +ignoreip = 127.0.0.1/8 ::1 100.72.229.63 100.103.255.41 100.73.44.93 100.108.235.54 + +[sshd] +enabled = true' | sudo tee /etc/fail2ban/jail.local && sudo systemctl reload fail2ban +``` + +Pros: Password auth stays usable for operators. + +Cons: Manual maintenance — add new node IPs on join. IP changes +require updates. Forgetting to update → ban returns. + +### Path C: Both (production hardening) + +Two independent controls — if someone accidentally re-enables passwords, +the whitelist still protects; if the whitelist misses a node, key-only +auth still blocks brute-force. Apply both Path A and Path B. + +## What happens without this + +The symptom is `Connection refused` on port 22, even when: + +- `sshd` is running and listening on `0.0.0.0:22` +- `ufw`/`iptables` allows port 22 +- `tailscale ping` works (35ms pong) + +The fail2ban ban targets the Tailscale IP — the node appears reachable +but SSH is silently dropped at the kernel level. + +## FreeBSD equivalent — PF rate limiting + +FreeBSD nodes don't use fail2ban. The equivalent is PF SSH rate limiting +with `max-src-conn-rate` and an overload table: + +```pf +# /etc/pf.conf +table persist + +pass in quick on tailscale0 proto tcp from any to any port = ssh \ + flags S/SA keep state \ + (max-src-conn-rate 5/60, overload flush global) + +block quick from +``` + +5 new connections per 60 seconds per source IP. Exceeding adds the +source to `` (blocked for 10 minutes). Established +connections aren't counted — only new TCP handshakes. + +Manual unban: + +```sh +sudo pfctl -t ssh_brutes -T delete 100.72.229.63 +``` + +## Platform summary + +| Platform | Tool | Fix | +| ------------ | -------- | ---------------------------------------------- | +| Linux | fail2ban | Path A (password off) or Path B (IP whitelist) | +| FreeBSD | PF | `max-src-conn-rate` + overload table | +| Mother (osa) | PF | `max-src-conn-rate` on tailscale0 SSH rule | + +## Related + +- `freebsd-admin` — PF rule management, `max-src-conn-rate` SSH rate limiting +- `mother-hive` wiki — per-node SSH key strategy, forced-command confinement +- `hive-routing` wiki — fleet communication reliability diff --git a/.agent/skills/freebsd-admin/SKILL.md b/.agent/skills/freebsd-admin/SKILL.md index b37cc77..1bc74a0 100644 --- a/.agent/skills/freebsd-admin/SKILL.md +++ b/.agent/skills/freebsd-admin/SKILL.md @@ -56,11 +56,46 @@ For update-status questions, use the existing read-only hostd audit ops the sysadmin update-report path. Do not expose `freebsd-update fetch` or run mutating update commands for a status report. -## Tailscale controlplane exposure +## SSH & service exposure (PF rules) -When the controlplane API/dashboard is only exposed on Tailscale: +### Controlplane service ports + +When the controlplane API/dashboard is exposed on Tailscale: - allow `tailscale0` ingress to ports `3100` (direct API) and `443` (nginx proxy) + +### SSH rate limiting (FreeBSD equivalent of fail2ban) + +FreeBSD doesn't use fail2ban. PF handles SSH brute-force protection with +`max-src-conn-rate` and an overload table: + +```pf +# /etc/pf.conf +table persist + +pass in quick on tailscale0 proto tcp from any to any port = ssh \ + flags S/SA keep state \ + (max-src-conn-rate 5/60, overload flush global) + +block quick from +``` + +- `5/60`: 5 new connections per 60 seconds per source IP +- `overload`: source added to `` table on exceed +- `flush global`: entries expire after 600 seconds (10 min) +- `keep state`: only new TCP handshakes count; existing sessions are free + +Manual operations: + +```sh +sudo pfctl -t ssh_brutes -T show # list banned IPs +sudo pfctl -t ssh_brutes -T delete 100.72.229.63 # unban specific IP +sudo pfctl -t ssh_brutes -T flush # clear all bans +``` + +For the Linux fleet fail2ban equivalent, see +[fail2ban-tailscale skill](../fail2ban-tailscale/SKILL.md). + - validate PF before reload (`sudo pfctl -nf /etc/pf.conf`) and then `sudo service pf reload` ## Workflow