2026-06-28 00:23:58 +02:00
2 changed files with 160 additions and 2 deletions
--- a/.agent/skills/fail2ban-tailscale/SKILL.md
+++ b/.agent/skills/fail2ban-tailscale/SKILL.md
@ -0,0 +1,123 @@
+---
+name: fail2ban-tailscale
+description: "Prevent fail2ban from banning fleet SSH traffic. Root cause: password auth enabled triggers password-fallback failures during key negotiation. Fix: disable password auth or whitelist fleet IPs."
+platforms: [linux]
+---
+
+# fail2ban & Fleet SSH Reliability
+
+## Root cause
+
+When a fleet node connects via SSH and the key doesn't match on first
+attempt, `sshd` falls back to password authentication. Those password
+failures accumulate in fail2ban's counters. After `maxretry = 5`, the
+source Tailscale IP is banned — breaking all fleet SSH to that node.
+
+The trigger is NOT a brute-force attack. It's the key negotiation
+sequence between trusted nodes during normal fleet operation.
+
+## Fix — choose one path
+
+### Path A: Disable password auth (recommended if key-only)
+
+One line, permanent. Removes the attack surface entirely — no password
+attempts means no fail2ban bans:
+
+```sh
+sudo sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
+sudo systemctl reload sshd
+```
+
+Pros: Zero ongoing maintenance. Works for all hosts, known or unknown.
+No IP lists to update. fail2ban becomes irrelevant for SSH.
+
+Cons: Password login is disabled. If a node loses its private key,
+physical/console access is needed. Not suitable for OOTB setups that
+need password auth.
+
+Verification:
+
+```sh
+ssh -o PreferredAuthentications=password localhost
+# Should fail: "Permission denied (publickey)"
+```
+
+### Path B: Whitelist specific fleet IPs (if password auth must stay on)
+
+For nodes that need password auth (OOTB state, temporary access, shared
+machines). Whitelist only known fleet nodes — do NOT whitelist the
+entire `100.64.0.0/10` (that trusts every Tailscale device on any
+tailnet):
+
+```sh
+# Get fleet IPs from any node:
+tailscale status | awk '/active|idle/{print $1}'
+
+echo '[DEFAULT]
+ignoreip = 127.0.0.1/8 ::1 100.72.229.63 100.103.255.41 100.73.44.93 100.108.235.54
+
+[sshd]
+enabled = true' | sudo tee /etc/fail2ban/jail.local && sudo systemctl reload fail2ban
+```
+
+Pros: Password auth stays usable for operators.
+
+Cons: Manual maintenance — add new node IPs on join. IP changes
+require updates. Forgetting to update → ban returns.
+
+### Path C: Both (production hardening)
+
+Two independent controls — if someone accidentally re-enables passwords,
+the whitelist still protects; if the whitelist misses a node, key-only
+auth still blocks brute-force. Apply both Path A and Path B.
+
+## What happens without this
+
+The symptom is `Connection refused` on port 22, even when:
+
+- `sshd` is running and listening on `0.0.0.0:22`
+- `ufw`/`iptables` allows port 22
+- `tailscale ping` works (35ms pong)
+
+The fail2ban ban targets the Tailscale IP — the node appears reachable
+but SSH is silently dropped at the kernel level.
+
+## FreeBSD equivalent — PF rate limiting
+
+FreeBSD nodes don't use fail2ban. The equivalent is PF SSH rate limiting
+with `max-src-conn-rate` and an overload table:
+
+```pf
+# /etc/pf.conf
+table <ssh_brutes> persist
+
+pass in quick on tailscale0 proto tcp from any to any port = ssh \
+    flags S/SA keep state \
+    (max-src-conn-rate 5/60, overload <ssh_brutes> flush global)
+
+block quick from <ssh_brutes>
+```
+
+5 new connections per 60 seconds per source IP. Exceeding adds the
+source to `<ssh_brutes>` (blocked for 10 minutes). Established
+connections aren't counted — only new TCP handshakes.
+
+Manual unban:
+
+```sh
+sudo pfctl -t ssh_brutes -T delete 100.72.229.63
+```
+
+## Platform summary
+
+| Platform     | Tool     | Fix                                            |
+| ------------ | -------- | ---------------------------------------------- |
+| Linux        | fail2ban | Path A (password off) or Path B (IP whitelist) |
+| FreeBSD      | PF       | `max-src-conn-rate` + overload table           |
+| Mother (osa) | PF       | `max-src-conn-rate` on tailscale0 SSH rule     |
+
+## Related
+
+- `freebsd-admin` — PF rule management, `max-src-conn-rate` SSH rate limiting
+- `mother-hive` wiki — per-node SSH key strategy, forced-command confinement
+- `hive-routing` wiki — fleet communication reliability
--- a/.agent/skills/freebsd-admin/SKILL.md
+++ b/.agent/skills/freebsd-admin/SKILL.md
@ -56,11 +56,46 @@ For update-status questions, use the existing read-only hostd audit ops
 the sysadmin update-report path. Do not expose `freebsd-update fetch` or run
 mutating update commands for a status report.

-## Tailscale controlplane exposure
+## SSH & service exposure (PF rules)

-When the controlplane API/dashboard is only exposed on Tailscale:
+### Controlplane service ports
+
+When the controlplane API/dashboard is exposed on Tailscale:

 - allow `tailscale0` ingress to ports `3100` (direct API) and `443` (nginx proxy)
+
+### SSH rate limiting (FreeBSD equivalent of fail2ban)
+
+FreeBSD doesn't use fail2ban. PF handles SSH brute-force protection with
+`max-src-conn-rate` and an overload table:
+
+```pf
+# /etc/pf.conf
+table <ssh_brutes> persist
+
+pass in quick on tailscale0 proto tcp from any to any port = ssh \
+    flags S/SA keep state \
+    (max-src-conn-rate 5/60, overload <ssh_brutes> flush global)
+
+block quick from <ssh_brutes>
+```
+
+- `5/60`: 5 new connections per 60 seconds per source IP
+- `overload`: source added to `<ssh_brutes>` table on exceed
+- `flush global`: entries expire after 600 seconds (10 min)
+- `keep state`: only new TCP handshakes count; existing sessions are free
+
+Manual operations:
+
+```sh
+sudo pfctl -t ssh_brutes -T show       # list banned IPs
+sudo pfctl -t ssh_brutes -T delete 100.72.229.63  # unban specific IP
+sudo pfctl -t ssh_brutes -T flush      # clear all bans
+```
+
+For the Linux fleet fail2ban equivalent, see
+[fail2ban-tailscale skill](../fail2ban-tailscale/SKILL.md).
+
 - validate PF before reload (`sudo pfctl -nf /etc/pf.conf`) and then `sudo service pf reload`

 ## Workflow