fix/skills-pf-validate-cleanup #250
2 changed files with 160 additions and 2 deletions
123
.agent/skills/fail2ban-tailscale/SKILL.md
Normal file
123
.agent/skills/fail2ban-tailscale/SKILL.md
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
---
|
||||
name: fail2ban-tailscale
|
||||
description: "Prevent fail2ban from banning fleet SSH traffic. Root cause: password auth enabled triggers password-fallback failures during key negotiation. Fix: disable password auth or whitelist fleet IPs."
|
||||
platforms: [linux]
|
||||
---
|
||||
|
||||
# fail2ban & Fleet SSH Reliability
|
||||
|
||||
## Root cause
|
||||
|
||||
When a fleet node connects via SSH and the key doesn't match on first
|
||||
attempt, `sshd` falls back to password authentication. Those password
|
||||
failures accumulate in fail2ban's counters. After `maxretry = 5`, the
|
||||
source Tailscale IP is banned — breaking all fleet SSH to that node.
|
||||
|
||||
The trigger is NOT a brute-force attack. It's the key negotiation
|
||||
sequence between trusted nodes during normal fleet operation.
|
||||
|
||||
## Fix — choose one path
|
||||
|
||||
### Path A: Disable password auth (recommended if key-only)
|
||||
|
||||
One line, permanent. Removes the attack surface entirely — no password
|
||||
attempts means no fail2ban bans:
|
||||
|
||||
```sh
|
||||
sudo sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
|
||||
sudo systemctl reload sshd
|
||||
```
|
||||
|
||||
Pros: Zero ongoing maintenance. Works for all hosts, known or unknown.
|
||||
No IP lists to update. fail2ban becomes irrelevant for SSH.
|
||||
|
||||
Cons: Password login is disabled. If a node loses its private key,
|
||||
physical/console access is needed. Not suitable for OOTB setups that
|
||||
need password auth.
|
||||
|
||||
Verification:
|
||||
|
||||
```sh
|
||||
ssh -o PreferredAuthentications=password localhost
|
||||
# Should fail: "Permission denied (publickey)"
|
||||
```
|
||||
|
||||
### Path B: Whitelist specific fleet IPs (if password auth must stay on)
|
||||
|
||||
For nodes that need password auth (OOTB state, temporary access, shared
|
||||
machines). Whitelist only known fleet nodes — do NOT whitelist the
|
||||
entire `100.64.0.0/10` (that trusts every Tailscale device on any
|
||||
tailnet):
|
||||
|
||||
```sh
|
||||
# Get fleet IPs from any node:
|
||||
tailscale status | awk '/active|idle/{print $1}'
|
||||
|
||||
echo '[DEFAULT]
|
||||
ignoreip = 127.0.0.1/8 ::1 100.72.229.63 100.103.255.41 100.73.44.93 100.108.235.54
|
||||
|
||||
[sshd]
|
||||
enabled = true' | sudo tee /etc/fail2ban/jail.local && sudo systemctl reload fail2ban
|
||||
```
|
||||
|
||||
Pros: Password auth stays usable for operators.
|
||||
|
||||
Cons: Manual maintenance — add new node IPs on join. IP changes
|
||||
require updates. Forgetting to update → ban returns.
|
||||
|
||||
### Path C: Both (production hardening)
|
||||
|
||||
Two independent controls — if someone accidentally re-enables passwords,
|
||||
the whitelist still protects; if the whitelist misses a node, key-only
|
||||
auth still blocks brute-force. Apply both Path A and Path B.
|
||||
|
||||
## What happens without this
|
||||
|
||||
The symptom is `Connection refused` on port 22, even when:
|
||||
|
||||
- `sshd` is running and listening on `0.0.0.0:22`
|
||||
- `ufw`/`iptables` allows port 22
|
||||
- `tailscale ping` works (35ms pong)
|
||||
|
||||
The fail2ban ban targets the Tailscale IP — the node appears reachable
|
||||
but SSH is silently dropped at the kernel level.
|
||||
|
||||
## FreeBSD equivalent — PF rate limiting
|
||||
|
||||
FreeBSD nodes don't use fail2ban. The equivalent is PF SSH rate limiting
|
||||
with `max-src-conn-rate` and an overload table:
|
||||
|
||||
```pf
|
||||
# /etc/pf.conf
|
||||
table <ssh_brutes> persist
|
||||
|
||||
pass in quick on tailscale0 proto tcp from any to any port = ssh \
|
||||
flags S/SA keep state \
|
||||
(max-src-conn-rate 5/60, overload <ssh_brutes> flush global)
|
||||
|
||||
block quick from <ssh_brutes>
|
||||
```
|
||||
|
||||
5 new connections per 60 seconds per source IP. Exceeding adds the
|
||||
source to `<ssh_brutes>` (blocked for 10 minutes). Established
|
||||
connections aren't counted — only new TCP handshakes.
|
||||
|
||||
Manual unban:
|
||||
|
||||
```sh
|
||||
sudo pfctl -t ssh_brutes -T delete 100.72.229.63
|
||||
```
|
||||
|
||||
## Platform summary
|
||||
|
||||
| Platform | Tool | Fix |
|
||||
| ------------ | -------- | ---------------------------------------------- |
|
||||
| Linux | fail2ban | Path A (password off) or Path B (IP whitelist) |
|
||||
| FreeBSD | PF | `max-src-conn-rate` + overload table |
|
||||
| Mother (osa) | PF | `max-src-conn-rate` on tailscale0 SSH rule |
|
||||
|
||||
## Related
|
||||
|
||||
- `freebsd-admin` — PF rule management, `max-src-conn-rate` SSH rate limiting
|
||||
- `mother-hive` wiki — per-node SSH key strategy, forced-command confinement
|
||||
- `hive-routing` wiki — fleet communication reliability
|
||||
|
|
@ -56,11 +56,46 @@ For update-status questions, use the existing read-only hostd audit ops
|
|||
the sysadmin update-report path. Do not expose `freebsd-update fetch` or run
|
||||
mutating update commands for a status report.
|
||||
|
||||
## Tailscale controlplane exposure
|
||||
## SSH & service exposure (PF rules)
|
||||
|
||||
When the controlplane API/dashboard is only exposed on Tailscale:
|
||||
### Controlplane service ports
|
||||
|
||||
When the controlplane API/dashboard is exposed on Tailscale:
|
||||
|
||||
- allow `tailscale0` ingress to ports `3100` (direct API) and `443` (nginx proxy)
|
||||
|
||||
### SSH rate limiting (FreeBSD equivalent of fail2ban)
|
||||
|
||||
FreeBSD doesn't use fail2ban. PF handles SSH brute-force protection with
|
||||
`max-src-conn-rate` and an overload table:
|
||||
|
||||
```pf
|
||||
# /etc/pf.conf
|
||||
table <ssh_brutes> persist
|
||||
|
||||
pass in quick on tailscale0 proto tcp from any to any port = ssh \
|
||||
flags S/SA keep state \
|
||||
(max-src-conn-rate 5/60, overload <ssh_brutes> flush global)
|
||||
|
||||
block quick from <ssh_brutes>
|
||||
```
|
||||
|
||||
- `5/60`: 5 new connections per 60 seconds per source IP
|
||||
- `overload`: source added to `<ssh_brutes>` table on exceed
|
||||
- `flush global`: entries expire after 600 seconds (10 min)
|
||||
- `keep state`: only new TCP handshakes count; existing sessions are free
|
||||
|
||||
Manual operations:
|
||||
|
||||
```sh
|
||||
sudo pfctl -t ssh_brutes -T show # list banned IPs
|
||||
sudo pfctl -t ssh_brutes -T delete 100.72.229.63 # unban specific IP
|
||||
sudo pfctl -t ssh_brutes -T flush # clear all bans
|
||||
```
|
||||
|
||||
For the Linux fleet fail2ban equivalent, see
|
||||
[fail2ban-tailscale skill](../fail2ban-tailscale/SKILL.md).
|
||||
|
||||
- validate PF before reload (`sudo pfctl -nf /etc/pf.conf`) and then `sudo service pf reload`
|
||||
|
||||
## Workflow
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue