clawdie-iso/NETWORKING.md

241 lines
7.4 KiB
Markdown
Raw Normal View History

# Clawdie-AI Networking & Firewall
**Version:** v0.5.0
2026-05-12 18:24:51 +02:00
**Updated:** 12.maj.2026
---
## Overview
Every Clawdie-AI instance runs PF as its firewall from first boot. The firewall
is configured by `shell-pf.sh` during `clawdie-firstboot` and is not optional —
all instances ship protected.
```
Internet
vtnet0 (or em0, igb0, etc. — auto-detected)
PF ← block all default, brute-force protection
├── port 22 → sshd (password + key auth, rate-limited)
├── port 80 → nginx (Let's Encrypt renewals)
├── port 443 → nginx (agent UI + HTTPS)
└── NAT → 192.168.0.0/16 → agent jails
2026-05-12 18:24:51 +02:00
└── warden0 bridge → 192.168.100.0/24
```
---
## Glasspane Operator Feature
The "glasspane" provides operators a visual view into browser automation:
```
Operator (via Tailscale)
├── SSH (port 22) → tmux panes → terminal view
└── VNC (port 5900) → wayvnc → cage → browser → visual browser view
```
**Security model:**
- VNC (5900) blocked on public interface
- VNC only accessible via Tailscale interface
- SSH (22) also moves to Tailscale-only after Tailscale activation
- All protected behind PF with block-all default
This allows operators to watch the agent perform browser automation in real-time,
debug issues visually, and verify behavior — all without exposing VNC to the public internet.
> **Architecture note:** Autonomous browser execution is handled by the browser
> jail / task-clone path in Clawdie-AI (see `docs/internal/BROWSER-JAIL.md`).
> Operator credential refresh will use host-side browser sessions via xpra over
> SSH (see `docs/internal/OPERATOR-BROWSER-ARCHITECTURE.md`). The cage/wayvnc
> path above describes the current ISO-shipped visual monitoring capability,
> not the autonomous execution surface.
---
## What firstboot sets up
`shell-pf.sh` runs during firstboot and:
1. **Detects ext_if** via `route -n get default` — no hardcoded interface names
2026-05-12 18:24:51 +02:00
2. **Creates agent bridge** `warden0` at `192.168.100.1/24` (matches the Clawdie-AI bridge naming convention)
3. **Writes `/etc/pf.conf`** with block-all default, SSH protection, jail NAT
4. **Installs `pf_reload`** rc.d service — see cold boot race below
5. **Enables PF** via rc.conf
---
## Agent bridge naming
2026-05-12 18:24:51 +02:00
The root install bridge is always `warden0`. It is not derived from
`ASSISTANT_NAME`; renaming the assistant must not rename networking
infrastructure.
| Scope | Bridge | Default subnet |
| --------------------- | --------- | ------------------ |
| Root install | `warden0` | `192.168.100.0/24` |
| Future extra bridge 1 | `warden1` | operator-assigned |
| Future extra bridge 2 | `warden2` | operator-assigned |
**Multi-tenant:** PF NAT covers the entire `192.168.0.0/16` supernet. Adding a
2026-05-12 18:24:51 +02:00
second bridge later requires no PF NAT change — just an explicit bridge name and
a new `/24` from the pool.
---
## Brute force SSH protection
PF automatically blocks IPs that exceed the rate limit:
```
max-src-conn 5 — max 5 simultaneous connections from one IP
max-src-conn-rate 3/60 — max 3 new connections per 60 seconds
overload <bruteforce> — offending IP added to the bruteforce table
flush global — existing connections from that IP killed
```
The `<bruteforce>` table persists across `pfctl -f` reloads (not across reboots).
**To inspect blocked IPs:**
```sh
pfctl -t bruteforce -T show
```
**To unblock an IP (e.g., if you locked yourself out):**
```sh
pfctl -t bruteforce -T delete 1.2.3.4
```
**To flush all blocks:**
```sh
pfctl -t bruteforce -T flush
```
---
## The cold boot race condition
**Problem:** On FreeBSD, PF starts before `tailscaled`:
```
pf: REQUIRE: FILESYSTEMS netif routing ← very early
tailscaled: REQUIRE: NETWORKING ← much later
```
When PF loads rules at boot, `tailscale0` does not exist yet. FreeBSD PF
resolves interface names to kernel interface indexes at rule load time. A
non-existent interface gets index `-1`. When Tailscale comes up later and
creates `tailscale0` with a real index, PF's loaded rule still holds `-1`
the mismatch means **SSH packets on `tailscale0` never match the pass rule**.
The default `block all` takes over. SSH is blocked even though Tailscale is up.
**Fix: `pf_reload` rc.d service**
`shell-pf.sh` installs `/usr/local/etc/rc.d/pf_reload` on every instance:
```
# PROVIDE: pf_reload
# REQUIRE: tailscaled ← runs after Tailscale is up
```
This service reloads `/etc/pf.conf` after `tailscaled` starts, giving PF a
chance to resolve `tailscale0` to its real kernel index. SSH via Tailscale
then works correctly on every boot.
**Pre-installed even without Tailscale.** `pf_reload` is harmless if
`tailscaled` is not installed — the rc.d script simply has no trigger.
When the agent later installs Tailscale, the race fix is already in place.
---
## Adding Tailscale (agent's job)
When the agent installs Tailscale on a deployed instance:
1. `pkg install tailscale`
2. `sysrc tailscaled_enable="YES"`
3. `tailscale up --authkey=<key> --hostname=<name>`
4. Edit `/etc/pf.conf` — uncomment the Tailscale block:
```pf
tailscale_if="tailscale0"
pass in quick on $tailscale_if proto tcp to port 22 keep state # SSH via Tailscale
block in quick on $ext_if proto tcp to port 22 # Block public SSH
pass in quick on $tailscale_if proto tcp to port 5900 keep state # wayvnc glasspane
block in quick on $ext_if proto tcp to port 5900 # Block public VNC
pass in quick on $ext_if proto udp to port 41641 keep state # Tailscale WireGuard
pass in quick on $ext_if inet6 proto udp to port 41641 keep state # Tailscale WireGuard v6
```
5. Reload PF: `pfctl -f /etc/pf.conf`
After this, SSH and VNC on the public interface are blocked. Access is via Tailscale only.
`pf_reload` (already installed) handles the cold boot race automatically.
---
## Troubleshooting SSH lockout
**Symptom:** SSH connections refused, can only access via provider console.
**Diagnosis checklist:**
```sh
# Is PF running?
pfctl -s info | head -3
# Are you blocked by bruteforce table?
pfctl -t bruteforce -T show
# Is Tailscale up (if using Tailscale SSH)?
tailscale status
# Does tailscale0 exist?
ifconfig tailscale0
# Test pf.conf loads without error
pfctl -nf /etc/pf.conf
```
**Quick recovery (from provider console):**
```sh
# Disable PF temporarily to get SSH access
service pf stop
# Diagnose, fix, then re-enable
service pf start
```
**If locked out due to Tailscale race (cold boot):**
```sh
# Re-enable PF after tailscale0 is confirmed up
tailscale status # confirm connected
pfctl -f /etc/pf.conf # reload — resolves tailscale0 index
```
---
## Ports reference
| Port | Protocol | Purpose |
| ----- | -------- | ------------------------------------------------------------------- |
| 22 | TCP | SSH (rate-limited, brute-force protected) — moves to Tailscale-only |
| 80 | TCP | HTTP — Let's Encrypt certificate renewals |
| 443 | TCP | HTTPS — agent UI |
| 5900 | TCP | VNC — wayvnc glasspane (Tailscale-only) |
| 41641 | UDP | Tailscale WireGuard (if Tailscale installed) |
All other inbound traffic is blocked by default.