# Syncoid Setup — ZFS Replication to Backup Server **Purpose:** Replicate Clawdie controlplane ZFS datasets to a remote backup server. **Tool:** Syncoid (part of Sanoid package) **Method:** Incremental `zfs send/recv` over SSH --- ## Architecture ``` ┌─────────────────────────┐ SSH ┌─────────────────────────┐ │ controlplane │ ───────────────────► │ backup server │ │ (source) │ │ (target) │ │ │ │ │ │ zroot/ │ │ tank/backups/ │ │ ├── jails/ │ ──zfs send/recv──► │ └── clawdie/ │ │ ├── pg-cache/ │ │ ├── jails/ │ │ └── ... │ │ └── pg-cache/ │ └─────────────────────────┘ └─────────────────────────┘ 10.0.0.1 storage.clawdie.si ``` --- ## Prerequisites ### On Backup Server 1. **ZFS pool created** (e.g., `tank`) 2. **FreeBSD or Linux with ZFS** 3. **SSH accessible from controlplane** 4. **Dedicated backup user** (optional but recommended) ### On Controlplane 1. **Sanoid installed**: `pkg install sanoid` 2. **SSH key for backup user** 3. **Network connectivity to backup server** --- ## Step 1: Create Backup User (on backup server) ```bash # On backup server # Create dedicated backup user pw user add -n zfsbackup -c "ZFS Backup" -s /bin/sh -m # Create SSH directory mkdir -p /home/zfsbackup/.ssh chmod 700 /home/zfsbackup/.ssh # Allow ZFS commands without sudo password cat > /usr/local/etc/sudoers.d/zfsbackup << 'EOF' zfsbackup ALL=(root) NOPASSWD: /sbin/zfs * zfsbackup ALL=(root) NOPASSWD: /sbin/zpool * EOF chmod 440 /usr/local/etc/sudoers.d/zfsbackup # Create target dataset zfs create -o mountpoint=/backups tank/backups zfs create tank/backups/clawdie chown -R zfsbackup:zfsbackup /backups ``` --- ## Step 2: Generate SSH Key (on controlplane) ```bash # On controlplane # Generate dedicated key for backup ssh-keygen -t ed25519 -f /root/.ssh/syncoid_backup -N '' -C "syncoid@controlplane" # Copy public key to backup server cat /root/.ssh/syncoid_backup.pub | ssh root@storage.clawdie.si \ 'cat >> /home/zfsbackup/.ssh/authorized_keys && chmod 600 /home/zfsbackup/.ssh/authorized_keys' # Test connection ssh -i /root/.ssh/syncoid_backup zfsbackup@storage.clawdie.si 'sudo zfs list' ``` ### SSH Config (optional but recommended) ```bash # /root/.ssh/config Host backup-storage HostName storage.clawdie.si User zfsbackup IdentityFile /root/.ssh/syncoid_backup Compression yes ServerAliveInterval 60 ServerAliveCountMax 3 ``` Test: `ssh backup-storage 'sudo zfs list'` --- ## Step 3: Initial Full Replication ```bash # On controlplane # First, create a clean snapshot to replicate zfs snapshot -r zroot@initial-backup # Full replication (this will take a while — all data is sent) syncoid -r --sshkey=/root/.ssh/syncoid_backup \ zroot zfsbackup@storage.clawdie.si:tank/backups/clawdie # Or using SSH config alias: syncoid -r --sshkey=/root/.ssh/syncoid_backup \ zroot backup-storage:tank/backups/clawdie ``` **Note:** Initial replication sends ALL data. For large pools, this can take hours. Subsequent runs only send incremental changes (typically minutes). --- ## Step 4: Verify Replication ```bash # On backup server, verify datasets exist sudo zfs list -r tank/backups/clawdie # Check snapshots were replicated sudo zfs list -t snapshot -r tank/backups/clawdie # Compare snapshot counts (should match) # On controlplane: zfs list -t snapshot -r zroot | wc -l # On backup server: sudo zfs list -t snapshot -r tank/backups/clawdie | wc -l ``` --- ## Step 5: Automated Replication (Cron) ```bash # /etc/cron.d/zfs-replication # Replicate every 4 hours 0 */4 * * * root /usr/local/bin/syncoid-backup.sh >> /var/log/syncoid.log 2>&1 ``` ### syncoid-backup.sh Script ```bash #!/bin/sh # /usr/local/bin/syncoid-backup.sh # Replicate zroot to backup server set -e SSH_KEY="/root/.ssh/syncoid_backup" SOURCE="zroot" TARGET="zfsbackup@storage.clawdie.si:tank/backups/clawdie" LOG="/var/log/syncoid.log" echo "$(date '+%Y-%m-%d %H:%M:%S') — Starting ZFS replication" # Run syncoid with: # -r : recursive (all child datasets) # --no-sync-snap : don't create new snapshots (Sanoid handles that) # --delete : remove snapshots on target that no longer exist on source # --skip-parent : don't replicate zroot itself, just children /usr/local/bin/syncoid \ -r \ --sshkey="$SSH_KEY" \ --no-sync-snap \ --delete \ --skip-parent \ "$SOURCE" "$TARGET" 2>&1 | while read line; do echo "$(date '+%Y-%m-%d %H:%M:%S') $line" done >> "$LOG" echo "$(date '+%Y-%m-%d %H:%M:%S') — Replication complete" ``` ```bash chmod +x /usr/local/bin/syncoid-backup.sh ``` --- ## Step 6: Test Incremental Replication ```bash # Create a new snapshot on controlplane zfs snapshot zroot@test-$(date +%s) # Run syncoid manually /usr/local/bin/syncoid-backup.sh # Verify new snapshot appeared on backup server ssh backup-storage 'sudo zfs list -t snapshot -r tank/backups/clawdie | tail -5' ``` --- ## Step 7: Monitoring & Alerts ### Log Rotation ```bash # /etc/newsyslog.conf.d/syncoid /var/log/syncoid.log root:wheel 640 7 1024 * JC ``` ### Health Check Script ```bash #!/bin/sh # /usr/local/bin/check-replication.sh # Alert if replication hasn't run in 12+ hours LOG="/var/log/syncoid.log" MAX_AGE_HOURS=12 if [ ! -f "$LOG" ]; then echo "ERROR: Syncoid log not found at $LOG" exit 2 fi LAST_RUN=$(stat -f %m "$LOG" 2>/dev/null || stat -c %Y "$LOG" 2>/dev/null) NOW=$(date +%s) AGE_HOURS=$(( (NOW - LAST_RUN) / 3600 )) if [ "$AGE_HOURS" -gt "$MAX_AGE_HOURS" ]; then echo "WARNING: Replication last ran $AGE_HOURS hours ago (threshold: $MAX_AGE_HOURS)" exit 1 fi echo "OK: Replication ran $AGE_HOURS hours ago" exit 0 ``` ### Cron Check ```bash # /etc/cron.d/replication-check # Check replication health every 6 hours 0 */6 * * * root /usr/local/bin/check-replication.sh || echo "Replication check failed" | mail -s "Clawdie Backup Alert" admin@clawdie.si ``` --- ## Recovery Procedure ### Full System Recovery ```bash # On backup server — create a replication stream zfs snapshot -r tank/backups/clawdie@recovery-$(date +%Y%m%d) zfs send -R tank/backups/clawdie@recovery-20260317 | ssh root@new-controlplane 'zfs recv -F zroot' ``` ### Single Dataset Recovery ```bash # On backup server — send specific dataset zfs send tank/backups/clawdie/jails/git@latest | \ ssh root@controlplane 'zfs recv -F zroot/jails/git' ``` ### File Recovery from Snapshot ```bash # On backup server — mount snapshot and copy file sudo zfs clone tank/backups/clawdie@backup-20260315 /mnt/recovery cp /mnt/recovery/home/clawdie/important-file.txt /tmp/ sudo zfs destroy /mnt/recovery ``` --- ## Syncoid Options Reference | Option | Purpose | | ----------------- | ------------------------------------------------ | | `-r` | Recursive (include child datasets) | | `--sshkey=PATH` | Use specific SSH key | | `--no-sync-snap` | Don't create new snapshots (Sanoid handles this) | | `--delete` | Remove target snapshots not on source | | `--skip-parent` | Don't replicate parent dataset, only children | | `--compress=zstd` | Use zstd compression in transit | | `--quiet` | Suppress progress output | | `--dryrun` | Show what would be sent without sending | --- ## Example: Full Command ```bash # Production replication command syncoid -r --sshkey=/root/.ssh/syncoid_backup \ --no-sync-snap --delete --skip-parent \ --compress=zstd \ zroot backup-storage:tank/backups/clawdie ``` --- ## Security Hardening ### Restrict SSH Key (on backup server) ```bash # /home/zfsbackup/.ssh/authorized_keys # Prepend with restrictions: command="sudo /sbin/zfs receive -Fdu tank/backups/clawdie",\ no-port-forwarding,no-X11-forwarding,no-agent-forwarding,\ no-pty,from="10.0.0.1" ssh-ed25519 AAAA... syncoid@controlplane ``` This restricts the SSH key to: - Only run `zfs receive` command - Only accept connections from controlplane IP - No shell access, no port forwarding --- ## Checklist - [ ] Backup server has ZFS pool `tank/backups` - [ ] `zfsbackup` user created with sudo zfs permissions - [ ] SSH key generated on controlplane and copied to backup server - [ ] Initial full replication completed - [ ] Snapshots verified on backup server - [ ] `syncoid-backup.sh` script created - [ ] Cron job configured (every 4 hours) - [ ] Log rotation configured - [ ] Health check script in place - [ ] Recovery procedure documented and tested --- ## Integration with Clawdie-AI Add to Clawdie-AI setup: ```bash # In setup/host.ts or setup/backup.ts // Create backup cron job writeCronJob('/etc/cron.d/zfs-replication', ` # Replicate ZFS to backup server every 4 hours 0 */4 * * * root /usr/local/bin/syncoid-backup.sh >> /var/log/syncoid.log 2>&1 `); // Create syncoid script writeScript('/usr/local/bin/syncoid-backup.sh', syncoidBackupScript); ``` --- ## Troubleshooting ### "cannot receive: destination already exists" ```bash # Force overwrite (destructive!) syncoid -r --force-delete zroot target:pool/backup ``` ### "cannot send: not a snapshot" ```bash # Ensure you're sending from a snapshot, not a dataset zfs snapshot -r zroot@manual-$(date +%s) syncoid zroot@manual-20260317 target:pool/backup ``` ### SSH connection issues ```bash # Test SSH manually first ssh -i /root/.ssh/syncoid_backup -v zfsbackup@storage.clawdie.si # Check firewall allows outbound SSH from controlplane # Check backup server allows inbound SSH to zfsbackup user ``` ### Large initial replication ```bash # If initial replication is too slow, consider: # 1. Use --compress=zstd or --compress=lz4 # 2. Run during off-peak hours # 3. Limit bandwidth: trickle -d 10000 syncoid ... (10 MB/s limit) ``` --- ## Summary | Item | Value | | --------------- | ----------------------------------------- | | **Source** | controlplane (zroot) | | **Target** | storage.clawdie.si (tank/backups/clawdie) | | **Frequency** | Every 4 hours | | **Method** | Incremental zfs send/recv | | **Compression** | zstd | | **Monitoring** | Log check every 6 hours | This gives you enterprise-grade ZFS backup with: - Incremental replication (only changes sent) - Point-in-time recovery (all snapshots preserved) - Automated operation (cron-based) - Simple recovery (zfs send/recv)