clawdie-iso/docs/SYNCOID-SETUP.md

11 KiB

Syncoid Setup — ZFS Replication to Backup Server

Purpose: Replicate Clawdie controlplane ZFS datasets to a remote backup server. Tool: Syncoid (part of Sanoid package) Method: Incremental zfs send/recv over SSH


Architecture

┌─────────────────────────┐         SSH          ┌─────────────────────────┐
│   controlplane          │ ───────────────────► │   backup server         │
│   (source)              │                      │   (target)              │
│                         │                      │                         │
│   zroot/                │                      │   tank/backups/         │
│   ├── jails/            │  ──zfs send/recv──►  │   └── clawdie/          │
│   ├── pg-cache/         │                      │       ├── jails/        │
│   └── ...               │                      │       └── pg-cache/     │
└─────────────────────────┘                      └─────────────────────────┘

   10.0.0.1                                          storage.clawdie.si

Prerequisites

On Backup Server

  1. ZFS pool created (e.g., tank)
  2. FreeBSD or Linux with ZFS
  3. SSH accessible from controlplane
  4. Dedicated backup user (optional but recommended)

On Controlplane

  1. Sanoid installed: pkg install sanoid
  2. SSH key for backup user
  3. Network connectivity to backup server

Step 1: Create Backup User (on backup server)

# On backup server
# Create dedicated backup user
pw user add -n zfsbackup -c "ZFS Backup" -s /bin/sh -m

# Create SSH directory
mkdir -p /home/zfsbackup/.ssh
chmod 700 /home/zfsbackup/.ssh

# Allow ZFS commands without sudo password
cat > /usr/local/etc/sudoers.d/zfsbackup << 'EOF'
zfsbackup ALL=(root) NOPASSWD: /sbin/zfs *
zfsbackup ALL=(root) NOPASSWD: /sbin/zpool *
EOF
chmod 440 /usr/local/etc/sudoers.d/zfsbackup

# Create target dataset
zfs create -o mountpoint=/backups tank/backups
zfs create tank/backups/clawdie
chown -R zfsbackup:zfsbackup /backups

Step 2: Generate SSH Key (on controlplane)

# On controlplane
# Generate dedicated key for backup
ssh-keygen -t ed25519 -f /root/.ssh/syncoid_backup -N '' -C "syncoid@controlplane"

# Copy public key to backup server
cat /root/.ssh/syncoid_backup.pub | ssh root@storage.clawdie.si \
    'cat >> /home/zfsbackup/.ssh/authorized_keys && chmod 600 /home/zfsbackup/.ssh/authorized_keys'

# Test connection
ssh -i /root/.ssh/syncoid_backup zfsbackup@storage.clawdie.si 'sudo zfs list'
# /root/.ssh/config
Host backup-storage
    HostName storage.clawdie.si
    User zfsbackup
    IdentityFile /root/.ssh/syncoid_backup
    Compression yes
    ServerAliveInterval 60
    ServerAliveCountMax 3

Test: ssh backup-storage 'sudo zfs list'


Step 3: Initial Full Replication

# On controlplane

# First, create a clean snapshot to replicate
zfs snapshot -r zroot@initial-backup

# Full replication (this will take a while — all data is sent)
syncoid -r --sshkey=/root/.ssh/syncoid_backup \
    zroot zfsbackup@storage.clawdie.si:tank/backups/clawdie

# Or using SSH config alias:
syncoid -r --sshkey=/root/.ssh/syncoid_backup \
    zroot backup-storage:tank/backups/clawdie

Note: Initial replication sends ALL data. For large pools, this can take hours. Subsequent runs only send incremental changes (typically minutes).


Step 4: Verify Replication

# On backup server, verify datasets exist
sudo zfs list -r tank/backups/clawdie

# Check snapshots were replicated
sudo zfs list -t snapshot -r tank/backups/clawdie

# Compare snapshot counts (should match)
# On controlplane:
zfs list -t snapshot -r zroot | wc -l

# On backup server:
sudo zfs list -t snapshot -r tank/backups/clawdie | wc -l

Step 5: Automated Replication (Cron)

# /etc/cron.d/zfs-replication

# Replicate every 4 hours
0 */4 * * * root /usr/local/bin/syncoid-backup.sh >> /var/log/syncoid.log 2>&1

syncoid-backup.sh Script

#!/bin/sh
# /usr/local/bin/syncoid-backup.sh
# Replicate zroot to backup server

set -e

SSH_KEY="/root/.ssh/syncoid_backup"
SOURCE="zroot"
TARGET="zfsbackup@storage.clawdie.si:tank/backups/clawdie"
LOG="/var/log/syncoid.log"

echo "$(date '+%Y-%m-%d %H:%M:%S') — Starting ZFS replication"

# Run syncoid with:
#   -r : recursive (all child datasets)
#   --no-sync-snap : don't create new snapshots (Sanoid handles that)
#   --delete : remove snapshots on target that no longer exist on source
#   --skip-parent : don't replicate zroot itself, just children

/usr/local/bin/syncoid \
    -r \
    --sshkey="$SSH_KEY" \
    --no-sync-snap \
    --delete \
    --skip-parent \
    "$SOURCE" "$TARGET" 2>&1 | while read line; do
    echo "$(date '+%Y-%m-%d %H:%M:%S') $line"
done >> "$LOG"

echo "$(date '+%Y-%m-%d %H:%M:%S') — Replication complete"
chmod +x /usr/local/bin/syncoid-backup.sh

Step 6: Test Incremental Replication

# Create a new snapshot on controlplane
zfs snapshot zroot@test-$(date +%s)

# Run syncoid manually
/usr/local/bin/syncoid-backup.sh

# Verify new snapshot appeared on backup server
ssh backup-storage 'sudo zfs list -t snapshot -r tank/backups/clawdie | tail -5'

Step 7: Monitoring & Alerts

Log Rotation

# /etc/newsyslog.conf.d/syncoid
/var/log/syncoid.log     root:wheel  640  7     1024  *     JC

Health Check Script

#!/bin/sh
# /usr/local/bin/check-replication.sh
# Alert if replication hasn't run in 12+ hours

LOG="/var/log/syncoid.log"
MAX_AGE_HOURS=12

if [ ! -f "$LOG" ]; then
    echo "ERROR: Syncoid log not found at $LOG"
    exit 2
fi

LAST_RUN=$(stat -f %m "$LOG" 2>/dev/null || stat -c %Y "$LOG" 2>/dev/null)
NOW=$(date +%s)
AGE_HOURS=$(( (NOW - LAST_RUN) / 3600 ))

if [ "$AGE_HOURS" -gt "$MAX_AGE_HOURS" ]; then
    echo "WARNING: Replication last ran $AGE_HOURS hours ago (threshold: $MAX_AGE_HOURS)"
    exit 1
fi

echo "OK: Replication ran $AGE_HOURS hours ago"
exit 0

Cron Check

# /etc/cron.d/replication-check
# Check replication health every 6 hours
0 */6 * * * root /usr/local/bin/check-replication.sh || echo "Replication check failed" | mail -s "Clawdie Backup Alert" admin@clawdie.si

Recovery Procedure

Full System Recovery

# On backup server — create a replication stream
zfs snapshot -r tank/backups/clawdie@recovery-$(date +%Y%m%d)
zfs send -R tank/backups/clawdie@recovery-20260317 | ssh root@new-controlplane 'zfs recv -F zroot'

Single Dataset Recovery

# On backup server — send specific dataset
zfs send tank/backups/clawdie/jails/git@latest | \
    ssh root@controlplane 'zfs recv -F zroot/jails/git'

File Recovery from Snapshot

# On backup server — mount snapshot and copy file
sudo zfs clone tank/backups/clawdie@backup-20260315 /mnt/recovery
cp /mnt/recovery/home/clawdie/important-file.txt /tmp/
sudo zfs destroy /mnt/recovery

Syncoid Options Reference

Option Purpose
-r Recursive (include child datasets)
--sshkey=PATH Use specific SSH key
--no-sync-snap Don't create new snapshots (Sanoid handles this)
--delete Remove target snapshots not on source
--skip-parent Don't replicate parent dataset, only children
--compress=zstd Use zstd compression in transit
--quiet Suppress progress output
--dryrun Show what would be sent without sending

Example: Full Command

# Production replication command
syncoid -r --sshkey=/root/.ssh/syncoid_backup \
    --no-sync-snap --delete --skip-parent \
    --compress=zstd \
    zroot backup-storage:tank/backups/clawdie

Security Hardening

Restrict SSH Key (on backup server)

# /home/zfsbackup/.ssh/authorized_keys
# Prepend with restrictions:
command="sudo /sbin/zfs receive -Fdu tank/backups/clawdie",\
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,\
no-pty,from="10.0.0.1" ssh-ed25519 AAAA... syncoid@controlplane

This restricts the SSH key to:

  • Only run zfs receive command
  • Only accept connections from controlplane IP
  • No shell access, no port forwarding

Checklist

  • Backup server has ZFS pool tank/backups
  • zfsbackup user created with sudo zfs permissions
  • SSH key generated on controlplane and copied to backup server
  • Initial full replication completed
  • Snapshots verified on backup server
  • syncoid-backup.sh script created
  • Cron job configured (every 4 hours)
  • Log rotation configured
  • Health check script in place
  • Recovery procedure documented and tested

Integration with Clawdie-AI

Add to Clawdie-AI setup:

# In setup/host.ts or setup/backup.ts

// Create backup cron job
writeCronJob('/etc/cron.d/zfs-replication', `
# Replicate ZFS to backup server every 4 hours
0 */4 * * * root /usr/local/bin/syncoid-backup.sh >> /var/log/syncoid.log 2>&1
`);

// Create syncoid script
writeScript('/usr/local/bin/syncoid-backup.sh', syncoidBackupScript);

Troubleshooting

"cannot receive: destination already exists"

# Force overwrite (destructive!)
syncoid -r --force-delete zroot target:pool/backup

"cannot send: not a snapshot"

# Ensure you're sending from a snapshot, not a dataset
zfs snapshot -r zroot@manual-$(date +%s)
syncoid zroot@manual-20260317 target:pool/backup

SSH connection issues

# Test SSH manually first
ssh -i /root/.ssh/syncoid_backup -v zfsbackup@storage.clawdie.si

# Check firewall allows outbound SSH from controlplane
# Check backup server allows inbound SSH to zfsbackup user

Large initial replication

# If initial replication is too slow, consider:
# 1. Use --compress=zstd or --compress=lz4
# 2. Run during off-peak hours
# 3. Limit bandwidth: trickle -d 10000 syncoid ... (10 MB/s limit)

Summary

Item Value
Source controlplane (zroot)
Target storage.clawdie.si (tank/backups/clawdie)
Frequency Every 4 hours
Method Incremental zfs send/recv
Compression zstd
Monitoring Log check every 6 hours

This gives you enterprise-grade ZFS backup with:

  • Incremental replication (only changes sent)
  • Point-in-time recovery (all snapshots preserved)
  • Automated operation (cron-based)
  • Simple recovery (zfs send/recv)