11 KiB
11 KiB
Syncoid Setup — ZFS Replication to Backup Server
Purpose: Replicate Clawdie controlplane ZFS datasets to a remote backup server.
Tool: Syncoid (part of Sanoid package)
Method: Incremental zfs send/recv over SSH
Architecture
┌─────────────────────────┐ SSH ┌─────────────────────────┐
│ controlplane │ ───────────────────► │ backup server │
│ (source) │ │ (target) │
│ │ │ │
│ zroot/ │ │ tank/backups/ │
│ ├── jails/ │ ──zfs send/recv──► │ └── clawdie/ │
│ ├── pg-cache/ │ │ ├── jails/ │
│ └── ... │ │ └── pg-cache/ │
└─────────────────────────┘ └─────────────────────────┘
10.0.0.1 storage.clawdie.si
Prerequisites
On Backup Server
- ZFS pool created (e.g.,
tank) - FreeBSD or Linux with ZFS
- SSH accessible from controlplane
- Dedicated backup user (optional but recommended)
On Controlplane
- Sanoid installed:
pkg install sanoid - SSH key for backup user
- Network connectivity to backup server
Step 1: Create Backup User (on backup server)
# On backup server
# Create dedicated backup user
pw user add -n zfsbackup -c "ZFS Backup" -s /bin/sh -m
# Create SSH directory
mkdir -p /home/zfsbackup/.ssh
chmod 700 /home/zfsbackup/.ssh
# Allow ZFS commands without sudo password
cat > /usr/local/etc/sudoers.d/zfsbackup << 'EOF'
zfsbackup ALL=(root) NOPASSWD: /sbin/zfs *
zfsbackup ALL=(root) NOPASSWD: /sbin/zpool *
EOF
chmod 440 /usr/local/etc/sudoers.d/zfsbackup
# Create target dataset
zfs create -o mountpoint=/backups tank/backups
zfs create tank/backups/clawdie
chown -R zfsbackup:zfsbackup /backups
Step 2: Generate SSH Key (on controlplane)
# On controlplane
# Generate dedicated key for backup
ssh-keygen -t ed25519 -f /root/.ssh/syncoid_backup -N '' -C "syncoid@controlplane"
# Copy public key to backup server
cat /root/.ssh/syncoid_backup.pub | ssh root@storage.clawdie.si \
'cat >> /home/zfsbackup/.ssh/authorized_keys && chmod 600 /home/zfsbackup/.ssh/authorized_keys'
# Test connection
ssh -i /root/.ssh/syncoid_backup zfsbackup@storage.clawdie.si 'sudo zfs list'
SSH Config (optional but recommended)
# /root/.ssh/config
Host backup-storage
HostName storage.clawdie.si
User zfsbackup
IdentityFile /root/.ssh/syncoid_backup
Compression yes
ServerAliveInterval 60
ServerAliveCountMax 3
Test: ssh backup-storage 'sudo zfs list'
Step 3: Initial Full Replication
# On controlplane
# First, create a clean snapshot to replicate
zfs snapshot -r zroot@initial-backup
# Full replication (this will take a while — all data is sent)
syncoid -r --sshkey=/root/.ssh/syncoid_backup \
zroot zfsbackup@storage.clawdie.si:tank/backups/clawdie
# Or using SSH config alias:
syncoid -r --sshkey=/root/.ssh/syncoid_backup \
zroot backup-storage:tank/backups/clawdie
Note: Initial replication sends ALL data. For large pools, this can take hours. Subsequent runs only send incremental changes (typically minutes).
Step 4: Verify Replication
# On backup server, verify datasets exist
sudo zfs list -r tank/backups/clawdie
# Check snapshots were replicated
sudo zfs list -t snapshot -r tank/backups/clawdie
# Compare snapshot counts (should match)
# On controlplane:
zfs list -t snapshot -r zroot | wc -l
# On backup server:
sudo zfs list -t snapshot -r tank/backups/clawdie | wc -l
Step 5: Automated Replication (Cron)
# /etc/cron.d/zfs-replication
# Replicate every 4 hours
0 */4 * * * root /usr/local/bin/syncoid-backup.sh >> /var/log/syncoid.log 2>&1
syncoid-backup.sh Script
#!/bin/sh
# /usr/local/bin/syncoid-backup.sh
# Replicate zroot to backup server
set -e
SSH_KEY="/root/.ssh/syncoid_backup"
SOURCE="zroot"
TARGET="zfsbackup@storage.clawdie.si:tank/backups/clawdie"
LOG="/var/log/syncoid.log"
echo "$(date '+%Y-%m-%d %H:%M:%S') — Starting ZFS replication"
# Run syncoid with:
# -r : recursive (all child datasets)
# --no-sync-snap : don't create new snapshots (Sanoid handles that)
# --delete : remove snapshots on target that no longer exist on source
# --skip-parent : don't replicate zroot itself, just children
/usr/local/bin/syncoid \
-r \
--sshkey="$SSH_KEY" \
--no-sync-snap \
--delete \
--skip-parent \
"$SOURCE" "$TARGET" 2>&1 | while read line; do
echo "$(date '+%Y-%m-%d %H:%M:%S') $line"
done >> "$LOG"
echo "$(date '+%Y-%m-%d %H:%M:%S') — Replication complete"
chmod +x /usr/local/bin/syncoid-backup.sh
Step 6: Test Incremental Replication
# Create a new snapshot on controlplane
zfs snapshot zroot@test-$(date +%s)
# Run syncoid manually
/usr/local/bin/syncoid-backup.sh
# Verify new snapshot appeared on backup server
ssh backup-storage 'sudo zfs list -t snapshot -r tank/backups/clawdie | tail -5'
Step 7: Monitoring & Alerts
Log Rotation
# /etc/newsyslog.conf.d/syncoid
/var/log/syncoid.log root:wheel 640 7 1024 * JC
Health Check Script
#!/bin/sh
# /usr/local/bin/check-replication.sh
# Alert if replication hasn't run in 12+ hours
LOG="/var/log/syncoid.log"
MAX_AGE_HOURS=12
if [ ! -f "$LOG" ]; then
echo "ERROR: Syncoid log not found at $LOG"
exit 2
fi
LAST_RUN=$(stat -f %m "$LOG" 2>/dev/null || stat -c %Y "$LOG" 2>/dev/null)
NOW=$(date +%s)
AGE_HOURS=$(( (NOW - LAST_RUN) / 3600 ))
if [ "$AGE_HOURS" -gt "$MAX_AGE_HOURS" ]; then
echo "WARNING: Replication last ran $AGE_HOURS hours ago (threshold: $MAX_AGE_HOURS)"
exit 1
fi
echo "OK: Replication ran $AGE_HOURS hours ago"
exit 0
Cron Check
# /etc/cron.d/replication-check
# Check replication health every 6 hours
0 */6 * * * root /usr/local/bin/check-replication.sh || echo "Replication check failed" | mail -s "Clawdie Backup Alert" admin@clawdie.si
Recovery Procedure
Full System Recovery
# On backup server — create a replication stream
zfs snapshot -r tank/backups/clawdie@recovery-$(date +%Y%m%d)
zfs send -R tank/backups/clawdie@recovery-20260317 | ssh root@new-controlplane 'zfs recv -F zroot'
Single Dataset Recovery
# On backup server — send specific dataset
zfs send tank/backups/clawdie/jails/git@latest | \
ssh root@controlplane 'zfs recv -F zroot/jails/git'
File Recovery from Snapshot
# On backup server — mount snapshot and copy file
sudo zfs clone tank/backups/clawdie@backup-20260315 /mnt/recovery
cp /mnt/recovery/home/clawdie/important-file.txt /tmp/
sudo zfs destroy /mnt/recovery
Syncoid Options Reference
| Option | Purpose |
|---|---|
-r |
Recursive (include child datasets) |
--sshkey=PATH |
Use specific SSH key |
--no-sync-snap |
Don't create new snapshots (Sanoid handles this) |
--delete |
Remove target snapshots not on source |
--skip-parent |
Don't replicate parent dataset, only children |
--compress=zstd |
Use zstd compression in transit |
--quiet |
Suppress progress output |
--dryrun |
Show what would be sent without sending |
Example: Full Command
# Production replication command
syncoid -r --sshkey=/root/.ssh/syncoid_backup \
--no-sync-snap --delete --skip-parent \
--compress=zstd \
zroot backup-storage:tank/backups/clawdie
Security Hardening
Restrict SSH Key (on backup server)
# /home/zfsbackup/.ssh/authorized_keys
# Prepend with restrictions:
command="sudo /sbin/zfs receive -Fdu tank/backups/clawdie",\
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,\
no-pty,from="10.0.0.1" ssh-ed25519 AAAA... syncoid@controlplane
This restricts the SSH key to:
- Only run
zfs receivecommand - Only accept connections from controlplane IP
- No shell access, no port forwarding
Checklist
- Backup server has ZFS pool
tank/backups zfsbackupuser created with sudo zfs permissions- SSH key generated on controlplane and copied to backup server
- Initial full replication completed
- Snapshots verified on backup server
syncoid-backup.shscript created- Cron job configured (every 4 hours)
- Log rotation configured
- Health check script in place
- Recovery procedure documented and tested
Integration with Clawdie-AI
Add to Clawdie-AI setup:
# In setup/host.ts or setup/backup.ts
// Create backup cron job
writeCronJob('/etc/cron.d/zfs-replication', `
# Replicate ZFS to backup server every 4 hours
0 */4 * * * root /usr/local/bin/syncoid-backup.sh >> /var/log/syncoid.log 2>&1
`);
// Create syncoid script
writeScript('/usr/local/bin/syncoid-backup.sh', syncoidBackupScript);
Troubleshooting
"cannot receive: destination already exists"
# Force overwrite (destructive!)
syncoid -r --force-delete zroot target:pool/backup
"cannot send: not a snapshot"
# Ensure you're sending from a snapshot, not a dataset
zfs snapshot -r zroot@manual-$(date +%s)
syncoid zroot@manual-20260317 target:pool/backup
SSH connection issues
# Test SSH manually first
ssh -i /root/.ssh/syncoid_backup -v zfsbackup@storage.clawdie.si
# Check firewall allows outbound SSH from controlplane
# Check backup server allows inbound SSH to zfsbackup user
Large initial replication
# If initial replication is too slow, consider:
# 1. Use --compress=zstd or --compress=lz4
# 2. Run during off-peak hours
# 3. Limit bandwidth: trickle -d 10000 syncoid ... (10 MB/s limit)
Summary
| Item | Value |
|---|---|
| Source | controlplane (zroot) |
| Target | storage.clawdie.si (tank/backups/clawdie) |
| Frequency | Every 4 hours |
| Method | Incremental zfs send/recv |
| Compression | zstd |
| Monitoring | Log check every 6 hours |
This gives you enterprise-grade ZFS backup with:
- Incremental replication (only changes sent)
- Point-in-time recovery (all snapshots preserved)
- Automated operation (cron-based)
- Simple recovery (zfs send/recv)