layered-soul/skills/zfs-snapshot-audit/SKILL.md

117 lines
3.5 KiB
Markdown
Raw Normal View History

---
name: zfs-snapshot-audit
description: Audit ZFS snapshots and sanoid config — find orphaned/leaked snapshots, verify dataset coverage, safely destroy dead weight.
category: freebsd
---
# ZFS Snapshot Audit & Sanoid Coverage
Use this when disk space is tight and you suspect ZFS snapshots are holding dead weight, or when you need to verify sanoid coverage across datasets.
## 1. Quick check — pool and dataset overview
```bash
zpool list zroot
zfs get -H -o property,value used,refer,usedbysnapshots,available zroot/home/clawdie
df -h /
```
If `usedbysnapshots` is high (>1G), investigate.
## 2. List all snapshots with size
```bash
zfs list -t snapshot -o name,used,creation -S creation -r zroot 2>/dev/null
```
Look for snapshots from dates that exceed the sanoid retention policy.
## 3. Check sanoid coverage
Config lives at `/usr/local/etc/sanoid/sanoid.conf`. Every dataset that has
`autosnap` snapshots should have a matching `[dataset]` entry with a template.
Without one, `sanoid --prune-snapshots` won't touch them — they accumulate
forever.
```bash
# List all datasets with snapshots
zfs list -t snapshot -r zroot 2>/dev/null | awk -F@ '{print $1}' | sort -u
# Cross-reference against sanoid config
grep '^\[' /usr/local/etc/sanoid/sanoid.conf
```
Any dataset with snapshots but no sanoid entry = **orphaned**.
## 4. Available templates (sanoid.conf)
| Template | hourly | daily | monthly | Use case |
|----------|--------|-------|---------|----------|
| `operator_home_minimal` | 6 | 3 | 0 | Operator home dir |
| `operator_home_full` | 24 | 14 | 3 | Full home retention |
| `persistent_service` | 12 | 7 | 2 | Jails (cms, git, etc.) |
| `critical_data` | 24 | 14 | 3 | Databases (pgdata, pgwal) |
## 5. Destroy orphaned snapshots
**Safe:** destroying ZFS snapshots does NOT touch the live filesystem. Only the
unique blocks held by that point-in-time copy are freed. Verify first:
```bash
# Confirm: usedbysnapshots = dead weight, referenced = live data
zfs get -H -o property,value used,refer,usedbysnapshots zroot/home/clawdie
```
Destroy individual snapshots:
```bash
zfs destroy zroot/home/clawdie@autosnap_2026-04-20_00:15:00_daily
```
Or batch by pattern (FreeBSD 15 ZFS supports `%` glob):
```bash
zfs destroy zroot/home/clawdie@autosnap_%
```
## 6. Add missing dataset to sanoid config
Append to `/usr/local/etc/sanoid/sanoid.conf` (root-owned, use `sudo tee -a`):
```
[zroot/home/clawdie]
use_template = operator_home_minimal
```
Verify it takes effect (next `sanoid --prune-snapshots` cron run, or manually):
```bash
sudo sanoid --prune-snapshots --verbose 2>&1 | grep home/clawdie
```
## 7. Verify after cleanup
```bash
zfs get -H -o value usedbysnapshots zroot/home/clawdie
# Should be 0B (or low if new autosnaps have been taken)
zpool list zroot
df -h /
```
## Pitfalls
- **Do NOT delete sanoid-managed snapshots by hand.** If `autoprune=yes`, sanoid
handles retention. Manual deletion of recent snapshots can confuse the policy.
This workflow is for **orphaned** snapshots only — those with no matching
sanoid `[dataset]` entry.
- **Don't destroy snapshots on datasets with `autoprune=yes`** thinking you're
helping — you'll just fight the cron job. Fix the policy instead.
- The config header says "do not edit by hand" but the dataset list is the
operator's domain — adding a dataset is safe.
## Discovery log
2026-06-22: `zroot/home/clawdie` was missing from sanoid config. 10 orphaned
snapshots from April 20-22 held 23.6G of dead weight (`usedbysnapshots`).
Added `operator_home_minimal` template and destroyed all 10. Freed 23.5G.