layered-soul/skills/zfs-snapshot-audit/SKILL.md
Sam & Claude 9872e1d4cf skill(zfs): add snapshot vacuum workflow for disk pressure after large deletions
Covers the case where df unchanged after rm -rf or cargo clean because
sanoid snapshots captured the deleted files. Documents the vacuum
procedure: identify holding snapshots, destroy them to reclaim space
immediately, or use sanoid --prune-snapshots for the gentler path.

Updates Pitfalls to acknowledge this as the exception to "never touch
sanoid-managed snaps."

Discovered 2026-06-24: cargo clean freed 5.5G but df showed 16G unchanged.
usedbysnapshots = 26.6G across 9 sanoid snapshots. Full vacuum freed 13G
(16G → 29G free, pool 80% → 72%).
2026-06-24 20:24:40 +02:00

5.8 KiB

name description category
zfs-snapshot-audit Audit ZFS snapshots and sanoid config — find orphaned/leaked snapshots, verify dataset coverage, safely destroy dead weight. freebsd

ZFS Snapshot Audit & Sanoid Coverage

Use this when disk space is tight and you suspect ZFS snapshots are holding dead weight, or when you need to verify sanoid coverage across datasets.

1. Quick check — pool and dataset overview

zpool list zroot
zfs get -H -o property,value used,refer,usedbysnapshots,available zroot/home/clawdie
df -h /

If usedbysnapshots is high (>1G), investigate.

2. List all snapshots with size

zfs list -t snapshot -o name,used,creation -S creation -r zroot 2>/dev/null

Look for snapshots from dates that exceed the sanoid retention policy.

3. Check sanoid coverage

Config lives at /usr/local/etc/sanoid/sanoid.conf. Every dataset that has autosnap snapshots should have a matching [dataset] entry with a template. Without one, sanoid --prune-snapshots won't touch them — they accumulate forever.

# List all datasets with snapshots
zfs list -t snapshot -r zroot 2>/dev/null | awk -F@ '{print $1}' | sort -u

# Cross-reference against sanoid config
grep '^\[' /usr/local/etc/sanoid/sanoid.conf

Any dataset with snapshots but no sanoid entry = orphaned.

4. Available templates (sanoid.conf)

Template hourly daily monthly Use case
operator_home_minimal 6 3 0 Operator home dir
operator_home_full 24 14 3 Full home retention
persistent_service 12 7 2 Jails (cms, git, etc.)
critical_data 24 14 3 Databases (pgdata, pgwal)

5. Destroy orphaned snapshots

Safe: destroying ZFS snapshots does NOT touch the live filesystem. Only the unique blocks held by that point-in-time copy are freed. Verify first:

# Confirm: usedbysnapshots = dead weight, referenced = live data
zfs get -H -o property,value used,refer,usedbysnapshots zroot/home/clawdie

Destroy individual snapshots:

zfs destroy zroot/home/clawdie@autosnap_2026-04-20_00:15:00_daily

Or batch by pattern (FreeBSD 15 ZFS supports % glob):

zfs destroy zroot/home/clawdie@autosnap_%

8. Disk-pressure after large deletions (snapshot vacuum)

When you delete a large directory (e.g. cargo clean freeing 5.5G of target/), the space is NOT freed if sanoid-managed snapshots captured the files. df shows no change. The deleted blocks are locked in the snapshot chain until every snapshot that captured them is rotated out.

Symptom: df unchanged after cargo clean or rm -rf large-dir.

Check:

zfs get -H -o value usedbysnapshots zroot/home/clawdie
# If high (>5G) and you just did a large deletion → vacuum needed
zfs list -t snapshot -o name,used,creation -r zroot/home/clawdie
# Find the snapshots taken during/after the files were created

Fix — reclaim space immediately:

# Destroy ALL snapshots on the dataset (aggressive, zero history retained):
sudo zfs destroy zroot/home/clawdie@autosnap_%

# Or: destroy specific snapshots that captured pre-deletion state:
sudo zfs destroy zroot/home/clawdie@autosnap_2026-06-24_14:00:00_hourly
# Repeat for each snapshot, then verify:
zfs get -H -o value usedbysnapshots zroot/home/clawdie
# Should approach 0B

After reclaim:

df -h /home/clawdie         # free space should jump
zpool list zroot            # pool capacity drops

Sanoid will begin taking fresh snapshots on its next cron tick.

Prevent next time: after any large deletion, run sudo sanoid --prune-snapshots to immediately rotate out hourlies that captured the deleted data, without losing the daily safety net.

9. Add missing dataset to sanoid config

Append to /usr/local/etc/sanoid/sanoid.conf (root-owned, use sudo tee -a):

[zroot/home/clawdie]
        use_template = operator_home_minimal

Verify it takes effect (next sanoid --prune-snapshots cron run, or manually):

sudo sanoid --prune-snapshots --verbose 2>&1 | grep home/clawdie

10. Verify after cleanup

zfs get -H -o value usedbysnapshots zroot/home/clawdie
# Should be 0B (or low if new autosnaps have been taken)
zpool list zroot
df -h /

Pitfalls

  • Prefer sanoid prune over manual destroy. If disk pressure is not urgent, run sudo sanoid --prune-snapshots and let retention policy rotate out old snapshots. This preserves the daily safety net.
  • Exception: snapshot vacuum after large deletions. When you need the space NOW (e.g. cargo clean freed 5.5G but df shows no change), manual destroy of sanoid-managed snapshots is warranted. See §8 above. Destroy all snapshots on the target dataset, then let sanoid rebuild them.
  • Don't destroy snapshots on datasets with autoprune=yes during normal operations — you'll fight the cron job. This is only for the vacuum case.
  • The config header says "do not edit by hand" but the dataset list is the operator's domain — adding a dataset is safe.

Discovery log

2026-06-24: Hit "snapshot vacuum" — cargo clean freed 5.5G but df showed 16G unchanged. usedbysnapshots = 26.6G. Sanoid's hourly snapshots (14:00-20:00) captured the target/ directory before deletion. Destroyed a May boot environment (3.5G), old checkpoints (14M), pre-reinstall snaps (3.2M), then all 9 sanoid hourly+daily snaps on home/clawdie. Final: 16G → 29G free, pool 80% → 72%. Lesson: after any large deletion, either sanoid --prune-snapshots to rotate immediately, or manual destroy if desperate. Added §8 to this skill.

2026-06-22: zroot/home/clawdie was missing from sanoid config. 10 orphaned snapshots from April 20-22 held 23.6G of dead weight (usedbysnapshots). Added operator_home_minimal template and destroyed all 10. Freed 23.5G.