What Is ZFS Scrubbing?
ZFS data scrubbing is a process that verifies the integrity of all data stored in a ZFS pool. It checks every block against its checksum and, if redundancy is available (such as mirrors or RAID-Z), automatically repairs corrupted data using healthy copies.
Purpose: To detect and correct silent data corruption (also known as bit rot) before it causes application failures or data loss.
Why Is Scrubbing Necessary in ZFS?
Unlike traditional file systems that assume storage devices return valid data, ZFS assumes nothing. It constantly verifies that what it reads matches what was originally written using checksums.
Data Corruption Can Happen Because Of:
- Bit rot on hard drives or SSDs
- Controller or cable failures
- Firmware bugs
- RAM errors
- Cosmic rays (yes, seriously!)
ZFS scrubbing is like a background health scan for your data, catching corruption early and fixing it before it spreads.
How ZFS Scrubbing Works
- Checksum Verification
- ZFS stores a checksum for every data and metadata block.
- Scrubbing reads all blocks and recalculates the checksum to compare against the stored value.
- Automatic Repair
- If corruption is found and redundancy exists, ZFS pulls a good copy from a mirror or RAID-Z group.
- The corrupted copy is automatically repaired in place.
- Non-Destructive
- Unlike
fsck
in traditional systems, ZFS scrubs do not stop services or require unmounting. - Scrubbing is performed while the pool is online.
- Unlike
- Progress Reporting
- Scrubbing is asynchronous and can take hours to days depending on pool size and I/O speed.
- Progress and estimated time remaining are visible via
zpool status
.
Running a ZFS Scrub Manually
zpool scrub <poolname>
Example:
zpool scrub tank
Check progress:
zpool status
Example output:
pool: tank
state: ONLINE
scrub: scrub in progress since Sun Jun 29 10:00:00 2025
1.23T scanned at 200M/s, 800G issued at 130M/s, 2.4T total
0 errors, 0 repaired, 8.00% done, 1:23:00 to go
To stop a scrub:
zpool scrub -s <poolname>
Scheduling Automatic Scrubs
It’s recommended to scrub monthly, or more frequently for:
- Critical systems
- High-availability environments
- Pools with older or large-capacity disks
Example cron job (run once a month):
Edit the crontab for root:
crontab -e
Add:
@monthly zpool scrub tank
Or use systemd timer units if your distro uses systemd.
Scrubbing vs. Resilvering vs. fsck
Action | Description |
---|---|
Scrub | Verifies and optionally repairs data on healthy pool |
Resilver | Rebuilds data on a new or replaced disk |
fsck | Traditional file system checker (not used with ZFS) |
Scrubbing is proactive. Resilvering is reactive.
Best Practices for ZFS Scrubbing
1. Schedule Regular Scrubs
Use cron/systemd to scrub monthly or weekly based on usage.
2. Monitor Results
Always check the output of zpool status
after a scrub to:
- Verify that no errors occurred
- Identify silent corruption early
3. Enable Alerts
Set up alerting via email or Proxmox notifications when scrub repairs anything.
4. Scrub Idle Pools
Scrubs are I/O-intensive. Run them during low-usage hours to avoid performance impact.
5. Check for Aging Drives
Frequent checksum errors may indicate failing drives even if S.M.A.R.T. shows no issues.
Performance Impact of Scrubbing
Scrubbing is I/O-heavy but CPU-light:
- It reads every block in the pool, so it can impact disk performance.
- It doesn’t block other operations, but may slow down read/write workloads.
Tips to Mitigate Performance Issues:
- Run scrubs during off-peak hours.
- Use ZFS
nice
I/O classes to deprioritize scrub activity (advanced tuning). - On large pools, spread scrubs across different days if multiple pools exist.
Scrub Reporting Tools & Automation
If you’re managing ZFS on a fleet of servers or want visual reports:
Use:
zfs-zed
(ZFS Event Daemon) for automatic alertsarc_summary.py
andzpool-status-report.sh
for human-readable reports- Proxmox VE’s GUI — shows scrub status directly in the dashboard
Case Study Example: Silent Corruption Repair
Let’s say a pool has mirrored vdevs. During a scrub, ZFS finds that one block on disk A has a checksum mismatch. Disk B has the correct version.
- ZFS logs the event
- Reads the correct data from Disk B
- Overwrites the corrupted block on Disk A
- Updates the status to “Repaired 1 block”
Without the scrub, the system would continue reading from the faulty disk — and your backup or application might silently ingest corrupted data.
ZFS Scrub Metrics (from zpool status
)
Field | Meaning |
---|---|
scanned | Total amount of data examined |
issued | Amount of data actually read |
errors | Any errors detected during scrub |
repaired | Number of blocks fixed (should be 0) |
to go | Estimated time remaining |
Summary
Feature | ZFS Scrubbing Advantage |
---|---|
Detects Silent Corruption | ✅ Yes |
Runs While Online | ✅ Yes |
Auto-Repair with Redundancy | ✅ Yes |
Manual or Scheduled | ✅ Both supported |
Alerts/Logs Supported | ✅ Via zfs-zed , systemd, Proxmox, etc. |
Frequency Recommendation | ✅ Monthly (minimum), weekly for critical data |
Final Thoughts
ZFS scrubbing is one of the key differentiators of ZFS over traditional file systems. It’s a simple yet powerful way to protect your data from the silent corruption that can ruin backups, databases, and user trust.
“Backups protect against disasters. Scrubs protect against decay.”