TrueNASGuide
Isometric vector illustration showing ZFS data storage with scrub and SMART test indicators for disk health monitoring.
data-protection

TrueNAS Scrubs and SMART Tests: A Disk Health Routine

Snapshots protect your data; scrubs and SMART tests protect your disks. How ZFS scrubs and SMART self-tests work together on TrueNAS, sensible schedules for a home NAS, and how to read the warning signs before a drive dies.

By TrueNASGuide Editorial · · 8 min read

Snapshots and replication protect your data against accidents and disasters. Scrubs and SMART tests protect against the slower, quieter threat: disks degrading and bits silently rotting. They’re the maintenance layer most home NAS users set up once and forget — which is fine, because “set up once correctly” is exactly the goal. This guide explains what each does, how they complement each other, and what schedule actually makes sense for a home pool.

Two different jobs

ZFS scrubs and SMART tests are often mentioned together but they check different things, and you want both.

A ZFS scrub verifies your data. It reads every used block in the pool, recomputes its checksum, and compares it to the stored checksum. If a block is corrupt and the pool has redundancy (mirror or RAIDZ), ZFS repairs it from a good copy — this is ZFS “self-healing.” A scrub catches silent data corruption (bit rot, a flaky cable, a drive returning bad data without erroring) before it spreads or before you discover it during a resilver when it’s too late.

A SMART test verifies the drive hardware. SMART (Self-Monitoring, Analysis and Reporting Technology) is built into the drive; a SMART self-test exercises the drive’s own surface and electronics and reports developing problems — reallocated sectors, pending sectors, read errors, temperature, wear. SMART catches a drive that’s failing before it fails completely.

The two are complementary: SMART warns you about degrading drives before they die; ZFS (via scrubs) catches drives that return bad data or fail without warning. Run both.

How they work together on TrueNAS

TrueNAS layers disk protection: ZFS detects sudden failures in real time during normal I/O; the TrueNAS middleware polls SMART data from each drive in the background; and your scheduled scrub and SMART self-tests proactively surface problems on a cadence. You see the results in the pool status (a scrub reports repaired/unrecoverable errors) and in the disk’s SMART report (test pass/fail and the attribute counters).

SMART test types

There are two self-tests you’ll schedule:

  • SHORT test — a quick electrical/mechanical check, completes in minutes, negligible performance impact. Safe to run frequently (nightly or every few days).
  • LONG test — a full surface scan of the entire drive. Thorough, but it takes hours on large drives and has a real performance impact while running. Run it less often (e.g., monthly), at a quiet time.

A sensible home schedule

TrueNAS creates a monthly scrub task by default. That’s acceptable for a home pool; for anything you care about, tightening the scrub to roughly every two weeks is a common, well-regarded recommendation. A reasonable home routine:

TaskFrequencyNotes
ZFS scrubEvery 2 weeks (or keep monthly default)Schedule for an idle window
SMART SHORT testDaily or weeklyCheap; nightly is fine
SMART LONG testMonthlyHours-long; schedule off-peak

One scheduling rule matters: don’t let these tasks overlap. A SMART long test, a scrub, and a snapshot-replication run all hitting the disks at once thrash performance and can interfere with each other. Stagger them — e.g., scrub on the 1st and 15th overnight, short SMART nightly at a different hour, long SMART on a weekend morning when nothing else runs.

You set scrubs under Data Protection → Scrub Tasks and SMART tests under Data Protection → S.M.A.R.T. Tests (and enable the SMART service). TrueNAS uses cron-style schedules for both.

Reading the warning signs

The point of all this is early warning. The attributes that genuinely predict failure on spinning disks:

  • Reallocated Sector Count — sectors the drive has remapped because they went bad. A few may be tolerable; a rising count is a drive on its way out.
  • Current Pending Sector Count — sectors flagged bad but not yet reallocated. Non-zero and climbing is a red flag.
  • Offline Uncorrectable — sectors the drive couldn’t read or fix.
  • UDMA CRC Error Count — often a cable/connection problem, not the drive itself; reseat or replace the SATA cable before condemning the disk.

For SSDs/NVMe, watch media wearout / percentage used and available spare. A scrub reporting repaired checksum errors on one disk repeatedly is itself a signal that disk (or its cable) is suspect even if SMART looks clean.

Configure email/alerts so TrueNAS tells you when a test fails or an attribute crosses a threshold — silent monitoring you never look at protects nothing.

When to act

  • SMART test fails, or pending/reallocated sectors are climbing: plan to replace that drive. Order a replacement now; don’t wait for total failure, especially mid-resilver risk on wide RAIDZ.
  • Scrub reports unrecoverable errors: the pool found corruption it could not repair (no good copy existed). This means you’ve lost redundancy somewhere or have a real data-loss event — investigate immediately and lean on your replication backups.
  • Repeated CRC errors on one disk: suspect the cable/backplane first.

What this layer does and doesn’t do

Scrubs and SMART are health and integrity checks — they keep your pool honest and warn you before hardware quits. They are not a backup. A scrub can’t recover data the pool no longer has a good copy of, and a passing SMART test won’t save you from an accidental rm, a pool-wide corruption event, ransomware, or a fire. That’s what snapshots and off-box replication are for. Think of it as a stack: SMART and scrubs keep the disks and data verified; snapshots and replication keep you recoverable when something the disks can’t catch goes wrong. You want every layer.

Next steps

See also

Related

Comments