Detecting a Failing Hard Drive with S.M.A.R.T.


By Jeremy Brock

Your hard drive holds data such as documents, pictures and music. Replacing your hard drive before it completely fails can save you thousands of dollars in data recovery costs. One method to detect hard drive failure is to read the S.M.A.R.T. ("Self-Monitoring, Analysis, and Reporting Technology") values.

Step 1: Download GetSMART
Step 2: Select your hard drive and click on the S.M.A.R.T. tab.

smart
*Depending upon your hard drive manufacturer, you may not have all of the sensors listed above. Some Fujitsu drives display erroneous event counts in the millions which makes this test invalid.

Step 3: If any of the sensors have a yellow triangle or any of the five sensors listed above have a non-zero event count, you should backup your data to another drive and run other drive tests.

Interpreting the Data:
SMART uses thresholds to determine whether a drive is failing. Each manufacture defines what they consider failing which is why SMART is very subjective to interpretation and often misunderstood. SMART registers the current value, worst value, warning threshold, and event count. Once an attributes value drops below the warning threshold the drive is considered bad by the manufacture and should be replaced however many manufactures use unrealistic values making it almost impossible for the drive to fail a SMART test. By interpreting the values yourself you can gauge the likelihood of failure.

Attribute values usually start at 100 or 200 and decrease as the drive ages. Event counts are useful for spotting bad sectors while reading the current value then comparing it to the warning threshold is good for Read Error Rates.

Raw Read Error Rate:
If your hard drive is slow check the Raw Read Error Rate value, it is possible that the hard drive is having trouble accessing sectors. Compare the current value to the warning value, if it is close (within 20 or 30) consider replacing the drive. For example, let’s assume the drive had a starting current value of 100, now it reads 50, and warning is 30. You can interpret that as the drive having 50% more read errors and your drive is near the failing threshold.

Reallocation Events:
If Windows is blue screening, stalling/freezing, or displaying page file errors check the Reallocation Sector Count, Reallocation Event Count, and Current Pending Sector Event Counts. If they are NON-Zero it means the hard drive encountered a bad sector and remapped it with a sector from the spare area. Hard drives have a spare area of sectors, when a bad sector is encountered a sector in the spare area is used as a replacement. I often find that even if a sector is remapped there is further damage to the platter and replacing the drive is a good idea.

Using SMART values is an excellent way to evaluate the stability of a Hard Drive. You can debate at what point a hard drive is considered bad however if your data is important why risk loosing it especially when new hard drives are inexpensive.

Other Resources:

1. DiskCheckup (Free) and ActiveSMART (Trail) - Monitors SMART values and alerts you to change over time.
2. SmartMonTools (Linux & Free) – This is my favorite SMART utility and is included on most Linux LiveCD’s. Not only does it monitor SMART and Report Status, via console or email, it can run SMART Diagnostic Tests (Short & Long) and view the SMART error logs which show the last five drive errors
3. Google Labs: Disk Failure Report
4. Wikipedia: S.M.A.R.T. ("Self-Monitoring, Analysis, and Reporting Technology")



About Us

After nine years as the head technician for an IT firm Jeremy started A+ Perfect Computers, LLC.

We're here to solve your technology problems be it virus removal or managing your IT infrastructure.

Jeremy Brock
President