As some of you may be aware, I don’t put a whole lot of stock in the “SMART” feature of hard drives.
Rarely have I had the SMART capabilities of a hard drive actually tell me that the drive was going to fail.
Recently I had an encounter with SMART errors in a totally different way.
Basically, my ReadyNAS NV+ storage server was telling me that “Disk 1” was having problems and might fail soon … but all my tests indicated that the drives were fine.
After a lot of hassle, and going back and forth with Netgear support, I finally figured out the problem.
It started a few weeks ago with an email the ReadyNAS sent
Reallocated sector count has increased in the last day. Disk 1: Previous count: 671 Current count: 677 Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.
The odd thing was, the SMART information on the drives, that was available via the ReadyNAS web interface, did not indicate any of the 4 drives currently installed had any reallocated sectors.
Based on my previous experience with SMART, I decided to swap disk 1 and test it independently.
The nice thing about the ReadyNAS is that it supports RAID 5 and hot swappable drives (which means the drives data is spread across multiple drives & can be recreated if one of the drives fails and the drives can be removed and inserted while the machine is running).
So I pulled disk 1 and replaced it with the spare.
I ran some basic tests on the disk I pulled and no errors showed up.
I decided to let the system run with the new disk and see what happened.
The next day I got the same email message … except the reallocated sector count had grown.
OK, maybe the ReadyNAS enumerates the disk from zero instead of one (many computer functions are zero based instead of one based).
Disk 2 didn’t show any smart errors in the web interface either, but I pulled it from the chassis and replaced it with the disk I had previously removed.
I ran basic tests on the removed 2nd disk and it too showed no errors.
At this point I was getting really frustrated … so I sent an email to Netgear support for assistance.
In the past I’ve gotten really great support from Netgear on the ReadyNAS.
After a few messages back and forth with Netgear support, they asked me to put the device into “Support” mode and forward the telnet port on my router to it, so one of their engineers could log on to it and do some diagnostics.
I did this …. although the idea of opening up port 23 on my router was a bit disquieting (I don’t even have telnet server’s installed on any of my Linux machines).
I told Netgear that the setup was done and waited.
I didn’t hear anything back for a day, so I sent them an email. I got a reply indicating that their engineering staff hadn’t had a chance to check my system, could I wait a few more days?
On a whim, I did a Google search on “readynas tech support telnet password” and quickly found out that the password required to log in was trivial to find.
I immediately disabled the telnet port on my router and notified Netgear support that I would not be opening the port back up.
A few days later I got a email from Netgear indicating that they had determined there was a problem with my ReadyNAS’s hardware and they wanted to replace it (It was still under warranty).
They gave me the RMA information and I paid the $20 to get an advance replacement (actually not a bad deal, when you factor in packing & shipping costs).
The new ReadyNAS came, I swapped the drives over, and ran the usual tests … everything seemed to be fine.
That night, I got the email message about SMART detecting increased reallocated sector count.
Well, I was done with Netgear support on this … they clearly didn’t know what the problem was.
Just for grins, I loaded the ReadyNAS plug-in that would let me log in to the device’s Linux OS and operate as root (super user).
I checked the 4 drive’s SMART data using the ‘smartctl’ command and saw no errors … I then noticed something … there was a fifth drive registered in the system. It was the USB drive I use to back up the data.
I ran the ‘smartctl’ command on the USB drive and found it had reallocated sectors!
5 Reallocated_Sector_Ct 0x0033 071 071 036 Pre-fail Always - 1210
I hadn’t even considered that the USB drive would be identified as “Disk 1”. I figured it would be identified as “USB Disk 1” or “Disk 5”.
I removed the USB drive from the system and the email reports about the SMART errors stopped.
In informed Netgear support of my discovery … but haven’t heard back from them. I kind of don’t expect to.
My friend. Your experience is almost word-for-word the same path I took over several months to discover what the issue was. This was after RMA’ing and purchasing drives to replace ‘Disk 1’ and not understanding the source of the errors.
I too got ssh running and examined logs, scripts, crons etc, etc until I worked out what it was – an attached usb drive generating the alerts. If the check_smart script is going to scan usb drives then they should also appear in the Frontview interface in the health window.
Thanks for your rambling I thought I was going crazy there for a while.