Re: smartctl, reallocated sector count question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: smartctl, reallocated sector count question

mathog
Summary so far:  some old Seagate ST340016A disks were found to have
nonzero 'Offline uncorrectable' and 'Current_Pending_Sector' counts
which could not be reset to zero by writing to every block on the disk.

I contacted Seagate about this issue, and the best I could get out of
them (on the second attempt) was:

| I understand you are getting unclearable SMART 197 and 198
| fields on your drive.  We do not recommend tampering with your
| SMART values on the drive in any way.  We do not have any utility
| ourselves for clearing these fields.  Seatools is the only valid
| diagnostic that we use to test the drives for functionality.  If
| the drive passes both the short and long test of Seatools then
| the drive itself is fine.  If it fails the tests then the drive
| should be replaced.

I did not want to "tamper" with the drives, I just asked if there was a
way to clear these fields.  (Neither seatools nor dd will do so.)

Anyway, no answer on why these particular drives ended up with these
counts "stuck".  The disks all pass both the long and short SMART tests.
 I suspect that this is related to the disks having been powered off for
a very long time, well over a year.  I think maybe that if the counts
are set in these two fields, and the disks are left off for a very long
time, somehow or other the firmware loses track of them.  For instance,
it may associate a time field with these blocks, and allow only so long
(6 months?) before it swaps them out even if they are not overwritten,
neglecting to clear the two fields when it does so.  So when the disks
were powered back up, this check may have been performed, resulting in
the observed "stuck" values in those fields.  Whatever this issue is,
according to Seagate, it apparently does not indicate a failing disk.

Regards,

David Mathog
[hidden email]
Manager, Sequence Analysis Facility, Biology Division, Caltech

------------------------------------------------------------------------------
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|

Re: smartctl, reallocated sector count question

Bruce Allen
David,

Thanks for the update.  I suspect this is probably due to buggy SMART
firmware on the disk.  When writing and reviewing disk firmware, the
disk vendors seem to be mostly concerned with performance (read/write
speed) since this is what gets tested by reviewers and is used by
customers in determining what to buy.  The SMART part of the firmware is
often an afterthought, and is not written or reviewed with the same
attention to detail.

Cheers,
        Bruce


David Mathog wrote:

> Summary so far:  some old Seagate ST340016A disks were found to have
> nonzero 'Offline uncorrectable' and 'Current_Pending_Sector' counts
> which could not be reset to zero by writing to every block on the disk.
>
> I contacted Seagate about this issue, and the best I could get out of
> them (on the second attempt) was:
>
> | I understand you are getting unclearable SMART 197 and 198
> | fields on your drive.  We do not recommend tampering with your
> | SMART values on the drive in any way.  We do not have any utility
> | ourselves for clearing these fields.  Seatools is the only valid
> | diagnostic that we use to test the drives for functionality.  If
> | the drive passes both the short and long test of Seatools then
> | the drive itself is fine.  If it fails the tests then the drive
> | should be replaced.
>
> I did not want to "tamper" with the drives, I just asked if there was a
> way to clear these fields.  (Neither seatools nor dd will do so.)
>
> Anyway, no answer on why these particular drives ended up with these
> counts "stuck".  The disks all pass both the long and short SMART tests.
>  I suspect that this is related to the disks having been powered off for
> a very long time, well over a year.  I think maybe that if the counts
> are set in these two fields, and the disks are left off for a very long
> time, somehow or other the firmware loses track of them.  For instance,
> it may associate a time field with these blocks, and allow only so long
> (6 months?) before it swaps them out even if they are not overwritten,
> neglecting to clear the two fields when it does so.  So when the disks
> were powered back up, this check may have been performed, resulting in
> the observed "stuck" values in those fields.  Whatever this issue is,
> according to Seagate, it apparently does not indicate a failing disk.
>
> Regards,
>
> David Mathog
> [hidden email]
> Manager, Sequence Analysis Facility, Biology Division, Caltech

------------------------------------------------------------------------------
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|

Re: smartctl, reallocated sector count question

Tim Small

> David Mathog wrote:

>> Anyway, no answer on why these particular drives ended up with these
>> counts "stuck".

Hmm.  Just one thought - if you have the LBAs of the original errors, I
wonder if it's worth trying to use hdparm's "--make-bad-sector" and then
"--write-sector" commands?  Bit of a long-shot but worth a try...

Cheers,

Tim.

------------------------------------------------------------------------------
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|

Re: smartctl, reallocated sector count question

Christian Franke
In reply to this post by mathog
David Mathog wrote:
>
> Summary so far:  some old Seagate ST340016A disks were found to have
> nonzero 'Offline uncorrectable' and 'Current_Pending_Sector' counts
> which could not be reset to zero by writing to every block on the
> disk.
>

Just for Info: Current CVS version of smartd provides a workaround for
this issue:

If '-C 197+ -U 198+' is specified in smartd.conf, a warning is only
issued if 'Current_Pending_Sector' or 'Offline uncorrectable' raw value
increase. If the new persistence feature ('-s' option) is used, then
this also works across boot cycles.

I will also add '-v' options which will allow to enable this by the
drive database.

Cheers,
    Christian




------------------------------------------------------------------------------
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support