Drive errors in boot logs, but NOT in smartctl, and still runs ok after completing boot. What's wrong with this drive?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Drive errors in boot logs, but NOT in smartctl, and still runs ok after completing boot. What's wrong with this drive?

albion
I've got a secondary (not 'boot' or 'root') drive attached in my system.

It's "complaining" in boot logs, but seems to NOT have an issue reported by 'smartctl', and works OK after booting.

I'd appreciate any ideas as to what the problem IS, or if there's really is one -- and what to do to fix it.  Entirely poosible that I just don't understand what I'm seeing here :-/

On boot I see these messages, but the system ends up booted and functional -- including all the data on this drive,

        dmesg | egrep -i "ata2|ata-2"
                [    2.790815] ata2: SATA max UDMA/133 abar m1024@0xf9fff800 port 0xf9fff980 irq 22
                [    3.273185] ata2: softreset failed (device not ready)
                [    3.276322] ata2: applying PMP SRST workaround and retrying
                [    3.456395] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
                [    3.468932] ata2.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
                [    3.471995] ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
                [    3.488495] ata2.00: configured for UDMA/133
                [   35.922130] ata2.00: exception Emask 0x0 SAct 0x20000000 SErr 0x80000 action 0x6 frozen
                [   35.926153] ata2: SError: { 10B8B }
                [   35.930101] ata2.00: failed command: READ FPDMA QUEUED
                [   35.934077] ata2.00: cmd 60/08:e8:00:00:00/00:00:00:00:00/40 tag 29 ncq dma 4096 in
                [   35.942198] ata2.00: status: { DRDY }
                [   35.946311] ata2: hard resetting link
                [   36.426065] ata2: softreset failed (device not ready)
                [   36.430247] ata2: applying PMP SRST workaround and retrying
                [   36.594063] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
                [   36.603874] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100)
                [   36.608124] ata2.00: revalidation failed (errno=-5)
                [   41.797584] ata2: hard resetting link
                [   42.277519] ata2: softreset failed (device not ready)
                [   42.281749] ata2: applying PMP SRST workaround and retrying
                [   42.445513] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
                [   42.455368] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100)
                [   42.459702] ata2.00: revalidation failed (errno=-5)
                [   42.464016] ata2: limiting SATA link speed to 1.5 Gbps
                [   47.685034] ata2: hard resetting link
                [   48.164970] ata2: softreset failed (device not ready)
                [   48.169275] ata2: applying PMP SRST workaround and retrying
                [   48.332964] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
                [   48.348678] ata2.00: configured for UDMA/133
                [   48.352981] ata2.00: device reported invalid CHS sector 0
                [   48.357355] ata2: EH complete

'ata2' corresponds to /dev/sdb on my system

        find /dev/disk | egrep -i "ata2|ata-2"
                /dev/disk/by-path/pci-0000:00:11.0-ata-2-part1
                /dev/disk/by-path/pci-0000:00:11.0-ata-2

        ls -al `find /dev/disk | egrep -i "ata2|ata-2"`
                lrwxrwxrwx 1 root root  9 Apr 29 10:50 /dev/disk/by-path/pci-0000:00:11.0-ata-2 -> ../../sdb
                lrwxrwxrwx 1 root root 10 Apr 29 10:50 /dev/disk/by-path/pci-0000:00:11.0-ata-2-part1 -> ../../sdb1

Checking the smartctl data for that drive

        smartctl -H /dev/sdb
                smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.13-2.ge5d11ce-default] (SUSE RPM)
                Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

                === START OF READ SMART DATA SECTION ===
                SMART overall-health self-assessment test result: PASSED

        smartctl -x /dev/sdb
                smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.13-2.ge5d11ce-default] (SUSE RPM)
                Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

                === START OF INFORMATION SECTION ===
                Model Family:     SAMSUNG SpinPoint F3
                Device Model:     SAMSUNG HD103SJ
                Serial Number:    S246J9EZC03669
                LU WWN Device Id: 5 0024e9 2042536ec
                Firmware Version: 1AJ10001
                User Capacity:    1,000,204,886,016 bytes [1.00 TB]
                Sector Size:      512 bytes logical/physical
                Rotation Rate:    7200 rpm
                Form Factor:      3.5 inches
                Device is:        In smartctl database [for details use: -P show]
                ATA Version is:   ATA8-ACS T13/1699-D revision 6
                SATA Version is:  SATA 2.6, 3.0 Gb/s
                Local Time is:    Sat Apr 29 10:57:37 2017 PDT
                SMART support is: Available - device has SMART capability.
                SMART support is: Enabled
                AAM feature is:   Disabled
                APM feature is:   Disabled
                Rd look-ahead is: Enabled
                Write cache is:   Enabled
                ATA Security is:  Disabled, NOT FROZEN [SEC1]
                Wt Cache Reorder: Enabled

                === START OF READ SMART DATA SECTION ===
                SMART overall-health self-assessment test result: PASSED

                General SMART Values:
                Offline data collection status:  (0x82) Offline data collection activity
                                                        was completed without error.
                                                        Auto Offline Data Collection: Enabled.
                Self-test execution status:      (   0) The previous self-test routine completed
                                                        without error or no self-test has ever
                                                        been run.
                Total time to complete Offline
                data collection:                ( 9360) seconds.
                Offline data collection
                capabilities:                    (0x5b) SMART execute Offline immediate.
                                                        Auto Offline data collection on/off support.
                                                        Suspend Offline collection upon new
                                                        command.
                                                        Offline surface scan supported.
                                                        Self-test supported.
                                                        No Conveyance Self-test supported.
                                                        Selective Self-test supported.
                SMART capabilities:            (0x0003) Saves SMART data before entering
                                                        power-saving mode.
                                                        Supports SMART auto save timer.
                Error logging capability:        (0x01) Error logging supported.
                                                        General Purpose Logging supported.
                Short self-test routine
                recommended polling time:        (   2) minutes.
                Extended self-test routine
                recommended polling time:        ( 156) minutes.
                SCT capabilities:              (0x003f) SCT Status supported.
                                                        SCT Error Recovery Control supported.
                                                        SCT Feature Control supported.
                                                        SCT Data Table supported.

                SMART Attributes Data Structure revision number: 16
                Vendor Specific SMART Attributes with Thresholds:
                ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
                  1 Raw_Read_Error_Rate     POSR-K   100   100   051    -    0
                  2 Throughput_Performance  -OS--K   056   054   000    -    8531
                  3 Spin_Up_Time            PO---K   075   069   025    -    7623
                  4 Start_Stop_Count        -O--CK   098   098   000    -    2285
                  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
                  7 Seek_Error_Rate         -OSR-K   252   252   051    -    0
                  8 Seek_Time_Performance   --S--K   252   252   015    -    0
                  9 Power_On_Hours          -O--CK   100   100   000    -    27432
                 10 Spin_Retry_Count        -O--CK   252   252   051    -    0
                 11 Calibration_Retry_Count -O--CK   252   252   000    -    0
                 12 Power_Cycle_Count       -O--CK   098   098   000    -    2236
                191 G-Sense_Error_Rate      -O---K   100   100   000    -    24
                192 Power-Off_Retract_Count -O---K   252   252   000    -    0
                194 Temperature_Celsius     -O----   064   048   000    -    33 (Min/Max 15/52)
                195 Hardware_ECC_Recovered  -O-RCK   100   100   000    -    0
                196 Reallocated_Event_Count -O--CK   252   252   000    -    0
                197 Current_Pending_Sector  -O--CK   252   252   000    -    0
                198 Offline_Uncorrectable   ----CK   252   252   000    -    0
                199 UDMA_CRC_Error_Count    -OS-CK   098   098   000    -    1162
                200 Multi_Zone_Error_Rate   -O-R-K   100   100   000    -    0
                223 Load_Retry_Count        -O--CK   252   252   000    -    0
                225 Load_Cycle_Count        -O--CK   100   100   000    -    2301
                                            ||||||_ K auto-keep
                                            |||||__ C event count
                                            ||||___ R error rate
                                            |||____ S speed/performance
                                            ||_____ O updated online
                                            |______ P prefailure warning

                General Purpose Log Directory Version 1
                SMART           Log Directory Version 1 [multi-sector log support]
                Address    Access  R/W   Size  Description
                0x00       GPL,SL  R/O      1  Log Directory
                0x01           SL  R/O      1  Summary SMART error log
                0x02           SL  R/O      2  Comprehensive SMART error log
                0x03       GPL     R/O      2  Ext. Comprehensive SMART error log
                0x06           SL  R/O      1  SMART self-test log
                0x07       GPL     R/O      2  Extended self-test log
                0x08       GPL     R/O      2  Power Conditions log
                0x09           SL  R/W      1  Selective self-test log
                0x10       GPL     R/O      1  SATA NCQ Queued Error log
                0x11       GPL     R/O      1  SATA Phy Event Counters log
                0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
                0xe0       GPL,SL  R/W      1  SCT Command/Status
                0xe1       GPL,SL  R/W      1  SCT Data Transfer

                SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
                Device Error Count: 1162 (device log contains only the most recent 8 errors)
                        CR     = Command Register
                        FEATR  = Features Register
                        COUNT  = Count (was: Sector Count) Register
                        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
                        LH     = LBA High (was: Cylinder High) Register    ]   LBA
                        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
                        LL     = LBA Low (was: Sector Number) Register     ]
                        DV     = Device (was: Device/Head) Register
                        DC     = Device Control Register
                        ER     = Error register
                        ST     = Status register
                Powered_Up_Time is measured from power on, and printed as
                DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
                SS=sec, and sss=millisec. It "wraps" after 49.710 days.

                Error 1162 [1] occurred at disk power-on lifetime: 27432 hours (1143 days + 0 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 01 00 08 00 00 00 00 00 00 40 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  60 00 00 00 08 00 00 00 00 00 00 40 00     00:00:00.000  READ FPDMA QUEUED
                  60 00 10 00 08 00 00 00 00 00 00 40 08     00:00:00.044  READ FPDMA QUEUED
                  ef 00 10 00 02 00 00 00 00 00 00 a0 08     00:00:00.043  SET FEATURES [Enable SATA feature]
                  27 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:00.043  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
                  ec 00 00 00 00 00 00 00 00 00 00 a0 08     00:00:00.043  IDENTIFY DEVICE

                Error 1161 [0] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 01 00 08 00 00 00 00 00 00 40 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  60 00 00 00 08 00 00 00 00 00 00 40 00     00:00:00.000  READ FPDMA QUEUED
                  60 00 10 00 08 00 00 00 00 00 00 40 08     00:00:00.046  READ FPDMA QUEUED
                  ef 00 10 00 02 00 00 00 00 00 00 a0 08     00:00:00.046  SET FEATURES [Enable SATA feature]
                  27 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:00.046  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
                  ec 00 00 00 00 00 00 00 00 00 00 a0 08     00:00:00.046  IDENTIFY DEVICE

                Error 1160 [7] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 51 00 01 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  25 20 20 00 01 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
                  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
                  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
                  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
                  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]

                Error 1159 [6] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
                  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
                  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
                  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
                  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]

                Error 1158 [5] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
                  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
                  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
                  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
                  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]

                Error 1157 [4] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
                  25 20 20 00 01 00 00 00 00 00 08 e0 00     00:00:00.036  READ DMA EXT
                  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
                  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
                  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]

                Error 1156 [3] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 51 00 09 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 9 sectors at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  25 20 20 00 01 00 00 00 00 00 08 e0 00     00:00:00.036  READ DMA EXT
                  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
                  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
                  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
                  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]

                Error 1155 [2] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
                  When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                  ER -- ST COUNT  LBA_48  LH LM LL DV DC
                  -- -- -- == -- == == == -- -- -- -- --
                  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0

                  Commands leading to the command that caused the error were:
                  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
                  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
                  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
                  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
                  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
                  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
                  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]

                SMART Extended Self-test Log Version: 1 (2 sectors)
                Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
                # 1  Short offline       Completed without error       00%      4157         -
                # 2  Short offline       Completed without error       00%      4114         -
                # 3  Extended offline    Completed without error       00%      4073         -
                # 4  Short offline       Completed without error       00%      4056         -
                # 5  Short offline       Completed without error       00%      4011         -
                # 6  Extended offline    Completed without error       00%      3968         -
                # 7  Short offline       Completed without error       00%      3949         -
                # 8  Short offline       Completed without error       00%      3905         -
                # 9  Extended offline    Completed without error       00%      3903         -
                #10  Short offline       Completed without error       00%      3846         -
                #11  Short offline       Completed without error       00%      3801         -
                #12  Extended offline    Completed without error       00%      3761         -
                #13  Short offline       Completed without error       00%      3745         -
                #14  Extended offline    Completed without error       00%      3664         -
                #15  Short offline       Completed without error       00%      3646         -
                #16  Short offline       Completed without error       00%      3606         -
                #17  Extended offline    Completed without error       00%      3568         -
                #18  Short offline       Completed without error       00%      3552         -
                #19  Short offline       Completed without error       00%      3507         -
                #20  Extended offline    Completed without error       00%      3468         -
                #21  Short offline       Completed without error       00%      3451         -

                SMART Selective self-test log data structure revision number 0
                Note: revision number not 1 implies that no selective self-test has ever been run
                 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
                    1        0        0  Completed [00% left] (0-65535)
                    2        0        0  Not_testing
                    3        0        0  Not_testing
                    4        0        0  Not_testing
                    5        0        0  Not_testing
                Selective self-test flags (0x0):
                  After scanning selected spans, do NOT read-scan remainder of disk.
                If Selective self-test is pending on power-up, resume after 0 minute delay.

                SCT Status Version:                  2
                SCT Version (vendor specific):       256 (0x0100)
                SCT Support Level:                   1
                Device State:                        Active (0)
                Current Temperature:                    33 Celsius
                Power Cycle Min/Max Temperature:     33/33 Celsius
                Lifetime    Min/Max Temperature:     18/65 Celsius
                Under/Over Temperature Limit Count:   0/0

                SCT Temperature History Version:     2
                Temperature Sampling Period:         5 minutes
                Temperature Logging Interval:        5 minutes
                Min/Max recommended Temperature:     -5/80 Celsius
                Min/Max Temperature Limit:           -10/85 Celsius
                Temperature History Size (Index):    128 (114)

                Index    Estimated Time   Temperature Celsius
                 115    2017-04-29 00:20    32  *************
                 ...    ..(  5 skipped).    ..  *************
                 121    2017-04-29 00:50    32  *************
                 122    2017-04-29 00:55    34  ***************
                 123    2017-04-29 01:00    36  *****************
                 124    2017-04-29 01:05    36  *****************
                 125    2017-04-29 01:10    37  ******************
                 ...    ..( 10 skipped).    ..  ******************
                   8    2017-04-29 02:05    37  ******************
                   9    2017-04-29 02:10    36  *****************
                  10    2017-04-29 02:15    35  ****************
                  11    2017-04-29 02:20    35  ****************
                  12    2017-04-29 02:25    34  ***************
                 ...    ..(  6 skipped).    ..  ***************
                  19    2017-04-29 03:00    34  ***************
                  20    2017-04-29 03:05    33  **************
                 ...    ..(  3 skipped).    ..  **************
                  24    2017-04-29 03:25    33  **************
                  25    2017-04-29 03:30    34  ***************
                 ...    ..( 26 skipped).    ..  ***************
                  52    2017-04-29 05:45    34  ***************
                  53    2017-04-29 05:50    35  ****************
                 ...    ..( 11 skipped).    ..  ****************
                  65    2017-04-29 06:50    35  ****************
                  66    2017-04-29 06:55    36  *****************
                  67    2017-04-29 07:00    35  ****************
                 ...    ..(  5 skipped).    ..  ****************
                  73    2017-04-29 07:30    35  ****************
                  74    2017-04-29 07:35    36  *****************
                  75    2017-04-29 07:40    35  ****************
                 ...    ..( 10 skipped).    ..  ****************
                  86    2017-04-29 08:35    35  ****************
                  87    2017-04-29 08:40    34  ***************
                  88    2017-04-29 08:45    33  **************
                  89    2017-04-29 08:50    33  **************
                  90    2017-04-29 08:55    21  **
                  91    2017-04-29 09:00    24  *****
                  92    2017-04-29 09:05    26  *******
                  93    2017-04-29 09:10    27  ********
                  94    2017-04-29 09:15    28  *********
                  95    2017-04-29 09:20    29  **********
                  96    2017-04-29 09:25    30  ***********
                  97    2017-04-29 09:30    30  ***********
                  98    2017-04-29 09:35    31  ************
                  99    2017-04-29 09:40    31  ************
                 100    2017-04-29 09:45    32  *************
                 ...    ..(  2 skipped).    ..  *************
                 103    2017-04-29 10:00    32  *************
                 104    2017-04-29 10:05    33  **************
                 ...    ..(  9 skipped).    ..  **************
                 114    2017-04-29 10:55    33  **************

                SCT Error Recovery Control:
                           Read: Disabled
                          Write: Disabled

                Device Statistics (GP/SMART Log 0x04) not supported

                SATA Phy Event Counters (GP Log 0x11)
                ID      Size     Value  Description
                0x0001  4            1  Command failed due to ICRC error
                0x0002  4            5  R_ERR response for data FIS
                0x0003  4            5  R_ERR response for device-to-host data FIS
                0x0004  4            0  R_ERR response for host-to-device data FIS
                0x0005  4            1  R_ERR response for non-data FIS
                0x0006  4            1  R_ERR response for device-to-host non-data FIS
                0x0007  4            0  R_ERR response for host-to-device non-data FIS
                0x0008  4            1  Device-to-host non-data FIS retries
                0x0009  4            8  Transition from drive PhyRdy to drive PhyNRdy
                0x000a  4            6  Device-to-host register FISes sent due to a COMRESET
                0x000b  4            0  CRC errors within host-to-device FIS
                0x000d  4            0  Non-CRC errors within host-to-device FIS
                0x000f  4            0  R_ERR response for host-to-device data FIS, CRC
                0x0010  4            0  R_ERR response for host-to-device data FIS, non-CRC
                0x0012  4            0  R_ERR response for host-to-device non-data FIS, CRC
                0x0013  4            0  R_ERR response for host-to-device non-data FIS, non-CRC
                0x8e00  4            1  Vendor specific
                0x8e01  4            6  Vendor specific
                0x8e02  4            0  Vendor specific
                0x8e03  4            0  Vendor specific
                0x8e04  4            0  Vendor specific
                0x8e05  4            0  Vendor specific
                0x8e06  4            1  Vendor specific
                0x8e07  4            2  Vendor specific
                0x8e08  4            6  Vendor specific
                0x8e09  4            0  Vendor specific
                0x8e0a  4           26  Vendor specific
                0x8e0b  4         1792  Vendor specific
                0x8e0c  4          110  Vendor specific
                0x8e0d  4            0  Vendor specific
                0x8e0e  4           26  Vendor specific
                0x8e0f  4            0  Vendor specific
                0x8e10  4          104  Vendor specific
                0x8e11  4            6  Vendor specific


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|

Re: Drive errors in boot logs, but NOT in smartctl, and still runs ok after completing boot. What's wrong with this drive?

Robert Spotswood
Number 199 is indicating a problem. *USUALLY* it's cable related. Try
replacing the sata cable and see if the raw value quits climbing. If you
don't feel like playing with it, you can also change the sata port it's
plugged into on the motherboard (or add-in card).


On 4/29/2017 1:09 PM, [hidden email] wrote:

> I've got a secondary (not 'boot' or 'root') drive attached in my system.
>
> It's "complaining" in boot logs, but seems to NOT have an issue reported by 'smartctl', and works OK after booting.
>
> I'd appreciate any ideas as to what the problem IS, or if there's really is one -- and what to do to fix it.  Entirely poosible that I just don't understand what I'm seeing here :-/
>
> On boot I see these messages, but the system ends up booted and functional -- including all the data on this drive,
>
> dmesg | egrep -i "ata2|ata-2"
> [    2.790815] ata2: SATA max UDMA/133 abar m1024@0xf9fff800 port 0xf9fff980 irq 22
> [    3.273185] ata2: softreset failed (device not ready)
> [    3.276322] ata2: applying PMP SRST workaround and retrying
> [    3.456395] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [    3.468932] ata2.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
> [    3.471995] ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
> [    3.488495] ata2.00: configured for UDMA/133
> [   35.922130] ata2.00: exception Emask 0x0 SAct 0x20000000 SErr 0x80000 action 0x6 frozen
> [   35.926153] ata2: SError: { 10B8B }
> [   35.930101] ata2.00: failed command: READ FPDMA QUEUED
> [   35.934077] ata2.00: cmd 60/08:e8:00:00:00/00:00:00:00:00/40 tag 29 ncq dma 4096 in
> [   35.942198] ata2.00: status: { DRDY }
> [   35.946311] ata2: hard resetting link
> [   36.426065] ata2: softreset failed (device not ready)
> [   36.430247] ata2: applying PMP SRST workaround and retrying
> [   36.594063] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [   36.603874] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100)
> [   36.608124] ata2.00: revalidation failed (errno=-5)
> [   41.797584] ata2: hard resetting link
> [   42.277519] ata2: softreset failed (device not ready)
> [   42.281749] ata2: applying PMP SRST workaround and retrying
> [   42.445513] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [   42.455368] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100)
> [   42.459702] ata2.00: revalidation failed (errno=-5)
> [   42.464016] ata2: limiting SATA link speed to 1.5 Gbps
> [   47.685034] ata2: hard resetting link
> [   48.164970] ata2: softreset failed (device not ready)
> [   48.169275] ata2: applying PMP SRST workaround and retrying
> [   48.332964] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [   48.348678] ata2.00: configured for UDMA/133
> [   48.352981] ata2.00: device reported invalid CHS sector 0
> [   48.357355] ata2: EH complete
>
> 'ata2' corresponds to /dev/sdb on my system
>
> find /dev/disk | egrep -i "ata2|ata-2"
> /dev/disk/by-path/pci-0000:00:11.0-ata-2-part1
> /dev/disk/by-path/pci-0000:00:11.0-ata-2
>
> ls -al `find /dev/disk | egrep -i "ata2|ata-2"`
> lrwxrwxrwx 1 root root  9 Apr 29 10:50 /dev/disk/by-path/pci-0000:00:11.0-ata-2 -> ../../sdb
> lrwxrwxrwx 1 root root 10 Apr 29 10:50 /dev/disk/by-path/pci-0000:00:11.0-ata-2-part1 -> ../../sdb1
>
> Checking the smartctl data for that drive
>
> smartctl -H /dev/sdb
> smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.13-2.ge5d11ce-default] (SUSE RPM)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> smartctl -x /dev/sdb
> smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.13-2.ge5d11ce-default] (SUSE RPM)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family:     SAMSUNG SpinPoint F3
> Device Model:     SAMSUNG HD103SJ
> Serial Number:    S246J9EZC03669
> LU WWN Device Id: 5 0024e9 2042536ec
> Firmware Version: 1AJ10001
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    7200 rpm
> Form Factor:      3.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ATA8-ACS T13/1699-D revision 6
> SATA Version is:  SATA 2.6, 3.0 Gb/s
> Local Time is:    Sat Apr 29 10:57:37 2017 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> AAM feature is:   Disabled
> APM feature is:   Disabled
> Rd look-ahead is: Enabled
> Write cache is:   Enabled
> ATA Security is:  Disabled, NOT FROZEN [SEC1]
> Wt Cache Reorder: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x82) Offline data collection activity
>                                        was completed without error.
>                                        Auto Offline Data Collection: Enabled.
> Self-test execution status:      (   0) The previous self-test routine completed
>                                        without error or no self-test has ever
>                                        been run.
> Total time to complete Offline
> data collection:                ( 9360) seconds.
> Offline data collection
> capabilities:                    (0x5b) SMART execute Offline immediate.
>                                        Auto Offline data collection on/off support.
>                                        Suspend Offline collection upon new
>                                        command.
>                                        Offline surface scan supported.
>                                        Self-test supported.
>                                        No Conveyance Self-test supported.
>                                        Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                        power-saving mode.
>                                        Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                        General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        ( 156) minutes.
> SCT capabilities:              (0x003f) SCT Status supported.
>                                        SCT Error Recovery Control supported.
>                                        SCT Feature Control supported.
>                                        SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
>  1 Raw_Read_Error_Rate     POSR-K   100   100   051    -    0
>  2 Throughput_Performance  -OS--K   056   054   000    -    8531
>  3 Spin_Up_Time            PO---K   075   069   025    -    7623
>  4 Start_Stop_Count        -O--CK   098   098   000    -    2285
>  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
>  7 Seek_Error_Rate         -OSR-K   252   252   051    -    0
>  8 Seek_Time_Performance   --S--K   252   252   015    -    0
>  9 Power_On_Hours          -O--CK   100   100   000    -    27432
> 10 Spin_Retry_Count        -O--CK   252   252   051    -    0
> 11 Calibration_Retry_Count -O--CK   252   252   000    -    0
> 12 Power_Cycle_Count       -O--CK   098   098   000    -    2236
> 191 G-Sense_Error_Rate      -O---K   100   100   000    -    24
> 192 Power-Off_Retract_Count -O---K   252   252   000    -    0
> 194 Temperature_Celsius     -O----   064   048   000    -    33 (Min/Max 15/52)
> 195 Hardware_ECC_Recovered  -O-RCK   100   100   000    -    0
> 196 Reallocated_Event_Count -O--CK   252   252   000    -    0
> 197 Current_Pending_Sector  -O--CK   252   252   000    -    0
> 198 Offline_Uncorrectable   ----CK   252   252   000    -    0
> 199 UDMA_CRC_Error_Count    -OS-CK   098   098   000    -    1162
> 200 Multi_Zone_Error_Rate   -O-R-K   100   100   000    -    0
> 223 Load_Retry_Count        -O--CK   252   252   000    -    0
> 225 Load_Cycle_Count        -O--CK   100   100   000    -    2301
>                            ||||||_ K auto-keep
>                            |||||__ C event count
>                            ||||___ R error rate
>                            |||____ S speed/performance
>                            ||_____ O updated online
>                            |______ P prefailure warning
>
> General Purpose Log Directory Version 1
> SMART           Log Directory Version 1 [multi-sector log support]
> Address    Access  R/W   Size  Description
> 0x00       GPL,SL  R/O      1  Log Directory
> 0x01           SL  R/O      1  Summary SMART error log
> 0x02           SL  R/O      2  Comprehensive SMART error log
> 0x03       GPL     R/O      2  Ext. Comprehensive SMART error log
> 0x06           SL  R/O      1  SMART self-test log
> 0x07       GPL     R/O      2  Extended self-test log
> 0x08       GPL     R/O      2  Power Conditions log
> 0x09           SL  R/W      1  Selective self-test log
> 0x10       GPL     R/O      1  SATA NCQ Queued Error log
> 0x11       GPL     R/O      1  SATA Phy Event Counters log
> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
> 0xe0       GPL,SL  R/W      1  SCT Command/Status
> 0xe1       GPL,SL  R/W      1  SCT Data Transfer
>
> SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
> Device Error Count: 1162 (device log contains only the most recent 8 errors)
>        CR     = Command Register
>        FEATR  = Features Register
>        COUNT  = Count (was: Sector Count) Register
>        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
>        LH     = LBA High (was: Cylinder High) Register    ]   LBA
>        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
>        LL     = LBA Low (was: Sector Number) Register     ]
>        DV     = Device (was: Device/Head) Register
>        DC     = Device Control Register
>        ER     = Error register
>        ST     = Status register
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 1162 [1] occurred at disk power-on lifetime: 27432 hours (1143 days + 0 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 01 00 08 00 00 00 00 00 00 40 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  60 00 00 00 08 00 00 00 00 00 00 40 00     00:00:00.000  READ FPDMA QUEUED
>  60 00 10 00 08 00 00 00 00 00 00 40 08     00:00:00.044  READ FPDMA QUEUED
>  ef 00 10 00 02 00 00 00 00 00 00 a0 08     00:00:00.043  SET FEATURES [Enable SATA feature]
>  27 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:00.043  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
>  ec 00 00 00 00 00 00 00 00 00 00 a0 08     00:00:00.043  IDENTIFY DEVICE
>
> Error 1161 [0] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 01 00 08 00 00 00 00 00 00 40 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  60 00 00 00 08 00 00 00 00 00 00 40 00     00:00:00.000  READ FPDMA QUEUED
>  60 00 10 00 08 00 00 00 00 00 00 40 08     00:00:00.046  READ FPDMA QUEUED
>  ef 00 10 00 02 00 00 00 00 00 00 a0 08     00:00:00.046  SET FEATURES [Enable SATA feature]
>  27 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:00.046  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
>  ec 00 00 00 00 00 00 00 00 00 00 a0 08     00:00:00.046  IDENTIFY DEVICE
>
> Error 1160 [7] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 51 00 01 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  25 20 20 00 01 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
>  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
>  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
>  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
>  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]
>
> Error 1159 [6] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
>  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
>  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
>  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
>  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]
>
> Error 1158 [5] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
>  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
>  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
>  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
>  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]
>
> Error 1157 [4] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
>  25 20 20 00 01 00 00 00 00 00 08 e0 00     00:00:00.036  READ DMA EXT
>  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
>  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
>  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
>
> Error 1156 [3] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 51 00 09 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 9 sectors at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  25 20 20 00 01 00 00 00 00 00 08 e0 00     00:00:00.036  READ DMA EXT
>  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
>  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
>  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
>  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]
>
> Error 1155 [2] occurred at disk power-on lifetime: 27431 hours (1142 days + 23 hours)
>  When the command that caused the error occurred, the device was active or idle.
>
>  After command completion occurred, registers were:
>  ER -- ST COUNT  LBA_48  LH LM LL DV DC
>  -- -- -- == -- == == == -- -- -- -- --
>  84 -- 51 00 3f 00 00 00 00 00 00 e0 00  Error: ICRC, ABRT 63 sectors at LBA = 0x00000000 = 0
>
>  Commands leading to the command that caused the error were:
>  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
>  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
>  25 20 20 00 3f 00 00 00 00 00 00 e0 00     00:00:00.036  READ DMA EXT
>  c6 00 20 00 10 00 00 00 00 00 00 ef 00     00:00:00.036  SET MULTIPLE MODE
>  91 00 20 00 3f 00 00 00 00 00 00 ef 00     00:00:00.036  INITIALIZE DEVICE PARAMETERS [OBS-6]
>  10 00 20 00 01 00 00 00 00 00 01 e0 00     00:00:00.036  RECALIBRATE [OBS-4]
>  00 00 00 00 01 00 00 00 00 00 01 40 00     00:00:00.036  NOP [Abort queued commands]
>
> SMART Extended Self-test Log Version: 1 (2 sectors)
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%      4157         -
> # 2  Short offline       Completed without error       00%      4114         -
> # 3  Extended offline    Completed without error       00%      4073         -
> # 4  Short offline       Completed without error       00%      4056         -
> # 5  Short offline       Completed without error       00%      4011         -
> # 6  Extended offline    Completed without error       00%      3968         -
> # 7  Short offline       Completed without error       00%      3949         -
> # 8  Short offline       Completed without error       00%      3905         -
> # 9  Extended offline    Completed without error       00%      3903         -
> #10  Short offline       Completed without error       00%      3846         -
> #11  Short offline       Completed without error       00%      3801         -
> #12  Extended offline    Completed without error       00%      3761         -
> #13  Short offline       Completed without error       00%      3745         -
> #14  Extended offline    Completed without error       00%      3664         -
> #15  Short offline       Completed without error       00%      3646         -
> #16  Short offline       Completed without error       00%      3606         -
> #17  Extended offline    Completed without error       00%      3568         -
> #18  Short offline       Completed without error       00%      3552         -
> #19  Short offline       Completed without error       00%      3507         -
> #20  Extended offline    Completed without error       00%      3468         -
> #21  Short offline       Completed without error       00%      3451         -
>
> SMART Selective self-test log data structure revision number 0
> Note: revision number not 1 implies that no selective self-test has ever been run
> SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Completed [00% left] (0-65535)
>    2        0        0  Not_testing
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> SCT Status Version:                  2
> SCT Version (vendor specific):       256 (0x0100)
> SCT Support Level:                   1
> Device State:                        Active (0)
> Current Temperature:                    33 Celsius
> Power Cycle Min/Max Temperature:     33/33 Celsius
> Lifetime    Min/Max Temperature:     18/65 Celsius
> Under/Over Temperature Limit Count:   0/0
>
> SCT Temperature History Version:     2
> Temperature Sampling Period:         5 minutes
> Temperature Logging Interval:        5 minutes
> Min/Max recommended Temperature:     -5/80 Celsius
> Min/Max Temperature Limit:           -10/85 Celsius
> Temperature History Size (Index):    128 (114)
>
> Index    Estimated Time   Temperature Celsius
> 115    2017-04-29 00:20    32  *************
> ...    ..(  5 skipped).    ..  *************
> 121    2017-04-29 00:50    32  *************
> 122    2017-04-29 00:55    34  ***************
> 123    2017-04-29 01:00    36  *****************
> 124    2017-04-29 01:05    36  *****************
> 125    2017-04-29 01:10    37  ******************
> ...    ..( 10 skipped).    ..  ******************
>   8    2017-04-29 02:05    37  ******************
>   9    2017-04-29 02:10    36  *****************
>  10    2017-04-29 02:15    35  ****************
>  11    2017-04-29 02:20    35  ****************
>  12    2017-04-29 02:25    34  ***************
> ...    ..(  6 skipped).    ..  ***************
>  19    2017-04-29 03:00    34  ***************
>  20    2017-04-29 03:05    33  **************
> ...    ..(  3 skipped).    ..  **************
>  24    2017-04-29 03:25    33  **************
>  25    2017-04-29 03:30    34  ***************
> ...    ..( 26 skipped).    ..  ***************
>  52    2017-04-29 05:45    34  ***************
>  53    2017-04-29 05:50    35  ****************
> ...    ..( 11 skipped).    ..  ****************
>  65    2017-04-29 06:50    35  ****************
>  66    2017-04-29 06:55    36  *****************
>  67    2017-04-29 07:00    35  ****************
> ...    ..(  5 skipped).    ..  ****************
>  73    2017-04-29 07:30    35  ****************
>  74    2017-04-29 07:35    36  *****************
>  75    2017-04-29 07:40    35  ****************
> ...    ..( 10 skipped).    ..  ****************
>  86    2017-04-29 08:35    35  ****************
>  87    2017-04-29 08:40    34  ***************
>  88    2017-04-29 08:45    33  **************
>  89    2017-04-29 08:50    33  **************
>  90    2017-04-29 08:55    21  **
>  91    2017-04-29 09:00    24  *****
>  92    2017-04-29 09:05    26  *******
>  93    2017-04-29 09:10    27  ********
>  94    2017-04-29 09:15    28  *********
>  95    2017-04-29 09:20    29  **********
>  96    2017-04-29 09:25    30  ***********
>  97    2017-04-29 09:30    30  ***********
>  98    2017-04-29 09:35    31  ************
>  99    2017-04-29 09:40    31  ************
> 100    2017-04-29 09:45    32  *************
> ...    ..(  2 skipped).    ..  *************
> 103    2017-04-29 10:00    32  *************
> 104    2017-04-29 10:05    33  **************
> ...    ..(  9 skipped).    ..  **************
> 114    2017-04-29 10:55    33  **************
>
> SCT Error Recovery Control:
>           Read: Disabled
>          Write: Disabled
>
> Device Statistics (GP/SMART Log 0x04) not supported
>
> SATA Phy Event Counters (GP Log 0x11)
> ID      Size     Value  Description
> 0x0001  4            1  Command failed due to ICRC error
> 0x0002  4            5  R_ERR response for data FIS
> 0x0003  4            5  R_ERR response for device-to-host data FIS
> 0x0004  4            0  R_ERR response for host-to-device data FIS
> 0x0005  4            1  R_ERR response for non-data FIS
> 0x0006  4            1  R_ERR response for device-to-host non-data FIS
> 0x0007  4            0  R_ERR response for host-to-device non-data FIS
> 0x0008  4            1  Device-to-host non-data FIS retries
> 0x0009  4            8  Transition from drive PhyRdy to drive PhyNRdy
> 0x000a  4            6  Device-to-host register FISes sent due to a COMRESET
> 0x000b  4            0  CRC errors within host-to-device FIS
> 0x000d  4            0  Non-CRC errors within host-to-device FIS
> 0x000f  4            0  R_ERR response for host-to-device data FIS, CRC
> 0x0010  4            0  R_ERR response for host-to-device data FIS, non-CRC
> 0x0012  4            0  R_ERR response for host-to-device non-data FIS, CRC
> 0x0013  4            0  R_ERR response for host-to-device non-data FIS, non-CRC
> 0x8e00  4            1  Vendor specific
> 0x8e01  4            6  Vendor specific
> 0x8e02  4            0  Vendor specific
> 0x8e03  4            0  Vendor specific
> 0x8e04  4            0  Vendor specific
> 0x8e05  4            0  Vendor specific
> 0x8e06  4            1  Vendor specific
> 0x8e07  4            2  Vendor specific
> 0x8e08  4            6  Vendor specific
> 0x8e09  4            0  Vendor specific
> 0x8e0a  4           26  Vendor specific
> 0x8e0b  4         1792  Vendor specific
> 0x8e0c  4          110  Vendor specific
> 0x8e0d  4            0  Vendor specific
> 0x8e0e  4           26  Vendor specific
> 0x8e0f  4            0  Vendor specific
> 0x8e10  4          104  Vendor specific
> 0x8e11  4            6  Vendor specific
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Smartmontools-support mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/smartmontools-support


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|

Re: Drive errors in boot logs, but NOT in smartctl, and still runs ok after completing boot. What's wrong with this drive?

albion


On Sat, Apr 29, 2017, at 11:25 AM, Robert S wrote:
> Number 199 is indicating a problem. *USUALLY* it's cable related. Try
> replacing the sata cable and see if the raw value quits climbing. If you
> don't feel like playing with it, you can also change the sata port it's
> plugged into on the motherboard (or add-in card).

Wow. Nice call, I woulda never have found that!  Never thought that a cable problem would allow the drive to boot and work afterwards ...

Swapped  the cable, and :

dmesg | egrep -i "ata2|ata-2"
        [    2.771065] ata2: SATA max UDMA/133 abar m1024@0xf9fff800 port 0xf9fffa80 irq 22
        [    3.259806] ata2: softreset failed (device not ready)
        [    3.262990] ata2: applying PMP SRST workaround and retrying
        [    3.439709] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
        [    3.455067] ata2.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
        [    3.458119] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
        [    3.467512] ata2.00: configured for UDMA/133

Thanks a LOT!

p.s. that "applying PMP SRST workaround and retrying" appears to be the kernel successfully working around a problem becuase of "CONFIG_SATA_PMP=y" in the kernel config.  Not sure that I can do anything abt that without compiling the kernel with a different option.  Or whether I need to.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|

Re: Drive errors in boot logs, but NOT in smartctl, and still runs ok after completing boot. What's wrong with this drive?

Robert Spotswood
The problem is the data is getting corrupted in transit, some of the time.
Kind of like a static phone call. Usually you can make out what they are
saying, but not always. The CRC is catching those errors and the data is
being re-transmitted. Hence if the problem is intermittent, the drive will
still work, although slower.

>
>
> On Sat, Apr 29, 2017, at 11:25 AM, Robert S wrote:
>> Number 199 is indicating a problem. *USUALLY* it's cable related. Try
>> replacing the sata cable and see if the raw value quits climbing. If you
>> don't feel like playing with it, you can also change the sata port it's
>> plugged into on the motherboard (or add-in card).
>
> Wow. Nice call, I woulda never have found that!  Never thought that a
> cable problem would allow the drive to boot and work afterwards ...
>
> Swapped  the cable, and :
>
> dmesg | egrep -i "ata2|ata-2"
> [    2.771065] ata2: SATA max UDMA/133 abar m1024@0xf9fff800 port
> 0xf9fffa80 irq 22
> [    3.259806] ata2: softreset failed (device not ready)
> [    3.262990] ata2: applying PMP SRST workaround and retrying
> [    3.439709] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [    3.455067] ata2.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
> [    3.458119] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth
> 31/32), AA
> [    3.467512] ata2.00: configured for UDMA/133
>
> Thanks a LOT!
>
> p.s. that "applying PMP SRST workaround and retrying" appears to be the
> kernel successfully working around a problem becuase of
> "CONFIG_SATA_PMP=y" in the kernel config.  Not sure that I can do anything
> abt that without compiling the kernel with a different option.  Or whether
> I need to.
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Smartmontools-support mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/smartmontools-support
>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support