Trouble using smartctl with LSI megaraid controller

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Trouble using smartctl with LSI megaraid controller

alohaaaron
Hello,  
I am able to run the commands below successfully on the command line but I am not able to get them to work in the /etc/smartd.conf file.  

CentOS 6.8
smartmontools.x86_64  1:5.43-1.el6

#from the shell
Success!  Results at end of email.
smartctl -s on -d sat+megaraid,30 /dev/sda
smartctl -a -d sat+megaraid,30  /dev/sda

A few questions....
1. Does anyone know the correct syntax to get it working in the smartd.conf file?
2. There are two partitions in this system /dev/sda and /dev/sdb.  How do I know which drive Id (DID) belongs to which partition and does it matter if I assign a physical drive to the incorrect partition?
3. Is there a way to check all the drives on a single partition at once instead of having to check them individually with the sat+megaraid,N syntax?  There are 20 physical drives on this system.
4.  Once I enable smartctl on the command line for a drive is it on permanently or does it need to be enabled through the smartd.conf file?

Thanks for the help!

#smartd.conf file
Failure!
/dev/sda -H -d sat+megaraid,30  -s S/../../2/03

#/var/log/messages after restart of smartd service
Feb 11 12:23:16 db011 smartd[3751]: Opened configuration file /etc/smartd.conf
Feb 11 12:23:16 db011 smartd[3751]: Configuration file /etc/smartd.conf parsed.
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30] [SAT], opened
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30] [SAT], WDC WD2500BHTZ-04JCPV0, S/N:WD-WX11E23LZ256, WWN:5-0014ee-6ae7ef044, FW:04.06A00, 250 GB
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30] [SAT], not found in smartd database.
Feb 11 12:23:16 db011 smartd[3751]: Device: /dev/sda [megaraid_disk_30] [SAT], not capable of SMART Health Status check
Feb 11 12:23:16 db011 smartd[3751]: Unable to register ATA device /dev/sda [megaraid_disk_30] [SAT] at line 28 of file /etc/smartd.conf
Feb 11 12:23:16 db011 smartd[3751]: Device /dev/sda [megaraid_disk_30] [SAT] not available
Feb 11 12:23:16 db011 smartd[3751]: Monitoring 0 ATA and 0 SCSI devices
Feb 11 12:23:16 db011 smartd[3753]: smartd has fork()ed into background mode. New PID=3753.

Drive list here.
storcli64 /c0 /eall /sall show
-------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                  Sp
-------------------------------------------------------------------------------
29:0     30 Onln   0 232.375 GB SATA HDD N   N  512B WDC WD2500BHTZ-04JCPV0 U
29:1     31 Onln   0 232.375 GB SATA HDD N   N  512B WDC WD2500BHTZ-04JCPV0 U
29:2     35 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:3     33 GHS    - 372.093 GB SATA SSD N   N  512B SDLFODAM-400G-1HA1     U
29:4     39 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:5     34 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:6     37 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:7     32 Onln   1 372.093 GB SATA SSD N   N  512B SDLFODAM-400G-1HA1     U
29:8     42 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:9     38 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:10    36 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:11    41 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:12    45 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:13    43 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:14    40 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:15    44 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:16    47 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:17    48 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:18    49 GHS    - 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
29:19    46 Onln   1 372.093 GB SATA SSD N   N  512B SDLFGD7R-400G-1HA1     U
-------------------------------------------------------------------------------


#Results of smartctl -a -d sat+megaraid,30  /dev/sda 
=====================================================================
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2500BHTZ-04JCPV0
Serial Number:    WD-WX11E23LZ256
LU WWN Device Id: 5 0014ee 6ae7ef044
Firmware Version: 04.06A00
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Feb 11 12:32:31 2017 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2400) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: (   2) minutes.
Extended self-test routine
recommended polling time: (  31) minutes.
Conveyance self-test routine
recommended polling time: (   5) minutes.
SCT capabilities:       (0x30bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   177   177   021    Pre-fail  Always       -       2108
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       17
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   071   071   000    Old_age   Always       -       21799
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       16
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   119   107   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


#Results of smartctl -s on -d sat+megaraid,30  /dev/sda
======================================================================
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trouble using smartctl with LSI megaraid controller

mathog
On 11-Feb-2017 12:48, [hidden email] wrote:
> A few questions....
> 1. Does anyone know the correct syntax to get it working in the
> smartd.conf
> file?

Sorry, no.

> 2. There are two partitions in this system /dev/sda and /dev/sdb.  How
> do I
> know which drive Id (DID) belongs to which partition and does it matter
> if
> I assign a physical drive to the incorrect partition?

Get the "megacli" program and install it.  Then you can run things like
this:

SNAME=`hostname -s`
NOW=`date`
OFILE=/root/$SNAME.megaraid.info

echo "Megaraid information for $SNAME collected $NOW" > $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "General Information" >> $OFILE
megacli -AdpAllInfo      -aAll | tr -d '\r' >> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Battery backup" >> $OFILE
megacli -AdpBbuCmd       -aAll | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Virtual disks" >> $OFILE
megacli -LDInfo    -Lall -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Physical drives" >> $OFILE
megacli -PDList          -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Patrol read" >> $OFILE
megacli -AdpPR     -Info -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "DONE" >> $OFILE


> 3. Is there a way to check all the drives on a single partition at once
> instead of having to check them individually with the sat+megaraid,N
> syntax?  There are 20 physical drives on this system.

You mean with one "test all disks" smartctl command?  No, I don't think
that is possible.
You can certainly start all of the tests sequentially with no delay,
which is pretty
much the same thing.  Then wait however long is necessary for that disk
type and read back all the results.  In theory.  For some reason my test
script does them sequentially - I may have been worried about  what the
RAID would do if all the disks were busy self testing at the same time.  
Doing them sequentially in a script (with or without delays) isn't a big
deal, something along the lines of:

/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,0
sleep 10300
logger "Done smartctl -t long  /dev/sda -d sat+megaraid,0"
/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,1
sleep 10300
logger "Done smartctl -t long  /dev/sda -d sat+megaraid,1"
etc.

then read all the results.  Use a loop if you don't want to write them
all out. It would also be fine to read the result for a disk immediately
after each long test completes.

> 4.  Once I enable smartctl on the command line for a drive is it on
> permanently or does it need to be enabled through the smartd.conf file?

Once enabled it should stay on until the system reboots.  That command
changes a state
which is stored on the drive.  The state will not survive a power cycle.

Regards,

David Mathog
[hidden email]
Manager, Sequence Analysis Facility, Biology Division, Caltech

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trouble using smartctl with LSI megaraid controller

Håkon Alstadheim


Den 13. feb. 2017 18:59, skrev mathog:
> On 11-Feb-2017 12:48, [hidden email] wrote:
>> A few questions....
>> 1. Does anyone know the correct syntax to get it working in the
>> smartd.conf
>> file?
me>

> Sorry, no.
>
>> 2. There are two partitions in this system /dev/sda and /dev/sdb.  How
>> do I
>> know which drive Id (DID) belongs to which partition and does it matter
>> if
>> I assign a physical drive to the incorrect partition?
>
> Get the "megacli" program and install it.  Then you can run things like
> this:
Second that, get megacli. Interface is atrocious, but there is help in
the program. Wrap the program in a script to reduce typing. The commands
are almost, but not completely, grouped in a logical way, with almost,
but not completely, consistent terminology and abbreviations.


I use:
-----/usr/local/bin/megacli:--------
#!/bin/sh
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/megacli/lib"
if test "$1" = "help" || test "$1" = "-help" ; then
    echo /opt/megacli/megacli "$@"
    exec /opt/megacli/megacli "$@"
else
    echo /opt/megacli/megacli "$@" a0 nolog
    exec /opt/megacli/megacli "$@" a0 nolog
fi
-----------------------

Before I got my head around the options, i hacked up a search facility
for the help-texts (see megahelp attached). The most useful way to run
megahelp is like:
$ megahelp <substring>
Where <substring> is something like "ld" or "pd". This will output a
(short) help-text on all commands containing that substring. If you run
it like:
$ megahelp -desc <substring>
it will return hits in the descriptions, rather than the commands
themselves. If you run it like:
$ megahelp -all
You get all possible sub-commands, with no help. Page through that and
look at specific commands with "megahelp <command>".


Everything after __DATA__ in the script attached is output from the
command "megacli help". You should replace it with output from YOUR
version. I also noticed that you get different output from "megacli
help" than you get by querying help for each individual command, so if
you search for a specific command, you get the output of "megacli help
<command>" rather than just an excerpt from the output of "megacli help" .

If you don't have a wrapper like this, there is a built-in pager in the
megacli program. Clues to how that works are in the header or footer of
the "megacli help" output. Never used it myself.

Once you know your way around you can find your way from logical disk
("ld") , to pci-id ("megacli adpgetpciinfo a0 nolog"), to
/dev/disk/by-path/pci-<pci-id> .

Perl-snippet from an awful spaghetti mess I use:
------snippet:----
local $pci_id = undef;
{
    local ($bus_number,$device_number,$function_number);
    open(PCIID,"$megacli  adpgetpciinfo a0 nolog|") or die "Could not
find pci-info";
    while($_=<PCIID>){
        if(m(^Bus Number[ :]*([0-9]+))){$bus_number = $1; };
        if(m(Device Number[ :]*([0-9]+))){ $device_number=$1;};
        if(m(Function Number[ :]*([0-9]+)) ) { $function_number=$1;};
    }
    die "Could not find pci-id" unless defined($bus_number) &&
defined($device_number) && defined($function_number);
    $pci_id =
sprintf("%02d:%02d.%d",$bus_number,$device_number,$function_number);
}
-----snippet ends.----

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support

megahelp (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trouble using smartctl with LSI megaraid controller

alohaaaron
In reply to this post by alohaaaron
Thanks for the info David. I'll run the tests separately on each drive then.  I thought it could be done on a partition.  I can execute most smartctl commands from the command line but would like it monitored by the smartd daemon instead of running a script. By the documentation at https://linux.die.net/man/5/smartd.conf it seems the commands below work in the smartd.conf after restart after removing the -H and substituting it with an -a? The output from /var/log/messages is below.  
/etc/smartd.conf

/dev/sda -d sat+megaraid,30 -a -s S/../.././01
/dev/sda -d sat+megaraid,31 -a -s S/../.././02
/var/log/messages
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_30] [SAT], not found in smartd database.
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_30] [SAT], not capable of SMART Health Status check
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_30] [SAT], is SMART capable. Adding to "monitor" list.
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_31] [SAT], opened
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_31] [SAT], WDC WD2500BHTZ-04JCPV0, S/N:WD-WX11E23LW514, WWN:5-0014ee-65929e323, FW:04.06A00, 250 GB
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_31] [SAT], not found in smartd database.
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_31] [SAT], not capable of SMART Health Status check
Feb 13 16:00:53 db011 smartd[32430]: Device: /dev/sda [megaraid_disk_31] [SAT], is SMART capable. Adding to "monitor" list.
Feb 13 16:00:53 db011 smartd[32430]: Monitoring 2 ATA and 0 SCSI devices
Feb 13 16:00:53 db011 smartd[32444]: smartd has fork()ed into background mode. New PID=32444.
================================================================================
Sorry, no.

> 2. There are two partitions in this system /dev/sda and /dev/sdb.  How 
> do I
> know which drive Id (DID) belongs to which partition and does it matter 
> if
> I assign a physical drive to the incorrect partition?

Get the "megacli" program and install it.  Then you can run things like 
this:

SNAME=`hostname -s`
NOW=`date`
OFILE=/root/$SNAME.megaraid.info

echo "Megaraid information for $SNAME collected $NOW" > $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "General Information" >> $OFILE
megacli -AdpAllInfo      -aAll | tr -d '\r' >> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Battery backup" >> $OFILE
megacli -AdpBbuCmd       -aAll | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Virtual disks" >> $OFILE
megacli -LDInfo    -Lall -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Physical drives" >> $OFILE
megacli -PDList          -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "Patrol read" >> $OFILE
megacli -AdpPR     -Info -aALL | tr -d '\r'>> $OFILE
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=" >> $OFILE
echo "DONE" >> $OFILE


> 3. Is there a way to check all the drives on a single partition at once
> instead of having to check them individually with the sat+megaraid,N
> syntax?  There are 20 physical drives on this system.

You mean with one "test all disks" smartctl command?  No, I don't think 
that is possible.
You can certainly start all of the tests sequentially with no delay, 
which is pretty
much the same thing.  Then wait however long is necessary for that disk 
type and read back all the results.  In theory.  For some reason my test 
script does them sequentially - I may have been worried about  what the 
RAID would do if all the disks were busy self testing at the same time.  
Doing them sequentially in a script (with or without delays) isn't a big 
deal, something along the lines of:

/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,0
sleep 10300
logger "Done smartctl -t long  /dev/sda -d sat+megaraid,0"
/usr/sbin/smartctl -t long /dev/sda -d sat+megaraid,1
sleep 10300
logger "Done smartctl -t long  /dev/sda -d sat+megaraid,1"
etc.

then read all the results.  Use a loop if you don't want to write them 
all out. It would also be fine to read the result for a disk immediately 
after each long test completes.

> 4.  Once I enable smartctl on the command line for a drive is it on
> permanently or does it need to be enabled through the smartd.conf file?

Once enabled it should stay on until the system reboots.  That command 
changes a state
which is stored on the drive.  The state will not survive a power cycle.

Regards,

David Mathog
mathog@...
Manager, Sequence Analysis Facility, Biology Division, Caltech

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Smartmontools-support mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
Loading...