How to Find Drive Error in RAID Behind LSI RAID Card

Question: How can I see if the drives behind the hardware raid card using LSI has any reported error or not?

Solution

First, to find out if your drive raid arrays are optimal or not, you may run the following command:

[root@bd4 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aAll


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :dr1
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 931.0 GB
Sector Size         : 512
Mirror Data         : 931.0 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No


Virtual Drive: 1 (Target Id: 1)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 465.25 GB
Sector Size         : 512
Mirror Data         : 465.25 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00

This shall result in a key called ‘State’, which would say ‘Optimal’ if the raid is healthy. Although, it is possible that your drives have reported a few errors which might indicate a potential drive failure, which hasn’t been picked up by the RAID state yet. These errors are available under the following command:

/opt/MegaRAID/MegaCli/MegaCli64 pdlist a0

The above command lists the drive details. There are 3 error/failure counts, which are important to notice are ‘Media Error Count’, ‘Other Error Count’, and ‘Predictive Failure Count’. If you are seeing the number is changing quickly a few sets of times, then you should look at the drive status closely, as it seems to be producing a hardware failure soon. I have seen several times in my life, that the raid state saying it is ‘Optimal’, but the Media error was reported, soon after, we found the drive was actually failing.

To find out error counts in one go, you may use the following:

[root@bd4 ~]# /opt/MegaRAID/MegaCli/MegaCli64 pdlist a0 | grep -i "Predictive Failure Count" -B 6
Enclosure position: 1
Device Id: 2
WWN: 5000c5002834a246
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
--
Enclosure position: 1
Device Id: 3
WWN: 5000c500461c9ec6
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
--
Enclosure position: N/A
Device Id: 0
WWN: 4154412020202020
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
--
Enclosure position: N/A
Device Id: 1
WWN: 4154412020202020
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0

Look at the count sections it has returned. Hope this helps.

How To Get Disk Serial Number in Megaraid

Question:

We can use smartctl to get the disk serial ID in case of disk replacement or crashes, with the following:

smartctl -a /dev/sdX

Where X is the device identifier like, for the first disk, this would be sda, second sdb etc. But in case the devices are behind the RAID, this command returns an error:

[root@tampa-lb ~]# smartctl -a /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1127.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sda failed: DELL or MegaRaid controller, please try adding '-d megaraid,N'

How to make this work?

Answer:

To get the serial numbers behind the LSI MegaRAID, you would first need to find out the device ID using LSI Megaraid tools. A quick way to install LSI Megaraid tool is available here:

How to: Install LSI Command Line Tool

One you have installed the LSI Megaraid command line tools, now you may use the following command to identify your device:

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | egrep 'Slot\ Number|Device\ Id|Inquiry\ Data|Raw|Firmware\ state' | sed 's/Slot/\nSlot/g'

This would output something like the following:

Slot Number: 1
Device Id: 11
Raw Size: 447.130 GB [0x37e436b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: 50026B72822A7D3A    KINGSTON SEDC500R480G                   SCEKJ2.3

In this server, it has one disk, but you may have multiple disk with different ‘Firemware state’ and ‘Device Id’. To use smartmontools, you need to pick the ‘Device Id’, mentioned here, which is 11. Now you can run the following command to get the device details using smartctl:

smartctl -d megaraid,N -a /dev/sdX

Here, N is the device ID, and X is the device name, you may get the device name using df -h command or fdisk -l. For our case, this command would be like the following:

smartctl -d megaraid,11 -a /dev/sda

This would print a lot of information about your device, but if you are looking to identify the Serial Number only, you may run the following:

~ smartctl -d megaraid,11 -a /dev/sda|grep Serial
Serial Number:    50026B72822A7D3A

One thing to note, we can also get Serial number from the MegaCli tools Inquiry data, you may have already noticed:

[root@tampa-lb ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | grep 'Inquiry Data'
Inquiry Data: 50026B72822A7D3A    KINGSTON SEDC500R480G                   SCEKJ2.3

Here, the first parameter in the return is the same as smartctl returns as Serial number, it’s because it’s the serial number that megacli gets/identifies as well.