How to Find Drive Error in RAID Behind LSI RAID Card

Question: How can I see if the drives behind the hardware raid card using LSI has any reported error or not?

Solution

First, to find out if your drive raid arrays are optimal or not, you may run the following command:

[root@bd4 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aAll


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :dr1
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 931.0 GB
Sector Size         : 512
Mirror Data         : 931.0 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No


Virtual Drive: 1 (Target Id: 1)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 465.25 GB
Sector Size         : 512
Mirror Data         : 465.25 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00

This shall result in a key called ‘State’, which would say ‘Optimal’ if the raid is healthy. Although, it is possible that your drives have reported a few errors which might indicate a potential drive failure, which hasn’t been picked up by the RAID state yet. These errors are available under the following command:

/opt/MegaRAID/MegaCli/MegaCli64 pdlist a0

The above command lists the drive details. There are 3 error/failure counts, which are important to notice are ‘Media Error Count’, ‘Other Error Count’, and ‘Predictive Failure Count’. If you are seeing the number is changing quickly a few sets of times, then you should look at the drive status closely, as it seems to be producing a hardware failure soon. I have seen several times in my life, that the raid state saying it is ‘Optimal’, but the Media error was reported, soon after, we found the drive was actually failing.

To find out error counts in one go, you may use the following:

[root@bd4 ~]# /opt/MegaRAID/MegaCli/MegaCli64 pdlist a0 | grep -i "Predictive Failure Count" -B 6
Enclosure position: 1
Device Id: 2
WWN: 5000c5002834a246
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
--
Enclosure position: 1
Device Id: 3
WWN: 5000c500461c9ec6
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
--
Enclosure position: N/A
Device Id: 0
WWN: 4154412020202020
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
--
Enclosure position: N/A
Device Id: 1
WWN: 4154412020202020
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0

Look at the count sections it has returned. Hope this helps.