Question: How can I see if the drives behind the hardware raid card using LSI has any reported error or not?
Solution
First, to find out if your drive raid arrays are optimal or not, you may run the following command:
[root@bd4 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aAll Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name :dr1 RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 931.0 GB Sector Size : 512 Mirror Data : 931.0 GB State : Optimal Strip Size : 64 KB Number Of Drives : 2 Span Depth : 1 Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None Default Power Savings Policy: Controller Defined Current Power Savings Policy: None Can spin up in 1 minute: No LD has drives that support T10 power conditions: No LD's IO profile supports MAX power savings with cached writes: No Bad Blocks Exist: No Is VD Cached: No Virtual Drive: 1 (Target Id: 1) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 465.25 GB Sector Size : 512 Mirror Data : 465.25 GB State : Optimal Strip Size : 64 KB Number Of Drives : 2 Span Depth : 1 Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None Default Power Savings Policy: Controller Defined Current Power Savings Policy: None Can spin up in 1 minute: Yes LD has drives that support T10 power conditions: No LD's IO profile supports MAX power savings with cached writes: No Bad Blocks Exist: No Is VD Cached: No Exit Code: 0x00
This shall result in a key called ‘State’, which would say ‘Optimal’ if the raid is healthy. Although, it is possible that your drives have reported a few errors which might indicate a potential drive failure, which hasn’t been picked up by the RAID state yet. These errors are available under the following command:
/opt/MegaRAID/MegaCli/MegaCli64 pdlist a0
The above command lists the drive details. There are 3 error/failure counts, which are important to notice are ‘Media Error Count’, ‘Other Error Count’, and ‘Predictive Failure Count’. If you are seeing the number is changing quickly a few sets of times, then you should look at the drive status closely, as it seems to be producing a hardware failure soon. I have seen several times in my life, that the raid state saying it is ‘Optimal’, but the Media error was reported, soon after, we found the drive was actually failing.
To find out error counts in one go, you may use the following:
[root@bd4 ~]# /opt/MegaRAID/MegaCli/MegaCli64 pdlist a0 | grep -i "Predictive Failure Count" -B 6 Enclosure position: 1 Device Id: 2 WWN: 5000c5002834a246 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 -- Enclosure position: 1 Device Id: 3 WWN: 5000c500461c9ec6 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 -- Enclosure position: N/A Device Id: 0 WWN: 4154412020202020 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 -- Enclosure position: N/A Device Id: 1 WWN: 4154412020202020 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0
Look at the count sections it has returned. Hope this helps.