Server Boots to Grub – OVH Servers – How to Fix

Error Details

After you have completed updating your yum, you saw the kernel got updated, and hence restarted the server to take the new kernel. But you find out that the server has never come online. Once you visit the KVM or Serial Console (SOL) of the system, you could see, your system is booted to ‘grub>’ console instead of booting from disk. How can you fix the system now?

Solution Intro

This specific issue can appear for any linux server, along with many reasons. Although, if you are running an server from OVH and had faced a similar issue, the boat I am going to show you can navigate to destination. Please note, in many other case of similar situation, you might end up fixing the grub with the same solution.

What and How the Problem Happened

OVH has an interesting strategy of booting. They follow everything through network PXE, even if it is not ‘netboot’, but just the local drives. For this to work out, you need PXE to take the latest grub details pushed once a kernel is updated. This is one reason why, OVH also supplies a custom kernel from a cusstom repo. Although, if you are using the stock kernel, you might come up with a situation, where the latest grub hasn’t been pushed to PXE and your system fails to boot from drives. It then puts you in the ‘grub’ of network.

How to Fix the Problem

Now, one thing is clear, after you completed a kernel update, your grub is broken due to the latest machine code is not available to the booting system. You can go and follow a regular grub repair method for Grub 2, to fix the situation. A couple of things to remember, as your system’s grub is failing to load, you have to use an independent rescue kernel to fix this, this could either be from a personal network repository or a rescue disk available from your datacenter’s location, like ovh has one. Another thing to remember, is that, if you are using CentOS 7 or Ubuntu with UEFI system, using mdadm or linux software raid, it is highly likely, your boot efi is placed in a non raid partition. Preferably in the first drive’s first partition. You can always verify this from your fstab file.

So the first job, is to boot your system into the rescue disk/cd/kernel. I assume you have done that with no difficulty. Once done, first mount your partitions. In OVH cases, it loads the mdadm automatically. In my case, it was /dev/md2.

mount /dev/md2 /mnt
# check what partition is used for /boot/efi
nano /mnt/etc/fstab
# in my case, it is /dev/nvme0n1p1 (It is a NVMe SSD, and the first partion is used for efi storage
mount /dev/nvme0n1p1 /mnt/boot/efi

Once we have mounted the partitions successfully, you may now chroot the system. Before chrooting, you want the dev, proc and sys to use the /mnt partitions respectively:

mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys

If these all goes well, now we can chroot the system:

chroot /mnt

Now you have successfully changed the root directory of the rescue kernel to the original drive’s root. All you need to do, is to remake the grub config, that will immediately generate the grub.cfg file and sync the machine code:

# we know grub.cfg is available in /boot/grub2/grub.cfg
grub2-mkconfig -o  /boot/grub2/grub.cfg
# once this is finished, we have to make sure, grub is also installed for both disks, for my case, these are /dev/nvme0n1 and /dev/nvme1n1
grub2-install /dev/nvme0n1
grub2-install /dev/nvme1n1

If you see the response is ‘No Error Reported’, then you are good go. You may now reboot your system back to hard disk, and can see your grub is able to load the latest kernel you installed from the original hard disk. Remember, for safety, you should umount all the partition, to avoid any data loss due to OS page cache:

# exit from chroot
exit
# unmount dev, proc, sys, /mnt/boot/efi, /mnt
umount /dev
umount /proc
umount /sys
umount /mnt/boot/efi
umount /mnt

Happy troubleshooting!

How To Get Disk Serial Number in Megaraid

Question:

We can use smartctl to get the disk serial ID in case of disk replacement or crashes, with the following:

smartctl -a /dev/sdX

Where X is the device identifier like, for the first disk, this would be sda, second sdb etc. But in case the devices are behind the RAID, this command returns an error:

[root@tampa-lb ~]# smartctl -a /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1127.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sda failed: DELL or MegaRaid controller, please try adding '-d megaraid,N'

How to make this work?

Answer:

To get the serial numbers behind the LSI MegaRAID, you would first need to find out the device ID using LSI Megaraid tools. A quick way to install LSI Megaraid tool is available here:

How to: Install LSI Command Line Tool

One you have installed the LSI Megaraid command line tools, now you may use the following command to identify your device:

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | egrep 'Slot\ Number|Device\ Id|Inquiry\ Data|Raw|Firmware\ state' | sed 's/Slot/\nSlot/g'

This would output something like the following:

Slot Number: 1
Device Id: 11
Raw Size: 447.130 GB [0x37e436b0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: 50026B72822A7D3A    KINGSTON SEDC500R480G                   SCEKJ2.3

In this server, it has one disk, but you may have multiple disk with different ‘Firemware state’ and ‘Device Id’. To use smartmontools, you need to pick the ‘Device Id’, mentioned here, which is 11. Now you can run the following command to get the device details using smartctl:

smartctl -d megaraid,N -a /dev/sdX

Here, N is the device ID, and X is the device name, you may get the device name using df -h command or fdisk -l. For our case, this command would be like the following:

smartctl -d megaraid,11 -a /dev/sda

This would print a lot of information about your device, but if you are looking to identify the Serial Number only, you may run the following:

~ smartctl -d megaraid,11 -a /dev/sda|grep Serial
Serial Number:    50026B72822A7D3A

One thing to note, we can also get Serial number from the MegaCli tools Inquiry data, you may have already noticed:

[root@tampa-lb ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | grep 'Inquiry Data'
Inquiry Data: 50026B72822A7D3A    KINGSTON SEDC500R480G                   SCEKJ2.3

Here, the first parameter in the return is the same as smartctl returns as Serial number, it’s because it’s the serial number that megacli gets/identifies as well.

How to Recover Innodb Table when ib_logfile / ibdata is/are crashed/deleted/lost without backup

If you are here, that means, you probably have panicked the same way, I did around 12 years back. I lost my ib_logfile0/ib_logfile1/ibdata1 all at once for a server that excessively utilizes Innodb tables. I had to recover vital data from the same situation today on a random request who does not have backups, and thought it is better to keep this as a document for future.

One key purpose of utilizing Innodb tables instead of MyISAM is that, the benefit on writes. It always outperform MyISAM in writes due to the use of extra efficient buffers. But, this also causes Innodb to vulnerable from crashing. As Innodb stores some sensitive data to 3 specific files, loosing them, also looses some serious mapping instruments for the database engines to recognizes Innodb table structure and data.

Who can follow this technique?

If you have lost any of ib_logfile0, ib_logfile1, ibdata1 or all of them, but still manages to keep the database folder intact with the .frm and .ibd files (which you would, if you have accidentally deleted the log file or the data only) and also have the following option NOT DISABLED in your mysql configuration ‘innodb_file_per_table’. This option is enabled by default, until you are explicitly disabling this to increase performance. A suggestion: only do this, if you keeping real time backups of your databases. Otherwise, it is better to have this enabled

What is ‘innodb_file_per_table’?

Primarily the tablespace stores and uses data from system tablespace for Innodb. But, as this creates a single point of failure from ibdata and log files, Innodb by defaults also stores the tablespace in table’s own data file, which is .ibd file. That means, if I lose the ibdata/logfile mappings, I can still use the .ibd file to restore my tablespace and do the schema to data mapping only if I allowed innodb to store these information to the database’s own .ibd file. You may read more about the parameters from MySQL documentation:

File-Per-Table Tablespaces at dev.mysql.com

How to Recover an Innodb Table from database files only?

There are two steps to this process. One is to identify and recognize the database schema from the frm file and then basically find a way to import the tablespace from .ibd file and introduce it to innodb engine system tablespace.

First Step First: How to get the schema from .frm files?

First, you must install mysql-utilities tools to get access mysqlfrm tool, you may get the instructions to install this here:

Once this is done, now you have two options to read mysqlfrm files. My favorite way is to use the ‘diagnostic’ attribute. To achieve this, run the following:

mysqlfrm --diagnostic /var/lib/mysql/your_database/assets.frm

I assumed, your database name is ‘your_database’ and the table you are trying to recover is ‘assets’. The above command will return you the schema of ‘CREATE TABLE’ you need to use. First, create a new database, and run this on the SQL console to generate the table first on the new database.

Second Step: Get your data and mapping back from .ibd to system tablespace

Once the database has the table, it will also create a .frm and .ibd file for you. What we need to do, is to first, make it forget the existing .ibd file it created, sync the .ibd file from our collapsed database, make the mysql innodb engine to recognize tablespace from the backup tablespace of this .ibd file and store & use it from system tablespace. These lines are complex, and might sound a bit difficult. No worry, let’s do it.

Run the following command first to let it forget the .ibd it has created now:

alter table assets discard tablespace;

Remember the following, our table name is ‘assets’. If you have a different table name, make sure to replace this accordingly. What this has done, is removed the assets.ibd file it created in /var/lib/mysql/new_database/ folder as we asked him to forget the existing .ibd file. Now we first need to copy the backup/old .ibd file to this location with the correct permission. I would use rsync to make sure permissions remains intact here:

rsync -vrplogDtH /var/lib/mysql/your_database/assets.ibd /var/lib/mysql/new_database/

Once this is done, we know, .ibd contains a backup of our original tablespace. We only need to make mysql & innodb recognize this. To achieve this, you may do the following from the Sql console:

alter table assets import tablespace;

If it throws a warning on not being able to file the .cfg file, you may forget it, because it is not essential to have a .cfg to recognize permissions/configurations.

If everything runs well, you should see your rows are back. It’s because innodb has now fetched your tablespace data from .ibd file to system tablespace and it can now recognizes the mapping to your data, viola! All you now need is to repeat the process for all of your innodb tables, and recover the whole database.

How to Install Mysqlfrm / Mysql-utilities in CentOS 7

Mysql provides a set of utility tools that can be used to recover your data from Mysql data files. One of them is ‘Mysqlfrm’. This tool is not given in primary MySQL bundles, instead it comes with Mysql-utilities.

This package can be installed from ‘mysql-tools-community’ repo, those are available from MySQL Yum Repos

Command would be:

yum install mysql-utilities

This would also install another python package called ‘mysql-connector-python’ for you form the ‘mysql-connectors-community’ repo automatically. There is one catch. Sometimes, due to python version dependencies, you may fail to connect to mysql through the automatically detected mysql-connector-python that is automatically installed by mysql-utilities. You may know that if you are seeing the following error when you type mysqlfrom in the command line:

# mysqlfrm
Traceback (most recent call last):
  File "/usr/bin/mysqlfrm", line 27, in <module>
    from mysql.utilities.common.tools import (check_python_version,
ImportError: No module named utilities.common.tools

For these cases, you may install an older version of mysql connector for python, using the following before installing mysql-utilities:

yum install mysql-connector-python.noarch

This would install an older version of mysql connector that works better with Python 2.7 or similar.

Once the above is done, you may now install mysql-utilities using the following back again:

yum install mysql-utilities

As you have already installed the connector, this won’t try to reinstall the mysql connector from dependencies and use the other one that you got installed.

Now you may use the mysqlfrm tool to read your frm files and recover the table structures if required. Here is a great article from 2014 and still valid on mysqlfrm use cases:

How to recover table structure from .frm files with MySQL Utilities