How to: Use WINMTR to Diagnose Network Issues

MTR is a great tool to understand if there is a routing issue. There are many times, customer says the website/web server is slow or not being able to access the network etc. After some basic checks, if no solution is concluded, it is important to get a MTR report from the client. As most of the users use Windows, it is common to use WinMTR.

To run WINMTR, you need to first download it from here:

https://sourceforge.net/projects/winmtr/

or here

https://winmtr.en.uptodown.com/windows

Once the app is downloaded, double clicking it will open it. WinMTR is a portable executable binary. It doesn’t require installation.

Once opened, you can enter the ‘domain name’ that is having trouble in the ‘Host’ section and press start.

Start winMTR by entering your domain in the Host section

Once you start, it will start reaching the domain you entered and hit each of the node it passes for routing, with giving the amount of drops each node is hitting

WintMTR running – (I have hidden two hops for privacy)

If you are seeing drops of anything above 2-5%, that node is problematic. If the node is dropping a lot, but the next node isn’t dropping enough, then the node is set to transparently hiding the packet responses for security, then that node is not problematic. So if your packet isn’t reaching the destination and it is dropping somehwere or looping in a node, that means, the problem is within that node. Now you can locate the node and see where does it belong. If it belongs to within your territory, then the issue is within your ISP or IIG. But if it is outside your territory but at the end of the tail, then the issue is with the Host.

In most case, we ask for running the MTR for 5 minutes and then export to TEXT and send it over for us to analyse to customers. You can export the report by stopping the MTR and clicking ‘Export TEXT’ available in the winMTR window.

How to Make Cloudflare Work with HAProxy for TLS Termination

Remember:
This is a part of dirty hack series. This is not the only way you can achieve what we want to achieve. But this is only used when you can trust the connections between your HAProxy and the Origin servers. Otherwise, you should not use this technique.

One common problem with using HAProxy and Cloudflare is that, the SSL that Cloudflare gives us, it gets terminated at HAProxy on L7 load balancer. For such cases, Cloudflare can not verify the Origin server and drops the connection. For such cases, your HAProxy will not work. What would you do for such cases? There are two ways to do this.

First one is, Cloudflare gives you a origin certificate, that you can install at HAProxy. I won’t dig into deep into this in this blog post.

But if you can trust your connections between HAProxy and backend Origin servers, as well as the connections between Cloudflare and HAproxy, you can choose the second one. For this case, Cloudflare allows you to Encrypt only the connections between the Visitors and Cloudflare. It won’t matter what you are doing behind the Cloudflare. This option is called ‘Flexible’ option, that you can select from your Cloudflare >> SSL/TLS tab.

Fix TLS Termination by HAProxy with Flexible Encryption Mode of Cloudflare

Once you set this to Flexible, this should start working ASAP. Remember, this is not essentially the best way to do this, but the quickest way only if load balancing is more important to you instead the data integrity.

How to install ifconfig in CentOS 7

CentOS 7 doesn’t come with ifconfig tools. It encourages users to use ‘ip’ tool for network administration. Although, it is still possible to use ifconfig with CentOS 7. ifconfig is a part of net-tools package. All you have to do is to install the net-tools package using yum.

How to install ifconfig in CentOS 7

Run the following command to install net-tools package in CentOS 7, this will install ifconfig as well:

# yum install net-tools -y
# ifconfig

Troubleshooting: Imunify360 database is corrupt. Application cannot run with corrupt database

Error Message:

# service imunify360 start
Starting imunify360: WARNING [+ 3743ms] defence360agent.utils.check_db|DatabaseError detected: database disk image is malformed
WARNING [+ 3766ms] defence360agent.cli.subparsers.common.server|Imunify360 database is corrupt. Application cannot run with corrupt database. Please, contact Imunify360 support team at https://cloudlinux.zendesk.com

Detail Information & Explanation:

If you are using imunify360, an application firewall for linux servers by Cloudlinux team, you might incur an error where it says the database is corrupt. You might first see ‘Imunify360 is not started’ error from the WHM panel and end up getting the above error message as stated. Imunify360 uses a SQL database, located under ‘/var/imunify360/imunify360.db’. This image is checked everytime Imunfi360 tries to start, and if the database is malformed, it would not start. Fortunately, imunify360 comes with tools to handle this database and recover if corrupted.

How to Fix:

First, we start by running database integrity check. This can be done using the following:

imunfiy360-agent checkdb

(From Imunify360 Doc: checkdb  – Check database integrity)

Once done, you can now use ‘migratedb’ to repair and restore if the database is corrupted.

imunify360-agent migratedb

(From Imunify360 Doc: migratedb – Check and repair database if it is corrupted.)

If migratedb fails, the only way to recover this is to reinstall imunify360.

Linux How To: Install IPTABLES in CentOS 7 / RHEL 7 Replacing FirewallD

CentOS 7 / RHEL 7 doesn’t come with iptables by default. It uses a full functional firewall system called ‘firewalld’. I have been a big fan of iptables and it’s capability from the very first, and since I have switched to CentOS 7, I couldn’t stop using it. I had to stop firewalld and install iptables in all of my CentOS 7 installation and start using iptables rules as I was using before. Here is a small How To guide on installing Iptables and disabling firewalld from a CentOS 7 or RHEL 7 or a similar variant distro.

How to Install IPTABLES in CentOS 7

To begin using iptables, you need to download and install iptables-service package from the repo. It isn’t installed automatically on CentOS 7. To do that, run the following command:

# yum install iptables-services -y

How to stop the firewalld service and start the Iptables service

Once the iptables-serivces package is installed, you can now stop the firewalld and start the iptables. Keeping both kind of network filtering too can create conflicts and it is recommended to use any out of two. To do that run the following:

# systemctl stop firewalld
# systemctl start iptables

Now to disable firewalld from the starting after the boot, you need to disable the firewalld:

# systemctl disable firewalld

To disallow starting firewalld manually as well, you can mask it:

# systemctl mask firewalld

Now you can enable iptables to start at the boot time by enabling iptables using systemctl command:

# systemctl enable iptables

How to check status of iptables in centOS 7

In previous distros, iptables status could be fetched using service command, although, the option is no longer available in CentOS 7. To fetch the iptables status, use the following:

# iptables -S

Iptables save command can still be used using service tool:

# service iptables save

This would save your iptables rules to /etc/sysconfig/iptables as it used to do in previous distros.

Linux: Assertion failed on job for iptables.service.

If you are using Centos 7 or RHEL 7 or any of it’s variant, you are probably using ‘Firewalld’ by default. Although, if you are a iptables fan like me, who likes it’s simplicity and manipulative nature instead of a full form firewall, then you probably have disabled firewalld from your CentOS 7 instance and using iptables. There are couple of servers, where I use runtime iptables rules for postrouting and masquerading. These rules are dynamically generated by my scripts instead of the sysconfig file under:

/etc/sysconfig/iptables

This file is generated upon running the iptables save command:

service iptables save

which I rarely do so.

Error Details

Which is why, I don’t have a /etc/sysconfig/iptables file in those servers and a common error I see while restarting iptables in those system is the following:

# systemctl restart iptables.service
Assertion failed on job for iptables.service.

How to Fix The Error

The error appears because you don’t have any rule in /etc/sysconfig/iptables or the file doesn’t exist either. You can ignore the error as iptables would still run. To eradicate the error, simply make sure you have some iptables rules loaded on your system using the status command:

iptables -S

And then, run:

service iptables save

Once done, restarting iptables shouldn’t show the error any longer.

Linux: Disable On-Access Scanning on Sophos AV

We use Sophos AV instead of ClamAV in couple of our linux servers. Sophos comes with on access scanning that uses a kernel module to trigger which file has been accessed unlike ClamAV which only come with signature and a basic scanning tool by default. It has it’s own benefit while drawbacks too. You have to give a certain amount of resources for Sophos. There are times, when you may require to disable the On Access Scanning on Sophos AV to diagnose different issues with the server.

To disable on access scanning on sophos AV, run the following from your terminal/ssh console:

/opt/sophos-av/bin/savdctl disable

To re-enable on access scanning on sophos AV, run the following:

/opt/sophos-av/bin/savdctl enable

Sophos log file is located here:

/opt/sophos-av/log

Sophos comes with multiple control binaries. They can be found at the following directory:

/opt/sophos-av/bin

You can find sophos binaries available at the man page too:

man savdctl

How to: Setup a server for R1Soft CDP backup?

We at Mellowhost has been utilizing R1Soft CDP backup for last 8 years. R1Soft has been a great backup tool even though the tool is immensely resource hoggy. At different times we had gone through different situations to handle our backup servers efficiently. After all the hiccups with backup nodes, we ended up efficiently configuring 3 backup servers of 3 different configuration

  1. backup1 = It contains 12TB file system on a RAID 0 array. It copies data to a BTRFS compressed drive once a week to keep the data safe if RAID 0 dies. This server uses RAID 0 for faster drive verification and block scanning by r1soft. This server hosts servers that requires frequent backing up and can sustain a loss of a week data (Less important data). As the server performs really fast due to being RAID 0, we can run multiple r1soft threads at a time including disk safe verification and block scans.
  2. backup2 = It contains 30TB file system in RAID 6 hardware array. This is used for hosting our VPS backups. This server is a seriously large one to keep backups of our enterprise VPS clients.
  3. backup3 = It contains 16TB file system in RAID 10 hardware array. This server is hosted in a East Coast American Location. It is our off network backup server and keeps backups for East Coast servers too.

One of the key factor in designing a backup server is the size and the location. Need to keep in mind that CDP 3 takes more space than CDP 2 for unknown reason while still being a differential backup solution, not just an incremental. Location of the server matters due to the network speed. If you are hosting your server a lot far than the server network, it may take longer time to complete the initial storage. Due to the latency it may fails to perform as fast like 1Gbps even if both network supports it. Just for an example, if you are backing up your data at 1MBps speed, it would take 12.13 days to complete backup of 1TB data [ Calculation: (((1024 x 1024) / 60) / 60) / 24 = 12.13 days ]. A 100Mbps port can give you speed upto 10MBps, while you can have 50MBps+ speed if you are using a 1Gbps network roughly. So why does the speed matter? If you are backing up your initial data in 13 days, that doesn’t mean it will be the same all the time. Your second backup would take much less amount of time as it only needs to upload the differential backups. That is true! But the problem will come when you require to do a bare metal restore. If your server requires a disaster recovery, you would then need 13 days to restore your server to the original state. Your customers won’t sit down for 13 days! While creating backup, it is important to think about disaster recovery too. How fast are you going to be able to restore the backup is an important concern while designing your disaster recovery solution.

I always recommend users to choose a 1Gbps network with a latency below 2ms if you want to have a good disaster recovery solution. This can guarantee a faster bare metal restore when needed.

The second key factor while creating the R1soft backup server would be to choose the RAID. If you are thinking to create r1soft backup on a non-raided solution, I think you should drop off your idea. RAID isn’t necessarily always use to keep your data safe, it can also be used for performance. Keeping a RAID 0 or striping in general is must for a R1Soft server. Otherwise, every couple of times, you are going to see a lot of stalled processes doing ‘disk safe verification’ ‘block scan’ etc etc and not able to keep the backup up to date or canceling processes due to duplicate backup process (Old one taking too long to complete). It is better not to choose RAID 5. I particularly didn’t try RAID 5, but I have used RAID – Z on ZFS file system, which was seriously slow for my work around. I switched the server later on to RAID 0 and BTRFS compression to keep a weekly backup which tremendously improved the R1Soft performance. We at later time, worked to create more backup servers with hardware RAID WB cache and battery backed unit to give us more performance benefit while creating and restoring backups. These servers have been performing tremendously well with R1Soft. They can also be called good disaster recovery node.

Last, I recommend you to understand that backup isn’t just keeping a copy of your data of your online existence. It is important to design a disaster recovery solution instead of just creating backups. If you are simply into creating backups, you probably don’t need R1Soft or any high end servers instead simple Rsync would work fine. But to create ‘Disaster Recovery’ solution, you need high level planning, good hardwares and good cost estimation. If you are leaving behind in any, you will probably fail to create a good disaster recovery solution that actually ‘works’.

Identifying File / Inode by Sector / Block Number in Linux

I had an interesting problem earlier today. While running r1soft backup, dmesg was throwing some I/O like the following:

Dec 28 09:28:43 ssd1 kernel: [36701.752626] end_request: I/O error, dev vda, sector 331781632
Dec 28 09:28:43 ssd1 kernel: [36701.755400] end_request: I/O error, dev vda, sector 331781760
Dec 28 09:28:43 ssd1 kernel: [36701.758286] end_request: I/O error, dev vda, sector 331781888
Dec 28 09:28:43 ssd1 kernel: [36701.760992] end_request: I/O error, dev vda, sector 331780864

They didn’t go out after multiple file system checks. That left me no choice other than finding what’s actually in that sector. I could see the sector numbers was increasing by 128 up 10 sequential logs. That makes to understand it could be a specific account causing the errors.

EXT file system comes with an interesting tool called debugfs. This can be used on mounted file system and can be used to track down IO related issues. Although, you require to do some calculation first to convert sector to block number of a specific partition before you can use debugfs.

The lowest sector number in the log was ‘331780864’. First I tracked down the partition where this sector lies. This can be done using fdisk -lu /dev/disk (Make sure to use -u, to ensure fdisk returns the sector numbers instead of cylinder number)

#fdisk -lu /dev/vda

Disk /dev/vda: 1342.2 GB, 1342177280000 bytes
16 heads, 63 sectors/track, 2600634 cylinders, total 2621440000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0002f013

Device Boot Start End Blocks Id System
/dev/vda1 * 2048 411647 204800 83 Linux
/dev/vda2 411648 205211647 102400000 83 Linux
/dev/vda3 205211648 209307647 2048000 82 Linux swap / Solaris
/dev/vda4 209307648 2621439999 1206066176 83 Linux

Now, find the Start Number < Our Sector number to detect which block contains our desired sector. In our case, it is /dev/vda4. Once done, we need to numeric sector number specifically for this partition, which can be done by subtracting our sector number with start number of the partition. In our case:

331780864 – 209307648 = 122473216

That means, our sector lies in 122473216th sector of /dev/vda4.

Now find the block size by tune2fs:

# tune2fs -l /dev/vda4 | grep Block
Block count: 301516544
Block size: 4096
Blocks per group: 32768

In our case, it is 4096.

Now determine the size of the sectors by bytes. This is shown in fdisk output:

Sector size (logical/physical): 512 bytes / 512 bytes

From the two relations block/bytes and sector/bytes, find sector/block : 512 / 4096 = .125

Now, calculate the block number of 122473216th sector: 122473216 x .125 = 15309152

We can now use debugfs to determine what file we have on that block number as following:

debugfs /dev/vda4

On the debug prompt, type:

debugfs: icheck 15309152

Block   Inode number
15309152   2611435

This will show the inode number of the desired file. Use the inode number to run:

debugfs: ncheck 15309152

Inode   Pathname
15309152 /lost+found/#29938847

This will show you the desired file that is actually causing the issue. In my case, I could find files that were corrupted in some old fsck, were stored in lost+found and they were missing magic number/incomplete files. Once I had deleted all the files from lost+found, my issue was resolved. Viola!

Check File System for Errors with Status/Progress Bar

File system check can be tedious sometimes. User may want to check the progress of the fsck, which is not enabled by default. To do that, add -C (capital C) with the fsck command.

fsck -C /dev/sda1

The original argument is:

fsck -C0 /dev/sda1

Although, it would work without number if you put the -C in front of other arguments, like -f (forcing the file system check) -y (yes to auto repair). A usable fsck command could be the following:

fsck -fy -C0 /dev/sda1

or

fsck -C -fy /dev/sda1

Please note, -c (small C) would result a read only test. This test will try to read all the blocks in the disk and see if it is able to read them or not. It is done through a program called ‘badblock’. If you are running badblock test on a large system, be ready to spend a large amount of time for that.