How To Compare Metrics within Identical Time of Today with Yesterday (or More) in Metabase / SQL / Metabase Trend Query

Scenario Explanation

Suppose, you have a live Ecommerce system. What you want to see, is how well your revenue is performing comparing to yesterday. But if you are comparing with the total revenue of yesterday, then this won’t be an efficient metric until you reach the end of the day. In real scenario, what you would like, is to compare it with the identical time it is happening right now. For example, if it is 7.19 PM now, then you want to compare either of the following two and show the trend:

i) Revenues within 00:00 – 7:00 PM yesterday to revenues within 00:00 – 7:00 PM today or
ii) Revenues within 00:00 – 8:00 PM yesterday to revenues within 00:00 – 8:00 PM today (remember, it is 7.19 PM, so the counter and comparison will continue to flow when you get new order on this case)

Both of the way of seeing trend is useful, depends on your choice, but if you are comparing the metrics with the full of yesterday and the half way through to today, this might not be as useful.

How To Do This In Meatabase?

Please remember, I would be doing this in Metabase. Metabase is an open source data analytical and visualization tool written in Java. It is simple, fast and efficient tool for all square use cases in data analytics. I am using a ‘Reformatted’ real time MySQL database as the data source, hence the query can be used in plain MySQL databases to see results.

Note: We can’t do this with the simple query of Metabase, hence, we have to use the Native query support of Metabase. If your Administrator has limited your access to Metabase to Simple & Custom query, then you can not do this until your Administrator allows you so.

Building The Query

Let’s assume, our orders lie in ‘orders’ table, order creation dates are given in ‘date_created’ column, along with order status and the total value of the order is stored separately under status and total column.

First, we write the sum of the total column with the mysql aggregation function ‘sum’ and set a variable ‘date_created’ based the on the date_created column of orders table as following:

SELECT date(`orders`.`date_created`) AS `date_created`, sum(`orders`.`total`) AS `sum`
FROM `orders`

Now, if you want to use the ‘Trend’ in Metabase, then the query must have ‘Group By’ a timeseries based on which it shows the ‘Trend’ metrics. And we know, that our group by timeseries going to be ‘date_created’. We also know that our conditions of selecting the rows, going to be under the ‘WHERE’ statement, so we will keep it as blank for now. So the query, goes as following now:

SELECT date(`orders`.`date_created`) AS `date_created`, sum(`orders`.`total`) AS `sum`
FROM `orders`
WHERE (
# Our conditions will follow here    
    )
    
GROUP BY date(`orders`.`date_created`)
ORDER BY date(`orders`.`date_created`) ASC

We are done with the basic. Now, let’s fill up the WHERE statement. First, you may want to filter out certain ‘status’ of the orders in case you have many. You first, fill that up as following:

SELECT date(`orders`.`date_created`) AS `date_created`, sum(`orders`.`total`) AS `sum`
FROM `orders`
WHERE (

        (`orders`.`status` = 'completed'
        OR `orders`.`status` = 'confirmed' OR `orders`.`status` = 'processing')
        )

    )
    
GROUP BY date(`orders`.`date_created`)
ORDER BY date(`orders`.`date_created`) ASC

In my case, we are filtering if the order status is either completed, confirmed or processing. We are discarding other statuses like on-hold, or pending or abandoned cart.

Now, let’s think for a moment. We have to compare two time series, that means, we have to first, create an identical time differences. Let me give an example for better understanding. If we are trying to find out, the revenues between 00:00 – 20:00 yesterday and 00:00 – 20:00 today, then we need to have two conditions in WHERE statement for each of them, and they should select all the rows from both conditions, which necessarily means, it would be a ‘OR’ statement. For both cases, we need to select the order statuses, that would necessarily mean something like the following:

SELECT date(`orders`.`date_created`) AS `date_created`, sum(`orders`.`total`) AS `sum`
FROM `orders`
WHERE (

        (`orders`.`status` = 'completed'
        OR `orders`.`status` = 'confirmed' OR `orders`.`status` = 'processing')
        )
Or
        (`orders`.`status` = 'completed'
        OR `orders`.`status` = 'confirmed' OR `orders`.`status` = 'processing')
        )

    )
    
GROUP BY date(`orders`.`date_created`)
ORDER BY date(`orders`.`date_created`) ASC

Does the above make sense? If not, soon enough it would be, let’s move on to the next step. Now, how can you align the time to match the hour we have now in SQL query? MySQL curdate() returns the current date, which is today at 00:00 So, if I know the difference of hours left for today, and subtract that from curdate(), that would give us the hour it is now with the date of yesterday. Here is the example:

Say: curdate() = 10-6-2020 00:00
now() = 10-6-2020 20:45


This is 20th hour, so if we subtract 20 from 24, it would be 4. Now, we subtract 4 from curdate():

10-6-2020 00:00 - 4 Hour
09-6-2020 20:00

which basically is the same hour we are on, of today 10-6-2020 20:00 hour. Simple, isn’t it?

Now, how to write this in SQL query? Let’s see

First, we get 09-6-2020 00:00 Hour, this can be done as following in MySQL with negative Interval of 1 Day from current date.

date_add(curdate(), INTERVAL -1 DAY)

Next, we get 09-6-2020 20:00 Hour using the following:

date_add(curdate(), INTERVAL -(24 - hour(now())) HOUR)

See, how I am calculating the Interval Hour, I am using the MySQL function 'hour()' on another function 'now()' to get the hour we are on, which is giving me 20, then we are subtracting from 24 as 24 hours a day, the result is giving me 4 here. So the output going to be 20:00 hour. 

Now, for today, we first need 10-6-2020 00:00 Hour, which is simply:

curdate()

then, the target time is basically the now() function:

now()

We now have all 4 parts of it, here is the completed statement going to be only for the time selections:

For Yesterday:

str_to_date(date_format(`orders`.`date_created`, '%Y-%m-%d %H'), '%Y-%m-%d %H') 
 BETWEEN str_to_date(date_format(date_add(curdate(), INTERVAL -1 DAY), '%Y-%m-%d %H'), '%Y-%m-%d %H') AND str_to_date(date_format(date_add(curdate(), INTERVAL -(24 - hour(now())) HOUR), '%Y-%m-%d %H'), '%Y-%m-%d %H')

For Today:

str_to_date(date_format(`orders`.`date_created`, '%Y-%m-%d %H'), '%Y-%m-%d %H') 
BETWEEN str_to_date(date_format(curdate(), '%Y-%m-%d %H'), '%Y-%m-%d %H') AND str_to_date(date_format(now(), '%Y-%m-%d %H'), '%Y-%m-%d %H')

By adding the above to our basic and simple status selection statements:

SELECT date(`orders`.`date_created`) AS `date_created`, sum(`orders`.`total`) AS `sum`
FROM `orders`
WHERE (

        (`orders`.`status` = 'completed'
        OR `orders`.`status` = 'confirmed' OR `orders`.`status` = 'processing' OR `orders`.`status` = 'on-hold')
        AND str_to_date(date_format(`orders`.`date_created`, '%Y-%m-%d %H'), '%Y-%m-%d %H') 
        BETWEEN str_to_date(date_format(date_add(curdate(), INTERVAL -1 DAY), '%Y-%m-%d %H'), '%Y-%m-%d %H') AND str_to_date(date_format(date_add(curdate(), INTERVAL -(24 - hour(now())) HOUR), '%Y-%m-%d %H'), '%Y-%m-%d %H')
    
    OR
    
        (`orders`.`status` = 'completed'
        OR `orders`.`status` = 'confirmed' OR `orders`.`status` = 'processing' OR `orders`.`status` = 'on-hold')
        AND str_to_date(date_format(`orders`.`date_created`, '%Y-%m-%d %H'), '%Y-%m-%d %H') 
        BETWEEN str_to_date(date_format(curdate(), '%Y-%m-%d %H'), '%Y-%m-%d %H') AND str_to_date(date_format(now(), '%Y-%m-%d %H'), '%Y-%m-%d %H')
    
    )
    
GROUP BY date(`orders`.`date_created`)
ORDER BY date(`orders`.`date_created`) ASC

Simple, huh? Indeed, it is. If you run this query in phpmyadmin, or your mysql console, you will see the revenue amounts from the yesterday till the hour we are running through at this moment of today. But in Metabase, you can get a beautiful ‘Trend’ number metric with it. Just go To Visualization >> Trends to show you the Trend Number Metric for this as following:

It is cool to see how good or bad you are doing by this hour than yesterday.

Improving MySQL database performance!

Most of the shared hosting users use different content management system for their websites. There are different types of content management systems. These content management systems use mysql database most of the time. A CMS can have degraded performance when the user’s database gets fragmented. Mysql Database fragmentation appears when you make lots of changes on your database cause lots of writes and updates. Periodically, these changes cause memory fragmentation on the database, mostly MyISAM storage engine performs tremendous slow on a fragmented condition! This tutorial should have the advises for shared hosting users, how often they should defragment their databases and how!

Continue reading “Improving MySQL database performance!”