Can You Use Elasticsearch as a Primary Database?

Well, this is an interesting topic. Earlier today, I answered the same question in a Elasticsearch Community Group in Facebook, thought to keep this documented as well.

Primarily, if you are aware of how Elasticsearch is storing data (The document like), you might think, it is a full fledged NoSQL database, but you need to know, it is not. If you have looked at several NoSQL databases, you might already be thinking, there is probably no standard definition of NoSQL databases, which is partially true, again, there still is definition of database management systems, where it doesn’t fit. Does that mean, we can’t use ES as the primary data store? No, that’s doesn’t mean that. You certainly can, but there are use cases. Let’s look at couple of points before we conclude.

Database Transactions

ES has no transaction. Even the algorithm that ES implements, Lucene Search, which has transaction in the original design, but ES doesn’t have any transaction. When there is no transaction, the first thing you need to remember, it has no rollback facility. That also means, every operation is atomic by design, and there is no way to cancel, abort or revert them. This also means, you have no locking standard in the system. If you are doing parallel writes, you need to be very careful about the writes here because ES is not giving you any guarantee to parallel writes.

ES Caching Strategy

Elasticsearch is a ‘near real’ time search engine. It is not a ‘real’ time search engine. It has a caching standard, that refreshes every 1 second. This is ‘by design’ of Elasticsearch. So, what’s the problem with it? The problem is, you may get tricked by the ES cache if you are immediately asking for a read for a quick write operation and it is within 1 second. But as most of the cases, ES is not written much often (Since most of the people do not use it as primary storage), most of the people thinks it is real time. But it is not. It’s cache can trick you in intense cases, where writes won’t be immediately available to you for read.

User Management

ES user management is no where near to a full fledged DBMS, only because ES doesn’t need to. They have never announced themselves as a NoSQL DBMS, hence they do not require to add the functionality in full square. They only implements the facilities that ES Ecosystems require.

So … ?

So, what does all these mean to you? If you have an extremely large read featured application, where updates are very rarely or occassionaly done, then you can definitely go with ES as your primary document storage engine. But if you have a write oriented application, requires complex user management or parallel threads require write access, then you better be choosing a standard DBMS for your business and use ES as secondary data store for your analytics and searches.

How to Use Sticky Session for CSRF submission on Highly Scalable Cloud App in Haproxy

HINT: If you are a nginx fan and used it in mass scale, then, you must have done this using ip_hash (Nginx Documentation). It follows the same purpose for Haproxy. Difference and benefits of using Haproxy over Nginx for L7 proxy in a highly scalable and reliable cloud app would be a discussion for another day.

Case Discussion:

Suppose, you have a Cloud app, that is load balanced & scaled between multiple servers using Haproxy, for example:

101.101.101.101
202.202.202.202
303.303.303.303

Now, if your app has a submission form, for example, a poll submission from your users, then, there is an issue in this Haproxy setup.

Let’s say, an User A, requests for the app, and gets the data from the server 101.101.101.101, the CSRF token he gets for the poll submission to his browser, also maintains the app hosted on 101.101.101.101. But when he press the submit button, HAProxy puts him on 202.202.202.202 app, and the app hosted on 202.202.202.202 instantly rejects the token for the session as the session is not registered for that app. For such cases, we need to maintain a ‘Sticky’ session based on the cookie set by the right server. That means, if the cookie is set by 101.101.101.101, HAproxy should obey and give the user 101.101.101.101 until the cookie or the session is reset or regenerated.

How To Do That:

What we need to do, let haproxy write the server id in the cookie, and make the directive ‘server’ to follow the cookie. Please remember, there are couple of other way to achieve this. There is another way of doing this is called ‘IP Affinity’, where you make sticky session based on IP of the user. There is another based on PHP session value. Setting sticky session based on php session should also work. I preferred the cookie based sticky session, just on random selection.

So, to write the server id in the cookie, you need to add the following in the haproxy ‘backend’ directive as following:

backend app-main
balance roundrobin
cookie SERVERID insert indirect nocache

In the cookie directive, you can see, we are taking the HAProxy variable ‘SERVERID’ and inserting that to the cookie attribute. Now, all you need to do, is to configure your balancing IPs to follow the cookie, like the following:

backend app-main
balance roundrobin
cookie SERVERID insert indirect nocache
server nginx1 101.101.101.101 cookie S1
server nginx2 202.202.202.202 cookie S2
server nginx3 303.303.303.303 cookie S3

S1, S2, S3 are just 3 different names of the cookies for the specific servers. After the above is done, you can now restart and see Haproxy is following stickiness based on the session you have.

Just to find out, how to test if you are using laravel, try to regenerate the session based on the session() helper method as following:

session()->regenerate()->csrf_token();

You should be able to see the content loading from different web servers when the session regenerates. But it will persists when the regenerate session method is not called.

TDD: Date Assertions – Laravel 7, PHP 7.* – Carbon 2 – Changes

If you follow a TDD approach to develop your software, and also a Laravel user, working with a JSON API, you might have experienced some date assertion issues while asserting Json with Laravel 7. Previously, when Laravel was using Carbon 1 for date management, it would not return the whole Carbon object for assertJson. Which is why, the following would work in a sample PHPUnit Test:

$contact = factory(Contact::class)->create();
$response = $this->get('/api/contacts' . $contact->id);

$response->assertJson([
'name' => $contact->name,
'email' => $contact->email,
'meeting_date' => $contact->meeting_date,
'company' => $contact->company,
]);

But as of now, Carbon 2, returns the whole Carbon object for $contact->meeting_date here, the assertion will fair, because the $response, didn’t get a Carbon object, instead a Json string here.

If you go through the Carbon documentation here, you can see the following under the section ‘Migrate to Carbon 2’:

$date->jsonSerialize() and json_encode($date) no longer returns arrays but simple strings: "2017-06-27T13:14:15.000000Z". This allows to create dates from it easier in JavaScript. 

This is basically what you need to use if you are on Laravel 7 with Carbon 2. You need to serialize the data as Laravel is not doing it for you automatically here to fix this up. Just change the ‘meeting_date’ to the following:

'meeting_date' => $contact->meeting_date->jsonSerialize(),

and this should let your assertion pass.

Remember, you might not need to explicitly do this in your controller return until you are following a TDD based development where an assertion to pass is important to continue the process. Laravel resource would implicitly serialize your data before returning them from controller.