Historical data: LevelDB versus MySQL

So, I’ve got me a Cubietruck with an 60GB SSD attached to it. On the SSD there’s a LevelDB database with historical data for 121 device values. The total number of keys stored in this database is about 1,050,000. It takes about 8 minutes to fill this database from scratch with a MS SQL Server table as ‘source’.

That means that I can put (write) more than 2,000 entries to the database per second. That will do. Just kidding – that’s more than I’ll ever need of course.. but what about the get (read) performance? Being able to read 2,000 values per second from a database is not that much, so I hope the read performance is even better. A small test showed me that LevelDB could produce about 4500 values per second. Would it be possible to improve this? A lot, please?

So I decided to do some testing, so that I won’t end up with a user interface with charts that take ages to load. Therefore I wrote a small script that would query the database to retrieve everything for all the device values in the database. The result set, retrieved with the createReadStream() function of LevelUP, can have a size in the range of 4 to ~50,000 values. By counting the number of returned values and the time that was needed to get those (and push them to an array for further processing), I could get an indication of how fast LevelDB really is, based on a real database (not just a bunch of “123456abcdef” keys…) and based on result set ranges from ‘very small’ to ‘very large’. As in real life ;-)

And why not do the same test with MySQL and see how that one performs as well? With a MySQL installation ‘out of the box’ (so no performance-enhancing tricks) I created a database with a single table in it. It contains the same data as the LevelDB version: device value ID, timestamp and value (and some more). And an index of course:

CREATE TABLE `data` (
`TIME` datetime NOT NULL,
`LGDEVICEID` varchar(25) NOT NULL,
`START` decimal(16,6) DEFAULT '0.000000',
`VALUE` decimal(16,6) DEFAULT '0.000000',
`COUNT` int(11) DEFAULT '1',
UNIQUE KEY `1` (`LGDEVICEID`,`TIME`),
KEY `time` (`TIME`,`LGDEVICEID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Filling this table with the same data as the LevelDB database took a bit longer:  56 minutes vs. 04:19 for LevelDB..  I hope this isn’t indicative for the read performance ;-)

Well, here are the numbers:

LevelDBvsMySQL

The numbers presented show the average number of rows per second each database can deliver in the specified size range of the result. An example for LevelDB: 100 createReadStream() calls resulted in a total of 62606 values in 13122 milliseconds, i.e. an average of 4771 values/second. MySQL: the same number of query() calls resulted in a total of 62606 values in 5996 milliseconds, i.e. an average of 10141 values/second.

The size ranges with which I tested may seem a bit strange, but that’s because I just didn’t have any historical data for a single device between 5k…10k – the gap between historical data that’s being stored temporarily and the historical data which is stored forever.

The numbers show a clear winner: MySQL. I guess that using a ‘traditional’ RDBMS for my historical data is not such a bad idea after all…

This is not what I expected actually. All those hyper-new database engines must have some benefits – cause what other reason would make them so popular lately? Well, I didn’t find them… at least not with my hardware, my historical data characteristics and my priorities … and since every second counts, I’m going for MySQL!

Bookmark the permalink.

6 Responses to Historical data: LevelDB versus MySQL

  1. Rene Klootwijk says:

    This is a not a fair comparison. Leveldb is a in-process db which in case of node.ja this means utilizing 1 core. Mysql runs as a separate multithreaded process utilizing all cores available. A better comparison would be Leveldb vs Sqlite or Mongodb vs Mysql. The reason these noSql database are that popular is because they do not require a predefined data model and their eventually consistent characteriatics which enablesthe creation of distributed databases without distance constraints.

  2. In my case it is. This is not a contest, it’s about choosing a tool. ;-)

    What’s wrong with comparing 2 different ways to store my historical data, look at the results and pick the one that fits with my priorities best? I should not do that at all? Right. Despite the differences between the 2, for me the goal is the same one: storing historical data. Nothing more, nothing less.

    Comparing tools is never unfair, especially from ‘the user perspective’. Just like it’s fair to compare a hammer with a chainsaw when you’re trying to find the best tool to destroy something. I (“the user”) don’t care how they work, I just need the best tool for the job.

    • Rene Klootwijk says:

      You are absolutely right in your situation, however I was referring to your rather blunt statement “All those hyper-new database engines must have some benefits – cause what other reason would make them so popular lately? Well, I didn’t find them… “.

  3. Vladimir Rodionov says:

    This is typical dilettantish approach. As one commenter already mentioned you compare RDBMS and single-threaded in – process DB. Make a quick google search “leveldb ssd performance” or read facebook report on RocksDB vs LevelDB benchmarking here:
    https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks

    you will discover many interesting facts, Robert.

    • I just knew I was gonna get some unusual responses to this post ;-)

      So you can’t cope with my choice for MySQL? Well, that’s your problem, not mine. Your comment hasn’t convinced me that I’m wrong – it didn’t even made me reconsider my choice.

      The reason for that is that you’ve totally missed the point of my post. Maybe, because I don’t think you did, you should take the time to really read it.

      I don’t care whether it’s in-process or not, cause I’ll use any DB I want to. Even dBase III, if that would perform much better than the rest… ;-) But for some reason I chose the solution that works (surprise, surprise) best in my situation. There is no such thing as OSFA in this, because the choices one can make are there for a reason. I used LevelDB as-is, just like MySQL. BTW, I don’t have 12 CPUs @ 2.67 GHz and I sure don’t have 144 GB of RAM. Talking about comparing? I don’t know why I should have a look at those numbers presented where you linked to, because they don’t tell me anything about my own situation.

      This is Home Automation related, I guess you missed that one too. Please, take the time to place things in perspective before you start judging and posting irrelevant benchmarks. All I need is the fastest DB engine for my setup. And if you’re trying to convince me that LevelDB should be the one I should choose based on the results, well, in that case you’re just ignoring the facts I presented. And I can’t think of any reason why I (or you) should ignore those facts. Cause you don’t question those facts, or do you?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>