SQL or NOSQL

I was working on a project for about 10 months using Oracle Coherence as the object repository. Our goal is storing all our domain objects into the data grid so we can share the objects to the rest of the enterprise in real time. We have used Oracle Coherence as read through cache and it works pretty well (I am sure all the other open sourced non-RDBMS solutions would work very good as well). However, we really did have an issue…

Oracle Coherence does not only works as a key-value paired distributed cache. It does also support distributed data processing (not exactly like map-reduce but it works actually like parallel processing). Also, it can read-through and write-behind Oracle database. It sounds like a perfect solution as our primary domain object storage.

We have about 240GB memory for storage (20 server x 3 JVM x 4GB heap) storage nodes. Due to the number of servers, we decide to have zero backup count. In this scenario, when a node fails, we will lots the data that stored in the node. We assume as long as we have the Oracle database as the permanent storage (thru write behind), we can always read through the missing data.

During our performance testing in the staging environment, Coherence does not performance if we try to random access (add, query, and delete at the same time). To be worse, for each new search index will consume 10-15% more storage. It is really expensive to use RAM for storage!!!

Write behind does not working at all due to our data throughput. Since we have 60 nodes, we have 60 x N JDBC connections. Oracle database basically cannot keep handle all the insert, update, merge, and delete at the same time. The database latency causes the storage nodes evicted from the data grid due to timeout.

After 10 months of development, we decided to get rid of Coherence and use Oracle database directly. Oracle 10g performs pretty good comparing to our legacy system. However, it still cannot provide the performance that we need. We are now looking into other open sourced NOSQL solution that could provide us memory/disk based storage plus Map-Reduced. The current candidates are Cassandra+Hadoop or MongoDB. Cassandra doesn’t support multiple keys indexing but version 0.6 works very well with Hadoop integration (although I have learned from forums that Cassandra has performance issue during high traffic of data loading). MongoDB is very fast and support multiple key indexing. However, the M-R solution comes from MongoDB only supports single thread.

SQl or NOSQL? Still have no answer after 12+ months of development.

Filed Under: Enterprise Computing

Tags:

About the Author: Andy H. Chan has years of enterprise software development and architecture experience. He is also the co-author of the book Pro Spring Integration. He can be reached Twitter @iceycake.

RSSComments (1)

Leave a Reply | Trackback URL

  1. Marco says:

    Here is a nice comparison overview for NOSQL providers:

    http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Leave a Reply