Introduction
In the last years,relational databases have been the only option when we talk about data persistence. Our unique choice have been which database we should use. Should we use a sql Server? Should we use a MysqL? Oracle? Even in these cases,some choices come by default. E.g. if we are using .NET,we almost always work with sql Servers,if we are using Java we almost always use Oracle,ruby-MysqL,python-MysqL/postgre and so on.
The reason is obvIoUs,relational databases are in the field for decades,they proved to be robust for most of the applications. We can rely on them to take care of concurrency,transactions and so on. But if relational databases are reliable as I’m saying why they are losing market to Nosql databases? Relational Databases have some problems that Nosql Databases are resolving.
Problems with Relational Databases
Impedance Mismatch
We use to write software using Python,Ruby,Java,.NET. What they have in common?They are object-oriented languages.But we persist the data using MysqL,Postgre,Oracle and sql Server. What they have in common?They are relational databases.Can you spot the difference? Impedance Mismatch is the name we gave to this difference. Our memory structures are object-oriented and our databases are relational,every time we need to save or retrieve data we need to make a conversion. ORM (Object Relational Mapping) Frameworkds,like Hibernate,Entity Framework,make easier to map objects and relational databases but it’s still a issue,principally when we need high performance queries.
Applications are getting bigger
Web applications are increasing in scale.We have to store more data,we have to serve more users and we need more computing capability.To handle this scenario we have to scale. We can scale in two ways. We can scale up,that is buying better machines,more disk,more memory and so on. Or we can scale out,that is buy a lot of small machines and use them in a cluster. In big applications scale up is not an option.Bigger machines are more expensive and they have a limit,we don’t have a machine that can handle the traffic from Google or Facebook.Given this context,we need new databases,since relational database are not designed to run on clusters. Yes,you have clustered relational databases,but they work sharing a disk,that isn’t the scenario we want to have when we’re building a cluster. Some of the companies who needs to handle a lot of traffic like Google,Facebook,Amazon started to develop databases that are designed to run on clusters and this was the beginning of Nosql era.
Nosql Era
Nowadays,there are a lot of Nosql databases,MongoDB,Redis,Riak,HBase,Cassandra and so on. And each one has at least one of these characteristics.
- Nosql databases don’t use sql,some of the has query languages like MongoDB and Cassandra
- Usually they are open-source projects
- They we’re built to run on clusters
- Schemaless,you don’t have rigid schema defining the data structure
Types of Nosql
Nosql databases can be divided in 4 types.Key-value,Document-Oriented,Column-Family DatabasesandGraph-Oriented Databases. Let’s see what are each one of these types,his characteristics and where we should be using them.
Key-Value Databases
What are:A key-value store works like a simple hashtable that we are used to use in traditional languages. You can add,retrieve and delete data through keys. Since they use primary key access they tend to have a good performance and are easily scalable.
Examples:Riak,Memcached,Amazon’s Dynamo,Project Voldemort
Who’s using:GitHub (Riak),BestBuy (Riak),Twitter (Redis and Memcached),StackOverFlow (Redis),Instagram (Redis),Youtube (Memcached),Wikipedia (Memcached).
When we should use:
- To store user information,like Session,Profiles,Preferences,Shopping Cart and so on.These info are often associated to a id(key). This case is exactly the best scenario to use a key-value database.
When we shouldn’t use:
- If we need to query the data by value instead by keys.There is no way to query a key-value database by value.
- If we need to save relationship between data.We can’t relate data between two or more keys in a key-value database.
- If we need transactions.In a key-value database,we can’t roll back a operation if a failure occurs.
Document-Oriented Databases
What are:Document-Oriented databases store data as documents. Documents can be defined as a set of maps,collections and scalar values. Documents are like rows,but unlike rows that have to have the same schema,documents can be totally different between themselves. These documents can be stored using XML,JSON or JSONB.
Examples:MongoDB,CouchDB,RavenDB
Who’s using:SAP (MongoDB),Codecademy (MongoDB),Foursquare (MongoDB),NBC News (RavenDB)
Logging.In a enterprise environment,each application has different logging info. Document-oriented databases don’t have a fixed schema. So we can use them to store all these different info.
If we need to have transactions between documents.Document-oriented databases don’t support transaction between documents,if we need it,we shouldn’t use a document database.
Column-Family Databases
What are:Column-Family databases store data in column families. A column family can be defined as groups of related data that are often queried together. Let me give a example. When we have a Person class we often access their name and age together but not his salary. In this case,name and age belong to one column-family and salary belongs to another one.
Examples:Cassandra,HBase
Who’s using:Ebay (Cassandra),Instagram (Cassandra),NASA (Cassandra),Twitter (Cassandra and HBase),Facebook (HBase),Yahoo!(HBase)
Logging.Since we can store data with different columns,each application can write their info with their own column families.
If we need ACID transactions.Cassandra doesn’t support transactions.
Graph-Oriented Databases
What are:Graph databases allow us to store data as graphs. Entities can be represented as vertices and the relationships between these entities can be represented as edges. In a example,we could have 3 entities. Steve Jobs,Apple and Next. And two edges called “Founded by” that relate Apple to Steve Jobs and Next to Steve Jobs.
Examples:Neo4J,Infinite Graph,OrientDB
Who’s using:Adobe (Neo4J),Cisco (Neo4J),T-Mobile (Neo4J)
Connected Data.If we have data that are connected through relationship,we have a good case to use a graph database,the vertices can be people,cities,companies and edges can be “lives in”,“employed by” and so on.
Data model not suitable.Most of the cases are not suitable for graph databases since operations involving the whole graph are not trivial.