NoSQL 比较 - Cassandra vs MongoDB vs Redis vs ElasticSearch vs HBase

前端之家收集整理的这篇文章主要介绍了NoSQL 比较 - Cassandra vs MongoDB vs Redis vs ElasticSearch vs HBase前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

转自:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis@H_301_1@

-------@H_301_1@

The most popular ones

Redis (V3.2)

  • Written in:C
  • Main point:Blazing fast
  • License:BSD
  • Protocol:Telnet-like,binary safe
  • Disk-backed in-memory database,
  • Master-slave replication,automatic failover
  • Simple values or data structures by keys
  • butcomplex operationslike ZREVRANGEBYscore.
  • INCR & co (good for rate limiting or statistics)
  • Bit and bitfield operations (for example to implement bloom filters)
  • Has sets (also union/diff/inter)
  • Has lists (also a queue; blocking pop)
  • Has hashes (objects of multiple fields)
  • Sorted sets (high score table,good for range queries)
  • Lua scripting capabilities
  • Has transactions
  • Values can be set to expire (as in a cache)
  • Pub/Sub lets you implement messaging
  • GEO API to query by radius (!)

Best used:For rapidly changing data with a foreseeable database size (should fit mostly in memory).@H_301_1@

For example:To store real-time stock prices. Real-time analytics. Leaderboards. Real-time communication. And wherever you used memcached before.@H_301_1@

Cassandra (2.0)

  • Written in:Java
  • Main point:Storehugedatasets in "almost" sql
  • License:Apache
  • Protocol:CQL3 & Thrift
  • CQL3 is very similar sql,but with some limitations that come from the scalability (most notably: no JOINs,no aggregate functions.)
  • CQL3 is now the official interface. Don't look at Thrift,unless you're working on a legacy app. This way,you can live without understanding ColumnFamilies,SuperColumns,etc.
  • Querying by key,or key range (secondary indices are also available)
  • Tunable trade-offs for distribution and replication (N,R,W)
  • Data can have expiration (set on INSERT).
  • Writes can be much faster than reads (when reads are disk-bound)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar,as opposed to Hadoop/HBase
  • Very good and reliable cross-datacenter replication
  • Distributed counter datatype.
  • You can write triggers in Java.

Best used:When you need to store data so huge that it doesn't fit on server,but still want a friendly familiar interface to it.@H_301_1@

For example:Web analytics,to count hits by hour,by browser,by IP,etc. Transaction logging. Data collection from huge sensor arrays.@H_301_1@

MongoDB (3.2)
  • Written in:C++
  • Main point:JSON document store
  • License:AGPL (Drivers: Apache)
  • Protocol:Custom,binary (BSON)
  • Master/slave replication (auto failover with replica sets)
  • Sharding built-in
  • Queries are javascript expressions
  • Run arbitrary javascript functions server-side
  • Geospatial queries
  • Multiple storage engines with different performance characteristics
  • Performance over features
  • Document validation
  • Journaling
  • Powerful aggregation framework
  • On 32bit systems,limited to ~2.5Gb
  • Text search integrated
  • GridFS to store big data + Metadata (not actually an FS)
  • Has geospatial indexing
  • Data center aware

Best used:If you need dynamic queries. If you prefer to define indexes,not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB,but your data changes too much,filling up disks.@H_301_1@

For example:For most things that you would do with MysqL or Postgresql,but having predefined columns really holds you back.@H_301_1@

ElasticSearch (0.20.1)

  • Written in:Java
  • Main point:Advanced Search
  • License:Apache
  • Protocol:JSON over HTTP (Plugins: Thrift,memcached)
  • Stores JSON documents
  • Has versioning
  • Parent and children documents
  • Documents can time out
  • Very versatile and sophisticated querying,scriptable
  • Write consistency: one,quorum or all
  • Sorting by score (!)
  • Geo distance sorting
  • Fuzzy searches (approximate date,etc) (!)
  • Asynchronous replication
  • Atomic,scripted updates (good for counters,etc)
  • Can maintain automatic "stats groups" (good for debugging)

Best used:When you have objects with (flexible) fields,and you need "advanced search" functionality.@H_301_1@

For example:A dating service that handles age difference,geographic location,tastes and dislikes,etc. Or a leaderboard system that depends on many variables.@H_301_1@

Classic document and BigTable stores

CouchDB (V1.2)

  • Written in:Erlang
  • Main point:DB consistency,ease of use
  • License:Apache
  • Protocol:HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus,master-master replication. (!)
  • MVCC - write operations do not block reads
  • PrevIoUs versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via '_changes' (!)
  • Attachment handling
  • thus,CouchApps(standalone js apps)

Best used:For accumulating,occasionally changing data,on which pre-defined queries are to be run. Places where versioning is important.@H_301_1@

For example:CRM,CMS systems. Master-master replication is an especially interesting feature,allowing easy multi-site deployments.@H_301_1@

Accumulo (1.4)

  • Written in:Java and C++
  • Main point:A BigTable with Cell-level security
  • License:Apache
  • Protocol:Thrift
  • Another BigTable clone,also runs of top of Hadoop
  • Originally from the NSA
  • Cell-level security
  • Bigger rows than memory are allowed
  • Keeps a memory map outside Java,in C++ STL
  • Map/reduce using Hadoop's facitlities (ZooKeeper & co)
  • Some server-side programming

Best used:If you need to restict access on the cell level.@H_301_1@

For example:Same as HBase,since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge,two-dimensional join-less tables are a requirement.@H_301_1@

HBase (V0.92.0)
  • Written in:Java
  • Main point:Billions of rows X millions of columns
  • License:Apache
  • Protocol:HTTP/REST (also Thrift)
  • Modeled after Google's BigTable
  • Uses Hadoop's HDFS as storage
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML,Protobuf,and binary
  • Jruby-based (JIRB) shell
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MysqL
  • A cluster consists of several different types of nodes

Best used:Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.@H_301_1@

For example:Search engines. Analysing log data. Any place where scanning huge,two-dimensional join-less tables are a requirement.@H_301_1@

Hypertable (0.9.6.5)

  • Written in:C++
  • Main point:A faster,smaller HBase
  • License:GPL 2.0
  • Protocol:Thrift,C++ library,or HQL shell
  • Implements Google's BigTable design
  • Run on Hadoop's HDFS
  • Uses its own,"sql-like" language,HQL
  • Can search by key,by cell,or for values in column families.
  • Search can be limited to key/column ranges.
  • Sponsored by Baidu
  • Retains the last N historical values
  • Tables are in namespaces
  • Map/reduce with Hadoop

Best used:If you need a better HBase.@H_301_1@

For example:Same as HBase,two-dimensional join-less tables are a requirement.@H_301_1@

@H_301_1@

Graph databases

OrientDB (2.0)

  • Written in:Java
  • Main point:Document-based graph database
  • License:Apache 2.0
  • Protocol:binary,HTTP REST/JSON,or Java API for embedding
  • Has transactions,full ACID conformity
  • Can be used both as a document and as a graph database (vertices with properties)
  • Both nodes and relationships can have Metadata
  • Multi-master architecture
  • Supports relationships between documents via persistent pointers (LINK,LINKSET,LINKMAP,LINKLIST field types)
  • sql-like query language (Note: no JOIN,but there are pointers)
  • Web-based GUI (quite good-looking,self-contained)
  • Inheritance between classes. Indexing of nodes and relationships
  • User functions in sql or JavaScript
  • Sharding
  • Advanced path-finding with multiple algorithms and Gremlin traversal language
  • Advanced monitoring,online backups are commercially licensed

Best used:For graph-style,rich or complex,interconnected data.@H_301_1@

For example:For searching routes in social relations,public transport links,road maps,or network topologies.@H_301_1@

Neo4j (V1.5M02)
  • Written in:Java
  • Main point:Graph database - connected data
  • License:GPL,some features AGPL/commercial
  • Protocol:HTTP/REST (or embedding in Java)
  • Standalone,or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have Metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Clustering,replication,caching,online backup,advanced monitoring and High Availability are commercially licensed

Best used:For graph-style,or network topologies.@H_301_1@

@H_301_1@

The "long tail"

Couchbase (ex-Membase) (2.0)

  • Written in:Erlang & C
  • Main point:Memcache compatible,but with persistence and clustering
  • License:Apache
  • Protocol:memcached + extensions
  • Very fast (200k+/sec) access of data by key
  • Persistence to disk
  • All nodes are identical (master-master replication)
  • Provides memcached-style in-memory caching buckets,too
  • Write de-duplication to reduce IO
  • Friendly cluster-management web GUI
  • Connection proxy for connection pooling and multiplexing (Moxi)
  • Incremental map/reduce
  • Cross-datacenter replication

Best used:Any application where low-latency data access,high concurrency support and high availability is a requirement.@H_301_1@

For example:Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).@H_301_1@

Scalaris (0.5)

  • Written in:Erlang
  • Main point:Distributed P2P key-value store
  • License:Apache
  • Protocol:Proprietary & JSON-RPC
  • In-memory (disk when using Tokyo Cabinet as a backend)
  • Uses YAWS as a web server
  • Has transactions (an adapted Paxos commit)
  • Consistent,distributed write operations
  • From CAP,values Consistency over Availability (in case of network partitioning,only the bigger partition works)

Best used:If you like Erlang and wanted to use Mnesia or DETS or ETS,but you need something that is accessible from more languages (and scales much better than ETS or DETS).@H_301_1@

For example:In an Erlang-based system when you want to give access to the DB to Python,Ruby or Java programmers.@H_301_1@

Aerospike (3.4.1)

  • Written in:C
  • Main point:Speed,SSD-optimized storage
  • License:License: AGPL (Client: Apache)
  • Protocol:Proprietary
  • Cross-datacenter replication is commercially licensed
  • Very fast access of data by key
  • Uses SSD devices as a block device to store data (RAM + persistence also available)
  • Automatic failover and automatic rebalancing of data when nodes or added or removed from cluster
  • User Defined Functions in LUA
  • Cluster management with Web GUI
  • Has complex data types (lists and maps) as well as simple (integer,string,blob)
  • Secondary indices
  • Aggregation query model
  • Data can be set to expire with a time-to-live (TTL)
  • Large Data Types

For example:Storing massive amounts of profile data in online advertising or retail Web sites.@H_301_1@

RethinkDB (2.1)

  • Written in:C++
  • Main point:JSON store that streams updates
  • License:License: AGPL (Client: Apache)
  • Protocol:Proprietary
  • JSON document store
  • Javascript-based query language,"ReQL"
  • ReQL is functional,if you use Underscore.js it will be quite familiar
  • Sharded clustering,replication built-in
  • Data is JOIN-able on references
  • Handles BLOBS
  • Geospatial support
  • Multi-datacenter support

Best used:Applications where you need constant real-time upates.@H_301_1@

For example:Displaying sports scores on varIoUs displays and/or online. Monitoring systems. Fast workflow applications.@H_301_1@

原文链接:https://www.f2er.com/nosql/203671.html

猜你在找的NoSQL相关文章