This article has participated in the third “topic writing” track of the Denver Creators Training Camp. For details, check out: Digg Project | Creators Training Camp third is ongoing, “write” to make a personal impact.

preface

One chart clearly traces the development process of database (1962-2016). In terms of my personal work experience, I have also experienced the reform and change of database, such as SQL Server database when I first studied in school, and then I worked on various relational databases for telecom business. For example, DB2, Oracle, Sybase, GP and so on, I am familiar with all kinds of relational databases because I have done the research and development of ETL tools. In the future, I used Mysql to deioE Internet enterprises. With the rise of Web2.0 websites on the Internet, the huge amount of data on the Internet, the traditional relational database seems to be inadequate, the rise of non-relational database NoSQL, used ClickHouse, Hbase, MongoDB, Redis and so on.

1. NoSQL

1.1 no (a brief history of

The term NoSQL first appeared in 1998. It is a lightweight, open source relational database developed by Carlo Strozzi that does not provide SQL functions.

In 2009, Last.fm’s Johan Oskarsson initiated a discussion about distributed open source databases, and Eric Evans from Rackspace reintroduced the concept of NoSQL, NoSQL at this point mainly refers to a non-relational, distributed database design pattern that does not provide ACID.

One milestone was the 2009 “No: SQL (EAST)” conference in Atlanta, with the slogan” Select Fun, Profit from Real_world where Relational =false”. Therefore, the most common interpretation of NoSQL is “non-relational,” emphasizing the benefits of key-value Stores and document databases rather than purely opposing RDBMSes.

1.2 What is NoSQL

NoSQL (Not Only SQL), meaning “more than SQL”, refers to a non-relational database, is different from the traditional relational database database management system.

NoSQL is used to store very large scale data. These types of data stores do not require fixed schemas and can scale horizontally without unnecessary operations.

1.3 为什么用NoSQL

With the rapid development and popularization of the Internet, the data generated by Internet surfers are increasing day by day, from GB to TB to PB. Much of this data is processed by relational database management systems (RDBMSS).

Due to the normal form constraints, transaction characteristics, disk IO and other characteristics of relational database, if the server uses relational database, when a large amount of data is generated, the traditional relational database has been unable to meet the requirements of fast query and data insertion. The emergence of NoSQL solves this crisis. It achieves performance gains by reducing data security, reducing support for transactions, and reducing support for complex queries. However, NoSQL is still not the best choice for certain scenarios, such as those where transaction and security metrics are absolutely required. (Such as Internet e-commerce payment, marketing, membership and many other scenarios still use MySQL database).

NoSQL is a new revolutionary database movement, which has been proposed in the early days and is gaining momentum in 2009. Proponents of NoSQL advocate the use of non-relational data storage, which is an injection of new thinking in contrast to the overwhelming use of relational databases.

1.3 no classification

1.3.1 Storing databases by key-value

Features: A key-value database is like a hash table used in traditional languages. Add, query, or delete data using keys.

Advantages: Fast query speed.

Disadvantages: Data is unstructured and is usually stored only as strings or binary data.

Application scenario: Content cache, user information such as session, configuration information, and shopping cart. It is mainly used to handle a large amount of data with high access load. Such as: Redis, Memcached

1.3.2 Column storage database

Features: A Column store database stores data in Column families, aggregating multiple columns into a single Column Family. The keys still exist, but they have the characteristic of pointing to multiple columns. For example, if we had a Person class, we would normally query their name and age together instead of salary. In this case, name and age would be placed in one column group and salary in another.

Advantages: Fast column storage search speed, strong scalability, easy to be distributed expansion, suitable for distributed file systems, distributed storage for massive data.

Disadvantages: Low query performance and lack of unified query syntax.

Application scenarios: logs, distributed file systems (object storage), recommendation portraits, spatio-temporal data, messages/orders, etc. For example, Cassandra, HBase, and Riak.

1.3.3 Document Database

Features: A document database stores data in the form of a document, similar to JSON, which is a collection of data items. Each data item has a name and a corresponding value, which can be simple data types such as strings, numbers, and dates. They can also be complex types, such as ordered lists and associated objects.

Advantages: data structure requirements are not strict, table structure is variable, do not need to define table structure in advance like relational database.

Disadvantages: Low query performance and lack of unified query syntax.

Application scenarios: Logs and Web applications.

Examples: MongoDB, CouchDB…

1.3.4 Graph Database

Features: Graph databases allow us to store data as graphs.

Advantages: graph correlation algorithm. Such as shortest path addressing, N degree relation search and so on.

Disadvantages: most of the time, it is necessary to calculate the whole graph to get the required information, the distributed cluster scheme is not easy to do, the processing of super nodes is weak, there is no sharding storage mechanism, the domestic community is not active.

Application scenarios: Social network, recommendation system, etc. Focus on building the relationship map.

For example: Neo4j, Infinite Graph

Relational database

Relational database, refers to the use of a relational model to organize the data of the database, which stores data in the form of rows and columns for users to understand, relational database this series of rows and columns is called a table, a group of tables constitute a database. The mainstream relational databases are Oracle, DB2, MySQL, GP and so on.

3. How to select NoSQL and relational database?

Relational database we all may be very familiar with, is also a test point in the interview, including some of the characteristics of the database principle and so on, when it comes to how to select NoSQL and relational database, in fact, it is easy to do selection. However, the choice of NoSQL database may need to be based on the specific business scenario described above and the capabilities of your company’s existing big data platform. If I were to choose a NoSQL database, I would consider it from three aspects:

  1. Business scenario,NoSQL database selection
  2. For example, our company has Hbase. Without Riak, I might use Hbase directly
  3. Learn the cost and completeness of the technology ecosystem, such as data monitoring, cluster scalability, and so on.