Redis series:

Liudongdong. Top/categories /…

This article is from:

Liudongdong. Top/archives/re…

Public number: Walk in the rain Sahara

Remark: welcome public number, study together, make progress together!

First, the evolution history of database

1. Standalone MySQL era

In the early days of the Emergence of the Web, most websites used standalone MySQL to store user data. Since the users and visits of websites were not too large, and even most of them used static web pages without too much interaction with the backend, standalone MySQL was enough

However, with the surge of user groups brought by the development of Web, the bottleneck has followed. In the era of standalone MySQL, the main reasons for the bottleneck are as follows

  1. The amount of data is too large to be stored on one server hard disk

  2. Mixed read and write, one server performance is inadequate

  3. Database index too large, out of memory

2. Vertical splitting and caching era

Later, as site traffic increased, sites using standalone MySQL began to have performance problems, so programmers shifted their focus from functionality to performance

In order to solve the problems of standalone MySQL era, Memcached and vertical split (read and write separation) are introduced into MySQL

A running web site the most of the time was conducted in by the user query operation, read and write if you split into different database, can improve the efficiency of the query, so the database with the vertical resolution of solution, namely database according to the function to read and write server split, and use the master-slave replication to ensure proper data, at the same time using the cache to speed up the speed

3. Horizontal split and cluster era

Read/write separation and database table can not meet the storage of user data, then there is the horizontal separation of the server, multiple primary and secondary nodes form a cluster node, and multiple cluster nodes form a cluster

4. The data Explosion

In today’s era of information explosion, people have higher and higher requirements for real-time information, and there are also more and more Internet users. Therefore, MySQL and other relational databases are not enough, because there is a large amount of data and data changes quickly

Through access to a third party platform and data capture, can easily access to users’ personal information, social networking, user generated data, and user operation log has increased exponentially, for these data structure is not sure if you want to the depth of mining these data, the relational database is no longer practical

What is NoSQL

NoSQL = Not Only SQL, that is, non-relational databases

Due to the advent of Web2.0 era, the amount of Internet users and data is geometrically rising. Traditional non-relational database is difficult to cope with the large amount of data and high concurrency of large websites, which exposes many problems that relational database is difficult to overcome

Because a relational database is essentially a table, it has a fixed structure, and all data must be stored in the same way

And non-relational database is unstructured, data can be stored in a variety of ways, that is, document-oriented storage, image-oriented storage, and even k-V storage. A Map in Java is a classic “NoSQL” because the Object type can be used for any type of Object

1. NoSQL features

  1. Easy to expand, there is no relationship between data

  2. Large amount of data storage, high performance (Redis 1S can write 8W times and read 11W)

  3. Data types are diverse, and no prior database design is required

2. The difference between RDBMS and NoSQL

RDBMS

  1. An RDBMS uses a structured organization

  2. DDL, DQL, DML

  3. Data and relationships are stored in separate tables and can only be stored in rows and columns

  4. ACID principle, strict consistency

  5. The transaction

NoSQL

  1. There is no set query language

  2. The storage modes are diversified: redis, HBase, document, and graph database

  3. Final consistency only requires final consistency of data

  4. CAP Theorem and Base Theory

  5. High performance, high availability, high scalability

In a company, NoSQL + RDBMS must be used together

3. Requirements of the era of big data

In the era of big data, there are concepts of 3V and 3 high

3V is used to describe data problems in the era of big data:

  1. Volume

  2. Variety

  3. Real time (velocity)

3 High refers to the standards that programs need to meet in the era of big data:

  1. High availability

  2. High concurrency

  3. A high performance

3. Alibaba’s structural evolution

From the end of 2010, Alibaba began to implement the transformation of the 5th generation website architecture. Alibaba has the following requirements for the 5th generation website architecture

  1. Agile: Agile development of requirements, application system inflation and coupling deterioration make architecture more and more complex, how to keep business agile development

  2. Openness: How to improve the openness of the website and attract third-party developers to join in the construction of the website

  3. Experience: Concurrency pressure is increasing rapidly, and users are demanding more and more experience

1. Data Layer: The increasing complexity of data architectures

Complex data architecture diagrams

To speed up access, different information about an item may come from different databases

  1. Basic commodity information: name, merchant information, etc., can be stored in a relational database

  2. Product description and comment: Because there are many characters in this part, the performance may deteriorate if it is stored in a relational database. Therefore, a document database, such as MongDB, is used

  3. Image: Distributed file system. Taobao TFS, Ali Cloud OSS, Google GFT, Hadoop HDFS, and FastDFS,

  4. Keyword search: solr, ElasticSearch, Taobao use Isearch

  5. Commodity popular band information: memory database Redis, Tair, memache

  6. Transaction of goods, external payment interface: third party application

The diversity of data raises many questions

  1. There are many data sources, and the transformation of data sources leads to large-scale reconfiguration of related applications

  2. It is difficult to locate problems across data sources, and caching and performance tuning are difficult to implement

  3. Data architecture is complex and applications need to rely directly on multiple types of data sources

In order to solve these problems, Alibaba proposed a solution: unified data service layer UDSL, a layer of agents between the website application cluster and the underlying data source, unified data layer

The data architecture with the addition of UDSL is as follows

UDSL shields the differences between underlying databases and uses a unified operating language to operate different databases. UDSL maintains the details

2. Hotspot cache

With the use of UDSL, the data architecture has been greatly simplified, but the performance problems are still serious. The website data is very large, caching too much data is not cost-effective, so caching hot data has become the best choice

Therefore, Alibaba developed a hot cache platform and provided it to UDSL as a cache system

Four types of NoSQL

1. KV key value pair

Store data in key-value pairs, such as Redis, Tair, and Memecache

2. The document type

The transmission format is Bson, similar to Json

The common one is MongDB. MongDB is a database based on distributed file storage and written in C++. It is mainly used to process a large number of documents

Also: CouchDB, RavenDB… Etc.

3. Column storage

Common examples are HBase and distributed file systems

4. Graph relational database

Graph relational databases are not used to store images, but to store relationships and build graphs of relationships, such as social networks, moments of friends, AD recommendations, etc

Common examples are: Neo4j, InfoGrid…… Etc.

5. Compare the four