0 mongos–>config Servers –> shard
When realizing the Shard cluster, MongoDB introduces Config Server to store the metadata of the cluster and Mongos as the entrance of application access. Mongos reads the routing information from Config Server and routes the request to the corresponding Shard at the back end
The role that
- A. Data Shards
- It is used to save data to ensure high availability and consistency of data. It can be a single instance of Mongod or a set of replicas. In the production environment, Shard is usually a Replica Set to prevent a single point of failure of the data slice. There is a PrimaryShard in all shards that contains an undivided set of data:
- B. Configure servers
- Saves metadata of the cluster, including routing rules for each Shard.
- Metadata is information about the organization of data, data fields and their relationships. In short, metadata is data about data
- Saves metadata of the cluster, including routing rules for each Shard.
- C. Query Routers
- Mongos is the access point of Sharded Cluster, which does not persist data itself (all metadata of Sharded Cluster is stored in Config Server, while user data is distributed to various shards).
- After Mongos starts, it loads metadata from Config Server and starts to provide services, routing user requests correctly to the corresponding Shard Sharding cluster which can have one Mongos or multiple Mongos to reduce the pressure of client requests.
1 What is sharding? Why shard?
- Sharding is to divide mongo single table data into multiple chunks, so as to horizontally improve mongo’s reading and writing ability
- Why does sharing multiple chunks improve literacy?
- Multiple chunks are distributed on multiple machines, making full use of the CPU and disk IO of multiple machines
- If a chunk is allocated to multiple chunks because of the existence of shard keys, if the query is performed based on the shard keys, the chunk is directly located and the query only needs to be performed on this chunk. Obviously it’s going to be a lot faster
- Such queries can be interpreted as MySQL’s manual table (mongo is more automated), such as a log table, you use the date as a table, when you query, you will first concatenate table names, to locate a table, so that you do not scan all the table data.
How many types of sharding are there?
- Hash shard key
- The principle of data allocation is that data is continuously added to a chunk. When the maximum amount of data is reached, the chunk is divided
- Strengths and Weaknesses
- Advantages: The hash key can distribute multiple chunks, greatly improving write performance
- Disadvantage: inconvenient range query
- The range shard key
- The principle of data distribution is as follows: Hash the shard key values, and then distribute the values to each chunk according to the range
- Strengths and Weaknesses
- Advantages: 1, convenient range query 2, if the sharding key is not monotonically increasing, can also improve write performance
- Disadvantages: Write performance cannot be improved if the shard key is monotonically increasing
3 Restriction and selection logic of fragment keys
- Restrictions on shard keys
- Shard keys are immutable.
- A shard key must have an index.
- The fragment key size is limited to 512bytes.
- The fragment key is used for route query.
- MongoDB does not accept documents without sharding keys inserted into collections that have been sharded at collection level (nor does it support null-value insertion).
- Shard key selection logic
- All fragments are read and written evenly.
- Data access is uniform, not random; Since new data is created in memory first, try to avoid the need to access new data from disk.
- Avoid hot data being flushed out of memory due to data movement from disk to memory.
- Combined field sharding may be the ideal sharding solution.
- Sharding formula (range heat) :
- Shard key formula: {coarseLocality: 1, search: 1}
- CoarseLocality: Should be a large-grained local field. For example, the MONTH MONTH ascending field.
- Search: is a field often used for lookups.
4 What impact will sharding have on queries and writes
All requests are routed, dispatched, and merged by Mongos. These actions are transparent to the client driver, and users connect to Mongos just as they connect to Mongod. Mongos routes requests to the corresponding Shard based on the request type and shard key, so different operation requests have different restrictions.
- Query request If the query request does not contain the Shard key, distribute the query to all shards and then combine the query result and return it to the client. If the query request contains the Shard key, the chunk to be queried is directly calculated based on the shard key and the corresponding SHard is sent the query request
- The insert request write operation must contain a Shard key. Mongos calculates the chunk to which the document should be stored based on the Shard key and sends the write request to the Shard where the chunk is located.
- Update/Delete Request The query criteria for update or delete requests must contain the SHard key or THE _ID. If the query criteria contain the Shard key, the request is directly routed to the specified chunk. If the query criteria contain only the _ID, the request is sent to all shards.
- For example, the listDatabases command forwards the listDatabases request to each Shard and Config Server and then merges the results