One of the non-relational databases (NoSQL)
Mongo introduction
Introduction to the
MongoDB is a database based on distributed file storage. Written in C++ language. Designed to provide scalable high-performance data storage solutions for WEB applications. MongoDB is a product between relational database and non-relational database. Among non-relational databases, it has the most rich functions and is the most like relational database. It supports a very loose data structure in the JSON-like BSON format, so it can store complex data types. The biggest feature of Mongo is that it supports a very powerful query language, its syntax is somewhat similar to object-oriented query language, almost can achieve the majority of functions similar to relational database single table query, but also supports the establishment of indexes to data.
The characteristics of
- A high performance
- Easy to deploy
- Easy to use
- Very convenient to store data
Sql and MongoDB are compared
SQL | Mongodb |
---|---|
Table (Talbe.) | Collection |
Row (Row) | Document |
Column (Col) | Field |
Primary Key | ObjectId (ObjectId) |
Index | Index |
Embeded tables | Embeded Document |
Array | Array |
Compared to the MySQL
SQL | Mongodb | |
---|---|---|
Database model | non-relational | relational |
storage | Virtual memory + persistence | Different engines have different storage methods |
Architectural features | High availability can be achieved through replica sets, as well as sharding | Common architectures are single point,M-S,MHA,MMM,Cluster and so on |
Data processing mode | Based on memory, hot data is stored in physical memory to achieve high-speed read and write | Different engines have their own characteristics |
maturity | New database, low maturity | Has a relatively mature system, high maturity |
Widely degrees | In Nosql database, mongodb is one of the more perfect DB, and the user population is also growing | The share of open source databases continues to grow, and mysql continues to grow |
The advantage of the mongo
- Quick! Mongodb with a moderate amount of memory is very fast in performance. It stores hot data in physical memory (not just indexes and a small amount of data), making hot data read and write very fast, thus improving overall speed and efficiency.
- High scalability! Mongodb has high availability and cluster architecture with very high expansibility. With the increase of physical machines and sharding, the expansion of Mongodb will reach an amazing level.
- Its own Failover mechanism! In the replica set configuration of Mongodb, when the master library encounters problems and cannot continue to provide services, the replica set will elect a new master library to continue to provide services.
- Json storage format! Json and Bson storage formats of Mongodb are very suitable for the storage and query of document format.
Directing a disadvantage
- Less experience in application. Due to the short rise of Nosql and the immature use of Mongodb in China, there is little application experience and a lack of communication platform in China.
- The non-relational database model may cause some new users to not adapt, because of the use of relational database, mongodb operation statements will appear relatively strange.
- Locking mechanism. Although post-MongoDB2.0 lock mechanism has changed from global to DB lock, for users with only one DB, lock will still be fixed, resulting in queue stacking.
- No transaction mechanism! Mongodb does not have its own transaction mechanism, so if you need to implement the mechanism in Mongo, you need to logically implement the transaction itself through an additional table.
Contrast with Redis
www.php.cn/redis/42192…
Database structure
MongoDB is a NoSQL database, and naturally there is no table-related concept. The database storage uses collections, in which documents are stored (tree-structured data).
Installation and Connection (omitted)
Basic operation and add, delete, change and check operation (omitted)
Spring Data MongoDB operations
- Add dependencies
- Configure the uri
- Domain document class preparation,
- MongoRepository and MongoTemplate classes to implement CRUD operations, similar to Redis, ES, etc.
persistence
Mongodb began supporting journal, or redo log, for failover and persistence after version 1.8.
MongoDB Journaling works
When a write is performed, MongoDB creates a journal containing the exact disk location and changed bytes. Therefore, if the server crashes suddenly, at startup Journal will replay any writes that were not flushed to disk before the crash.
- Planned batch submission
By default, MongoDB writes journal logs every 100ms, and several megabytes of data are written. This means that mongodb commits changes in bulk and each write does not flush to disk immediately, but by default you cannot lose writes of more than 100ms of data in each crash event. However, this guarantee is not strong enough for some applications, so there are several ways to get stronger persistence guarantees. You can ensure that the write has been written to persistence by passing the j option to getLastError. GetLastError will wait for the last write operation to be written to journal and Journaling will only wait 30ms instead of 100ms for the next journal to be written. > db.foo.insert({“x” : 1}) > db.runCommand({“getLastError” : 1, “j” : true}) > // The {“x” : Note that if you use “j” : true every time you write, it means that you will write at a minimum speed of 33 writes/ SEC. (1 write/30ms) × (1000ms/second) = 33.3 writes/second This generally doesn’t take too long to flush writes to disk, so you’ll notice better write performance if you allow mongodb to batch most writes instead of committing one by one. This option is used for important write operations. So, if there are 50 important writes, use “normal” getLastError without the j option, and then use j on the last write operation. If successful, the 50 writes are flushed safely to disk. If multiple write connections are connected, you can use the J option to write in parallel, increasing throughput even if the latency is high. 2. Set the commit interval Another option to reduce journaling intrusion is to shorten or extend journal commits between the two. Run the setParameter command to set the journalCommitInterval to a value between 2ms and 500ms. AdminCommand ({“setParameter” : 1, “journalCommitInterval”) : 10}) This option can also be set at startup –journalCommitInterval. Regardless of the interval setting, calling getLastError “j” : true reduces the time to one third. If the client tries to write faster than journal flushes, mongodb blocks until Journal finishes writing to disk.
If you want to crash, you can continue to work normally
- Replacing data files
It’s the best option. Delete all data directory files, restore from a backup, take a snapshot from a clean member, or replicate from a replica set to get a new copy of the data. If there is a replicate set and it is a small amount of data, copying again may be the best option, stop the replicate, delete the data directory files, and restart the replicate. 2. Restore data files if there is no backup, no copy, no replica set, do everything you can to recover the data, as much as possible. So backing up your data and making sure it’s available is a last resort. In this case, you need to use the repair command, which deletes any damaged data. Mongod comes with two repair tools: mongod built-in and Mongodump built-in. The Mongodump fix may uncover more data, but it will take a long time. In addition, if using Mongodump repair, you still need to restart the data before recovery. Therefore, you should determine how much time it is acceptable to recover data. To use mongod’s built-in fixes, run Mongod with the –repair option: # mongod –dbpath /path/to/corrupt/data –repair When running in the repair state, mongodb cannot start listening on port 27017, but you can view logs to see what is being done. Please note that the repair process requires a large amount of disk space. Make sure the disk space available exceeds the data size. For example, 80 gigabytes of data requires 80 gigabytes of free space. If the current disk space is insufficient, you can specify it for the mounted new disk through the — REPAIRPath option. # mongod — DBPATH/PATH /to/corrupt/data –repair — REPAIRPath /media/ external-HD /data/ DB It won’t make any difference. Because all the output of the fix is written to the new file, it does not change the original file until the last minute. Mongodump uses the repair option: # mongodump –repair 3. Mongod. lock file The mongod.lock file is a special file in the mongodb data directory. This is important when running with Journal disabled. When mongod is shut down normally, the mongod. Lock file is cleared and the next startup knows that it was completely shut down last time. Conversely, if the lock file is not cleared, mongod is not closed properly. If Mongod detects that there is no normal shutdown, it will not let you start again and will require you to make a copy of the data. However, some people have realized that you can get around this check by deleting the lock file. But please don’t do anything about it. Deleting the lock file at startup means you don’t know or care if your data has been corrupted. Except in this case, respect the LOCK file. If you are prevented from starting Mongod, fix your data instead of deleting the lock file. An important reason not to delete locked files is that you may not even notice a hard drive crash. If the server is restarted, the initialization script stops mongod before the server is shut down. However, the initialization attempts to gracefully shut down the process, or to kill it if it cannot be shut down. On busy systems, mongodb may take longer to shut down, and init scripts don’t wait for it to shut down, a rough hard shutdown
Carrier crash (hardware)
Copy, check, single point
Questions (very important)
1. What is NoSQL database?
NoSQL is a non-relational database. NoSQL = Not Only SQL.
2. What is the difference between NoSQL and RDBMS?
- Database, using structured data.
- SQL uses key-value pairs to store data.
3. When do I use and not use NoSQL database?
- Consider using a NoSQL database
- Structured/semi-structured big data
- I’m going to square it up
- Deal with dynamically increasing data items
- Consider a relational database
- The maturity of the database
- And business intelligence
- And professionalism
- Conclusion:
- For information tables (for example, product information), it is good to use NoSQL databases because NoSQL tends to provide flexible data types.
- Transaction tables, using relational databases, because transactional concerns are often required.
- For BI analysis, use relational databases, because the surrounding facilities are more complete.
4. What are non-relational databases?
Common Redis, MongoDB, ElasitcSearch, etc.
5. What is MongoDB?
A non-relational database, a document structured database, a non-relational database that most resembles a relational database.
6. What are the characteristics of MongoDB?
- Collection oriented storage: Good for storing objects and JSON data.
This is also the most basic level between MongoDB and MySQL.
- Dynamic query: MongoDB supports rich query expressions. Query instructions use JSON-style tags to easily query objects and arrays embedded in documents.
- Full index support: includes document embedded objects and arrays. Mongo’s query optimizer analyzes query expressions and generates an efficient query plan.
- Query monitoring: MongoDB includes a monitoring tool to analyze the performance of database operations.
- Replication and automatic failover: MongoDB database supports data replication between servers, master – slave mode and replication between servers. The primary goal of replication is to provide redundancy and automatic failover.
- Efficient traditional storage: supports binary data and large objects (such as photos or images).
- Automatic sharding to support cloud-level scalability: Automatic sharding enables horizontal database clusters with the ability to dynamically add additional machines.
7. Terms and concepts?
MySQL | MongoDB |
---|---|
Library Database | Library Database |
Table Table | A Collection of the Collection |
Row, Row | Document is the Document |
Column Column | Field in the Field, |
joins | Embed documents or links |
- MongoDB database can be regarded as an electronic file cabinet. Users can add, search, update and delete the data in the file. A database is a container for all collections, and each database has an associated physical file in the file system.
- A MongoDB collection is a set of MongoDB documents. It is equivalent to the concept of a table in a relational database (RDBMS). The collection resides in a separate database. Multiple documents within a collection can have multiple different fields. In general, documents in a collection have the same or related purpose.
- The MongoDB document consists of a set of key values. Documents are in dynamic mode, which means that documents in the same collection do not need to have the same fields and structure. Each record in a Table in a relational database is equivalent to a document in MongoDB.
However, in general, the Document field under the same Collection is uniform.
8. Function comparison between MySQL and MongoDB?
MySQL | MongoDB | |
---|---|---|
Rich data model | no | is |
Dynamic Schema | no | is |
The data type | is | is |
Data localization | no | is |
Field update | is | is |
Ease of programming | no | is |
Complex transaction | is | no |
The audit | is | is |
Automatic subdivision | no | is |
- This is an older comparison.
- Starting with MySQL5.7, jSON-type data formats are also available.
- Mongodb version 4.0 also provides transaction functionality.
- It’s up to everyone to decide whether it’s easy to program, but it’s ok.
9. What storage engines does MongoDB have?
- WiredTiger Storage Engine default
- In-Memory Storage Engine
- MMAPv1 Storage Engine (Deprecated as of MongoDB 4.0)
For a comparison, see the MongoDB Storage Engine Selection article.
- In the production environment, the WiredTiger Storage Engine is basically used because its performance is superior to MMAPv1 Storage Engine.
- If you really need an in-memory Storage Engine, Redis may be a better choice.
10. What data types does MongoDB support?
- String
- Integer
- Double
- Boolean
- Object
- bjectId
- Arrays
- Min/Max Keys
- Datetime
- Code
- Regular Expression
- … , etc.
11. Why use “Code” in MongoDB?
The “Code” type is used to store JavaScript Code in a document. We do not use this data type in most business scenarios.
12. Why use “Regular Expression” in MongoDB?
Regular Expression is used to store Regular expressions in a document.
13. Why do I use the ObjectId data type in MongoDB?
ObjectId Data type, which stores document ids. For information about the components of ObjectId, see MongoDB In Depth ObjectId.
- In addition, the ObjectId is a naturally distributed primary key implementation, so it can be used even after MongoDB sharding. Of course, for most business scenarios, we still want to be able to use auto-increment ids. See Implementing Auto-increment primary Key ids in MongoDB in Java.
- Another way is to use SnowFlake, etc.
14. How to understand the GridFS mechanism in MongoDB and why MongoDB uses GridFS to store files?
GridFS is a file specification for storing large files in MongoDB. GridFS allows large files to be split into smaller documents, which allows us to store large documents efficiently and eliminates the limitations of BSON objects. Of course, you are advised to use a dedicated file server, such as FastDFS or TFS, rather than MongoDB GridFS in a production environment.
15. Does MongoDB support stored procedures? If so, how to use it?
MongoDB supports stored procedures, which are written in Javascript and stored in db.system.js tables. Of course, like MySQL, stored procedures are not used in real life scenarios.
16. Why did MongoDB select b-tree index?
We have seen that MySQL uses the B+Tree index.
- B+Tree nodes do not store data, and all data is stored on leaf nodes. As a result, the query time complexity is fixed at log(n).
- The time complexity of b-tree query varies depending on the key position in the Tree. The best value is O(1).
We know that as little disk IO as possible is an effective way to improve performance. MongoDB is an aggregated database, and b-tree happens to aggregate the key and data fields together. As for why MongoDB uses B-Tree instead of B+Tree, it can be considered from the point of view of its design. It is not a traditional relational database, but NoSQL stored in JSON format, with the purpose of high performance, high availability and easy expansion.
- MySQL uses B+ Tree, so the data are all on leaf nodes, and the leaf nodes need to be accessed for each query. However, MongoDB uses B-tree, and all nodes have data domains, which can be accessed as long as the specified index is found. Undoubtedly, single query is faster than MySQL on average.
For more details, see why MongoDB uses B tree index and MySQL B+ tree index. On the Internet. Of course, Nai – Nai also felt that this answer might be a bit far-fetched. The purpose of listing this question is to let us know that MongoDB does not use B+Tree for indexing.
Select * from A:{B,C} where A:{B,C} where B :{C,B} where B :{C,B}
Because MongoDB uses the b-tree index, the query is basically the same as MySQL’s B+Tree index.
- A:{B,C}, can use the full index.
- On A:{C,B}, only part of the index is used, only part A.
18. What are MongoDB aggregation operations?
Aggregate operations that process data records and return computed results. Aggregation operations can combine values from multiple documents, perform various operations on groups of data, and return a single result. It is equivalent to the COUUNT(*) combination GROUP BY in SQL. For aggregate operations in MongoDB, the aggregate method should be used. For specific use, see MongoDB Documentation — Aggregate.
19. How does MongoDB achieve high availability?
Like MySQL, MongoDB provides its replication solution, which provides the foundation for high availability. Currently, MongoDB supports two replication modes:
- Master/Slave: Master/Slave replication, including Master and Slave.
- Replica Set, replication Set replication, roles include Primary, Secondary and Arbiter.
In the production environment, only the Replica Set replication level is used to achieve high availability of MongoDB. Mainly because it provides good automatic failover. For details, see MongoDB Cluster Construction and Use.
20. What is Primary?
Primary, which is the main node/member in the current replica set that handles all write operations. In a replication set cluster, when a failover occurs, a member of Secondary becomes the new Primary.
21. What is Secondary?
Secondary, copies the corresponding operation from the current Primary. It does this by tracing replication of oplog(local.oplog.rs).
22. How does MongoDB implement read/write separation?
In the program, we can configure to read data from the Secondary node to achieve read/write separation. For example, the data of Secondary node can be read for T+1 data statistics every night. For details, see the article “Read and write separation of Mongodb using Replica Sets”.
23. What is a delay node for MongoDB?
See the article Mongodb Lazy replication Node Configuration. In addition, MongoDB’s delay nodes do not provide external services, so full backup on them is also a very good choice.
24. How does MongoDB implement sharding?
MongoDB sharding is to horizontally slice data into different physical nodes. As application data gets bigger and bigger, so does the amount of data. As the data volume grows, a single machine may not be able to store the data or have an acceptable read and write throughput. Sharding allows more machines to be added to cope with increased data volumes and read and write operations.
Alternatively, we can think of MongoDB sharding as a built-in repository and table function.
For details, see MongoDB Sharding.
25. Should I start a Sharded or non-sharded MongoDB environment?
For ease of development, we recommend starting a MongoDB environment unsharded, unless one server is not enough to hold your initial data set. Upgrading from non-clustered sharding to clustered sharding is seamless, so there is no need to consider sharding when your data set is not yet large. In addition, the introduction of MongoDB sharding will bring corresponding operation and maintenance complexity, so do not use MongoDB sharding too early when the MongoDB replication set can support the current business
26. How do Shard and replication work?
Each shard is a logical collection of partitioned data. Shards may consist of a single server or cluster, and we recommend clustering for each shard.
27. When does data spread across multiple shards?
MongoDB sharding is range based. So all the objects in a collection are stored in a chunk. The option to fragment data is available only if there is more than one block. Right now, each default block size is 64Mb, so you need at least 64Mb space to perform a migration
28. What happens when you update documents on a Chunk that is being migrated?
Updates occur immediately on the old Chunk, and changes are copied to the new shard before ownership is transferred.
29. What happens if a Shard is stopped or slow and a query is launched?
If a shard stops, the query will return an error unless the Partial option is set. If a shard is slow to respond, MongoDB waits for its response.
30. Can I delete the old files in the moveChunk directory?
No problem, these files are temporary files generated when balancing on shards. Once these operations have been completed, the associated temporary files should also be deleted. At the moment, however, the cleanup is manual, so think carefully about freeing space for these files.
31. If the moveChunk fails, do I need to manually remove some of the transferred documents?
No, moves are consistent and deterministic.
- After a failure, the move operation is constantly retried.
- When finished, the data will only appear in the new shard.
32. Why use profilers in MongoDB?
Database Profiler, which collects information about the execution of Database commands against running Mongod instances.
- The commands include add, delete, modify, and check commands and configuration and management commands.
- Pro filer writes all collected data into the System.profile collection, a capped collection in the administrator database.
Profilers are turned off by default, and you can turn them on via a PER database or per instance.
33. What about MongoDB backups?
Similar to MySQL backup methods and policies, MongoDB also requires periodic full backup and periodic incremental backup. For details, see MongoDB Incremental Backup Solution and MongoDB Incremental Backup Scripts and Principles.
34. Does journal playback encounter problems when an entry is incomplete (for example, if one of the entries happens to fail midway)?
The write operation of each journal (group) is consistent and will not be played back during recovery unless it is complete.
35. Update operation immediately fsync to disk?
No, disk writes are deferred by default. Write operations may reach disk after two or three seconds (60 seconds by default).
- For example, if the database receives a thousand incrementing operations on an object in a second, the disk is flushed only once.
- You can set periodsecs using the syncPeriodSecs startup parameter.
Mongod uses fsync operation to flush data to disk interval, default is 60 (units: seconds) strongly recommended
Do not modify this value.
Mongod writes changed data to Journal before writing to memory and intermittently writes memory data to disk, i.e. delays writing to disk effectively improving disk efficiency.
36. Why is my data file so large?
MongoDB actively preallocates reserved space to prevent file system fragmentation.
37. Slow mongodb query problem? ,
- Problem orientation
- Problem processing
- New problems arise
- Continue troubleshooting.
** to operation and maintenance to do!! **
operations
Architecture principle, cluster building, backup and recovery, distributed read and write, monitoring, etc.