Summary: MongoDB 5.0 marks the beginning of a new release cycle to deliver new features to users faster. Versioning APIS combine with online resharding to keep users from worrying about future database upgrades and business changes; The native time series data platform also enables MongoDB to support a wider range of workloads and business scenarios; The new MongoDB Shell can improve the user experience and other features of MongoDB 5.0. This article focuses on the new features of MongoDB 5.0.
MongoDB 5.0 marks the start of a new release cycle to deliver new features to users more quickly. Versioning APIS combine with online resharding to keep users from worrying about future database upgrades and business changes; The native time series data platform also enables MongoDB to support a wider range of workloads and business scenarios; The new MongoDB Shell can improve the user experience and other features of MongoDB 5.0. This article focuses on the new features of MongoDB 5.0.
Native time series platform
MongoDB 5.0 makes it faster and cheaper to build and run time series applications by natively supporting the entire time series data lifecycle, from collection, storage, query, real-time analysis and visualization, to online archiving or automatic failure as the data ages. With the release of MongoDB 5.0, MongoDB expands the common application data platform, making it easier for development to process time series data, and further expanding its application scenarios in Internet of Things, financial analysis, logistics and other aspects.
MongoDB’s time series collection automatically stores time series data in a highly optimized and compressed format, reducing storage size and I/O for better performance and larger scale. It also shortens the development cycle, enabling you to quickly build a model tuned for the performance and analysis needs of time series applications.
Example of the command to create a collection of time series data:
db.createCollection("collection_name",{ timeseries: { timeField: "timestamp" } } )
Copy the code
MongoDB can seamlessly adjust the acquisition frequency and automatically process out-of-order measurements based on dynamically generated time partitions. The MongoDB Connector for Apache Kafka supports time series locally. You can automatically create time series sets directly from Kafka topic messages, allowing you to process and aggregate data as needed. Then write to MongoDB’s time series collection.
The time series collection automatically creates a data aggregation index sorted by time to reduce the delay of query data. The MongoDB query API also extends window functions so that you can run analytical queries (such as moving averages and cumulative summation). In relational database systems, these are often referred to as SQL analysis functions and support Windows defined in units of behavior (i.e., three-row moving averages). MongoDB goes a step further and adds powerful time series functions such as exponential moving averages, derivatives, and integrals that allow you to define Windows in time (such as 15-minute moving averages). Window functions can be used to query MongoDB time series and general sets, providing a new analysis method for a variety of application types. In addition, MongoDB 5.0 provides new time operators including dateAdd, dateAdd, dateAdd, dateSubstract, dateDiff and dateDiff, and dateDiff and dateTrunc, Allows you to summarize and query data through a custom time window.
You can combine MongoDB’s time series data with other enterprise data. Time series collections can be placed alongside regular MongoDB collections in the same database, and you don’t have to choose a dedicated time series database (which cannot serve any other type of application), nor do you need complex integrations to mix time series and other data. MongoDB eliminates the cost and complexity of integrating and running multiple disparate databases by providing a unified platform that allows you to build high-performance and efficient time series applications while also supporting other use cases or workloads.
Online data resharding
Database Version
The characteristics of
Implementation method
Mongo before 5.0
The re-sharding process is complicated and requires manual sharding.
-
Dump the entire collection and then reload the database into a new collection with the new sharding key. Because this is a process that needs to be handled offline, your application will need to be out of service for a long time before the reload is complete. For example, it may take several days to dump and reload a collection over 10 TB on a three-sharded cluster.
-
Method 2: Create a sharding cluster and reset the sharding key of the set. Then, by customizing the migration mode, write the sets that need to be sharded in the old sharding cluster into the new sharding cluster according to the new sharding key.
-
During this process, you need to handle the routing query and migration logic by yourself, and constantly check the migration progress to ensure that all data is successfully migrated.
-
Custom migrations are highly complex, labor-intensive, risky tasks that take a long time. For example, one MongoDB user spent three months migrating 10 billion documents.
Directing a 5.0
- Run the reshardCollection command to start resharding.
- The resharding process is efficient. Rather than simply rebalancing the data, all the data from the current collection is copied behind the scenes and rewritten to the new collection, while keeping in sync with the new writes to the application.
- Resharding is fully automated. Reduce the time taken to resharding from weeks or months to minutes or hours, eliminating tedious manual data migration.
- Using online resharding, you can easily evaluate the effects of different sharding keys in a development or test environment and modify the sharding keys if you need to.
You can change the Shard key of a collection as needed while the business is running (and the data is growing) without database downtime or complex migrations within the collection. You simply run the reshardCollection command in the MongoDB Shell, select the database and collection that you want to reshard, and specify the new shard key.
reshardCollection: "<database>.<collection>", key: <shardkey>
Copy the code
instructions
- : Name of the database to be sharded again.
- : The name of the collection to be resharded.
- : Name of the shard key.
- When you call the reshardCollection command, MongoDB clones the existing collection and then applies all oplogs from the existing collection to the new collection. When all oplogs are used, MongoDB automatically switches to the new collection and deletes the old collection in the background.
Versioning API
- Application compatibility Starting with MongoDB 5.0, the versioning API defines a set of commands and parameters that are most commonly used by the application (these commands do not change during major annual releases or quarterly quick releases of the database). By decoupling the application life cycle from the database life cycle, you can anchor drivers to a specific version of the MongoDB API, and your application will continue to run for years without code changes, even if the database is upgraded and improved.
- The versioning API gives MongoDB the flexibility to add new features and improvements to the database with each release (in a way that the new version is compatible with earlier versions). When you need to change an API, you can add a new version of the API and run it on the same server as the existing version. As MongoDB releases accelerate, versioning apis make it easier and faster to use the features of the latest MongoDB release.
Write Concern The default Majority level
Since MongoDB 5.0, the default Write Concern level is majority. Only when a Write operation is applied to the Primary node (the Primary node) and persisted to the logs of most replica nodes, the Write Concern will be committed and returned successfully, providing stronger data reliability guarantee “out of the box”.
Note Write Concern is fully tunable. You can customize Write Concern to balance application requirements on database performance and data persistence.
Connection management optimization
By default, one client connects to one thread on the corresponding back-end MongoDB server (Net.ServiceExecutor configured to synchronous). Creating, switching and destroying threads are costly operations. When the number of connections is too large, threads will occupy more resources of MongoDB server.
Connection storms occur when the number of connections is high or when connections are created out of control. The causes of connection storms can be many and often occur when services are already affected.
To address these situations, MongoDB 5.0 takes the following measures:
- Limiting the number of connections the driver tries to create at any one time is a simple and effective way to prevent database server overloads.
- Reduce the frequency of driver checks when monitoring connection pools, giving unresponsive or overloaded server nodes a chance to buffer and recover.
- The driver directs the workload to the fastest server with the healthiest connection pool, rather than randomly selecting from the available servers.
These measures, combined with improvements in the Mongos query routing layer in previous releases, further improve MongoDB’s ability to withstand high concurrent loads.
Long-running snapshot query
Long-running Snapshot Queries increase the versatility and flexibility of applications. You can run through the function time is 5 minutes by default query (or adjust the duration for custom), while maintaining consistent with real-time transactional database snapshot isolation, can also be conducted on Secondary nodes (nodes) snapshot query, which run in a single cluster of different workloads, and extend it to different shards.
MongoDB implements long-running snapshot queries through a Durable History project in the underlying storage engine that has been implemented since MongoDB 4.4. Durable History Stores snapshots of all field values that have changed since the query started. Durable History (Durable History) maintains snapshot isolation. It helps reduce the cache pressure of storage engines even when data changes, and achieves higher query throughput under high write loads.
The new mongo Shell
To provide a better user experience, MongoDB 5.0 redesigned the MongoDB Shell (Mongosh) from the ground up to provide a more modern command-line experience, as well as usability enhancements and a powerful scripting environment. The new MongoDB Shell has become the default Shell of MongoDB platform. The new MongoDB Shell introduces syntax highlighting, intelligent auto-complete, contextual help, and useful error messages to create an intuitive, interactive experience for you.
-
Enhanced user experience
-
Easier to write queries and aggregations, and easier to read results. The new MongoDB Shell supports syntax highlighting to help you distinguish between fields, values, and data types to avoid syntax errors. If errors still occur, the new MongoDB Shell can also point out problems and tell you how to fix them.
-
Faster input of queries and commands. The new MongoDB Shell supports intelligent auto-complete function, that is, the new MongoDB Shell can provide auto-complete options for methods, commands, MQL expressions, and so on according to the version of MongoDB you are connected to. Example: When you don’t remember the syntax of a command, you can quickly look it up from the MongoDB Shell.
-
Advanced Scripting Environment The script environment of the new MongoDB Shell is built on top of the Node.js REPL (interactive interpreter), and you can use all node.js apis and any module of NPM in your scripts. You can also Load and run scripts from the file system (as with older MongoDB shells, you can continue to execute scripts using Load and Eval).
-
Extensibility and plug-ins The new MongoDB Shell is extensible, enabling you to use all of MongoDB’s capabilities for increased productivity. In the new MongoDB Shell, you are allowed to install Snippets plug-ins. Snippets are automatically loaded into the MongoDB Shell, and Snippets can use all node.js apis and NPM packages. MongoDB also maintains a Snippets repository that provides some interesting features (such as plug-ins that analyze specified collection patterns), and you’re free to configure the MongoDB Shell to use the plug-ins of your choice.
Note The plug-in is currently only an experimental feature of MongoDB Shell.
PyMongoArrow and Data Science
With the release of the new PyMongoArrow API, you can run complex analysis and machine learning in Python on MongoDB. PyMongoArrow can quickly convert simple MongoDB query results into popular data formats (such as the Pandas data framework and NumPy arrays) to help simplify your data science workflows.
Schema validation improvements
Schema validation (Schema validation) is a way to manage and control MongoDB data applications. In MongoDB 5.0, schema validation becomes simpler and friendlier. When operation validation fails, a descriptive error message is generated to help you understand the document that does not comply with the validation rule of the collection validator and the reason, so that you can quickly identify and correct the error code that affects the validation rule.
Recoverable index creation task
MongoDB 5.0 supports index creation tasks that are in progress to be automatically restored to the original location after a node is restarted, reducing the impact of planned maintenance actions on services. For example, when restarting or upgrading a database node, you do not need to worry about the failure of the large collection index creation task currently in progress.
Version release adjustment
Since MongoDB supports many versions and platforms, each released version needs to be verified on more than 20 MongoDB supported platforms. The verification workload is large, which reduces the delivery speed of new MongoDB functions. Therefore, starting from MongoDB 5.0, Releases of MongoDB will be divided into Marjor Release (large Release) and Rapid Releases (Rapid Release), where Rapid Releases provide download and testing experience as a development Release, but are not recommended for production.
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.