Abstract:
Data and account preparation
First you need to upload the data to your MongoDB database. Aliyun is used in this example
MongoDB version of cloud databaseIf the network type is VPC (you must apply for a public IP address; otherwise, the system cannot communicate with the default DataWorks resource group), the test data is as follows:
MongoDB version of cloud databaseIf the network type is VPC (you must apply for a public IP address; otherwise, the system cannot communicate with the default DataWorks resource group), the test data is as follows:
{
"store": {
"book": [{"category": "reference"."author": "Nigel Rees"."title": "Sayings of the Century"."price": 8.95}, {"category": "fiction"."author": "Evelyn Waugh"."title": "Sword of Honour"."price": 12.99}, {"category": "fiction"."author": "J. R. R. Tolkien"."title": "The Lord of the Rings"."isbn": "0-395-19395-8"."price"] : 22.99},"bicycle": {
"color": "red"."price": 19.95}},"expensive"10} :Copy the code
Log in to the DMS console of MongoDB. In this example, the database is admin and the set is userlog. You can run db.userlog.find().limit(10) in the query window to view the uploaded data, as shown in the following figure.
In addition, you need to create a user in the database ahead of time for DataWorks to add the data source. In this example, commands are used
db.createUser({user:"bookuser",pwd:"123456",roles:["root"]})
To create a user named bookuser, password 123456, and permission root.
Extract data into MaxCompute using DataWorks
- Add MongoDB data source
Enter the DataWorks
Data integrationConsole, added
Mongo typeThe data source.
The specific parameters are as follows. You can click to test the connectivity of the data source. In this document, MongoDB is deployed in a VPC environment, so the data source type must contain a public IP address.
The address and port number can be accessed through the
MongoDB Management ConsoleClick on the instance name to get it, as shown below.
- Create a data synchronization task
Create a new one on DataWorks
Data synchronization type node.
At the same time, create a new one at DataWorks
Build table taskTo store JSON data. In this example, the new table is named mqData.
Table parameters can be completed through a graphical interface. In this example, the MQDATA table has only one column of type String and the column name is MQ Data.
After the configuration is complete, you can set data synchronization parameters on the GUI, as shown in the following figure. Select the target data source name as ODPS_FIRST and the target table as mqData just created. Mongodb_userlog data source type mongodb_userlog data source we just created. After the configuration is complete, click Convert to Script to switch to script mode.
The following is an example of the script pattern code.{ "type": "job"."steps": [{"stepType": "mongodb"."parameter": { "datasource": "mongodb_userlog"// Data source name"column": [{"name": "store.bicycle.color", //JSON field path, in this case extract color value"type": "document.document.string"// The number of destination fields must be the same as name. If the JSON field you select is a first-level field, as in this example, expensive, simply fill in string. }].CollectionName // collectionName: "userlog" }, "name": "Reader"."category": "reader" }, { "stepType": "odps"."parameter": { "partition": ""."isCompress": false."truncate": true."datasource": "odps_first"."column"[//MaxCompute Specifies the column name of the table"mqdata"]."emptyAsNull": false."table": "mqdata" }, "name": "Writer"."category": "writer"}]."version": "2.0"."order": { "hops": [{"from": "Reader"."to": "Writer"}},"setting": { "errorLimit": { "record": "" }, "speed": { "concurrent": 2."throttle": false."dmu": 1}}}Copy the code
After completing the above configuration, click Run connect. The following shows an example of a run success log.
results
In your
The business processCreate a new ODPS SQL node in.
You can enter
MaxCompute clientEnter the command to run.
The business processCreate a new ODPS SQL node in.
You can enter
SELECT * from mqdata;
Statement to view the data in the current MQData table. Of course, you can also directly in this stepMaxCompute clientEnter the command to run.