First you need to upload the data to your MongoDB database. In this example, the MongoDB version of aliYun cloud database is used. The network type is VPC (you need to apply for a public IP address; otherwise, you cannot communicate with the default DataWorks resource group). The test data is as follows: { “store”: { “book”: [ { “category”: “reference”, “author”: “Nigel Rees”, “title”: “Sayings of the Century”, “price”: 8.95}, {“category”: “fiction”, “author”: “Evelyn Waugh”, “title”: “Sword of Honour”, “price”: 12.99}, {“category”: “fiction”, “author”: “J. R. R. Tolkien”, “title”: “The Lord of the Rings”, “isbn”: “0-395-19395-8”, “price”: 22.99}], “bicycle” : {” color “:” red “, “price” : 19.95}}, “expensive” : 10} Log in to the DMS console of MongoDB. In this example, the database is admin and the set is userlog. You can run db.userlog.find().limit(10) in the query window to view the uploaded data, as shown in the following figure.

In addition, you need to create a user in the database ahead of time for DataWorks to add the data source. In this example, run the db.createUser({user:”bookuser”, PWD :”123456″,roles:[“root”]}) command to create a user named bookuser, password 123456, and permission root.

Extract data into MaxCompute using DataWorks

  1. Adding a MongoDB data source Access the DataWorks Data integration console and add a MongoDB data source.

The specific parameters are as follows. You can click to test the connectivity of the data source. In this document, MongoDB is deployed in a VPC environment, so the data source type must contain a public IP address.

The access address and port number can be obtained by clicking the instance name on the MongoDB administration console, as shown in the following figure.


  1. Creating a Data Synchronization task Create a data synchronization node on DataWorks.

At the same time, create a table creation task for DataWorks to store JSON data. In this example, the new table is named MQData.

Table parameters can be completed through a graphical interface. In this example, the MQDATA table has only one column of type String and the column name is MQ Data.

After the configuration is complete, you can set data synchronization parameters on the GUI, as shown in the following figure. Select the target data source name as ODPS_FIRST and the target table as mqData just created. Mongodb_userlog data source type mongodb_userlog data source we just created. After the configuration is complete, click Convert to Script to switch to script mode.

The following is an example of the script pattern code. {“mongodb_userlog” : mongodb_userlog {“mongodb_userlog” : mongodb_userlog”, “mongodb_userlog” : mongodb_userlog [{“name”: “store.bicycle. Color “, //JSON field path, in this example extract color value “type”: “Document.document. string” // The number of object fields in this field must be the same as name. If the JSON field you select is a first-class field, as in this example, you can simply fill in string. }, “ectionName “: “userlog”, “name”: “Reader”, “category”: “Reader”, {“stepType”: “odps”, “parameter”: { “partition”: “”, “isCompress”: false, “truncate”: true, “datasource”: “odps_first”, “column”: [//MaxCompute column name” mqdata”], “emptyAsNull”: false, “table”: “mqdata”}, “name”: “Writer”, “category”: }] “writer”, “version” : “2.0”, “order” : {” hops “: [{” from” : “Reader”, “to” : “writer”}]}, “setting” : {“errorLimit”: {“record”: “”}, “speed”: {“concurrent”: 2, “throttle”: false, “DMU “: 1}}} After completing the above configuration, click run. The following shows an example of a run success log.


results

Create a new ODPS SQL node in your business process.

You can enter SELECT * from mqdata; Statement to view the data in the current MQData table. You can also run the command directly from the MaxCompute client.

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.