Data Integration is a reliable, safe, low-cost and scalable Data synchronization platform across heterogeneous Data storage systems provided by Alibaba Group. It provides offline (full/incremental) Data access channels for more than 20 kinds of Data sources in different network environments. It is a stable, efficient and flexible data synchronization platform provided by Ali Group. It is committed to providing high-speed and stable data movement and synchronization between rich heterogeneous data sources in complex network environment. About Ali Cloud data integration platform use tutorial: Ali Cloud data integration platform use tutorial offline (batch) data synchronization introduction

Offline (batch) data channel mainly provides a set of abstract data extraction plug-in (called Reader) and data writing plug-in (called Writer) through the data source and data set that define the data source and destination, and designs a set of simplified intermediate data transmission format based on this framework. So as to achieve the purpose of data transmission between arbitrary structured and semi-structured data sources.

Support data source types

Data integration provides rich data source support, as follows:

Text storage (FTP, SFTP, OSS, multimedia files, etc.). Database (RDS/DRDS/MySQL/PostgreSQL, etc.) NoSQL (such as Memcache, Redis, MongoDB, HBase). Big data, such as MaxCompute, AnalyticDB, and HDFS. MPP database (HybridDB for MySQL etc.). See Supporting Data Source Types for more details.

Note:

Because the configuration information of each data source varies greatly, you need to query the parameter configuration information based on the usage. Therefore, detailed descriptions are provided on the data source configuration and job configuration pages. You can query and use them according to your own situation. Synchronous Development Instructions

Synchronous development provides two development modes: wizard mode and script mode.

Wizard mode: The wizard mode helps you quickly complete the configuration of data synchronization tasks by visually filling in and performing next steps. Wizard-mode learning is cheap, but you don’t get some advanced features.

Script mode: You can write JSON scripts for data synchronization to complete data synchronization development. This mode is suitable for advanced users and requires high learning costs. Scripting mode provides rich and flexible capabilities for fine-grained configuration management.

Note:

The code generated in wizard mode can be converted to script mode, which is a one-way operation and cannot be reverted to wizard mode after the conversion. Because script mode capabilities are a superset of wizard mode.

Before writing the code needs to complete the configuration of the data source and the creation of the target table.

Network Type Description

Network types include classic network, private network (VPC), and local IDC (in planning).

Classic network: it is uniformly deployed in the public basic network of Aliyun, and Aliyun is responsible for the planning and management of the network, which is more suitable for customers with high requirements for network ease-of-use.

Proprietary network: Build an isolated network environment based on Ali Cloud. You can fully control your own virtual network, including selecting your own IP address range, dividing network segments, and configuring routing tables and gateways.

Local IDC network: The network environment of the equipment room constructed by you is isolated from ali Cloud network.

For details about classic networks and private networks, see faQs about classic networks and VPC.

Supplementary notes:

The network connection can be a public network, and the network type is classic. Pay attention to the speed of public network bandwidth and related network costs. Do not use it without special circumstances.

For data synchronization on planned network connections, you can use the newly added local running Resource + script mode for data synchronization transmission. Or use the SHELL + DataX scheme, for which see Executing a DataX Task using a SHELL.

Private network A VPC is an isolated network environment that allows users to customize IP address ranges, network segments, and gateways. As private networks become more and more secure, data integration provides rDS-mysql, RDS-SQL Server, and RDS-PostgresQL. On a private network, you do not need to purchase an ECS on the same network as the VPC. The system automatically detects the ECS through the reverse proxy to ensure network connectivity. Other alicloud databases such as PPAS, OceanBase, Redis, MongoDB, Memcache, TableStore and HBase will also be supported in the future. Therefore, non-RDS data sources need to purchase ECS of the same network to configure the synchronization task of data integration on a private network, so that the network can be connected through ECS. Constraints and Limitations

Only structured (such as RDS and DRDS), semi-structured, and unstructured (such as OSS and TXT) data synchronization must be abstracted into structured data. In other words, Data Integration supports transferring synchronization of Data that can be abstracted into logical 2-d tables. Other completely unstructured Data, such as an MP3 stored in OSS, is not currently supported by Data Integration to synchronize to MaxCompute, which will be implemented later.

Supports data synchronization requirements for one or some data stores across regions.

Some areas can be transmitted over the classic network, not guaranteed. If the classic network is unavailable, you can use the public network.

Only data synchronization (transmission) is completed, and the consumption mode of data stream is not provided by itself.

More excellent courses:

7 days to play cloud server

Redis version of the cloud database using tutorial

Play cloud storage object storage OSS introduction

Ali Cloud CDN use tutorial

Load Balancing Introduction and Product Usage Guide

Official website of Ali Yun University (Official website of Ali Yun University, Innovative Talent Workshop under cloud Ecology)