Abstract:

Recently, Ali Cloud launched Serverless Data analysis engine -Data Lake Analytics, Data Lake Analytics, to help more storage services that do not have the ability to analyze, giving it the ability to analyze.

From shopping transactions in daily life, to industrial production and manufacturing, to social networks, media information, enterprise management decisions and so on, big data has become one of the most important directions of economic and social progress. More and more enterprises are choosing storage services as the first choice to store data when they are facing exponential growth of data. In the era when everyone shouts that data is king, if data is only stored and not analyzed, the super power of data will be meaningless.

The need to embrace analytics is urgent

Many enterprise users choose to ali cloud storage object storage service OSS or form (Table Store) to Store data, as for huge amounts of data, low cost, highly flexible storage platform, the cloud customer to Store a large amount of data, log data, water monitoring data, etc., but these data today do not have low cost, flexible and efficient analytical capability. OSS and Table Store not only Store a large amount of historical data, but also increase the number of new data every day.

In the past, customers needed to analyze OSS data and temporarily imported the data to various analysis engines purchased or deployed in advance. After analyzing the data, they deleted the data and released the resources of the analysis engine. As a result, the links were long, time-consuming and inconvenient, and it was not beneficial to save costs.



If customers use traditional MR solutions such as Hadoop, storage and computing are separated. However, due to resource reuse, both storage nodes and computing nodes need to be deployed on purchased ECS or physical machines. Therefore, storage and computing cannot be expanded separately on demand. However, traditional MPP databases, such as open source Greenplum, are more integrated with storage and computation, and the expansion of storage and computation separately on demand cannot be supported.

Data Lake Analytics, a Serverless Data Analytics engine, addresses these pain points. Without ETL, standard SQL, existing business intelligence (BI) and ETL tools can be used to easily analyze and integrate data in ali Cloud OSS and Table Store data sources at a very low cost and high efficiency.

Data Lake Analytics has four major features to help analyze Data



Data Lake Analytics enables more heterogeneous Data sources to have converged Analytics capabilities. It not only supports the combination of OSS and Table Store for data analysis, but also supports the connection of more data sources. Serverless means that customers do not need to purchase or manage servers to use analysis services, and the upgrade is transparent. Data Lake Analytics provides flexible scaling services based on ECS and ESS, enabling businesses to truly expand storage and computing resources on demand and pay for analysis based on usage. Without analysis, only storage costs are required, and the cost of the whole solution is extremely low.

In addition, Data Lake Analytics supports SQL 2003 standards, rich built-in function support, and can access OSS files or other Data sources as if they were databases. Support standard JDBC/ODBC, easy application integration. In terms of interactive capability, Data Lake Analytics has excellent query performance through efficient intelligent optimizer, XIHE, a new generation of analysis engine, and comprehensive integration of MPP+DAG technology, with real interactive analysis capability.

Break the tradition and apply to multiple scenarios

Scenario 1: Data extraction platform

A customer has a large amount of Data on OSS, and every day developers have to deal with a large number of temporary fetch requirements. The customer finally achieved a cheap and scalable Data extraction platform based on OSS + Data Lake Analytics. The overall scheme is as follows:



The client generates SQL to extract Data through the reporting tool. The reporting tool sends THE SQL to Data Lake Analytics, which analyzes OSS Data directly and returns the analysis results to the reporting tool. OSS ’10TB storage costs around $1200 per month, while Data Lake Analytics is fully charged based on query usage (currently free in public beta), enabling a completely self-service, extremely cheap and scalable Data extraction platform.

Scenario 2: Analyzing and Quickly Recovering DB cold data



In order to reduce the cost of RDS, customers have a large amount of historical Data backed up to OSS through DBS from time to time. Occasionally, there may be a small amount of analysis needs for historical archived business Data, which can be easily met through Data Lake Analytics. In addition, if a customer finds that the business data in the online library needs to be corrected using the OSS archive data, the traditional practice is to purchase a large SIZE RDS first, then restore the OSS backup to the RDS, and query the data used to make the correction. With Data Lake Analytics, it is easy to directly query OSS Data for correction through Data Lake Analytics, which is very convenient and costs are greatly reduced. Data Lake Analytics provides end-to-end Data security access and supports secure role access of OSS and user authorization at table level to isolate user Data.

Scenario 3: Energy battery data analysis platform The customer has a large amount of battery data. The customer needs to analyze the battery data once every day to analyze the battery life and decide whether to scrap the battery in advance. Customers upload battery Data to OSS in batches and analyze it once a day through Data Lake Analytics. The average storage cost of OSS is about 1200 YUAN per month for 10TB. Data Lake Analytics charges the Data according to the volume of Data queried (currently, it is not charged during the public test period), so the cost is very advantageous.



At present, there are a lot of business Data stored in OSS, Table Store and so on on the cloud, which are in urgent need of analysis ability. The emergence of Data Lake Analytics can well meet this demand. For cloud smes, you can choose to use the cheapest storage to match the most inclusive and flexible analysis capabilities. Ali cloud Data Lake Analytics is the most inclusive and flexible analysis ability of the practitioner, currently during the public beta free trial, welcome to come to experience. PC experience open beta: please stamp link click.aliyun.com/m/100000539…

The original link