Abstract:

On November 14, Ali Cloud released the real-time computing exclusive mode, which means that users can share part of physical resources independently from other users in terms of network, disk, CPU, memory and other resources, which is a major upgrade of real-time computing based on the original sharing mode. (Watch live computing press conference: yq.aliyun.com/live/591)

1. UDX openness: In the sharing mode of real-time computing, multiple users share a physical machine cluster, and there is no way to achieve complete isolation at the network/disk level. Therefore, for security reasons, the more flexible and low-level APIS of UDX/DataStream are not available to you in shared mode. In some scenarios, your business needs may not be met. Exclusive mode Provides complete isolation at the network and physical machine levels, enabling lower-level apis such as UDFS to meet your service requirements.

2. Rich hardware: with more and more diversified services, there will be more diversified requirements for the configuration of the underlying machine, such as CPU:MEM ratio, GPU, FPGA and other hardware requirements. Real-time computing exclusive cluster can fully reuse ali Cloud in the hardware level of all kinds of optimization, for you to solve all kinds of hardware adaptation problems.

3. User isolation: In an ECS-only cluster, you can use a batch of computing resources exclusively to access your VPC from the network. It not only meets your needs for private network and exclusive resources, but also connects with your IDC to meet your business needs.

4. Richer functions: ETL in the Data Lake scenario: it makes ETL task development more convenient by means of SQL+UDF. Heterogeneous data source computing: Data can be read from heterogeneous data sources for analysis. For example, remotely read data archive logs from OSS, join high-risk IP addresses in hbase, and perform network attack analysis. Supports source and result tables for 30+ data sources.

Singles’ Day is a shopping spree and a “big test” of Alibaba’s technology. It took only two minutes and 05 seconds for tmall’s “Double 11” transaction volume to reach 10 billion yuan, while it took only one hour and 47 minutes to reach 100 billion yuan, more than seven hours faster than in 2017. This madness has brought the largest double 11 traffic peak in Alibaba’s history, and the real-time computing processing capacity is equivalent to reading 1.2 million copies of the 2018 new Edition of Xinhua Dictionary in one second.

In 2013, more than 10 million people flocked to Tmall in the first minute of the Double 11 shopping carnival. These data in Hangzhou Taobao city data screen quasi real-time broadcast. Each number pulsating on the big screen comes from the close cooperation among dozens of systems within Alibaba Group. While the products are being sold at the fastest speed, these systems have completed countless rounds of data collection, transmission, processing, calculation and feedback to the page. It is also the debut of real-time computing technology called the Ali Cloud.

In the Double 11 of 2018, real-time data processing technology, including real-time collection, distribution and calculation of log data and transaction data, will finally be rendered and displayed in real-time on the big screen of media live broadcast. The stability of the whole link is under great pressure. It can be said that real-time computing completes three world-class challenges: 1. Low delay, from the first transaction at zero to the statistical results displayed on the media screen, the delay of the whole process is controlled within 3 seconds; 2. 2. The peak value of real-time computing processing reached 1.72 billion beats per second, and the overall performance of real-time computing increased by N times compared with last year; 3, high availability, the whole day service is not degraded, no trouble, carry all the peak flow.

Aliyun Real-time Computing is a one-stop, high-performance real-time big data processing platform based on Apache Flink. It is widely used in streaming data processing, offline data processing, DataLake computing and other scenarios to help enterprises upgrade and transform to real-time and intelligent big data computing.

The platform built by Apache Flink in Alibaba was officially launched in 2016 and started to be realized from the two scenes of Alibaba’s search and recommendation. In order to put Apache Flink into real operation in Alibaba, The Alibaba real-time computing team has made a lot of optimization. The product on Alibaba Cloud is named “Real-time Computing”, and Flink SQL is the main API. It is committed to creating a global leading real-time computing engine.

Learned, ali cloud computation based on a double 11 real-time domestic business of the group of ali, after long-term grope and development, to itself ali group precipitation years of real-time computing, architecture, business products can provide services in the form of cloud products, users can fully enjoy the ali group’s latest cutting-edge engine capacity calculation, In business, alibaba Group can avoid years of trial and error and lessons in streaming big data, faster and easier real-time big data processing process, and promote business development.

After years of precipitation, at present, Ali Cloud real-time computing products have the advantage of international leading products, in terms of throughput/delay, SQL support, development experience, window support, out-of-order support, upstream and downstream docking and other aspects are better than other cloud vendors related products. Compared with Spark and Storm, aliyun real-time computing features lower labor costs, more convenient development, operation and maintenance, and seamless connection with Aliyun data storage. Users can make full use of the product advantages provided by Aliyun real-time computing to quickly and conveniently solve the problems of real-time big data analysis of their own business.

Ali Cloud real-time computing can provide FlinkSQL to assist users to easily complete the processing of streaming computing logic. At the same time, limited by SQL code functions that cannot meet the business requirements of some specific scenarios, Aliyun real-time computing also provides full-function UDF functions for some credit users to help users complete customized business data processing logic. In the field of streaming data analysis, users can directly use FlinkSQL+UDF to complete most of the streaming data analysis and processing logic. At present, real-time computing is better at streaming data analysis, statistics and processing. It mainly solves the three major pain points of users:

1. Timeliness of streaming data The business value of data decreases rapidly with the loss of time, so it must be calculated and processed as soon as possible after the data is generated. However, the traditional big data processing mode follows the traditional day-clearing and day-finishing mode for data processing, that is, the current data is accumulated and processed in hours or even days as the calculation cycle. Obviously, such processing mode cannot meet the needs of real-time data calculation. In many business scenarios, such as real-time big data analysis, risk control early warning, real-time prediction, financial transactions and so on, batch (or offline) processing cannot meet the business requirements of the above-mentioned application fields with demanding data processing delay. As a kind of real-time computing model for streaming data, real-time computing can effectively shorten the delay of full-link data flow, real-time computing logic, and amortized computing cost, and finally effectively meet the business requirements of real-time processing of big data.

2. One-stop streaming data processing is different from open source or self-built streaming data processing services. Ali Cloud Real-time Computing is a fully managed streaming computing engine. Aliyun real-time computing naturally integrates data development, data operation and maintenance, monitoring and early warning and other services to facilitate users to try out and migrate streaming computing products at the lowest cost.

3. Sql-based streaming analysis supports standard SQL(product name: FlinkSQL), provides built-in string processing, time, statistics and other calculation functions to replace the inefficient and complex Flink development in the industry, so that more BI personnel and operation personnel can complete real-time big data analysis and processing through simple FlinkSQL, so that real-time big data processing is universal and popular.

At the same time, real-time data monitoring/analysis is realized. For example, BI personnel can see the real-time change of visitor data, purchase situation and transaction volume fluctuation of their own website, and it does not need a period of time to do statistics and analysis. What used to be 150 people/month can now be easily done by 3 people/month, increasing efficiency by 50 times.

Application scenarios widely Real-time computing is good at solving several areas of application scenarios including real-time network click PV, UV statistics; Statistics of the average 5-minute traffic flow through traffic jams; Hydraulic dam pressure data statistics and presentation; Network payment involves the alarm of fixed behavior rules of financial theft. Especially suitable for BI personnel, big data developers and other users.

Real-time computing is good at solving several areas of application scenarios, including real-time network click PV, UV statistics; Statistics of the average 5-minute traffic flow through traffic jams; Hydraulic dam pressure data statistics and presentation; Network payment involves the alarm of fixed behavior rules of financial theft. Especially suitable for BI personnel, big data developers and other users.

It can be roughly divided into four typical scenarios: ** Internet click flow analysis: real-time analysis of website user behavior, accurate real-time grasp of user portraits; Real-time financial risk control: real-time monitoring of financial malicious behavior, real-time risk control to avoid user losses; IOT risk control: Real-time monitoring and detection of equipment failures, even avoiding potential business risks; E-commerce accurate recommendation: real-time tracking of user behavior changes, accurate recommendation to improve product sales. **

Rich user cases After two years of development, real-time computing technology has been widely applied in taobao, Tmall, Ant Financial, Cainiao, Industrial Brain and many other businesses within the Group. Since the beginning of this year, zhongan Insurance, Quanmin TV, Qianxun, Xinhua Zhiyun and other external customers of the Group have also launched many typical scenarios and applications of real-time computing.

The exclusivity model emerged

Real-time computing has been exported in the form of “shared mode” since it was commercialized in April. In batch processing, SQL has been tested for decades and is recognized as a classic. But on the other hand, it also causes two major puzzles to users:

1. It is difficult to describe your business logic using ONLY SQL; 2. Annoying to translate some existing business logic in code into SQL.

Therefore, real-time computing exclusive mode emerged. The exclusive mode is the supplement of real-time computing on the basis of the original shared mode. Users have exclusive physical resources that are independent of other users in terms of network, disk, CPU, and memory resources. UDX features open, rich hardware, and isolation between users.

The basic Unit of measurement for real-time computing is CU(Compute Unit). One CU corresponds to the computing capacity of one CPU in the underlying system of real-time computing. The real time computing layer uses virtualization technology for resource isolation to ensure a basic CU consumption and a maximum consumption of only one CPU.

The specific use of the product is also very convenient, specifically, there are the following steps:

For more product details, welcome to click promotion.aliyun.com/ntms/act/rc… Watch the real-time computation conference yq.aliyun.com/live/591 click gather to chat and talking: real-time computing, exclusive mode yq.aliyun.com/roundtable/…