Introduce: as double 11 open, logistics industry also welcomed year big test. During The period of November 11, 2021, 4PX, as a logistics and storage service provider, will arrange over 40 warehouses and sorting points, with 50W + square meters of working space. The peak orders in a single day will reach tens of millions of levels. A large number of shopping orders will be delivered home by 4PX, and consumers will change from the last payer to the consignee in seconds.
The author | | plum sauce source ali technology to the public
With the opening of Double 11, the logistics industry has also ushered in the annual exam. During The period of November 11, 2021, 4PX, as a logistics and storage service provider, will arrange over 40 warehouses and sorting points, with 50W + square meters of working space. The peak orders in a single day will reach tens of millions of levels. A large number of shopping orders will be delivered home by 4PX, and consumers will change from the last payer to the consignee in seconds.
1. Service Introduction
Quartet was founded in 2004, start a business in shenzhen, is the first domestic international logistics and warehousing logistics service supply chain service providers around the world, mainly engaged in cross-border electric business platform for the customers, and ordinary users to provide warehousing logistics services, with GPN order (straight) and GFN (overseas warehousing) two network to provide better global cross-border electricity quality of ecological environment, It is committed to helping Chinese enterprises go global, and currently has more than 100+ branches around the world, serving about 1 million cross-border e-commerce merchants and more than 200 million cross-border e-commerce end users.
Ii. Business Challenges
In order to cope with the situation that the peak value of orders reached tens of millions in a single day on Double 11, 4PX used big data to rationally optimize resources, make a good allocation of global warehousing manpower, material resources and transportation capacity in advance, and ensure the efficient and orderly operation of warehousing processes. From the middle and late October of this year, the Shanghai transshipment center and dongguan transshipment center have been started. Up to now, 4PX has successively built and expanded super hubs and acquired more than 40 warehouses in East China, North China and South China, and continues to expand its layout in the country. In China, it has 40 + branches/distribution service outlets, and more than 500,000 square meters of office/work space.
In terms of business, With the help of independent research and development of the sorting system and cloud technology, 4PARTY can quickly identify bar codes and sort according to instructions, so as to achieve comprehensive integrated coverage of weighing and sorting, and ensure that every batch of goods can be automatically identified and accurately sorted out of the warehouse. Weighing and sorting has been upgraded from traditional manual mode to 100% manual control mode. In addition, Red light, the black technology hardware of 4PT information technology, made its debut in this “Double 11”. In the case of the same sorting efficiency, quadripartite information technology light screen and other technical means, the sorting machine dropped parcels for verification, the error rate in the library is reduced to 3 ‰, reaching the leading level of the industry. Especially for the warehouse, it has continuously increased the construction of automation, digitalization and intelligence in the warehouse, carried out system research and development and upgrading by combining big data, AI algorithm, cloud computing and other means, and introduced high-tech equipment to improve production capacity and guarantee timeliness.
With the rapid increase of orders and the increasing complexity of applications during Double 11, our business system is also facing severe challenges. The original real-time data warehouse architecture can no longer meet the current needs of the business. When looking for a new solution, we compared the big data real-time query databases commonly used in the industry, such as HBase, ClickHouse, Druid, but encountered bottlenecks in multi-table connection query of 100 billion data, which could not meet the requirements of real-time business and service stability.
The application scenarios of real-time data warehouse in 4PL mainly include the following aspects: Collection, warehouse operation, warehouse allocation, customs clearance and mail early warning and monitoring, which includes every step of single ticket operation. These scenarios need to be monitored in real time and decisions made in real time to improve the overall timeliness of logistics. Especially in the peak period of double 11, if the allocation of manpower or resources is insufficient, it is easy to make a certain link blocked, thus affecting the effectiveness of the overall logistics. At the technical level, we have many business systems, which are both consistent and independent. A complex index involves multiple systems and multiple tables, so our real-time data warehouse has a very strong ability to query table connections, and has high requirements for data update and insertion speed.
On November 11 this year, we upgraded the real-time counting warehouse system that supports our business. Through the new generation of real-time counting warehouse system based on Flink + Hologres, we can still monitor the logistics of each order and the operation of each warehouse in real time when the volume of logistics orders has increased by multiple times compared with last year. And real-time warehouse overall cost down 50%, truly achieve “more, faster, better, save”.
Below we will specifically introduce the path of real-time data warehouse upgrade evolution.
Three quadrilateral real-time number warehouse road
Real-time data warehouse 1.0
At the beginning of making the first version of real-time data warehouse, the time window was relatively tight, and we had to put limited energy into data modeling and business development. Therefore, after comparing the throughput and processing capacity of database, we chose ADB. ADB in a large number of cases, the query speed, insert speed is very fast, and support DTS, OTTER and other data synchronization access, synchronization performance is very good.
Data sources are PolarDB, MySQL, RDS and other databases of Alicloud. Alicloud DataWork data synchronization is adopted to synchronize incremental data to ADB in real time, and then real-time data calculation is done in ADB. Data task scheduling is completed in DataWork.
However, a very big problem was that ADB concurrency was limited and computing tasks consumed a lot of resources. When pulling data from various large-screen and real-time reports, ADB has a high latency in the case of high concurrency, which brings great challenges to the stability of our real-time services.
Real-time data warehouse 2.0
After experiencing the first version of real-time data warehouse, we summarized two important features of real-time data warehouse, one is real-time, and the other is the stability of the service.
The first version of real-time data warehouse could not meet the stability well, so we decided to conduct an in-depth study and exploration on the new real-time data warehouse. We saw a lot of applications of Hologres on Ali Cloud, which had excellent performance and brought good results. After comparing different real-time data warehouse architectures in the industry, we finally choose the combination of Flink+Hologres as real-time data warehouse.
There are two paths:
- The first path is to synchronize Binlog data to DataHub through DTS, then consume data from DataHub using Flink, and store the calculation results in Hologres. This path is mainly used to calculate some data with high access frequency and large amount of data, such as single quantity to be collected, single quantity to be put into library, single quantity to be completed, etc.
- The second path is: Binlog data of the business system is synchronized to Hologres through DataWorks. Hologres is divided into three layers. The ODS layer is used to store the original data, directly load the original log and data, and keep the data intact without processing. Typically, the import is incremental from the business system to the ODS layer, with the data model and granularity consistent with the business system. DWD layer data detail layer, cleaning ODS layer data. DWS is the summary layer and mainly stores expanded tables. The main reason for this is that multi-table join queries are most useful in Hologres because of the different granularity. The entire architecture relies on DataWorks for task scheduling.
The batch stream integrated AD hoc calculation and query hybrid mode adopted this time not only gives full play to the ability of Flink stream calculation, but also makes full use of Hologres’ powerful ability of continuous table query. HBase, ClickHouse, Druid and other real-time query databases commonly used on the Internet are used as real-time query databases. The complexity of our services is several times that of the Internet. These real-time databases cannot fully meet our needs.
Quad with real-time count Hologres
1 为什么选择Hologres
So why Hologres? Through investigation and investigation, it is found that it has several characteristics, which are more suitable for the actual situation.
- The first is Hologres’ real-time capability, which meets the current real-time data warehouse requirements of the four parties, supports JOIN between 10-billion-level tables and 10-billion-level tables, second-level query response, real-time write and batch data import, with ultra-high import performance and strong concurrency.
- Second, Hologres adopts storage and computing separation architecture. Data is stored in Aliyun distributed file system Pangu (similar to HDFS), which is convenient to independently expand computing or storage as required. For the rapid industry, the large-scale promotion is different from the daily needs of resources, which can rapidly expand and shrink to meet the dynamic needs of the business. At the same time, Hologres supports heterogeneous data source interaction analysis and federated query of offline data and real-time data. Hologres has been seamlessly integrated with MaxCompute, which can directly accelerate query of MaxCompute offline table in Hologres.
- The third is low maintenance cost and stable operation. Hologres storage cost as a real-time data warehouse is about 1/3 of ADB. With high resource flexibility, it can be flexibly raised and lowered like MaxCompute, and has high compatibility with Ali Cloud big data components, which can reduce operation and maintenance costs and improve r&d efficiency without heavy burden on technical architecture.
2 Hologres application scenarios
Hologres is responsible for querying real-time and offline data in an analysis-oriented OLAP system. Since Hologres supports high concurrent writes, timely queries, and OLAP analysis, it can take full advantage when joining our tables with different granularity. Here are two different scenarios to illustrate it in detail.
Scenario 1: Database operation scenario
Real-time data is parsed from Binlog to ODS layer. Meanwhile, the microbatch task calculates minute-level (range adjustable) statistical data to DWS wide table, and inserts and updates the offline data at the same time, so as to obtain real-time full data table. Scheduling adopts DataWorks scheduling, once every 5 minutes.
Scenario 2: Warehouse transfer scenario
For some tables with small amount of data, the DWS intermediate layer is constructed by view, depending on Hologres’ powerful Join ability, as shown in the figure below:
DWD is a filtered view of ODS layer, and DWS layer is a wide table of AGGREGATION layer of DWD layer. Every query of DWS layer means that all tables are queried again. Such queries are very complex and can be very slow with performance bottlenecks for relational databases. However, for Hologres, the millisecond level query completes without pressure, achieves real-time response, saves scheduling resources, and improves query flexibility.
3 Current deficiencies of Hologres
In the use of Hologres also found some do not meet the actual needs of the place: first, non-empty column can not establish index, multiple hundred million level table connections in the case of no index, the query speed decreased. Second, Hologres is compatible with Postgre ecology, but it does not support many functions, so it is difficult to develop compared with MaxCompute.
V Business Value
During the whole Period of Double 11, By upgrading the real-time data warehouse technology, The real-time data warehouse built based on Flink+Hologres supports the high-frequency refresh access of the real-time large screen, monitors the logistics dynamics in real time, promotes the efficient operation of the business, and enables consumers to get home faster. The value that the overall architecture upgrade brings to the business is as follows:
Stability: Based on the continuous high and stable output of Hologres, both real-time data writing and data reading showed strong stability during the whole Double 11. During the double 11, the failure rate was zero.
Real-time: The real-time large screen of collection and collection, warehouse operation, transfer and dial provides very strong real-time data support for our operation. The overall timeliness has improved a lot compared with last year, which brings good logistics experience to users and improves the service level of the company.
Cloud native: In addition to the above two core values, as double 11 is the peak traffic period, which is thousands of times higher than the daily traffic, Hologres can realize the dynamic expansion and reduction of capacity, meet our different needs for resources, and thus reduce the operation and maintenance cost.
This is the seventh Double 11 carnival that 4px participated in, and 4px has achieved a satisfactory result in this logistics exam. With the rapid growth of its business, 4PL is also constantly evolving the real-time data warehouse technology behind it to support a richer warehousing and logistics scene, so that logistics is gradually transformed from “manual” to “intelligent”.
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.