Guangzhou Yidijia Network Technology Co., Ltd. is a B2B2C e-commerce platform focusing on serving women. Its business scope includes skin care, makeup, nutrition and beauty food, private customized clothing, cross-border e-commerce and other fields. Since the incubation project in 2008, tmall mall was launched in May 2011, 8 distribution centers in China, yan Shimei, Yan Shan and other brands were established successively, and the independent e-commerce platform of Iraq was launched in 2013, and brand upgrading was fully launched in 2020. Yijia Internet active service marketing, to create a strong connection between skin care teachers and customers, strictly implement from top to bottom based on quality and professional, to social trust to do the connection, to service recognition business ideas, through continuous innovation and accumulation, become a leader in social e-commerce.
Business scenario and pain point analysis
Yijiajia is a B2B2B e-commerce platform integrating development, design, operation and sales. It not only serves millions of members, but also supports thousands of dealers and agents. It has many business applications, large amount of data and high requirements for data query and concurrency.
The home technology department of Iraq has experienced rapid development in the past three years. In the development process, it always insists on business priority, and has also carried out a variety of technological upgrading and transformation including application integration, splitting micro-services and aggregation of distributed applications. The current status of the whole department is analyzed as follows:
-
Architecture: multi-language, multi-data source, technology upgrade business intrusion problems are obvious;
-
Data: data island problems caused by application split, resulting in a large number of data replication and reconstruction problems;
-
Application: From the perspective of performance, the business side hopes to see performance data timely and accurately, and has a high demand for real-time performance;
-
Efficiency: the demand for systematic processes and tools is increasingly strong;
-
Cost: The main problem is that it is difficult to recruit talents who understand both big data and business, and the cost of team building is high
In recent years, the business of Iran has been growing rapidly, the amount of data has surged, and the business complexity has also increased. Under the current big data architecture, it is extremely urgent to solve the pain points such as “difficult talent reserve”, “business upgrade is limited by existing technologies”, and “Double 11 activity is under great pressure”.
The product selection
The home technology department of Iran has a very clear definition of the requirements for technology upgrading, mainly from the perspective of storage elasticity expansion and shrinkage, query performance optimization, OLAP, learning cost, query response, scalability, etc. The core concerns are the following three issues:
1) How to quickly complete data cleaning
2) How to complete data verification quickly and accurately
3) How to quickly rectify faults
In technology selection, we always adhere to the principle of “technology selection is the first productive force”, firmly believe that there is no best technology reserve, only better, firmly believe that technology selection is the determination of ability differentiation, adhere to improve the ability to do things right once, firmly believe in the importance of open sharing, cognitive upgrade.
In the early stage, Yegi made a lot of exploratory attempts in Hadoop, HBase, Kafaka, Azkaban, Spark, Greenplum and other open source big data products, and finally adopted Greenplum through performance comparison. However, he finally found that Greenplum had poor concurrency and was only suitable for analysis scenarios. Not suitable for high-concurrency query services.
Later, under the advice of ali Cloud big Data computing platform team, the Home Technology department of Iran carried out a comprehensive architecture upgrade. The entire architecture consists of DataWorks, real-time computing Flink and Hologres. The architecture is simple, the learning cost is very low, and the whole link can be easily run through only SQL.
The following will introduce to you, Ali cloud technology products in the home of the landing scene best practice
Best practices
I. Customer system practice
The customer relationship management system (CRM) of Iran is mainly composed of MySQL, MQ, Canal and self-developed applications. In order to support the cut-off upgrade of the business system, the technical department independently developed a set of messaging middleware, which has high maintenance cost. The customized data development process based on Binlog, MQ, and OLAP is complicated and costly to maintain. In addition, the system requires orderly data and limits the concurrency of cleaning.
After the architecture upgrade based on Hologres+DataWorks+ real-time computing Flink, the database data is directly written to Hologres in real time through DataWorks data integration, and then Hologres is subscribed to do further real-time cleaning through real-time computing Flink, and the result table is updated to the database. Can directly service business.
The overall architecture is clear and simple, with accurate data, end-to-end pure real-time, integrated storage and analysis, managed operation and maintenance, and automatic tool operation. It took 15 people in the original system 3 months to complete the project online, while the current architecture only takes 2 days to complete the deployment.
Ii. BI performance system practice
BI performance system can also be understood as real-time GMV large screen. Business data mainly has two requirements:
- real-time
- Accurate, performance calculation is not allowed to make mistakes.
The original architecture is shown in the figure below. The original data layer is written to MQ in real time through Binlog and Canal suite, and then business data is layered and cleaned according to the business domain. The task scheduling system updates the performance in the order of “day-month-quarter-year”. This seemingly perfect scheme actually has several problems:
- Real time problem: seemingly real time, in fact, there may be 5~10 minutes delay in the process;
- Concurrency problem: There is a limit to the concurrency of consumption.
- Operation and maintenance problems: If a link in the diagram fails, the system may also fail.
- Data cleaning timeliness: A cleaning script can take several minutes to run at a time, during which many other things can happen.
The following figure shows the new architecture of the upgraded BI performance system. The detailed data can be synchronized to Hologres through DataWorks in real time, and then a real-time ETL job based on Hologres data can be added to complete the processing of “day-month-quarter-year” data. Finally, analysis and query services can be provided for upper-layer applications based on Hologres. The whole system is pure real-time scheduling, high real-time performance, second delay, full SQL development, data verification efficiency.
3. Real-time application of data warehouse architecture practice
The technology department of Yi’s family has been thinking about how to equip application developers with big data development capabilities, and how to make big data not only for the big data team, but also for the application development team.
Based on real-time computing FLink+Hologres+DataWorks real-time data warehouse architecture, improve the reusability of data chassis, improve the flexibility of dynamic data adjustment in response to business changes, and jointly build an application system with data with the application team.
Iv. Group warehouse structure practice
In addition to e-commerce business, the jiashou warehouse team also needs to support the internal business of the group. Group data warehouse platform such as the market mainstream data warehouse architecture, based on open source big data system construction, has been comprehensively upgraded to Hologres+ real-time computing Flink+DataWorks real-time data warehouse architecture.
Business value and empowerment
The value of Hologres+ Real-time computing Flink+DataWorks real-time data warehouse new solution for business is mainly as follows:
- Unified data: a set of solutions can support a complete process, with uniform and orderly data such as schedules and dimension tables
- Unified services: Hologers directly provides a variety of online services, including data analysis, data services, etc., reducing interface construction.
- Unified storage: Hologres as unified storage, multiple data sources can be directly written to Hologres, no redundant storage, cost saving
- Unified governance: DataWorks provides unified standards, unified operations and unified monitoring to provide unified governance for big data development platforms.
In business terms, the new big data solutions are really out of the box, WYSIWYG.
Looking to the future
In the field of big data, data scale and business complexity are the key factors restricting query performance at the same time. In this process, only our developers continue to refine their data model, and when the data model reaches a certain maturity, performance problems can be easily solved.
Finally, let’s embrace technology, embrace change, win in models, data service business, data service applications, let’s live for applications, let’s fight for applications.
Author: Liu Songsen, CTO of Yi, senior engineer, associate professor title, visiting professor of many universities in China