preface

Hello everyone, I am ChinaManor, which literally translates to Chinese code farmer. I hope I can become a pathfinder on the road of national rejuvenation, a ploughman in the field of big data, an ordinary person who is unwilling to be mediocre.

This is the mind map for real time technologymanorWill update the reading “Alibaba big Data practice” chapter 5 real-time technology

1 introduction

As opposed to offline batch processing,Streaming real-time processing technologyAs a very important technical supplement, it is widely used within Alibaba Group. In the big data industry, the research of stream computing technology is a very hot topic in recent years. The business demand is to get the processed data in the first time, so as to monitor the current business status in real time and make operational decisions, so as to guide the business to a better direction. For example, a highly trafficked AD space on a website needsMonitor AD space in real timeIf the conversion rate is very low, the operators need to change the advertising in timeAvoid wasting traffic resources. In this case, you need toReal-time statisticsMetrics such as AD spot exposure and clicks are used as a reference for operational decisions. According to the delay of data, data timeliness is generally divided into three types (Offline, quasi-real time, real time)Both offline and quasi-real-time can be implemented in batch systems (such as HadoopMax Compute Spark), but with different scheduling cycles, while real-time data needs to be implemented in streaming systems. To put it simply, streaming data processing technology means that every piece of data generated by the business system will be collected immediately and sent to the streaming task in real time for processing, without the need to schedule tasks to process data. In general, streaming data processing has the following characteristics.

High timeliness high performance requirements high application limitations

1. High timeliness

Real-time data collection and processing, delay granularity in seconds or even milliseconds, business can get the processed data in the first time. 2. Permanent missions

Different from the periodic scheduling of offline tasks, streaming tasks are resident process tasks. Once started, they will continue to run until they are terminated artificially, so the computational cost is relatively high. This feature also implies that the data source for streaming tasks is unbounded, while that for offline tasks is bounded. This is the main difference between real-time processing and offline processing, which leads to the limitations of real-time tasks in data processing. 3. High performance requirements

Real-time computing has very strict requirements on the performance of data processing. If the processing throughput eye does not collect the throughput, the calculated data will lose the real-time characteristics. Such as real-time tasks Clock can handle only 30 seconds to collect data, then the time delay of data output will be more and more long, can not represent the current state, the moment the business is likely to lead to business side make the wrong operation decisions In the Internet industry, the need to deal with data is huge, how the data under the condition of rapid expansion can maintain high throughput and low delay, Is an important challenge we face today. As a result, performance optimization for real-time processing accounts for a large part of task development. 4. Application limitations Real-time data processing cannot replace offline processing. In addition to the high computing cost, it is not supported enough for scenarios with complex business logic (such as dual-stream association or data rollback). In addition, since the data source is streaming and the data has context relation, the uncertainty of data arrival time leads to some differences in the results obtained from real-time processing of eye departure line.

conclusion

The above is Alibaba’s big data practice | Real-time Technology chapter (I) The next chapter will talk about Alibaba’s streaming data architecture

May you have your own harvest after reading, if there is a harvest might as wellThree even a keySee you next time 👋·