As we know, the computing modes of big data are mainly divided into batch computing, stream computing, interactive computing and graph computing. Among them, streaming computing and batch computing are two main big data computing modes, which are respectively suitable for different big data application scenarios.

Currently, there are three mainstream Streaming computing frameworks: Storm, Spark Streaming and Flink. Their basic principles are as follows:

Apache Storm

In Storm, you need to design a real-time computing structure, which we call a topology. The topology is then submitted to the cluster, where the master node assigns code to the worker node, and the worker node executes the code. In one topology, there are two roles, spout and Bolt. Data is passed between spouts, which send the data stream as a tuple; Bolt transforms the data stream.

Apache Spark

Spark Streaming, an extension of the core Spark API, does not process one data stream at a time like Storm. Instead, it segments and shards the data stream at time intervals before processing it. Spark’s abstraction for continuous data streams is called DStream (Discretized Stream). DStream is a small batch RDD (elastic distributed data set), while RDD is a distributed data set, which can be converted by arbitrary functions and sliding data Windows (window computing) to achieve parallel operation.

Apache Flink

Computing framework for stream data + batch data. Batch data is considered a special case of stream data, with low latency (in milliseconds) and the ability to ensure that message transfers are not lost or repeated.

Flink creatively unifies stream processing and batch processing by treating the input data stream as unbounded, while batch processing is treated as a special kind of stream whose input data stream is defined as bounded. The Flink program consists of two basic building blocks, Stream, which is an intermediate result data, and Transformation, which is an operation that computes one or more input streams and outputs one or more result streams.

A comparison of the three computing frameworks is as follows:

Reference article:

Streaming Big Data: Storm, Spark and Samza

Related reading:

Concepts and indicators for recommendation system evaluation

Workflow of recommendation system

Vernacular recommendation system

Want to learn about recommendation systems? Look here! (2) — Neural network method

Want to learn about recommendation systems? Look here! (1) — Collaborative filtering and singular value decomposition

How to realize automatic online, operation and maintenance of intelligent recommendation system?

Getting started with recommendation systems, a list of knowledge you shouldn’t miss

For more information, please search and follow the recommendation wechat public account (ID: DSFSXJ).

This account is the official account of the first recommendation of the fourth paradigm intelligent recommendation product. The account is based on the computer field, especially the cutting-edge research related to artificial intelligence, aiming to share more knowledge related to artificial intelligence with the public and promote the public’s understanding of ARTIFICIAL intelligence from a professional perspective. At the same time, we also hope to provide an open platform for discussion, communication and learning for people related to ARTIFICIAL intelligence, so that everyone can enjoy the value created by artificial intelligence as soon as possible.