Why use the Lambda schema?

In order to solve three problems caused by big data

  1. Accuracy (good)
  2. Delay (fast)
  3. Throughput (multiple)

For example: the problem of extending web browsing data records in the traditional way

  1. Start with a traditional relational database
  2. Then add a Publish/subscribe schema queue
  3. Then scale it up by horizontal partitioning or sharding
  4. Fault tolerance issues began to arise
  5. Data corruption is emerging

The key problem is that in the AKF extension cube, it is not enough to segment only one dimension horizontally from the X axis, and we also need to introduce the functional decomposition of the Y axis. The Lambda architecture can guide how to implement extensions for a data system.

What is the Lambda schema

If we define a data system as follows:

Query=function(all data)
Copy the code

Then a LAMda architecture is

batch view = function(all data at the batching job's execution time)
realtime view = function(realtime view, new data)

query = function(batch view. realtime view)
Copy the code

Lambda architecture Read/write separation (batch layer + service layer) + real-time processing layer

This article was originally published by Silicon Valley’s IO