AKF design principles

In increasing the scale of the project, the traditional single cannot meet demand under the background of the project, which is when a distributed system is required to provide a better system performance, we tend to consider how the system architecture and classification, in this case, must have a systematic methodology, to cope with increasingly complex distributed system.

AKF is one such theory. AKF’s theory of the scaleable Cube can be summarized by three axes.

Scale horizontally based on the X axis

This approach is to replicate services and data on multiple different machines to address service availability issues.

That is, running multiple instances of the service, clustering and load balancing to provide access can improve service availability.

In this way, it is convenient to expand the system deployment. After load balancing, you only need to copy the program to each server node.

However, if the business volume reaches a certain level and the user request frequency becomes high, the business can be separated, which is the Y-axis expansion system based on AKF principle.

Function division based on Y axis

When the system encounters performance bottlenecks, you can split system functions to make the responsibilities and division of components more detailed and improve system efficiency. For example, when we split the read and write operations of the application process to the database, we can extend the distributed system of the standalone database to enable the primary database to support reading and writing both SQL, while the secondary database only supports reading SQL.

This is the division of business functions on the Y-axis of AKF, combined with horizontal replication on the X-axis, which can greatly improve the performance of the system.

Based on z-axis data partitioning

Z-axis scaling usually refers to the partitioning of systems based on requests and users’ unique needs, so that the subsystems divided are isolated from each other but complete.

AKF split the system along the Z axis.

MySQL > separate read from write

MySQL read-write separation based on ShardingSphere-JDBC

In the figure below, two z-axis scaling systems are used. Firstly, different IDCs are selected to provide services based on the geographical location of clients. The second is to group different users, such as passenger user group and driver user group, so that after separating user groups in business, different services can be provided.

Kafka is a distributed stream processing platform. When it comes to distribution, we will analyze Kafka’s architecture design from the perspective of AKF.

Kafka

Kafka infrastructure

The Kafka cluster is responsible for classifying the records in the cluster in the form of topics, with each Record belonging to a Topic.

At the bottom of each Topic there is a component area of logs that persist the records in the Topic. In a Kafka cluster, each log partition of a Topic must have one Borker as the Leader of the partition, and other brokers as followers of the partition.

The Leader reads and writes data in a partition, and the follower synchronizes data in a partition. In this way, if the Leader of the partition breaks down, other followers of the partition will select a new Leader to continue reading and writing data from the partition.

Part of the metadata of the Leader monitoring and Topic of the cluster is stored in Zookeeper.

Topics and Partitions

All messages in Kafka are managed on a Topic basis, and each Topic in Kafka typically has multiple subscribers who subscribe to the data sent to that Topic. Kafka is responsible for managing a set of log partition data for each Topic in the cluster.

The producer publishes the data to the corresponding Topic. Responsible for choosing which records to distribute to which Partition in the Topic. For example, this can be done in round-robin mode, but this is only for load balancing. This can also be done based on some semantic partitioning capability, such as based on keys in records.

Each log partition is an ordered immutable sequence of logs. Each Record in the partition is assigned a unique sequence number called offset. The Kafka cluster persists all records published to the Topic. The persistence time of this Record is specified through the configuration file, and the default is 168 hours.

log.retention.hours=168
Copy the code

Kafka periodically checks log files and removes expired data from the log. Since Kafka stores log files on a hard disk, there is no problem with using Kafka to cache some log files for a long time.

When consuming data in a consumer Topic, each consumer will maintain the offset of the corresponding partition for this consumption. After consuming a batch of data, the consumer will submit the offset of this consumption to the Kafka cluster, so that each consumer can control the offset of this consumer at will.

So in Kafka, consumers can read queue data from any location in a topic partition, independent of each other because each consumer controls the offset of its own consumption.

Log partitioning for Topics in Kafka serves the following purposes:

  • First, they allow logging to scale beyond what a single server can accommodate. Each individual partition must fit the server hosting it, but a Topic can have many partitions, so it can handle any amount of data.

  • Second, each server acts as a Leader for some of its partitions and possibly a Follwer for others, so the load in the cluster is well balanced.

Up: Kafka provides a replication mechanism. Each Partition in a Topic has a number of replicas. A Leader and several followers.

Analogous to the design principle of AKF, Topic is equivalent to the function Partition along the Y-axis, while Partition is the Partition of data fragments along the Z-axis, and the X-axis is the Partition copy.

summary

  1. The Y-axis in AKF principle is generally divided based on functions, which is similar to topics in Kafka. Generally, one business subscribs to one Topic.
  2. The Z-axis is typically a data Partition, analogous to a Partition in a Topic.
  3. X axis provides high availability. For high availability, multiple partitions can be set up in a Kafka cluster and R/W can be performed on the partitions of the primary node.

This article is published by OpenWrite!