Multi-structured database review

Memcached’s consistent hash algorithm

The consistent hash algorithm is used to solve the server load balancing problem.

Compute a hash value for each server node over a circle using the hash function
Compute a hash value for the data in the same way and store the data on the first server node found clockwise
Virtual nodes are adopted to suppress the non-uniformity of node distribution

DynamoDB improves the consistent hash algorithm

The location and number of virtual nodes are not fixed. If new nodes are added, all data objects on all nodes may be scanned, which costs a lot.

DynamoDB Improved method: The size and location of virtual nodes are fixed, and only the management modes of virtual nodes and physical nodes are changed.

Therefore, every time a new physical node is added, a certain number of virtual nodes are allocated from each existing node to the new node. At the same time, when a physical node leaves, all virtual nodes of this node are equally allocated to the remaining nodes.

Clock vector update algorithm

When an event occurs locally on node I, increment Vi[I] by 1
When node I sends data to node J, Vi is stored in the message
Node J updates its clock vector, Vj[k] = math.max (Vi[k], Vj[k]), then Vj[j] + 1

How does NAS read and write data

The NAS client encapsulates I/O requests into TCP/IP packets and transmits them to the remote NAS server over the network
After receiving an I/O request from a customer, the NAS server extracts the I/O request and performs I/O read and write operations
If data needs to be returned, it is also sent back to the client over the network

Advantages and disadvantages of NAS

Advantages: Simple deployment, low cost, and convenient management

Disadvantages: Data security risks, NAS performance is limited by the network transmission capability, and increases network load

OceanBase process for writing data

Write transactions in OceanBase are concentrated in UpdateServer. The UpdateServer always writes to the memory table first. When the threshold of the memory table is reached, the UpdateServer freezes the memory table, generates a new memory table to write data to it, and converts the data in the frozen memory table into a compact data format and writes the data to SSD. Finally, the memory table is reclaimed.

Log-based read and write principles of BigTable

If the request is valid, the write request is committed to the log, and the data is written to memtable in memory. When the memtable limit is exceeded, the memtable is frozen, a new MEMtable is created, and the frozen data is persisted to GFS in the form of an SSTable

GFS data writing process

The client requests the Master node for the location of the chunk server with the lease and other copies
The Master node returns the location of the main Chunk server and other replicas to the client, and the client caches this information locally.
The client transfers data to all replicas
When all replicas acknowledge receipt of data, the client sends a write request to the master Chuck server
The primary Chunk server transmits write requests to all secondary copies
All secondary copies send confirmation to the primary Chunk server
The main Chunk server sends an acknowledgement or error to the client

– Storm programming model

Storm has two Topology programming models: pipelined parallelism and multi-task parallelism. A calculation is called a Topology, which consists of multiple Spouts and Bolts. Once a task is committed, it will always run. Spout gets the emitting Stream message source component from the data source and is the emitter for the Stream. Bolt is a message processing unit used to perform filter/aggregate/query operations.

Paxos algorithm

The Paxos algorithm divides the roles in the system into three phases: Proposer, Acceptor, and Learner.

Prepare phase:

A Proposer generates a proposal number N and then sends a prepare request to a statutory set of acceptors
After an acceptor receives a Prepare request, if the proposal n in the prepare request is smaller than the number to which an acceptor responded, the response is rejected. If the prepare request contains a proposal number greater than the number to which an acceptor has responded, it returns the proposal it accepted last time and promises not to accept any proposal with a number greater than N.

The accept phase:

If an Acceptor accepts a prepare request with a majority of acceptors, it sends an Acceptor request numbered N and value to the Acceptor that responded to the prepare request.
If an acceptor receives a proposal numbered N, it accepts the proposal if n is greater than the prepare request number it responded to. Otherwise, it rejects the proposal.