Distributed system mainly contains a lot of content, I will do an interpretation for two core aspects: distributed application service and object remote call, distributed storage of data. Let’s start with one of the grandfathers of distributed application services and object remote invocation:

EJB/RMI (Enterprise Java Beans/Remote Method Invocation)

Distributed applications and remote object calls

Java engineers at that time were familiar With the name EJB, and there was a time when the EJB VS Spring war (Spring With Not EJB) excited programmers. In fact, the topic of debate is whether it is necessary to implement distributed calls between components and maintain component state in a distributed network environment. What is component state in a distributed environment? The so-called stateless VS stateful of distributed service components

Simply put, stateful is where the back-end service component makes the remote call process look more like a local call, and the client doesn’t have to worry too much about component state holds, making it easier to design purely objected-oriented components. Stateless, on the other hand, back-end service components focus on receiving client requests and handling problems. It is enough to respond to the client, rather than holding on to the state and aggravating it.

For example: shopping cart is an example, stateless shopping cart, the service component is to let the client through cookies to solve the shopping list problem; With stateful shopping, the component service is holding the shopping list state by itself, so the client and server objects look more responsible.

Rod Jonhson started Spring’s battle for the throne with this book.

The final outcome of the war is clear to all – Spring won and EJB went out of business. Stateless call mode has become the mainstream, of course, EJB itself has many problems, although the EJB3 standard was redesigned by Hibernate authors, it is too late. Once I was one of the proponents of EJB, also with EJB faded out! ^_^”

In fact, this victory is essentially the surrender of distributed system architecture to monolithic architecture. EJB (enterprise Javabeans), the flagship of JavaEE, represented a forward-thinking design at the time, a precursor to the current microservices architecture that was left dead in the sand. Let’s look at the EJB VS Spring architecture at the time:

The diagram above shows the JavaEE component services communication diagram. There are Remote EJB containers deployed independently, and Local EJB containers deployed with the WEB. They communicate with each other (RMI) as components, so that WEB applications can access EJBs and EJBs can access EJBs. Through JTA (JAVA Transaction interface) unified management of database transaction operations, to achieve a real distributed transaction. Like the micro services now, like the common RPC call now, as a distributed system architecture, isn’t it standard, isn’t it beautiful architecture!!

Let’s look at the architecture of Spring’s early counterattack:

The diagram above shows Spring’s SSH architecture, a typical single-instance architecture that uses an AOP aspect interception technique to hide and transparently implement transaction calls at the programmer’s code level, allowing developers to focus on their services. This is very approachable, the architecture doesn’t look very simple.

Yes, it’s this simple architecture that beats a beautiful but complex architecture. So you ask me, what are the advantages of distributed systems? It is beautiful. It implements a purer object-oriented model through the componentization of web services, giving programmers a unified programming pattern to deal with the complexity of distributed services on the network.

What are the disadvantages of distributed systems?

The number one disadvantage, deployment complexity, by itself, is very bad for testing and seriously affects Agility. This really isn’t a single EJB deployment is complex, as long as it is should be distributed system, it must deploy complex, we can think about the current service, lightweight design again, still starts the deployment of complex problems, while production deployment separate is good, but 90% of the deployment have occurred during the development phase, so frequent test, development, debugging and restart the deployment, For individual programmers, this is a much heavier burden than single-instance services.

The second weakness is concentrated in the original database SQL access patterns into a web service interface between calls, whether EJB, micro service or invocation pattern essence of this distributed system is the human, because people like to focus on a center to solve the problem, simple in form, appear easier to locate and adapt to change, Services scattered in different places, whether from service choreography, interface changes, or bug tracking, are too taxing for people to handle.

** Thus, Spring’s single services beat EJB with their simple deployment, agile adaptation, and human nature, the first time a distributed system failed. ** But why did the microservices architecture evolve anyway? In fact, I think it is a rebirth of EJB popularity, and SpringCloud has also started the EJB road of the same year. In the process of architecture dismantling, microservices even started to package databases and microservices independently. Here is the architecture of microservices:

The figure above shows the scheduling of multiple microservices by the API gateway. Each microservice has its own independent database, which is segmented from a traditional Master library. The Master library is gradually becoming a data center library, providing basic data exchange and data analysis (OLAP).

If we look at how similar the microservices architecture is to the EJB distributed architecture above, it is actually inherited from the same blood, that is, the distributed system. But is it really good for microservices to evolve into derepositories? I don’t think so. We can compare different data tables of a database to multiple members of a family. Is it necessarily good to forcibly separate family members? In the home originally public use a TV, now split a member, be about to match a TV, do you see TV even run back to see? That’s the analogy of whether a data field is redundant, or is it called through an interface? This will make the designer too painful, nature is against humanity.

Why do we have to do this? * * at the end of the day just want to let the relational database to implement vertical segmentation, forming better performance level, that is to say, root out the problem in a relational database, relational database is naturally does not support distributed, and unable to better realize the database level horizontal scaling, * * so in the future must be the era of distributed data storage systems instead of a relational database. Because if we have a cheap and powerful database system that can scale horizontally to solve performance problems, why bother with vertical segmentation of business data? This is the key role of distributed storage.

Distributed data storage

The specific form of distributed storage of data is distributed database. Distributed database is different from distributed application services. There is no laborious release and restart in the development and testing stage, so it does not affect agility. In addition, data calls are concentrated on an access agent, so there is no need to consider interface management as anti-human as distributed application services. The horizontal scaling force of distributed database just solves the awkward problem that microservices must be divided into libraries, allowing programmers to focus on the business problems of data access. With all these benefits, why can’t it be applied on a large scale? The root of the problem is the transaction!

Why do relational databases have the transactional advantage of ACID? To unlock a row set, I (transaction) need to process a row of data. I (transaction) apply for a key and lock the row. Others (transaction) can access the row until I unlock it. This operation can be easily handled in a relational database with a single-machine design, but in today’s distributed database environment, it is particularly troublesome because the data is divided into slices and distributed in different machine node libraries. Transaction locking has to be located on all relevant nodes. ** Transaction problems incisively and vividly reflects the distributed system in the network environment of multi-node collaboration complexity, collaboration closer things, distributed system dry up more difficult, ** why Key/Value database, such as Redis to use the most comfortable, is also very popular, is that there is no collaboration between data.

For example, TIDB inherits from Google Percolator and implements distributed transactions using Percolator transaction model. Is now a new term: NewSQL, traditional relational database ACID properties +NoSQL.

So what’s the key to a distributed database? Is carried out on a chunk of data fragmentation (block/partition), a full peace on different physical nodes, keep the volume of each node is almost the best, so that the throughput performance is best, if the node is some more, some less, this is a lean, then the data more nodes data access will carry greater pressure.

Different data nodes are managed by managers. Some managers are centrally appointed, ** namenode for HDFS, ** Some managers are elected, ** Elasticsearch master node. ** In general, a distributed database has management nodes to schedule data nodes and data nodes to serve data reads and writes. That’s the thing.

The figure above shows two different data management modes. ** The first is the master-slave model, such as HDFS blocks. ** Namenode is responsible for the allocation and scheduling of data block nodes. ** The second is the peer model, such as GlusterFS, where each node is master and slave, depending on whether its data matches the request.

To see how complex the architecture of distributed transactions can be, consider the architecture of TIDB.

The image above shows TIDB as a whole, and as an expert in distributed databases, this architecture would be a head-scratch.tiDB.

TIDB is the client-facing SQL request and logic handler, PD is the manager of the cluster, TiKV persists data in the form of key/value, TIDB, PD and TiKV are distributed clusters respectively, so PD is responsible for transaction initiation and data routing. TiDB is responsible for receiving SQL and controlling transaction process, and TiKV is responsible for data landing. Through such a system with a clear division of responsibilities, a truly distributed transaction is formed.

The advantage is that transaction locking is finally decentralized, reaching the farthest point where distributed database technology has traveled, and this was Google’s Percolator paper published in 2011. Google really is the land of the protoss.

Can pay the price is still not small, the first is obvious, the physical machine is indispensable, but to this business does not care about so many resources. The second is the transaction of TiDB process control was conducted in memory, such as transactional consistency synchronization is good, will enter the TiKV persistence, so the consumption of memory must, meet delay class bug, it may because the data memory flood burst its Banks, and the final question or network interaction is too frequent, guarantee a good network environment is very important.

All right, so much for that. Hopefully, this article has given us a deeper understanding of distributed systems. In a word, distributed systems are too complex to control.

I’m a “Read Bytes” creator who delves into big data technology and interprets distributed architectures

Head over to Byte Read Zhihu — learn more about big data

Public id Read-byte “Read bytes” Distributed, big data, depth of software architecture, professional interpretation