Some time ago, I saw an article on the Internet about what skills you need to have as an architect, which just includes the following points:

1. Have certain mastery and application experience in Java basic technology system (including JVM, class loading mechanism, multi-threaded concurrency, IO and network).

2, have a clear understanding of object-oriented software development ideas, familiar with common design patterns;

3. Familiar with the current popular open source framework (Spring/SpringMVC/IBATIS), and have a certain understanding of its core ideas and implementation principles;

4. Familiar with the development and design of Oracle, MySQL and other databases, as well as the design and development of REDIS or Memcached system;

5. Familiar with underlying middleware, distributed technology (including cache, messaging system, hot deployment, JMX, etc.)

At least one Java application server, such as Tomcat

7. Proficient in shell programming, familiar with awK, sed, grep, strace, TCUDump, GDB and other common commands;

8. Experience in design and development of large distributed, high concurrency, high load (large amount of data) and high availability systems

9. Have an understanding of configuration management and agile development models

10. Business ability

Of course, the above ten points are all abilities that a high-level architecture should master or has already possessed. Even if we know that an architect needs such abilities, it is still far from the stage of being an architect, just like you know that you need to take math, physics, politics, history and geography in the college entrance examination. We know that these are examination subjects, and how to get these things or how to deal with each knowledge point in the college entrance examination, so we know that we should master the direction of technology is of course to learn the specific knowledge inside;

I’m going to use distributed architecture as a topic to explain what specific knowledge is needed and how you can get it;

A, communication

Since it is a distributed system, inter-system communication technology is inevitable to master.

First of all, I need to master some basic knowledge, such as network communication protocols (such as TCP/UDP, etc.), Blocking IO (NonBlocking IO, ASYN-io), network adapter (multi-queue, etc.). More application level, need to understand such as connection reuse, serialization/deserialization, RPC, load balancing, etc.

Communication connection mode:

A large number of connections usually go one of two ways:

1. A large number of clients connect to a server

Now that NonBlocking-IO is so mature, a server that supports a large number of clients is not that difficult to write.

One important thing to note is that when the server fails, all clients cannot be reconnected at the same time. That would be a disaster.

The usual method is to do random time sleep before the client reconnection, and the other is to take the avoidance algorithm for the interval of reconnection.

Two, scalability

Basic means that large, distributed system for this type of system scalability problems must be taken into account in design, architecture diagram on any one point, or if the request quantity is increasing amount of data, how to do can be solved by adding machine way, of course, this process also need not consider infinite scene, If you have experienced architects from relatively small to very large, obviously the advantages are not small, but also increasingly scarce.

The scalability problem revolves around two scenarios:

1. Stateless scenario

Stateless scenarios usually put a lot of states in DB. When equivalent reaches a certain stage, servitization needs to be introduced to alleviate the situation of too many DB connections.

2. Stateful scenario

The so-called state is actually data, and Sharding is usually used to achieve scalability. Sharding has a variety of implementations, common ones are as follows:

2.1 rules Sharding

2.2 Consistent Hash

2.3 The advantage of Auto ShardingAuto Sharding is that there is basically no need to take care of data relocation, and it is OK to add machines as the quantity increases. However, usually Auto Sharding has relatively high requirements on how to use.

This often leads to limitations, such as HBase.

2.4CopyCopy this common read far more than write situation, implementation will have the final consistent scheme and global consistent scheme, the final consistent most can be through the message mechanism, etc.,

Global consistency, such as ZooKeeper/ETcd, is difficult to achieve both global consistency and high write support.

Three, stability

As a distributed system, it is necessary to consider how to deal with the failure of any point in the whole system (it is normal to fail some machines every day when the machine size is certain), and it is also mainly divided into stateless and stateful:

1. Stateless scenarios For stateless scenarios, it is usually easy to handle, only node discovery mechanism with heartbeat detection mechanism is OK, experience is nothing more than purely rely on 4 layer detection is not enough for business, usually need to make 7 layer, of course, made 7 layer has to deal with the problem of large scale

2. Stateful scenarios In a globally consistent type of scenario, if one machine fails, it usually means that there is an election mechanism to decide which of the other machines will dominate, common examples being paxOS-based implementations. Maintainability How should the whole system environment be set up, deployed, supporting maintenance tools, monitoring points, alarm points, problem location, problem handling strategy and so on.

Four, maintainability

Maintainability is easy to be omitted, but it is actually a very important part of distributed system, such as how to build the entire system environment, deployment, supporting maintenance tools, monitoring points, alarm points, problem location, problem handling strategy and so on.

Access the Java Advanced Architecture 697579751 for free architect learning materials

Of course, more details can be found in the following topic



This is just what you need to know about distributed projects, and we have seven major projects for the architect’s big project