Let’s get rid of the bad before we talk about the best

All of the companies I know that have failed in technology have one thing in common: they don’t have a unified technology system. Each team is different, with different languages, different architectures, different deployment methods, and varying levels of proficiency.

What about collaboration between teams? In most cases, the service provides a Rest interface, and in some cases communicates data through a database, or even more dramatically through FTP.

As a result, services cannot be monitored uniformly, link tracing cannot be built, log formats cannot be unified, let alone extract valuable data from them. In this case, service governance is almost impossible, and the establishment of a basic observation platform is also impossible. The cost is too high.

Unity is more important than optimality

The reason for this situation is that the initial compromise and connivance later became the “historical burden” that cannot be rid of. These historical burdens still need to be maintained by more and more people, and the farther they go, the possibility of reconstruction is eventually lost.

This is usually caused by ctos with no technical confidence, who are afraid that what they know is not the best solution, so they would rather let people try it on their own, even if it works best on a large scale. Often when the time comes to find that go too far, has been unable to return to the head.

As a CTO (or technical lead), it is not necessary to know the optimal architecture or solution, but it is important to ensure that the technical team does not play by ear.

In my experience with architecture, any bad architecture can be evolved or replaced with a new solution. Only encounter a variety of different architecture, but also have their own problems, at this time to unify the architecture of the sky, it is better to open a new company to start again.

Suboptimal choices are better than multiple optimal choices. 1 > N.

The value of unity

Just to name a few things to unify:

1. Unify the language

The Java language has the world’s best observability, debugging, and will always be the Internet company of choice.

Based on the unified language, a unified link tracking, monitoring index and dynamic tracking platform can be constructed. Everyone using Java can use BTrace, Arthas to quickly locate online problems. Or when something goes wrong, NullPointerException is something that everyone can understand. Or once there is some technical optimization, it can be quickly applied to all services.

2. Unified logs

It includes unified output format, output mode, and log framework. Through these, you can build a log platform and develop log analysis tools in the future. Problems can be traced across services by burying userids, link ids, and so on in logs.

3. Unified RPC call framework

For Java, you can use Dubbo with your eyes closed. Dubbo is the most Java language friendly framework with an active community, high maturity and easy operation. Order of magnitude of service under three or five hundred full cover live.

Spring Cloud is definitely not recommended.

First of all, I know of no large company that uses Spring Cloud on a large scale. Second, there are so many components in the Spring Cloud stack that it is difficult to understand, and it is even more difficult to manage problems with large-scale use.

Most of the Internet will analyze the performance of RPC framework and so on. I think performance is the least of the considerations when it comes to selection. More important things are often overlooked:

3.1 Unified RPC Interface Definition

Dubbo’s interface definition is very friendly. Within a single company, interfaces can be defined using the same project, and all services define interfaces together for uniform approval and release. This prevents people from having inconsistent understandings of Dubbo, resulting in incorrect RPC interface definitions, or issuing incompatible interface changes.

3.2 Development Efficiency

When THE Rest interface was first used to make RPC calls, the most difficult thing was to document the interface, and deal with the interface response format. Even if everyone uses JSON, they may have different ideas about the format of the date, or they may use different JSON libraries, resulting in a lot of confusing problems and wasted time.

In Dubbo’s case, the interface is defined and the code is written, and neither the caller nor the provider has to worry about serialization. And there is also a backdoor to change to gRPC, the interface definition should be proto.

As for the issue of cross-language RPC, it’s not worth considering at all. There is no need for cross-language, when the scale of the company is large, all aspects of the facilities are gradually complete, and the introduction of another language is actually a kind of destruction.

3.3 RPC Interface Management

For Dubbo, the provider caller of each interface is known. Each interface can count the number of times that the service caller needs to go offline or upgrade the interface.

Dubbo’s problems do exist, and they are performance issues on a large scale. If an interface provider has dozens of nodes, or if a service depends on dozens of interfaces, Dubbo’s model is a bit too big to handle.

For example, if there are 100 interdependent nodes in the system, each node needs to establish at least one connection with the other 99 nodes, so 99 * 99 connections need to be established between them. This is certainly a high load on the system. A better architecture, of course, is to establish connections between nodes through middleware, and middleware forwards traffic, so theoretically the whole system needs at least 99 * N(the number of middleware) connections. Is N x N -> N x M optimization, the connection number is not infinitely high.

This architecture also brings problems, most notably the doubling of Intranet traffic. The middleware carries all the north-south traffic, which requires high performance, and the overall stability of the system will decline. The operation and maintenance monitoring of middleware also comes at a cost. In short, there is no best architecture, to solve a problem, to solve more problems later.

4. Unified link tracing

Link tracking is an important means of observation, and a lot of valuable data can be collected from the link.

For example, Skywalking was used uniformly in our company. We modified Skywalking to enable it to be equipped with the capability of dynamic delivery and continuous tracking out of sampling, and to increase the collection of thread-level memory and CPU. Furthermore, all the behaviors of applications were further analyzed from the link, so that the root cause of problems could be identified at a glance through the monitoring system.

If you want to dig deep into a technology stack, the premise must be unity.

The evolution of the unified architecture

Unifying the technical architecture does not mean that the architecture is rigid, but facilitates the evolution of the architecture. There should be a permanent architecture committee within the company that, on an ongoing or regular basis, keeps the technology stack and architecture alive and abreast of industry trends.