Middleware covers a wide range, including development framework, registry, API gateway, configuration center, distributed transaction and distributed message components.
The current development of middleware centers on three main directions: First, with the popularity of cloud native technology, business applications are gradually transformed into containers and microservices, and how to adapt to cloud native usage scenarios and support large-scale service governance. Secondly, most of the middleware is not standardized, which not only brings great trouble to the user’s selection, but also increases the cost of learning and using. Finally, middleware itself is also facing cloud native upgrade, how to realize computing and storage separation, peer deployment and parallel expansion on the server side.
In recent years, the industry has made many explorations in each direction. For example, service grid provides standardized service discovery and governance capabilities for multi-language technology stacks, but brings additional performance and resource consumption, increased architectural complexity and operation and maintenance costs, which need to be further optimized.
On January 5, InfoQ invited Wang Hongzhi, technical director of Tencent Cloud Micro Service Center, to talk about middleware technology trends in 2022.
Wang Hongzhi is the technical director of Tencent Cloud micro-service products, and the head of the micro-service engine TSE and the open source project PolarisMesh. He has been engaged in the r&d of cloud native middleware in Tencent for a long time, and has been responsible for Tencent service discovery and governance platform, RPC framework, application PaaS platform and other projects.
Video playback address: www.infoq.cn/video/RXHG7… (This article is reprinted from InfoQ with permission)
Middleware evolution trends
InfoQ: We know that middleware is very important. In your opinion, what is the role of middleware in architecture? What is special about the middleware team and what they do compared to other teams?
Mr. Wang: We often see a layered diagram of the technology architecture, usually a four-tier model. At the bottom is IAAS layer infrastructure, at the top are databases and middleware, and at the top are business applications. To put it simply, some reusable technical components used in the application layer development process, such as Web server, RPC framework, message queue, can be included in the category of middleware. \
This part of the work and some other areas such as database, computing storage, network, there are a few more special.
The first, and most important, middleware is between the application layer and the database or infrastructure layer, which is close to the business scenario. For example, the development framework is a typical middleware, but there are many development frameworks in the industry, each of which has its own characteristics to adapt to different business scenarios.
This requires students who are engaged in middleware development to have a deep understanding of these business scenarios and to be able to do some abstract design for different business scenarios and then make them into a universal middleware. The challenge is quite big. For example, similar to the routing load balancing middleware, it is different for every business scenario routing, simple like this way of random or Hash, but in some business scenario may need to be performed according to the load from the server to equilibrium, like in the scene of the game may according to the number of game players to access on the machine, You may also want to fix the server based on a business-specific ID. We can see that a simple component will involve many business scenarios and need to make various adaptations, so it is a big challenge for the middleware team to design a common solution.
The second point is that many parts of middleware will be coupled with business development. No matter framework or cache, middleware will be embedded in business code, so business has high requirements on its performance or stability, and all business upgrades should not often have bugs.
The third point is that the technology stack of the middleware team should cover more comprehensively. Because it is close to the business layer, the business may use different languages and different development technology stacks. Middleware should also cover all aspects of the technology stack to meet the needs of different businesses.
InfoQ: For enterprises of different sizes, such as large enterprise, medium enterprise, small enterprise, what are the middleware they must use when choosing middleware?
Wang Hongzhi: From the broad sense of middleware, each enterprise should use middleware. For example, the development of small programs will also use web servers to realize the back end of small programs, Tomcat these Web servers are also middleware. Then, with the increase of users and the increase of business complexity, it is necessary to add distributed cache between database and application to improve access performance, and cache middleware like Redis will be introduced. If the request has a traffic peak, you can use message queue to eliminate the peak and fill the valley.
If it gets a little bigger, it may move into service-oriented development, using distributed services or microservices architectures, like some traditional enterprises in the past, it may introduce an ESB, or enterprise Service Bus, which is a typical middleware. Of course, in the Internet enterprise, we rarely use this enterprise service bus, but directly use a service registry, plus a service framework to implement the distributed service architecture. This is the domain of middleware.
In general, middleware is designed to help enterprises solve common technical problems encountered at different business scales. As your business grows from small to large, you run into more technical problems and progressively use more middleware.
InfoQ: What do you see as the overall direction of middleware evolution in recent years?
\
Wang Hongzhi: I think there are three major points in the evolution direction.
The first is a change in the way middleware is used. Cloud native, because with the popularity of cloud native technology, businesses are gradually containerized and microseralized. In this transformation process, middleware is required to adapt to the usage mode in cloud native scenarios.
Different areas, such as development frameworks and service management platforms, need to support large-scale service governance. Because previously it might not have been a microservice architecture, but rather a larger distributed module with fewer services. Now after the transformation of micro-service, the service scale is also larger, and it needs to do some governance and link tracking analysis, which will put forward some new requirements for middleware.
The second is that open source products attempt to standardize middleware. There is no unified standard for most middleware. Unlike container scheduling, where everyone defaults to Kubernetes as a standard, there is no problem with selection. If you want to use containers, use Kubernetes. However, for middleware, such as API gateways, service governance components, development frameworks, message queues, there are still many implementations in the industry, both internal and external open source companies, which bring great difficulties to user selection and increase learning and use costs. In recent years there have been a number of open source products that try to standardize on this. For example, service governance, a hot service grid in recent years, tries to do a standardized construction.
The third is the evolution of cloud biogenics in terms of technical architecture. Middleware itself is also facing cloud native upgrades. With the advancement of cloud native technology, middleware itself also needs to undergo cloud native transformation, such as to realize the separation of computing and storage on the server side, to support stateless peer-to-peer deployment, and to do parallel expansion quickly. This is where a lot of middleware has evolved over the years.
InfoQ: What are the essential differences between cloud-native middleware and on-premise middleware?
Wang Hongzhi: Essentially, there is no difference. For example, the functions provided to users by middleware are the same from local deployment to cloud native deployment. If there is no change in this piece, there is essentially no difference.
If there are differences, there are two main differences.
The first point is that after cloud native transformation, its deployment mode has changed. Cloud native is mainly container-based deployment, which enables it to do elastic scaling quickly. This can be challenging for some middleware. If the middleware is stateless, it is easier to do cloud native transformation. However, some middleware is stateful. For example, ZooKeeper is a strong consistent protocol, and each node is stateful. Therefore, container deployment cannot be smoothly implemented.
Secondly, after cloud native deployment, we can further develop Serverless deployment architecture, so that users do not need to be aware of the physical resource bearing of the middleware backend and only need to use it according to the amount. In general, these are differences in deployment or usage, not differences in substance.
InfoQ: You mentioned earlier that the technology stack of middleware covers a wide range of areas. Can you briefly explain what the technology stack of middleware covers? What do you think it can be divided into?
Wang Hongzhi: the core of middleware is currently two, we Tencent cloud middleware products made such a division. One is the middleware used for distributed or microservice development. This section covers more, such as the framework for service development, registry, API gateway, distributed transactions, which are used to solve the problems in distributed service architecture, we classify it as microservices middleware. The second category is message-oriented middleware, such as Kafka and Pulsar. At present, these two categories are relatively high in the middleware recognition.
There are also some middleware that you probably don’t think of as middleware these days, like distributed cache Redis, which we used to think of as middleware, but now a lot of enterprises classify it as NoSQL database, which is done by the database team. There are also some database proxies, which are used to implement sub-database sub-table and database transactions, and many distributed database access layers can accomplish these functions. Other scenarios like big data also have a lot of middleware, which is generally categorized as big data.
Technical review and prospect of microservices
InfoQ: From these two pieces, we can talk more specifically about the progress of microservices middleware in 2021, for example, is Istio still the most popular framework, what progress is it making in 2021, and what are the problems?
Wang Hongzhi: One of the most popular concepts in microservices these years is service grid. Istio is one of the representatives.
In 2021, it as a whole is in the process of a steady iteration. Istio underwent a revision of the architecture in 2020, such as the scrapping of the Mixer and some merging of the backend components, and in 2021 the architecture was mainly stable and ready for production, but it still had some problems.
An inherent problem is that Istio takes a proxy approach to service discovery and governance, which intrudes on business traffic. It hijacks business traffic, which leads to performance loss in business traffic, and proxy also increases CPU memory consumption. This is equivalent to adding two proxy forwards for each service invocation, which increases the complexity of the architecture and the cost of operation and maintenance. These problems affect the implementation speed of Istio in the production environment, so we still see few large-scale cases. Some landing cases, we know that it is more changed, changed in large-scale production environment can be used better.
Now there are gradually some cases of production landing, and in some cloud services also began to provide stable standard solutions, so that people can quickly use, which will be of great benefit to its later landing promotion.
InfoQ: After talking about Istio, we would like to talk about the changes to Service Mesh in 2021 in your view. Are there any new frameworks that are interesting to watch? Like does Dapr count?
Wang Hongzhi: Let’s start with Dapr. Dapr is a framework, but it is fundamentally different from Service Mesh. First of all, Dapr defines a much broader scope than Service Mesh. Service Mesh addresses the problem of Service discovery and governance. Dapr addresses the problem of multiple runtimes. In addition to service discovery and governance, Dapr addresses issues such as standardizing database access, message queue access, and more.
Dapr is pretty much a standard middleware form, and is positioned to include almost all middleware that we use in application development, to standardize. It can also be used with Service Mesh. For example, Dapr did not do Service governance before, but now we see that some of their plans can also be used with Service Mesh. This is indeed a hot direction last year, and it is still in the early stage. Tencent, for example, is also following these directions. We also have an internal interest group for Dapr, doing some gray scale experiments in some small businesses, but it is still some time before it can be used on a large scale.
In terms of changes to the Service Mesh itself, there are two major changes. The first problem is mainly caused by the traffic proxy method mentioned above. In 2021, the community is also trying to solve this problem. There are two main solutions: one is to improve its forwarding performance based on eBPF. There is also the Proxyless model proposed by the industry, which is the agentless Mesh model.
As mentioned above, the performance or resource loss caused by the solution of Mesh is mainly caused by proxy traffic. In my opinion, proxy traffic is not the core of Mesh positioning, but the main purpose of Mesh is to provide standardized service discovery and governance solutions. So since proxies introduce these loss issues, there is now talk of an agentless service grid approach. The agentless service grid approach is to provide standardized service discovery and governance capabilities through a lightweight SDK or Agent approach.
In this case, requests don’t need to go through proxies. One of the big developments is that I saw in October that something like gRPC is going to implement service discovery governance directly, without having to go to proxies like Envoy, requests can be connected directly. To realize service discovery and governance functions directly in RPC framework is also the direction that Tencent has been doing internally.
Tencent also considered whether to use Sidecar agent or lightweight SDK to do standard service discovery and governance when making service discovery and governance scheme internally. Considering the performance and resource loss caused by passing Sidecar, the delay loss is unacceptable in the case of heavy traffic. If this solution is used company-wide, the number of agents is also huge, and the additional resources consumed are considerable.
At that time, when we worked on the Mesh solution in Tencent, we gave priority to supporting the multi-language and lightweight SDK, and also let different business frameworks directly integrate SDK to realize standardized service discovery and governance. This is the same idea as last year’s agentless service grid approach. The scheme made by Tencent just mentioned was opened in September last year in the way of Polaris project. If you are interested, you can also pay attention to it on GitHub.
InfoQ: What is the status of Service Mesh development in China? Like Tencent Ali, such a large factory will develop the micro service framework, so about the product development thinking is what?
Wang Hongzhi: I’m mainly familiar with Tencent. Let me talk about Tencent first. Just mentioned some of Tencent’s own solutions, we developed the Polaris service discovery and governance platform, similar to the service grid. It offers two grid modes, a proxy-free mode that integrates and uses the business framework with a lightweight SDK, and a Sidecar access mode that supports Istio. The former is now being used internally on a large scale because it provides higher performance and does not require Sidecar deployment
Tencent basically has its own development framework for each business line. In addition to self-developed ones, some businesses also use open source gRPC or Spring Cloud. At present, the company hopes to gradually unify the tRPC framework for Tencent’s new generation.
If each line of business chooses a development framework, it is cheaper for them to develop the framework and integrate lightweight SDKS. Business development will not feel the SDK directly. The business felt that since it was using the development framework and embedding the integration SDK in it, the cost of access was not much different from that of Sidecar, but it did not have the disadvantages of Sidecar.
Others, such as Alibaba, we have seen some public information, they will also have both methods, including alibaba Cloud also provides istio-based products, also provides Dubbo based solutions to do service discovery and governance, like Dubbo3.0 plans to directly support the service grid model.
InfoQ: Danmu asked if Tencent’s internal microservices stack is unified?
Wang Hongzhi: This part is also divided into two parts. The microservice technology stack includes many things, such as development framework, registry, API gateway, distributed transactions and so on. Some of these components are unified, and some are not yet fully unified.
The registry is now largely uniform. Registration and governance centers are currently unified on the Polaris platform, and the entire platform should have more than five million services by the second half of last year. Like configuration management is also unified into our seven color stone platform.
There is no consensus on the development framework. In 2019, we made a tRPC framework, similar to gRPC, with some of our own optimization. In the past two years, we have been promoting it within the company. In the long term, we hope to unify it into this service framework. It is not yet fully unified, and there are still many existing frameworks in the coexistence stage.
Because this piece is somewhat coupled to business development, the more tightly coupled parts are the more difficult they are to unify or migrate. The tightest part is the development framework or service framework. Even if a better or more powerful framework is created, it will take a long time to replace it.
InfoQ: What do you think about Proxyless?
Wang Hongzhi: There have been some community exchanges in this direction before. We started with infrastructure sinking, but there are costs and wastage associated with this application agent in isolation. When we do any architectural evolution, we need to consider its costs and benefits. I think the service grid provides a standard solution for service discovery and governance. Whether it is a proxy or not is not the most important core point, it is just one way of doing it.
Let’s look at the pros and cons of this approach: It has the benefit of being able to develop non-intrusive, independent evolutionary upgrades that require no business awareness. In practice, it turns out that by introducing Sidecar, the whole process is not completely invasive-free. There are a lot of things that you still need to embed into the framework, whether it’s call chain passing, or dye key-based routing.
Another is that large-scale companies will not directly write the business layer applications, regardless of the use of self-developed framework or open source framework, there will be some CI processes within the company, can do automatic CI upgrades to the framework. The framework and its dependent SDK can be released as the business releases. This is different from the upgrade of Sidecar. One is solved in CI process, and the other is solved by operation and maintenance.
If the business needs to achieve low latency and low performance loss, our selection is more inclined to SDK solution, but we also provide Sidecar solution internally. During the three years of implementation of Polaris, it is necessary for businesses to make their own choices. In the end, most businesses still choose to access in the form of framework and SDK. Sometimes we feel that an idea is advanced and new, but in fact it may cause additional migration burden to the business. The shortcomings of the Sidecar approach may be eliminated in the future as the technical solution evolves, and we can look at this approach again.
InfoQ: We still have a lot of problems with the barrage. One viewer commented that “Mesh has a long way to go”. Can you talk more about usability?
Wang Hongzhi: We have also studied Istio carefully. I understand that the ease of use mentioned by my friend may mean that some of its rules are difficult to understand and manage. This part is relatively easy to solve, the basic business will not be naked, you can do a layer of encapsulation above, on the platform for business scenarios to do some visual configuration management.
On the other hand, there may be some problems with access, such as supporting multiple languages and technology stacks. Multi-protocol access is more troublesome, it only supports HTTP, gRPC and the original TCP protocol. However, as Tencent has many other self-developed protocols, such as based on some PB protocols, or even some binary protocols, this access cost is very high.
Another area is post-deployment operation and maintenance management, including problem location, which is still lacking, so the business needs to improve the observability and monitoring capabilities of the service grid itself.
InfoQ: One more question for Service Mesh: How do you view the current technology maturity of Service Mesh?
Wang Hongzhi: Generally speaking, this technology has reached a usable degree from the existing functions. 2020 is still very unstable. After great structural adjustment, for example, the previous Mixer was removed, which was a great transformation. In addition, multiple components at the server end were unified into one ISTIOD, and there were incompatibables in these changes.
But going into 2021, its version is stable. If the inherent defects have no impact on our business scenarios, they can be used in production. For example, some of our services, if they are not so sensitive to delay, or the requested invocation link is not very long, then they can be invoked by Sidecar, and the delay increase is not too much, so they can be used.
Technical review and outlook of message queues
InfoQ: Next we will talk about messaging systems, like the major messaging systems, Kafka, Pulsar, RocketMQ, what are the key technological developments in your opinion for 2021?
Wang Hongzhi: Kafka first. Kafka, as you probably know, made a big change last year to remove its dependency on ZooKeeper. ZooKeeper brings some costs to Kafka’s deployment and operation. Removing ZooKeeper simplifies the architecture and makes it easier to deploy, especially in the cloud. If you look at the plans for Kafka, they are also planning some storage separation, hot and cold separation.
RocketMQ also had a larger release last year, version 5.0. In version 5.0, the storage separation architecture is also proposed to enable more dynamic storage. The client side has also made some lightweight changes, putting some complex functions on the server side, so that there is less probability of bugs when the client updates and users reference them.
Pulsar is the hottest messaging system in the past two years. The main feature of Pulsar was its computationally decoupled architecture from the beginning. Now Kafka, too, is doing storage separation, perhaps inspired by Pulsar. It is architecturally sound on its own, but it is a relatively new messaging middleware. Last year was mostly about functional enhancements, such as support for transactional messages. There are also some ecological things, such as Kafka is good at big data ecology, like Pulsar can also do some upstream and downstream connections.
InfoQ: We have seen that we already have a lot of bullet screens, some questions are: is there any large-scale event practice within Tencent based on pulSAR message queue?
Wang Hongzhi: Yes, there is. Tencent should be said to be the first large-scale use of Pulsar in China.
At present, one is our self-developed business, and Pulsar is widely used in big data. At the same time, Tencent Cloud also provides Pulsar cloud products. At present, many external enterprises have started to use Pulsar cloud products.
InfoQ: We were just talking about the evolution of middleware technology, what do you see as the major issues for messaging systems to adapt to cloud native?
Wang Hongzhi: We also encountered this, that is, we provide on the cloud, whether Kafka or Pulsar, is to do some cloud native deployments for it. Kafka itself is computing storage is integrated, its storage is equivalent to the server, is stateful, which is to use Kubernetes stateful deployment solution. Pulsar itself is storage computing decoupled, is basically stateless and can be extended at will, which is best suited for deployment to containers. On the whole, it’s about doing some of the things on the deployment plan.
InfoQ: We have collected community questions from developers asking what is the future of messaging middleware? Is there still room for message-oriented middleware?
Wang Hongzhi: There is definitely a lot of room for development. Although it looks simple, it has a long history of 20 or 30 years. I remember IBM had MQ in the 80s and 90s, deployed on hardware and software all-in-one machines. More recently, some people have used these solutions, like Webank, for their stability and application-level reliability. It was only in the year 2000 that these, such as RabbitMQ and ActiveMQ, equivalent to distributed MQ appeared. Until now there were things like Kafka and Pulsar.
Now these middleware are facing the problem of sustainable development, one is the cloud native just said, need to see how to adapt to the cloud native scenario. The second is standardization. We have talked about several types of MQ, which scenario should be used when selecting the type? What are the advantages of each? Or is there one MQ that fits many scenarios? There are several solutions that attempt to do this, such as Pulsar, which provides access plugins for multiple protocols at the access layer to enable users to access Kafka, RocketMQ, and RabbitMQ, which are equivalent to standardization.
There is some ecological expansion beyond the relative core. For example, as mentioned earlier, in some big data scenarios to do lightweight computing, like Kafka has Stream and Function Stream computing functions.
There are also products like Eventbridge, which is an event bus, for example, going up to the application layer. Its core is message queue, in the direction of event notification can be done more simple and easy to use, more close to the scene. This is where message queues can be extended. \
InfoQ: We have a question from the community about whether middleware will be systematized and standardized in the future or beyond. Just now, the teacher also mentioned that standardization is a development direction in the future. What is the development trend of middleware technology? The audience friend said, “He understands that it will definitely not disappear.” He probably means the long-term development direction. What do you think, Sir?
Wang Hongzhi: First of all, whether the middleware can be completely systematized and standardized. I personally would love to see that happen, if it were standardized. The selection of users is a good news for the middleware research and development team.
There are a lot of open source message queues out there. Kafka, RocketMQ, Pulsar, there are several communities that are very active and have a strong following of users. And every message queue, whether it’s Pulsar or Kafka, is programmed to support different protocol access, and everyone can be compatible with different, other message queues. Microservice components, for example, development frameworks, different languages, different development frameworks, there are many more, maybe dozens of them. Tencent statistics in his company before, there are 20 to 30 kinds of development framework. Open source in the industry we are common actually also have a lot of. Because of its characteristics and business development is relatively coupled, this part of its standardization is relatively difficult. So we may not see such a standardization in the short term.
Disappearing is definitely not going to go away, and so many of the components that we’ve just mentioned are things that we have to use in developing applications. Unless development doesn’t need it, that’s probably going to go away.
There are three main long-term trends. The first is cloud native, and some adaptation should be made in cloud native scenarios to support cloud native application development. The second is that each community wants to become a standardized component. The third is to say how the middleware itself to achieve cloud native, including Serverless, can provide services to users with more convenient and lower cost.
Career planning and advice
InfoQ: Do you have a recommended path for those who want to get started as middleware engineers?
Wang Hongzhi: There are many kinds of middleware. If you want to get started, you may need to sort out a brain map first, summarize the components within the scope of middleware, get a general understanding of the overall system of middleware, and then choose one that you are interested in, or use the most common one first. Now open source is particularly hot, these middleware in the open source community have a relatively good implementation. Many organizations use and evolve on open source solutions, so look for well-known open source implementations to learn from. You can also go to some large factories to use these middleware practice case articles, to see some of the problems they will encounter in the process of landing.
InfoQ: It is also a growth path question, what is the career path for middleware engineers? Or a direction of development? This is also a question from the community.
Wang Hongzhi: There are three main directions. The first is the middleware technology that can sustain the power to become a technical expert in this area. Middleware and business architecture are very coupled, and it takes a lot of time to have a deep understanding of business architecture in order to do a good job in middleware design. Second, I just said that I need to have a deep understanding of business architecture, so that I can be an architect in the future. Third, after long-term research and development, with more experience, I can transfer to some positions in RESEARCH and development management.
Q&A session
InfoQ: We also have the issue of “bullet screen”, which aspects does Tencent standardize service governance?
Wang Hongzhi: This has something to do with the current situation. Before Tencent standardized, there were many micro-service systems within the company, with different registration centers, different configuration centers and different development frameworks, which were all separated.
This brings a problem. With the adjustment of some internal business structures, including business cooperation, different microservice systems will often use different registries, different frameworks and protocols. In this case, to do interoperability, some compatibility development work must be done. This work, quite meaningless, will waste a lot of time, reduce the efficiency of research and development. Some business side in addition to the registration of the way to provide services, but also a separate domain name to provide services; Or you may need to interconnect with multiple registries, and an application may interconnect with several registries. So in this case, we need to start standardization.
The biggest problem in doing this work is how to make the existing business seamless migration. When the business side is doing migration transformation, it is expected to be non-intrusive, and it is hoped that standardization can be promoted without the transformation of the business code layer. Because if the code layer is too expensive, you might as well keep doing it the way it was before.
Considering the immovable stock of the case to do standardization, the first most core is the registry part. We created a new “Polaris” registry with a two-way connection to the original registry, which can be directly migrated without changing the code, and can also be transferred to the original service. Existing services can also be registered on Polaris, which is a two-way connection. This solves the registry consolidation problem and forms a standard so that more governance can be done. The second point is that when we started to do distributed services, we are basically doing discovery and governance functions that some old centers did not do before.
Once integrated into Polaris, we provided these features as a lightweight SDK. If a business has a real pain point and needs this governance feature, it will develop a version that will embed the feature in the framework they are using, update the version of the framework when the feature is released, and the new service will be able to incorporate these features when it goes live. For the other stock of business, if it runs well for a long time, there is no need to change the time, can not move. The whole process can achieve smooth migration, and finally achieve standardization.
InfoQ: Does it make sense to open a registry if different systems are running on different networks in the same VPC?
Wang Hongzhi: There are two aspects to this. On the one hand, if you are not in the same VPC originally, these services may not be able to call each other, because they cannot be connected. Therefore, from the perspective of service architecture, it is really meaningless to connect. It depends on whether there is such an evolution process in the future. For example, if there is a need to support different VPCS, the call can be opened. If there is no need to open the call, there is no need to open the call. On the other hand, different business architectures, from the perspective of operation and maintenance, is it very expensive for you to maintain different data centers? Do you want to manage all the services in a unified central management page? If you have such management demands, you can also go through them.
InfoQ: The following question is also from the danmu area. One question is also about whether Tencent is developing the link tracking and monitoring of the micro service or something else?
Wang Hongzhi: Tencent’s monitoring system has not been unified. It is now a BG level, which is the unification of business groups. Monitoring data, including log data, are sensitive. For example, some business monitoring data are not expected to be seen by other departments or OTHER BGS. Therefore, monitoring is unified at the level of business group. The solution is a little different for each business group. Tencent Cloud is a BG based on open source support for Prometheus and SkyWalking. The monitoring standard supported by the client is Opentelementray, which supports access, including log monitoring, and is compatible with open source Web. Users on the cloud can use open source access solutions, but there are some self-developed solutions. The backend may be self-developed, but the access layer is increasingly supporting standard monitoring access like Prometheus and SkyWalking.
InfoQ: There is another question in the barrage: how can the SDK be lightweight without Sidecar?
Wang Hongzhi: Let’s talk about it from the two dimensions, from the physical scene, whether the function is very complicated, whether the package is large, you might think it is a bit heavy. Another is that it is very complicated to use, such as calling these apis to implement governance functions. Maybe the API is very complex, for example, to implement a function to call seven or eight apis, it may feel heavy to use.
So the lightness in Polaris is basically two latitudes. The first dimension is the complexity of the overall functionality, including the packages referenced, which is relatively lightweight. Because governance is not a heavy thing, mainly routing, load balancing, fusing and limiting traffic, it is not very complicated. Because unlike Sidecar, it doesn’t have to take over traffic, there’s no layer of traffic forwarding. Pure governance function, itself is not that high complexity. In the function realization, including the whole SDK system is relatively lightweight.
In the API design, we try to make it relatively easy to use standardized routing governance. Including load balancing circuit breakers, Polaris provides through an API such as getOneInstance. The default behind it is to pick the current instance from the Frame that is most appropriate to call. If you have routing rules set, it will first select routes, such as Service routes, or the nearest route, and then perform load balancing based on your configuration. When selecting IP addresses, the IP addresses on the fault have been removed, because there is a circuit breaker policy behind. He will go through the detection to eliminate some of the failed nodes, return a healthy usable IP. For the user, he just tuned an API, which is relatively lightweight to use. Overall, this SDK will be relatively lightweight.
InfoQ: There is a new problem in the danmaku area, please kindly check again: if the company does not have AN SDK and calls directly from HTTP domain names, is it recommended to directly use the Sidecar Mesh mode?
Wang Hongzhi: This needs to evaluate your business attributes. You can know whether you are sensitive to delay through pressure measurement. We tested a version in Q3 last year, and such delays as P99 and P90 may increase a few milliseconds for a proxy. This need to see your call link length is not long, sensitive to delay increase sensitivity, if not sensitive can be used. At least in terms of the business architecture, that’s fine. However, for some big data streams such as live scenes, this may not be suitable, because the consumption of delay will have a great impact on it, and it may be ok if it is not that kind of scene.
Second, it depends on the business scale of your company. If the business scale is relatively large, the operation and maintenance capability of Sidecar is required for Service Mesh mode. Because if you have tens of thousands or hundreds of thousands of nodes, that’s equivalent to hundreds of thousands of envoys on them, can the operations infrastructure take on the kind of operations that Sidecar can do, and if that’s OK, it’s OK.
InfoQ: Thank you very much for your answer. Due to time constraints, today’s live broadcast is almost over. Thank you very much for watching, and thank you very much for your wonderful sharing. *