In addition to classic frameworks such as Spring Cloud, Service Mesh technology is quietly emerging while microservices are in full swing. What exactly is Service Mesh, what does it bring, and what does it change? This article is compiled from a speech delivered by Ao Xiaojian, senior architect of Shurenyun, at QCon 2017 Shanghai.
A quick review of the evolution of microservices over the past three years. Over the past three years, microservices have become a technology hot spot in our industry, and we have seen a large number of Internet companies implementing microservices architectures. There are also many traditional enterprises in the Transformation of Internet technology, basically micro services and containers as the core.
One big trend we’ve seen in this technology transition is the popularity of the Spring Cloud microservices development framework, along with the microservices boom. In addition to Spring Cloud, we find that a new generation of microservices development technology is emerging, namely, Service Mesh/ Service grid.
I’m going to do a little survey, if anyone here today has ever seen service grid before, raise your hand. (Note: Survey results, only 3 hands were raised among hundreds of people)
What is a Service Mesh? Then I will tell you the evolution of Service Mesh and why we chose Service Mesh. Why I call it the next generation of microservices, that’s what we’re talking about today.
What is a Service Mesh?
Let’s start by saying that Service Mesh is a very, very new term that most of you have never heard of.
The term Buoyant was first used internally by the company Buoyant, which developed Linkerd. The term was first used publicly on September 29, 2016. In 2017, Service Mesh entered the national technology community with the introduction of Linkerd. Originally translated as “service engagement layer”, the word is a mouthful. It took a few months to switch to a service grid. I’ll show you why it’s called a grid later.
Take a look at the definition of Service Mesh created by Linkerd’S CEO, William. Linkerd was the industry’s first Service Mesh, and they coined the term Service Mesh, so this definition is fairly official and authoritative.
First, the service grid is an infrastructure layer that handles communication between services and is responsible for the reliable delivery of requests. In practice, service grids are typically implemented as lightweight network proxies that are typically deployed with, but transparent to, applications.
This definition may seem a little empty when you look at the text, and it’s not easy to understand what it is. Let’s look at something concrete.
For a simple request, the client application instance, as the initiator of the request, first sends the request to the local Service Mesh instance in a simple way. These are two separate processes with remote calls between them.
The Service Mesh completes the entire inter-service invocation process, such as Service discovery load balancing, and finally sending requests to the target Service. This is represented by Sidecar.
The word Sidecar is translated into Chinese as Sidecar, or “Sidecar”, and also has a rustic translation called “Sidecar”. Sidecar has been around for a long time, adding a proxy between the existing client and server.
In the case of multiple Service invocations, we can see the Service Mesh underneath all the services. This layer is called the dedicated infrastructure layer for inter-service communication. The Service Mesh takes over the network and forwards all requests between services. In this case, we see that the service above is no longer responsible for passing the specific logic of the request, but for completing the business processing. The communication between services is stripped from the application, presenting a layer of abstraction.
If there are a large number of services, the grid will appear. In the figure, the green box on the left is the application, the blue box on the right is the Service Mesh, and the blue lines represent the invocation relationship between services. The connections between sidecars form a network, hence the name of the service grid. At this point, the proxy is represented differently from the sidecar, forming a network.
Just to review the definition, let’s go back to these four key words. First of all, the service grid is abstract, essentially abstracting an infrastructure layer out of the application. Second, the function is to enable reliable delivery of requests. Deployment as a lightweight network proxy. The last key word is transparency to the application.
Notice that in the diagram above, the network may not be particularly obvious in this case. But if you remove the application on the left, and now only show the Service Mesh and the calls between them, the relationship will be particularly clear. It will be a complete network. This is a very important key point in the Service Mesh definition, which differs from Sidecar in that agents are no longer viewed as separate components, but rather as networks connected by these agents. The Service Mesh emphasizes the network of proxy connections rather than individuals as sidecar does.
Now that we have defined a Service Mesh, you should have a general idea of what a Service Mesh is.
Evolution history of Service Mesh
The second part traces the evolution of Service Mesh. Note that although the term Service Mesh didn’t exist until September 2016, it describes something that has been around for a long time.
Let’s start with “ancient times”, the first generation of network computer systems. In the earliest days, developers had to deal with the details of network communication in their own code, such as packet order, flow control and so on. As a result, network logic and business logic were mixed together. Then came TCP/IP, which solved the flow control problem, and as you can see from the diagram on the right, the functionality didn’t really change: all the functionality was there, but the code still had to be written. But the most important thing, flow control, has been taken out of the application. Compare the picture on the left and right, pull it out and make it part of the network layer of the operating system, which is TCP/IP, so the structure of the application is simple.
Now write should, do not have to consider how to send network card. After TCP/IP, this is a complete no-brainer. This is a very distant event, about 50 years ago.
The microservice era is also faced with similar things. For example, when we do microservices, we have to deal with a series of relatively basic things, such as service registration, service discovery, load balancing after obtaining server instances, fusing/retry to protect the server, etc. All these functions and micro services can not escape, so how to do? You can only write code to put all the functionality in. We found that the earliest microservices were the same, with a lot of non-functional code added to the application. To simplify development, we started using class libraries, such as the typical Netflix OSS suite. Once you’ve done that, the developer’s coding problem is solved: you only need to write a little code to implement the functionality. For this reason, in recent years you have seen the rapid adoption of the Java community Spring Cloud, which has become almost synonymous with microservices.
When you get to this point, is it perfect? Of course, if it was perfect, I wouldn’t be here today
Let’s look at these so-called pain points: the content is more, the threshold is higher. How long will it take for you to master Spring Cloud and apply it in your products to solve the problems? Is one week enough? One week is not enough for most people, three to six months for most people. Because when you actually land, there are all kinds of problems that take a long time to solve on your own. Here are the common subprojects of Spring Cloud, and only the most common parts are listed. There are also many contents of Netflix OSS suite under Spring Cloud Netflix. To really understand Spring Cloud, you need to understand all of these things, or you’ll still be miserable when you run into problems.
This is a lot of stuff, and for those of you who are relatively quick to learn, it can be done in a month, but the question is how long it takes your development team, especially your business development team, which is a very deadly thing: business teams often have a lot of junior colleagues.
But it’s not as simple as that. We have to face a lot of reality.
First of all, what are the strengths of our business development team? Will technology be the strongest? No, usually the strongest thing about our business development team is the understanding of the business, the familiarity with the whole business system.
Second thing, what are the core values of business applications? We have worked so hard to write so many micro-services, is it to achieve micro-services? Microservices is just our means, we need to achieve the business at the end of the day, that is our real goal.
The third thing is that in terms of microservices as a means, there are bigger challenges than learning the microservices framework. When doing the real landing of micro services, there will be a deeper understanding. For example, the fragmentation of microservices, such as designing a good API that is stable and easy to scale, and data consistency across multiple services is a headache for most teams. Finally, there is Conway’s law. Every student who does service will eventually encounter this ultimate problem, and most of the time it is to cry without tears.
But that’s not all. The only thing that’s more painful than writing a new microservice system is that you have to microservice an old system.
If all of this is not enough, add another one, and this one is even more deadly: Business development teams are often under business pressure and never have enough time and manpower. Say next month on-line is next month on-line, say double eleven promotion will not push to double twelve. Your boss doesn’t care if you have time to learn About Spring Cloud or if your business team can handle every aspect of microservices. Business is always about results.
The second pain point, the lack of functionality, is listed here as a common feature of service governance. However, the governance functions of Spring Cloud are not powerful enough. If these functions are well handled, the functions provided directly by Spring Cloud are far from enough. Many of these features need to be worked out on top of Spring Cloud.
The question is how much time and human resources are you going to devote to it. Some people say I don’t do some functions, such as gray, directly on the line, but the cost is quite high.
The third pain point is cross-language. When microservices first came out, they promised an important feature: that different microservices could be written in their best, favorite, and most appropriate programming languages. Half of the promise was OK, but the other half was not. Because when you implement it, it’s usually based on a library or a framework, and once you start coding in a specific programming language, it doesn’t seem right. Why is that? On the left, I’ve listed the major programming languages from the ranking list of programming languages, with some of the more familiar ones at the top. There are dozens more that are not listed, and in the middle are emerging programming languages, which are a bit more niche.
The question now is how many languages do we need to provide libraries and frameworks for?
The problem is so acute that there are usually only two ways to solve it:
One is the Unified Programming Language, where the entire company uses one programming language
Another option is to write as many libraries as there are programming languages
I’m sure that any of you who do infrastructure have encountered this problem.
But the problem is not over, the framework has been written, there are also able to write a copy of each language. But then there’s the fourth pain point: version upgrades.
Your framework doesn’t start out perfect, with all features, no bugs, and then it doesn’t need to be changed. It must be 1.0, 2.0, 3.0, with more features and BUG fixes. But once distributed to users, will they upgrade immediately? You can’t actually do that.
In this case, there will be inconsistency between the client and server versions, so be very careful to maintain compatibility, and then try to urge your users: I’m already 3.0, you don’t use 1.0, you upgrade. But if he doesn’t, you just suck it up and work on your version compatibility.
How complex is version compatibility? There are hundreds of servers and thousands of clients, and each version may be different. This is a Cartesian product. But remember, there’s a programming language problem, and you have to multiply it by N!
Imagine what happens when a Java1.0 client of the framework accesses the 3.0 server side of node.js, or a C++ 2.0 client accesses the 1.0 server side of golang. Do you want to run all the compatibility tests? It’s almost impossible to say how many cases your compatibility test will need to write in this Case.
So what? How to solve these problems, this is a realistic problem, we must always face.
Let’s think about it:
The first is what is the root cause of these problems: how much pain we are doing, how many problems we are facing, how many challenges we are facing, does it have anything to do with the service itself? Does writing a user service, doing CRUD operations on users, have anything to do with all this stuff? There’s something wrong, and it’s not the service itself, it’s the communication between the services, and that’s what we need to fix.
And then let’s see what our goals are. All of our previous efforts have been to ensure that the business request from the client is sent to the right place. What is the right place? For example, there is a version difference, should go to 2.0 version, or 1.0 version, what kind of load balancing is needed, whether to do gray scale. Ultimately these considerations are all about getting the request to the right place where you need it.
Third, the nature of the matter. This request is never changed during the entire process. For example, the user service we talked about earlier, does CRUD to the user, no matter how the request goes, the business semantics do not change. That’s the nature of things. That’s what doesn’t change.
This problem has a high degree of universality: all languages, all frameworks, all organizations, and these issues are the same for any microservice.
At this point, you should have a sense: does this question resemble any other question?
What were the problems that our predecessors had to solve 50 years ago? Why does TCP exist and what problem does TCP solve? And how was it solved?
TCP solves a similar problem, sending requests to the right place. These four points are consistent with all network communications, as long as TCP is used.
What happens when you have TCP? Now that we have TCP, and we’re developing our application based on TCP, what does our application need to do? Does our application need to care about the implementation of the link layer below the TCP layer? Don’t need. Similarly, when we develop applications over HTTP, does the application need to care about the TCP layer?
Why do we care so much about the communication layer when we develop microservices? We learn and do all the things in the service communication layer. Why do we do so much?
In this case, another thought naturally arose: if we can move the stack of network access down to TCP, can we move the stack of microservices down similarly?
Ideally, we would add a microservice layer to the network protocol layer to accomplish this. However, due to standards, it is not implemented at the moment, so it is not realistic for the time being. Of course, there may be a network layer of micro-services in the future.
Before there are some pioneers, tried to use proxy solutions, common nginx, HaProxy, Apache proxy and so on. This code doesn’t have much to do with microservices, but it does provide an idea: it inserts a completion function between the server and client, avoiding direct communication between the two. Of course, the function of the agent is very simple, the developer looks at it, the idea is good, but the function is not enough, how to do?
In this case, the first generation of Sidecar emerged. The role of Sidecar is similar to that of proxy, but its functions are much more complete. Basically, the functions implemented by the original microservice framework on the client side will be implemented accordingly.
The first generation of Sidecars was dominated by these companies, the most famous of which was Netflix.
In this place, we will mention the fourth one. The first three functions are all foreign companies. But in fact, Sidecar is not only played by foreign people. For example, when I worked in the Infrastructure Department of VipSHOP, in the first half of 2015, we made a major structural adjustment to our OSP servization framework and added a Sidecar named Local Proxy. Note that this time is the first half of 2015, similar to foreign countries. I believe there must be similar products in China, but not known to the outside world.
Sidecars of this era were limited and were designed for a specific infrastructure, often built directly on top of the infrastructure and frameworks of the companies that were developing sidecars. There are a lot of limitations, but one of the biggest problems is that it’s not universal: you can’t take it out and give it to someone else. For example, Airbnb must use Zookeeper, Netflix must use Eureka, and Vipshop’s Local Proxy is tied to the Osp framework and other infrastructure.
The main reason for these bindings has to do with the motivation behind these Sidecars. Netflix is for non-JVM language applications to plug into Netflix OSS, Soundcloud is for legacy Ruby applications to use JVM infrastructure. The OSP framework of VipSHOP, Local Proxy, was designed to solve the problem of non-Java language access and the aforementioned business department’s reluctance to upgrade. These problems are quite troublesome, but they have to be solved because Sidecar is a solution.
Because of this special background and requirements, the first generation of Sidecar was not universal, because it was built on the original system. Although it can not be taken out alone, it can still work well in the original system, so there is no motivation to do stripping. As a result, although there were many companies with Sidecar in the past, it was not widely spread, because even after it came out, others would not use it.
To mention one thing, in the middle of 2015, we had an idea to transform Local Proxy from OSP into general Sidecar. The plan is to support HTTH1.1, the Http Header will do, and the Body will be transparent to us, making it easy to use. Unfortunately, it was not realized due to priority and other reasons, mainly because there were a lot of other work, such as various business transformation, which was not necessary enough.
So it’s a pity that we didn’t realize this idea at that time. This was in 2015, and the time was very early. If there had been an implementation, we would probably have created the industry’s first Service Mesh ourselves. I’m sorry to think about it now.
But we’re not alone. There were other people who thought like us, but luckily, they had the opportunity to make something. This was the first generation of Service Mesh, the general-purpose Sidecar.
The first Linkerd on the left was the industry’s first Service Mesh, which coined the term Service Mesh. Date: January 15, 2016, 0.0.7 release, this is the earliest version seen on Github, actually this version is very close to the time we had the idea at that time. Then came version 1.0, which will be released in April 2017, six months from now. So Service Mesh is a very new term, and it’s perfectly normal that you haven’t heard of it.
Next up is Envoy, version 1.0, released in 2016.
It’s important to note that Both Linkerd and Envoy joined CNCF, Linkerd in January this year, and Envoy in September, just a month from now. Everyone here should understand CNCF’s position in Cloud Native, right? It can be said that the position of CNCF in Cloud Native is the same as that of the United Nations in the international order after World War II.
Then came the third Service Mesh, Nginmesh, from the familiar Nginx, the first version of which was released in September 2017. Because it is so new and just starting, there is nothing special to introduce.
Let’s take a look at the differences between Service Mesh and Sidecar.
First, Service Mesh is no longer viewed as individual components, but rather as a network of connections
The second Service Mesh is a generic component
Sidecar is optional and allows direct connection. In general, in development frameworks, native language clients prefer direct connection and other languages prefer Sidecar. For example, Java written framework, Java client directly connected, Php client through Sidecar. However, they can also choose Sidecar. For example, VipSHOP OSP uses Local Proxy for all languages. It is also optional in Sidecar. However, the Service Mesh requires full control of all traffic, that is, all requests must go through the Service Mesh.
I’m going to introduce you to Istio, which I would describe as the king of the pack. It comes from Google, IBM, and Lyft, and it’s the culmination of Service Mesh.
If you look at its icon, it’s a sailboat. Istio is a Greek word with an English meaning of “Sail”, which translates to “Sail”. What does the name have to do with the icon? Another Google phenomenon in the cloud era, K8S, Kubernete, also comes from The Greek word for captain, pilot, or helmsman, and the icon is a rudder.
The Istio name and icon are in the same vein as K8s. This thing released a 0.1 in May 2017 and a 0.2 just two weeks ago on October 4th. Everyone is familiar with software development and should understand what 0.1/0.2 is in a software iteration. 0.1 is roughly equivalent to a baby just born and 0.2 has not been weaned. But, even in this early version, my assessment of him is already the epitome, king style, why?
Why Istio king style? Most importantly, he brings unprecedented control to Service Mesh. The Service Mesh deployed in Sidecar mode controls all traffic between services. As long as the Service Mesh can be controlled, all traffic can be controlled and all requests in the system can be controlled. Istio comes with a centralized control panel that lets you control.
On the left is a single view, with a control panel added to the Sidecar to control the sidecar. The diagram is not particularly obvious, but if you look at the diagram on the right, the services panel gets a clearer feel when there are a lot of services. In the entire network, all traffic is controlled by the Service Mesh, which is controlled by the control panel. The biggest innovation Istio has brought is the ability to control the entire system from a control panel.
Istio is developed by three companies, the first two are terrible, Google and IBM, and they are both cloud platforms, Google’s cloud platform, IBM’s cloud platform, and GCP in particular, I think you all know. Is that what you mean by pedigree?
The strength of Istio is very strong, and I give a lot of praise here: the design concept is very innovative, creative, bold, pursuit and pattern. The strength of the Istio team is also amazing. If you have time, please go to the list of Istio committees and feel about it. Istio is also Google’s new blockbuster product, which could be the next phenomenon. What is Google’s current phenomenon product? K8s. Istio is likely to be the next K8S product.
Speaking of timing, what is potential? What era are we in today? It is the era of large-scale popularization of Internet technology, the era of microservice container in its heyday, and the era of Cloud Native. It is also the era of Internet transformation for traditional enterprises. Today’s enterprise users all want to transform. The general trend is very obvious, and everyone is turning or preparing to turn, but congenital deficiency. What is congenital deficiency? No genes, no ability, no experience, no talent, and facing all the pain points we talked about. So the timing is perfect for Istio to come out now. Don’t forget Istio also has the background of CNCF, k8S, which is about to dominate.
After Istio was launched, the community responded positively, with many followers. Envoy, one of the few Service meshes on the market, volunteered to serve as the base layer for Istio, while the other two Linkerd/ NginMesh implementations simply gave up fighting Istio and chose to collaborate and actively integrate with Istio. Many of the big names in the community, as listed here, have responded immediately to integrate with ISTIO or build their own products based on ISTIO. Why the first time? As soon as Istio came out with version 0.1, they took sides.
Istio’s architecture is divided into two main parts. The data panel below, which is for traditional Service Mesh, is Envoy at the moment, but as we mentioned earlier Linkerd and Nginmesh are both integrating with Istio, replacing the Envoy as the data panel.
The other big chunk is the control panel up there, which is what Istio really brings to the table. It is mainly divided into three parts. In the figure, I list their responsibilities and functions that can be realized.
Istio Official Documentation Chinese translation (
To sum up, Service Mesh has evolved step by step from the original proxy, to the limited Sidecar, to the general-purpose Service Mesh, and then to Istio with enhanced management capabilities, which will become the next generation of microservices in the future.
Note that it is only a year since the term Service Mesh was coined.
Why Service Mesh?
The first three pain points have been resolved, and with The Service Mesh they are no longer a problem. How to solve the pain point of upgrade? The Service Mesh is a separate process that can be upgraded independently of the application.
The Service Mesh allows clients to access the Service through remote calls. As long as a request can be sent to the Service Mesh, the request can be sent to the Service Mesh. The client side is extremely simplified, with perfect support for typical Rest requests in almost all languages. The server only has to do one thing, service registration. This makes multilingual support very comfortable. Now you can truly choose your programming language.
Here’s a miracle that allows you to have your cake and eat it: lower the threshold and increase functionality. Those of you who believe in conservation of quality will find it unscientific, but note that the two improvements were achieved simultaneously because Service Mesh did the most of the hard work. The Service Mesh is generic and reusable.
Service Mesh revolutionizes business development teams by lowering barriers to entry and providing a stable base for technology transformation. Ultimately, the goal is to free the business development team from the technical details of microservices implementation and get back to business.
The second change is to strengthen the operation and maintenance management team. If you are in operation and maintenance, you can seriously think about: How much control and management can you have over the system with Service Mesh? Note that the implementation of many functions is no longer application-specific and has moved into the Service Mesh, which is usually controlled by operations and maintenance.
Service Mesh is great for emerging niche languages. What is the most painful thing about a new language competing with the traditional mainstream programming language? It’s the ecology, the libraries, the frameworks. In microservices, it’s very difficult for a new niche language to compete with the likes of Java: it’s using your weakness to someone else’s advantage. With Service Mesh, niche languages have an opportunity to avoid this disadvantage and not compete with Java ecology, but to do what they do best.
Download the PPT from the public account to reply mesh
Author: Ao Xiaojian
Service Mesh: The next generation of microservices
MORE | Spring Cloud trilogy
Micro services practices for Small and medium Internet companies – lessons learned
Can Spring Cloud be used by small and medium sized companies in China?
What is Spring Cloud doing from an architectural evolution perspective?