Blog.csdn.net/mindfloatin…

Distributed applications, cloud computing and microservices are all the rage today. How much do you know about RPC, one of its technology cornerstones? A technical summary of RPC, I counted 5k+ words, slightly long, may not be suitable for leisure fragmentation time reading, you can save time to read 🙂

The full text is as follows:

Define origin target classification structure model disassemble component implement export Import protocol codec message header Message body Transmission execution exception summary Reference Two years ago I wrote two articles about RPC, now I find the structure and logic are slightly messy, so I reorganize and reintegrate them into one article. Those of you who want to know about RPC can take a look.

In recent years, servitization and microservitization have gradually become the mainstream mode of medium and large distributed system architecture, in which RPC plays a key role. In our daily development, we all use RPC implicitly or explicitly. Some programmers who are new to RPC will feel that RPC is mysterious, while some programmers who have many years of experience in using RPC have a lot of experience, but some don’t know much about its principle. Lack of understanding of the principle level often leads to some misuse in development.

Remote Procedure Call (RPC) is an interprocess communication method. It allows a program to call a procedure or function in another address space, usually on another machine on a shared network, without the programmer explicitly coding the details of this remote call. That is, programmers write essentially the same calling code whether they call local or remote functions.

Origin The conceptual term RPC was coined in the 1980s by Bruce Jay Nelson (see [1]). What was the original motivation for developing RPC? In Nelson’s paper Implementing Remote Procedure Calls (reference [2]), he mentioned several points:

Simplicity: The semantics of RPC concepts are clear and simple, making it easier to set up distributed computing. Efficient: Procedure calls look simple and efficient. Generality: In single-machine computing, “procedure” is often the most important communication mechanism between different algorithm parts.

Generally speaking, the average programmer is familiar with the local procedure call, so we make RPC completely similar to the local call, so it is easier to accept, no obstacles to use. Nelson’s paper was published 30 years ago, and his ideas seem truly visionary today, and the RPC framework we use today is largely based on that goal.

The primary goal of RPC is to make it easier to build distributed computing (applications) without losing the semantic simplicity of local calls while providing the power of remote calls. To achieve this goal, the RPC framework needs to provide a transparent call mechanism so that consumers do not have to explicitly distinguish between local and remote calls.

Classified RPC calls fall into the following two categories:

Synchronous call: The client waits for the call to complete and gets the result. Asynchronous invocation: The client does not have to wait for the execution result to return after the invocation, but can still get the return result through callback notification. If the client does not care about the result of the call, it becomes a one-way asynchronous call, and one-way calls do not return the result. The difference between asynchrony and synchronization is waiting for the server to complete and return the result.

Structure Below we take a step-by-step look at the structure of RPC from a theoretical model to a real component.

The model was first pointed out in Nelson’s paper that the program to implement RPC includes five theoretical model parts:

User 

User-stub 

RPCRuntime 

Server-stub 

Server

The relationship between these five parts is shown in the figure below:

Here the User is the Client. When User wants to make a remote call, it actually calls user-stub locally. User-stub is responsible for encoding the invoked interfaces, methods, and parameters using the agreed protocol specification and transmitting them to the remote instance through the local RPCRuntime instance. After receiving the request, the remote RPCRuntime instance sends it to the Server-Stub for decoding and invokes the request to the local Server. The call result is returned to the User.

Disassembling a coarse-grained conceptual structure of RPC implementation theoretical model is given above. Here, we further refine which components it should be composed of, as shown in the figure below.

The RPC server exports remote interface methods through RpcServer, and the client imports remote interface methods through RpcClient. The client invokes remote interface methods as if they were local methods. The RPC framework provides a proxy implementation of the interface, and the actual invocation is delegated to proxy RpcProxy. The proxy encapsulates the call information and passes the call to RpcInvoker for actual execution. RpcInvoker on the client side maintains the RpcChannel with the server side through the RpcConnector, and performs the protocol encoding (encode) with RpcProtocol and sends the encoded request message to the server side through the channel.

RPC server receiver RpcAcceptor receives call requests from clients and performs protocol decoding (DECODE) using RpcProtocol. The decoded call information is passed to the RpcProcessor to control the call process, and finally the call is delegated to RpcInvoker to actually execute and return the call result.

Components Above, we further disassemble the components of RPC implementation structure. The responsibilities of each component are explained in detail below.

RpcServer Is responsible for exporting remote interfaces. RpcClient Is responsible for importing remote interfaces. RpcProxy Is responsible for implementing remote interfaces. Responsible for encoding the call information and sending the call request to the server and waiting for the call result to return to the server: RpcConnector is responsible for maintaining the connection channel between the client and the server and sending data to the server. RpcAcceptor is responsible for receiving client requests and returning the request results RpcProcessor is responsible for controlling the call process at the server side, including managing the call thread pool and timeout time, etc. Implementation of RpcChannel data transmission channel This conceptual model given in Nelson’s paper has become the standard model for everyone’s reference later. The CORBAR (see [3]) implementation that I first encountered with distributed computing over a decade ago had a similar structure. In order to solve RPC of heterogeneous platform, CORBAR uses Interface Definition Language (IDL) to define remote Interface and map it to specific platform Language.

Later, most of the cross-language platform RPC basically adopted this approach, such as the familiar Web Service (SOAP) and the open source Thrift in recent years. Most of them are defined through IDL and provide tools to map user-stub and server-stub to generate different language platforms, as well as provide RPCRuntime support through framework libraries. However, it seems that each different RPC framework defines a different IDL format, leading to further increases in learning costs for programmers. While Web Services attempt to establish industry standards, rogue standards are complex and inefficient, otherwise there would be no need for more efficient RPC frameworks such as Thrift.

IDL was the last choice to implement RPC in a cross-platform language, and solving a wider range of problems naturally led to more complex solutions. For RPC in the same platform, there is obviously no need to make an intermediate language, such as Java native RMI, which is more direct and simple for Java programmers and reduces the learning cost of using it.

After further dismantling the components and dividing responsibilities above, the following will take the implementation of the CONCEPTUAL model of RPC framework on the Java platform as an example to analyze in detail the factors to be considered in the implementation.

Export Indicates that remote interfaces are exposed. Only exported interfaces can be invoked remotely, but unexported interfaces cannot be invoked remotely. A snippet of code to export an interface in Java might look like this:

DemoService demo = new … ; RpcServer server = new … ; server.export(DemoService.class, demo, options); We can export the entire interface, or we can export only certain methods in the interface in a more granular manner, as follows:

Server.export (demoservice. class, demo, “hi”, new class
[] { String.class }, options); There is also a special call in Java called polymorphism, that is, an interface may have multiple implementations, so which is called when remote calls? The semantics of this local call are implicitly implemented through reference polymorphism provided by the JVM, so cross-process calls cannot be implicitly implemented for RPC. If there are two implementations of the previous DemoService interface, then we need to specially mark the different implementations when exporting the interface as follows:

DemoService demo = new … ; DemoService demo2 = new … ; RpcServer server = new … ; server.export(DemoService.class, demo, options); server.export(“demo2”, DemoService.class, demo2, options); Demo2 is another implementation above, and we mark it as demo2 to export, so the remote call also needs to pass this flag to call the correct implementation class, thus solving the semantics of polymorphic call.

Import Imports as opposed to exports, client code must obtain the method or procedure definition of the remote interface in order to be able to make calls. At present, most RPC frameworks of cross-language platforms use code generator to generate user-stub code according to IDL definition. In this way, the actual import process is completed at compile time through code generator. Some of the cross-language RPC frameworks I have used such as CORBAR, WebService, ICE, and Thrift do this.

Code generation is an inevitable choice for RPC frameworks of cross-language platforms, while RPC of the same language platform can be implemented by sharing interface definitions. A snippet of code to import an interface in Java might look like this:

RpcClient client = new … ; DemoService demo = client.refer(DemoService.class); demo.hi(“how are you?” ); Import is a keyword in Java, so in the code snippet we refer to the import interface. The import approach here is also essentially a code generation technique, but it is generated at run time and looks cleaner than code generation at static compile time. There are at least two techniques available in Java to provide dynamic code generation: JDK dynamic proxies and bytecode generation. Dynamic proxy is more convenient to use than bytecode generation, but it is inferior to direct bytecode generation in terms of performance and bytecode generation in terms of code readability. On balance, as a general framework at the bottom, individuals tend to choose performance first.

Protocol A protocol is a data encapsulation method used in RPC calls during network transmission. It consists of three parts: codec, message header, and message body.

The codec client proxy needs to encode the call information before making the call. This requires consideration of what information needs to be encoded and transmitted to the server in what format for the server to complete the call. For efficiency, it is better to encode as little information as possible (transmit less data) and encode as simple rules as possible (execute efficiently).

Let’s first look at what we need to encode:

Call code 1. Interface methods include interface name and method name 2. Method parameters include parameter type and parameter value 3. The call attribute includes the information of the call attribute, such as the additional implicit parameter of the call and the timeout time of the call

Return code 1. Return value defined in the result interface method 2. Return code Exception return code 3. Return exception information Invoke exception information

In addition to the above necessary call information, we may also need some meta information to facilitate coding and decoding and possible future extensions. So our encoded message is divided into two parts, one is the meta information and the other is the necessary information for the call. When designing an RPC protocol message, we put the meta information in the protocol header and the necessary information in the protocol body. Here is a conceptual RPC protocol header design format:

Magic protocol magic number, for decoding design header size protocol header length, for extension design version protocol version, for compatibility design ST message body serialization type HB heartbeat message flag, for long connection transport layer heartbeat design OW one-way message flag, RP response message flag, No default Yes Request message Status Code Response message status code Reserved for byte alignment Message ID Message ID Body size Message body length Message body usually adopts serialization encoding, and the following serialization methods are common:

XML such as Webservie SOAP JSON such as JSON-RPC binary such as thrift; hession; After kryo and other formats are determined, it is easy to encode and decode. Because the header length is certain, we are more concerned about the serialization way of the message body. Serialization we care about three things:

Efficiency: The efficiency of serialization and deserialization, the faster the better. Length: Length of serialized bytes. The smaller the better. Compatibility: Compatibility of serialization and deserialization, compatibility of interface parameter objects with added fields. The above three points sometimes can not have it both ways, which involves the specific serialization library implementation details, will not be further analyzed in this article.

After the transmission protocol coding, it is natural to transmit the encoded RPC request message to the server, and the server returns the result message or confirmation message to the client after execution. RPC’s application scenario is essentially a reliable request-response message flow, similar to HTTP. Therefore, the LONG-connection TCP protocol is more efficient. Unlike HTTP, we define a unique ID for each message at the protocol level, so it is easier to reuse the connection.

Since long connections are used, the first question is how many connections do you need between the client and the server? In fact, there is no difference between single-connection and multi-connection. For applications with a small amount of data transfer, single-connection is sufficient. The biggest difference between single-connection and multi-connection is that each connection has its own private send and receive buffer. Therefore, when a large amount of data is transmitted, it is more efficient to spread the data among different connection buffers.

So, if you don’t have enough data transfers to keep the single-connection buffer saturated all the time, then using multiple connections doesn’t yield any noticeable improvement and can add to the overhead of connection management.

The connection is initiated and maintained by the client. If the client and server are directly connected, the connection is not interrupted (except for physical link faults). If the client and server connection passes through some load relay device, it may be interrupted by these intermediate devices if the connection is inactive for a period of time. It is necessary to periodically send heartbeat data to each connection to maintain the connection. Heartbeat messages are internal messages used by RPC framework library. There is also a special heartbeat bit in the preceding protocol header structure, which is used to mark heartbeat messages and is transparent to business applications.

What the client-side stub does is simply encode the message and transmit it to the server; the actual invocation takes place on the server side. The server side stub subdivides RpcProcessor and RpcInvoker into two components, one responsible for controlling the call process and the other responsible for the actual call. Let’s take Java as an example of what it takes to implement these two components.

Dynamic interface calls that implement code in Java are currently typically called through reflection. In addition to reflection in the native JDK, some third-party libraries provide better performance for reflection calls, so RpcInvoker encapsulates the implementation details of reflection calls.

What factors should be considered for control of the call process, and what call control services should be provided by RpcProcessor? The following points are put forward to inspire thinking:

Efficiency improvements Each request should be executed as quickly as possible, so we can’t create threads to execute each request and need to provide thread pool services. Resource isolation When we export multiple remote interfaces, how to prevent a single interface call from occupying all thread resources and causing other interfaces to execute blocking. Timeout control When an interface is slow and the client has timed out and given up waiting, it makes no sense for the server thread to continue executing. Exceptions No matter how much RPC tries to disguise remote calls as local calls, they are still quite different, and there are some exceptions that are never encountered in local calls. Before we talk about exception handling, let’s compare some differences between local calls and RPC calls:

Local calls are always executed, while remote calls are not, and the call message may not be sent to the server for network reasons. Local calls only throw exceptions declared by the interface, while remote calls also run out of other exceptions from the RPC framework runtime. The performance difference between local and remote calls can be significant, depending on the proportion of RPC inherent consumption. It is these differences that make RPC more of a consideration. When a remote interface is called to throw an exception, the exception may be a business exception or a runtime exception (such as a network outage) thrown by the RPC framework. A business exception indicates that the server has performed the call, which may not have performed properly for some reason, while an RPC runtime exception indicates that the server may not have performed the call at all, and the exception handling strategy for the caller naturally needs to be distinguished.

Since the inherent consumption of RPC is several orders of magnitude higher than that of local calls, the inherent consumption of local calls is in the nanosecond level, whereas the inherent consumption of RPC is in the millisecond level. Therefore, it is not suitable for exporting remote interfaces to be serviced by independent processes for excessively light computing tasks. Only the time spent on computing tasks is much higher than the inherent consumption of RPC is worth exporting to provide services for remote interfaces.

So far we have proposed a conceptual framework for RPC implementation and analyzed some implementation details that need to be considered in detail. No matter how elegant the concept of RPC is, “there are still several snakes hidden in the grass”, only a deep understanding of the nature of RPC can be better applied.

If you look at this, you might want to follow this conceptual model and implementation analysis. Can you really develop and implement an RPC framework library? I can answer that question in the affirmative. I really can. Because I developed and implemented a minimal RPC framework library according to this model to learn verification, the relevant code is put on Github, interested students can read by themselves. This is my own experimental nature of learning verification with open source project, the address is github.com/mindwind/cr… Craft – Atom-RPC is a miniature RPC framework library implemented according to this model, with much less code than industrial-level RPC framework library, which is easy to read and learn.

Thank you for your time, which makes my writing more meaningful.)

Reference [1] Bruce Jay Nelson. Bruce Jay Nelson [2] BIRRELL, NELSON. Implementing Remote Procedure Calls. 1983 [3] CORBAR. CORBAR [4] DUBBO. Mindwind -_ – source: CSDN, blog.csdn.net/mindfloatin… Copyright notice: This article is the blogger’s original article, reprint please attach the blog link!