1. What is dubbo
Dubbo is a high-performance, lightweight, open source RPC framework that provides efficient service governance solutions such as automatic service registration and automatic discovery. It can be seamlessly integrated with the Spring framework.
2. What are the usage scenarios of Dubbo?
1. Transparent remote method calls: Remote methods are called just like local methods, with simple configuration and no API intrusion. 2. Soft load balancing and fault tolerance mechanism: It can replace F5 and other hardware load balancers on the Intranet to reduce costs and single points. 3. Automatic service registration and discovery: It is no longer necessary to write out the address of the service provider. The registry queries the IP address of the service provider based on the interface name, and can smoothly add or delete service providers.
What are the core components of Dubbo?
Provider: indicates the service provider
Register: invokes the remote service consumer
3. Consumer: Service Registration and discovery Registry
4. Monitor: Monitors center and access call statistics
5. Container: Service running container
4. Process of Dubbo service registration and discovery
Container is responsible for starting, loading, and running the service provider. 2. When the service Provider starts, it registers its services with the registry. 3. Service Consumer At startup, consumers subscribe to the registry for the services they need. The Registry returns the list of service provider addresses to consumers. If there are changes, the Registry will push the change data to consumers based on the long connection. 5. The service Consumer selects one provider from the provider address list to call based on the soft load balancing algorithm. If the call fails, select another one to call. 6. Service Consumer and Provider accumulate call times and call time in memory, and regularly send statistical data to Monitor every minute.
5. What are the registration centers in Dubbo?
1. Multicast Registry: The Multicast registry does not need any central node, as long as the broadcast address, can carry out service registration and discovery, based on Multicast transmission in the network. 2. Zookeeper Registry: Based on distributed coordination system Zookeeper, the Watch mechanism of Zookeeper is used to realize data changes. 3. Redis registry: Based on Redis, it uses key/ Map storage, key storage service name and type, KEY storage service URL in map, value service expiration time. Redis-based publish/subscribe model notifies data changes. 4. Simple registry. You are advised to use Zookeeper as the registry
6. Can publishers and subscribers communicate when Dubbo’s registry cluster is down? It can communicate. When Dubbo is started, consumers will pull data such as the address interface of registered producers from Zookeeper and cache it locally. Each invocation is made according to the address of the local store.
7. What load balancing strategies does the Dubbo cluster provide?
1. Random LoadBalance: Random selection of provider policies is conducive to dynamic adjustment of provider weights. The cross section collision rate is high, the more calls, the more uniform distribution. You can set weights.
RoundRobin LoadBalance: Selects provider policies evenly, but has the problem of request accumulation.
3. LeastActive LoadBalance: the LeastActive call policy, the slower provider receives fewer requests. Each service maintains an active count counter. When machine A starts processing requests, this counter increases by 1, while MACHINE A is not finished processing. If the processing is complete, the counter decreases by 1. Machine B receives the request and processes it quickly. So the active numbers for A and B are 1,0. When A new request is made, machine B is chosen to execute it (with the smallest number of active requests) so that the slower machine A receives fewer requests.
ConstantHash LoadBalance: a consistent Hash policy that ensures that the same parameter requests are always sent to the same provider. When a machine goes down, it can be allocated to other providers based on virtual nodes to avoid drastic changes in providers.
8. What are Dubbo’s cluster fault tolerance schemes?
1. Failover Cluster: A Failover is performed automatically when a failure occurs. If a failure occurs, retry other servers. Typically used for read operations, but retries introduce longer delays. 2. Failfast Cluster: Fails quickly. An error is reported immediately after a call fails. Typically used for non-idempotent writes, such as new records. 3. Failsafe Cluster: Indicates Failsafe security. If an exception occurs, ignore it. It is used to write audit logs. 4. Failback Cluster: Automatically recovers from failures, records failed requests in the background, and periodically resends them. Typically used for message notification operations. 5. Forking Cluster: Call multiple servers in parallel and return as long as one is successful. It is usually used for read operations that require high real-time performance but waste more service resources. The maximum parallelism can be set by forks= “2”. 6. Broadcast Cluster: Broadcast calls all providers one by one. An error is reported on any one. Typically used to notify all providers to update local resource information such as caches or logs.
9. What are the core configurations of Dubbo?
Dubbo: Service/service configuration is used to expose a service, define the meta information of the service, a service can be exposed with multiple protocols, and a service can be registered with multiple registries
The dubbo:reference/ reference configuration is used to create a remote service proxy where a reference can point to multiple registries
Dubbo :protocol/ Protocol configuration This parameter is used to configure the protocol information for providing services. The protocol is specified by the provider and passively accepted by the consumer
Dubbo: Application/application configuration is used to configure information about the current application, whether the application is a provider or a consumer
Dubbo :module/ Module Configuration This parameter is optional
Dubbo: Registry/registry configuration is used to configure connection registrie-related information
Dubbo :monitor/ monitoring center Configuration This parameter is optional
Dubbo: Provider/provider configuration If the ProtocolConfig and ServiceConfig properties are not configured, the default value is used. It is optional
Dubbo: Consumer/consumer configuration If the ReferenceConfig attribute is not configured, the default value is used, which is optional
Dubbo :method/ Method configuration for ServiceConfig and ReferenceConfig specify method-level configuration information
Dubbo :argument parameter configuration Specifies the method parameter configuration
10, Dubbo timeout setting in two ways:
1. Set the timeout on the service provider side. In Dubbo’s user documentation, it is recommended to configure as much as possible on the service side as the service provider is more aware of the service features it provides than the consumer. 2. Set the timeout period on the service consumer. If the timeout period is set on the service consumer, the service consumer takes precedence over the service consumer. Because the service caller has more flexibility to set timeout control. If the consumer times out, the server thread does not customize and generates a warning.
The default failure will be retried twice.
11. What protocols does Dubbo support, and what are their strengths and weaknesses?
Dubbo: Single long connection and NIO asynchronous communication, suitable for large concurrent and small data volume service calls, and far more consumers than providers. TCP, asynchronous Hessian serialization. Dubbo The Dubbo protocol is recommended.
RMI: using JDK standard RMI protocol implementation, transmission parameters and return parameter objects need to implement Serializable interface, the use of Java standard serialization mechanism, the use of blocking short connection, transmission packet size mixed, the number of consumers and providers is about the same, can transfer files, transmission protocol TCP. Multiple short connection TCP transport, synchronous transport, suitable for general remote service calls and RMI interoperation. Java serialization suffers from security vulnerabilities when relying on earlier versions of the Common-Collections package.
WebService: Remote call protocol based on WebService, integrated with CXF implementation, providing interoperability with native WebService. Multiple short connections, HTTP based transmission, synchronous transmission, suitable for system integration and cross-language invocation.
HTTP: A remote invocation protocol based on HTTP form submission, implemented using Spring’s HttpInvoke. Multiple short connections, transport protocol HTTP, mixed sizes of incoming parameters, more providers than consumers, JS calls to applications and browsers.
Hessian: Integrated Hessian service, based on HTTP communication, using Servlet to expose the service, Dubbo embedded Jetty as the default implementation of the server, providing interoperation with Hession service. Multiple short connections, synchronous HTTP transfers, Hessian serialization, large incoming parameters, more providers than consumers, more provider pressure, passable files.
Memcache: RPC protocol based on Memcache implementation.
Redis: RPC protocol based on Redis.
12. What is RPC
Remote Procedure Call Protocol (RPC) is a Protocol that requests services from Remote computer programs over the network without understanding the underlying network technology. In short, RPC enables programs to access remote system resources just as they access local system resources. Some of the key aspects include: communication protocol, serialization, resource (interface) description, service framework, performance, language support, etc.
Simply put, RPC is a call from one machine (client) to a function or method (collectively called a service) on another machine (server) with passing arguments and getting the result returned.
13. RRC Architecture components
A basic RPC architecture should contain at least four components: 1. Client: service caller (service consumer) 2. Client Stub: stores the address information of the service end and packages the request parameter data information of the Client into network messages. The Server Stub receives the request message sent by the client, unpacks it, and then invokes the local service for processing. 4. Server: the real service provider
Specific call process:
1. The service consumer invokes the service to be consumed by invoking the local service. 2. After receiving the call request, the client stub is responsible for serializing (assembling) information such as methods and input parameters into a message body that can be transmitted through the network. 3. The client stub finds the remote service address and sends the message to the server over the network. 4. The server stub decodes the message after receiving it (deserialization operation); 5. The server stub invokes the local service for relevant processing according to the decoding result; 6. The local service executes the specific business logic and returns the processing result to the server stub. 7. The server stub repackages the returned result into a message (serialized) and sends it to the consumer over the network; 8. The client stub receives the message and decodes (deserializes) it. 9. The service consumer gets the final result;
14. What problems need to be solved by RPC framework?
1. How to determine the communication protocol between the client and the server? 2. How to make network communication more efficient? 3. How are the services provided by the server exposed to the client? 4. How does the client discover these exposed services? 5. How to serialize and deserialize request objects and response results more efficiently?
15. The implementation basis of RPC?
1. Highly efficient network communication is required. For example, Netty is generally selected as the network communication framework. 2. Need a more efficient serialization framework, such as Google’s Protobuf serialization framework; 3. Reliable addressing (mainly for service discovery), such as using Zookeeper to register services, etc. 4, if it is RPC call with session (state), also need to have session and state hold function;
16. What key technologies does RPC use?
Dynamic proxy Generates Client stubs and Server stubs using Java dynamic proxy technology. You can use either the native dynamic proxy mechanism provided by JDK or the open source CGLib proxy and Javassist bytecode generation technology.
Serialization and deserialization In a network, all data will be converted to bytes for transmission, so in order to enable the transmission of parameters in the network, these parameters need to be serialized and deserialized operations.
Serialization: The process of converting an object into a sequence of bytes is called object serialization, which is the process of encoding. Deserialization: The process of restoring a sequence of bytes to an object is called deserialization of an object, which is the process of decoding. Current efficient open source serialization frameworks: Kryo, FastJson, Protobuf, etc.
3. NIO communication For the sake of concurrency performance, traditional blocking IO is not suitable, so we need asynchronous IO, namely NIO. Java provides a solution to NIO, and Java 7 provides better NIO.2 support. You can choose Netty or MINA to solve NIO data transfer problems.
Service registry Optional: Redis, Zookeeper, Consul, Etcd. ZooKeeper is generally used to register and discover services and solve single point of failure and distributed deployment problems (registries).
17. What are the mainstream RPC frameworks
1. RMI is implemented using the Java. RMI package, based on Java Remote Method Protocol (Java) and Java’s native serialization.
Hessian is a lightweight remoting onHTTP tool that provides RMI functionality in a simple way. Based on HTTP protocol, using binary codec.
Dubbo Dubbo is an open source high-performance service framework of Alibaba, which enables applications to realize the output and input functions of services through high-performance RPC, and can be seamlessly integrated with the Spring framework.
18. Implementation principle architecture diagram of RPC
Two servers A and B have one application deployed on server A. If an application wants to call the function/method provided by the application on server B, it cannot call directly because it does not have the same memory space. Therefore, it needs to express the call semantics and transfer the call data over the network. For example, server A wants to call A method on server B:
1, establish communication first to solve the problem of communication: that is, if machine A wants to call machine B, it must first establish A communication connection. This is done primarily by establishing a TCP connection between the client and the server, where all data exchanged by a remote procedure call is transmitted. A connection can be an on-demand connection that breaks after the call, or a long connection that is shared by multiple remote procedure calls. Usually this connection can be on-demand connection (you need to call when they first establish a connection, immediately after the call breaking), can also be a long connection (client and server to establish connection to maintain long-term holding, or when any packets to send, can cooperate with heartbeat detection mechanism regularly check whether the connection of the established effective survival). Multiple remote procedure calls share the same connection.
2. Service addressing Addresses the problem of addressing, that is, how the application on server A tells the underlying RPC framework how to connect to server B (such as A host or IP address) and what the name of the method is. Usually we need to provide the B machine (hostname or IP address) and a specific port, and then specify the name of the method or function to be called and the input and output parameters to complete a call to the service. Reliable addressing (primarily for the discovery of services) is a cornerstone of RPC’s implementation, such as registering services using Redis or Zookeeper.
2.1 From the perspective of service providers: When service providers start up, they need to register their services in the designated registry so that service consumers can search through the service registry; When the service provider stops providing the service for various reasons, it needs to unregister the stopped service to the registry. The service provider needs to send heartbeat detection to the service registry periodically. If the service registry considers that the service provider has stopped the service after not receiving heartbeat from the service provider for a period of time, the service registry will remove the service from the registry.
2.2 From the point of view of the caller: the caller of the service finds the address of the service provider and other information from the service registry according to the service he subscribes to when starting the service. When a service consumed by a service caller goes online or offline, the registry notifies the service caller; When the service caller goes offline, the subscription is unsubscribed.
3. Network transmission
3.1 serialization When an application on machine A initiates an RPC call, the calling method and its input parameters and other information need to be transmitted to machine B through the underlying network protocol, such as TCP. As the network protocol is based on binary, All the data we transmit needs to be serialized or marshaled into binary before it can be transmitted over the network. The serialized or marshalled binary data is then sent to machine B through addressing operations and network transport.
3.2 Deserialization After machine B receives the request from the application of machine A, it needs to deserialize the received parameters and other information (the reverse operation of serialization), that is, restore the binary information to the expression in memory. Then find the corresponding method (part of the address) to make the local call (usually through Proxy generation, usually including JDK dynamic Proxy, CGLIB dynamic Proxy, Javassist bytecode generation technology, etc.), and get the return value of the call.
4, service invocation B machine for local calls (through A Proxy agent and reflection calls) after got the return value, at this time also need to be sent back to the return value is A machine, also need the serialized operation, and then through the network to send binary data back to A machine, and when A machine to receive these return values, The deserialization operation is carried out again, and the expression mode is restored to the memory. Finally, it is handed to the application on machine A for relevant processing (generally business logic processing operation).
19. CAP theory
CAP theory means that a distributed system can only satisfy at most two of Consistency, Availability and Partition tolerance at the same time.
(1) Satisfy CA to abandon P, that is, satisfy consistency, availability and abandon fault tolerance. But it also means that your system is not distributed, because the idea of distributing is to separate functions and deploy them on different machines. (2) Meet CP to abandon A, that is, meet consistency and fault tolerance, abandon availability. This can be satisfied if your system allows access failures for a period of time. It’s like when many people buy tickets at the same time, the background network fails, and the system crashes when you buy. (3) Meet AP and abandon C, that is, meet availability and fault tolerance, abandon consistency. This means that your system may have data inconsistencies when accessing concurrently. As it turns out, consistency is mostly sacrificed. Like 12306 and Taobao.com, it is like buying a train ticket. Originally, you saw that there was still a ticket, but in fact, it has been bought at this moment. When you fill in the information to buy, you find that the system reminds you that there is no ticket. This is the sacrifice of consistency. But sacrificing consistency is not necessarily for the best. It’s just like the transaction mechanism in mysql, when You transfer $100 from John to John, you have to make sure that you lose $100 from John and you gain $100 from John. So you need consistency of data, and you can transfer money whenever you want, and you need availability. But failing to transfer money is allowed.