Welcome to pay attention to the public account “JAVA Front” to view more wonderful sharing articles, mainly including source code analysis, practical application, architecture thinking, workplace sharing, product thinking and so on, at the same time, welcome to add my wechat “JAVA_front” to communicate and learn together


1 Service avalanche

Before analyzing service downgrades, let’s talk about what a service avalanche is. Now we assume that there are four systems A, B, C and D, and there are the following call links between them:

In normal cases, the calls between systems are normal and the system runs smoothly. However, the traffic of users accessing system A surges and is transmitted to system B, C, and D in an instant. A large number of server nodes in system B and C resist these traffic, while a small number of server nodes in system D do not resist these traffic, leading to the gradual depletion of system D’s resources and only slow service, resulting in a long delay in response to users.

In this case, the user finds that the response is slow and tries again thinking that the network is bad. In this case, multiple traffic is sent to the system. As a result, the resources of the upstream system are gradually exhausted, and the whole access link becomes unavailable.

In the service avalanche scenario described above, we find that it is unacceptable for a node in a link to fail, resulting in the entire link eventually becoming unavailable.


Two nonlinear

Let’s understand the service avalanche from another concept: nonlinearity. This concept is ubiquitous in our lives.

You have an 8 o ‘clock train to catch. If you leave at 6:30 you can arrive at the station at 7:00, so you conclude that it will only take you 30 minutes to get there.

You want to stay up a little later in the morning and expect to leave at 7:10, thinking you can get to the station at 7:40. But the most likely outcome is that you will miss the train. It takes at least an hour to get to the station because of the traffic jam during the morning rush hour.

A small snow ball weighs 100 grams. If you get hit 100 times in a snowball fight, it won’t affect you at all.

But if you get hit by a 10kg snowball once, it could cause serious injury.

That’s nonlinearity. Things are not simply superimposed, and when they reach a certain threshold they produce a completely different result.

Let’s analyze a second kill scenario on the Internet. Suppose you design a seckill system with a response time of 10 milliseconds for 30 people per second. From the time the user clicks the button to the result, it takes about 10 milliseconds. This time lapse is largely imperceptible and the performance is good. You feel good, go ahead and design:

30 traffic per second response time 10 ms 300 traffic per second response time 100 ms 3000 traffic per second response time 1000 msCopy the code

If you follow this approach to system design, major mistakes will occur. When 3000 visits per second occur, the system response time may not be 1000 ms, and the system may crash and not be able to handle any more requests. The most common scenario is a system avalanche when the cache system fails:

(1) When the cache layer with low time consumption fails, the traffic is directly transferred to the database layer with high time consumption, and the user’s waiting time will increase

(2) The increase of waiting time leads to more frequent visits by users, and more traffic will hit the database layer

(3) This leads to further increases in user wait times, which again leads to more frequent visits

(4) When the traffic reaches a limit, the system crashes and can no longer process any requests

Traffic and response time are not simply superimposed, and when a critical value is reached, the technical system simply collapses.


3 Service avalanche response plan

To ensure the stability and high availability of the system, we need to adopt some high availability strategies to build a stable high availability engineering system. The following schemes are generally adopted.


3.1 Redundancy + Automatic failover

The most basic redundancy strategy is the master-slave mode. The principle is to prepare two machines, deploy the same code, in the functional level is the same, can provide the same service externally.

A machine is started to provide services, and this is the master server. Another machine starts up and stands by, not providing service, listening to the status of the master server, which is the slave server. If the primary server is faulty, the secondary server immediately replaces the primary server and continues to provide services for users.

An automatic failover policy means that when an exception occurs on the primary system, it should be automatically detected and switched to the standby system. Do not only rely on manual switchover, otherwise the troubleshooting time will significantly increase.


3.2 Degradation Policy

The so-called downgrade strategy is that when the system encounters unbearable pressure, it chooses to temporarily shut down some non-critical functions, or delay the provision of some functions, so that all the resources at the moment are provided to the most critical services now.

Placing an order in a seckill scenario is the core and most critical function. When the system pressure is about to reach the critical value, you can temporarily shut down some non-core functions such as the query function.

When the seckill activity is over, the temporarily closed function will be turned on. This not only ensures the smooth progress of the second kill activities, but also protects the system from collapse.

There is also a downgrading strategy, where the downstream service on which the system depends has an error or is no longer available at all, and then the downstream service cannot be invoked, otherwise an avalanche may occur. So go straight back to the bottom, downgrading the downstream service.

Compare two concepts: service degradation and service outage. I think service circuit breaker is one way to degrade service, and there are many other ways to degrade service, such as switch degrade, traffic degrade, etc.


3.3 Delay Policy

When the user places the order successfully, they need to pay. Let’s say the second kill system places an order with 3000 visits per second. Let’s think about the question, is it necessary to pass the pressure of 3000 visits per second on to the payment server?

The answer is no. For example, users can jump to a payment page that prompts them to complete the payment within 10 minutes.

That’s 3,000 visits per second divided into a few minutes, effectively protecting the system. The technical architecture can also use message queues as buffers to let the payment service process the business as it can.


3.4 Isolation Policy

Physical isolation: Applications are deployed on different physical machines and in different equipment rooms, and resources do not affect each other.

Thread isolation: Different types of requests are classified and assigned to different thread pools for processing. If one type of request is time-consuming or abnormal, the access of another type of request is not affected.


4 Service Degradation

In this paper, we focus on the Dubbo framework to talk about service degradation. Now we have service providers providing the following services:

public interface HelloService {
	public String sayHello(String name) throws Exception;
}

public class HelloServiceImpl implements HelloService {
	public String sayHello(String name) throws Exception {
		String result = "hello[" + name + "]";
		returnresult; }}Copy the code

The configuration file declares the service interface:

<dubbo:service interface="com.java.front.demo.provider.HelloService" ref="helloService" />
Copy the code


4.1 Configuring a Degrade Policy

The Dubbo framework comes with a service degradation policy. It provides three common degradation policies. Let’s take a look at how to configure them.

(1) Forced demotion strategy

<dubbo:reference id="helloService" mock="force:return 1" interface="com.java.front.demo.provider.HelloService" />
Copy the code

(2) Abnormal demotion strategy

<dubbo:reference id="helloService" mock="throw com.java.front.BizException" interface="com.java.front.dubbo.demo.provider.HelloService" />
Copy the code

(3) Customize the degrade policy

package com.java.front.dubbo.demo.consumer;
import com.java.front.demo.provider.HelloService;

public class HelloServiceMock implements HelloService {

    @Override
    public String sayHello(String name) throws Exception {
        return "mock"; }}Copy the code

The configuration file specifies a custom degrade policy:

<dubbo:reference id="helloService" mock="com.java.front.dubbo.demo.consumer.HelloServiceMock" interface="com.java.front.demo.provider.HelloService" />
Copy the code


4.2 Source Code Analysis

public class MockClusterInvoker<T> implements Invoker<T> {

    @Override
    public Result invoke(Invocation invocation) throws RpcException {
        Result result = null;

        // Check for mock attributes
        String value = directory.getUrl().getMethodParameter(invocation.getMethodName(), Constants.MOCK_KEY, Boolean.FALSE.toString()).trim();

        // Execute the consumption logic directly without mock attributes
        if (value.length() == 0 || value.equalsIgnoreCase("false")) {

            // Service consumption FailoverClusterInvoker is executed by default
            result = this.invoker.invoke(invocation);
        }

        // Return directly without executing the consumption logic
        else if (value.startsWith("force")) {
            if (logger.isWarnEnabled()) {
                logger.warn("force-mock: " + invocation.getMethodName() + " force-mock enabled , url : " + directory.getUrl());
            }
            // Execute mock logic directly
            result = doMockInvoke(invocation, null);
        } else {
            try {
                // Service consumption FailoverClusterInvoker is executed by default
                result = this.invoker.invoke(invocation);
            } catch (RpcException e) {
                if (e.isBiz()) {
                    throw e;
                }
                if (logger.isWarnEnabled()) {
                    logger.warn("fail-mock: " + invocation.getMethodName() + " fail-mock enabled , url : " + directory.getUrl(), e);
                }
                // Service consumption fails to execute mock logicresult = doMockInvoke(invocation, e); }}returnresult; }}public class MockInvoker<T> implements Invoker<T> {

    @Override
    public Result invoke(Invocation invocation) throws RpcException {
        String mock = getUrl().getParameter(invocation.getMethodName() + "." + Constants.MOCK_KEY);
        if (invocation instanceof RpcInvocation) {
            ((RpcInvocation) invocation).setInvoker(this);
        }
        if (StringUtils.isBlank(mock)) {
            mock = getUrl().getParameter(Constants.MOCK_KEY);
        }

        if (StringUtils.isBlank(mock)) {
            throw new RpcException(new IllegalAccessException("mock can not be null. url :" + url));
        }
        mock = normalizeMock(URL.decode(mock));

        // 
        if (mock.startsWith(Constants.RETURN_PREFIX)) {
            mock = mock.substring(Constants.RETURN_PREFIX.length()).trim();
            try {
                Type[] returnTypes = RpcUtils.getReturnTypes(invocation);
                Object value = parseMockValue(mock, returnTypes);
                return new RpcResult(value);
            } catch (Exception ew) {
                throw new RpcException("mock return invoke error. method :" + invocation.getMethodName() + ", mock:" + mock + ", url: "+ url, ew); }}// 
      
        throw an exception
      ="throw">
        else if (mock.startsWith(Constants.THROW_PREFIX)) {
            mock = mock.substring(Constants.THROW_PREFIX.length()).trim();
            if (StringUtils.isBlank(mock)) {
                throw new RpcException("mocked exception for service degradation.");
            } else {
                // Get the custom exception
                Throwable t = getThrowable(mock);
                throw newRpcException(RpcException.BIZ_EXCEPTION, t); }}/ / < mock = "com. Java. Front. HelloServiceMock" > custom mock strategy
        else {
            try {
                Invoker<T> invoker = getInvoker(mock);
                return invoker.invoke(invocation);
            } catch (Throwable t) {
                throw new RpcException("Failed to create mock implementation class "+ mock, t); }}}}Copy the code


5 Raise questions

If force is configured in the mock attribute, no real business logic is executed, but only mock logic is executed, which is a bit easier to understand:

// Return directly without executing the consumption logic
else if (value.startsWith("force")) {
    if (logger.isWarnEnabled()) {
        logger.warn("force-mock: " + invocation.getMethodName() + " force-mock enabled , url : " + directory.getUrl());
    }
    // Execute mock logic directly
    result = doMockInvoke(invocation, null);
}
Copy the code

But for other mock configurations, the business code is executed first, and the mock logic is executed if the business code fails:

try {
    // Service consumption FailoverClusterInvoker is executed by default
    result = this.invoker.invoke(invocation);
} catch (RpcException e) {
    if (e.isBiz()) {
        throw e;
    }
    if (logger.isWarnEnabled()) {
        logger.warn("fail-mock: " + invocation.getMethodName() + " fail-mock enabled , url : " + directory.getUrl(), e);
    }
    // Service consumption fails to execute mock logic
    result = doMockInvoke(invocation, e);
}
Copy the code

This code catches RpcException, so the question is what kind of exception is RpcException? We experimented with a custom downgrade strategy, with the following consumer code:

package com.java.front.dubbo.demo.consumer;
import com.java.front.demo.provider.HelloService;

public class HelloServiceMock implements HelloService {

    @Override
    public String sayHello(String name) throws Exception {
        return "mock"; }}Copy the code

The configuration file specifies the custom policy and sets the service timeout to 2 seconds:

<dubbo:reference id="helloService" mock="com.java.front.dubbo.demo.consumer.HelloServiceMock" interface="com.java.front.demo.provider.HelloService" timeOut="2000" />
Copy the code

The consumer test code is as follows:

public static void testMock(a) {
    ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(new String[] { "classpath*:META-INF/spring/dubbo-consumer1.xml" });
    context.start();
    HelloService helloServiceMock = (HelloService) context.getBean("helloService");
    String result = helloServiceMock.sayHello("The JAVA front");
    System.out.println("Consumer receives result =" + result);
}
Copy the code


5.1 Timeout Exception

5.1.1 Code examples

The producer business code causes a block for 5 seconds, simulating a slow service:

public class HelloServiceImpl implements HelloService {

    public String sayHello(String name) throws Exception {
        String result = "hello[" + name + "]";
        // The simulation takes 5 seconds
        Thread.sleep(5000L);
        returnresult; }}Copy the code

The mock result returned by the consumer execution indicates that the timeout exception is a RpcException exception and can be caught by a degrade policy:

The consumer receives the result =mockCopy the code


5.1.2 Source code analysis

To analyze why timeout exceptions can be caught by a degrade policy, we look at the following two classes. The defaultFuture. get method uses the classic multithreaded protective pause pattern and implements asynchronous to synchronous, and throws a TimeoutException if a timeout occurs:

public class DefaultFuture implements ResponseFuture {

    @Override
    public Object get(int timeout) throws RemotingException {
        if (timeout <= 0) {
            timeout = Constants.DEFAULT_TIMEOUT;
        }
        // The response object is empty
        if(! isDone()) {long start = System.currentTimeMillis();
            lock.lock();
            try {
                // loop
                while(! isDone()) {// Waives the lock and blocks the current thread until it is signaled or interrupted or a timeout is reached
                    done.await(timeout, TimeUnit.MILLISECONDS);

                    // Check whether the block is complete
                    if (isDone()) {
                        break;
                    }
                    // Determine if the timeout period exceeds after the block is complete
                    if(System.currentTimeMillis() - start > timeout) {
                        break; }}}catch (InterruptedException e) {
                throw new RuntimeException(e);
            } finally {
                lock.unlock();
            }
            // If the response object remains empty, a timeout exception is thrown
            if(! isDone()) {throw new TimeoutException(sent > 0, channel, getTimeoutMessage(false)); }}returnreturnFromResponse(); }}Copy the code

DubboInvoker calls defaultFuture. get and throws a RpcException if the TimeoutException above is caught:

public class DubboInvoker<T> extends AbstractInvoker<T> {

    @Override
    protected Result doInvoke(final Invocation invocation) throws Throwable {
        try {
            // The request method initiates a remote call -> get asynchronizes to synchronization and performs timeout verification
            RpcContext.getContext().setFuture(null);
            Result result = (Result) currentClient.request(inv, timeout).get();
            return result;
        } catch (TimeoutException e) {
            throw new RpcException(RpcException.TIMEOUT_EXCEPTION, "Invoke remote method timeout. method: " + invocation.getMethodName() + ", provider: " + getUrl() + ", cause: " + e.getMessage(), e);
        } catch (RemotingException e) {
            throw new RpcException(RpcException.NETWORK_EXCEPTION, "Failed to invoke remote method: " + invocation.getMethodName() + ", provider: " + getUrl() + ", cause: "+ e.getMessage(), e); }}}Copy the code

It is clear from the source code analysis at this point that rpcExceptions are exceptions that can be caught by a service degradation strategy, so timeout exceptions can be degraded.


5.2 Service Exceptions

In this article we refer to non-timeout exceptions collectively as business exceptions, such as run-time exceptions occurring during producer business execution, as demonstrated below.

5.2.1 Code examples

Producer execution throws a runtime exception:

public class HelloServiceImpl implements HelloService {

    public String sayHello(String name) throws Exception {
        throw new RuntimeException("BizException")}}Copy the code

The consumer call throws an exception directly:

java.lang.RuntimeException: BizException
	at com.java.front.dubbo.demo.provider.HelloServiceImpl.sayHello(HelloServiceImpl.java:35)
	at org.apache.dubbo.common.bytecode.Wrapper1.invokeMethod(Wrapper1.java)
	at org.apache.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:56)
	at org.apache.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:85)
Copy the code


5.2.2 Source code analysis

We found that service degradation did not take effect on business anomalies, and we need to analyze the reasons. I think the analysis can be carried out from the following two points:

(1) What message does the consumer receive

public class DefaultFuture implements ResponseFuture {
    public static void received(Channel channel, Response response) {
        try {
            DefaultFuture future = FUTURES.remove(response.getId());
            if(future ! =null) {
                future.doReceived(response);
            } else {
                logger.warn("The timeout response finally returned at "
                            + (new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS").format(new Date()))
                            + ", response " + response
                            + (channel == null ? "" : ", channel: " + channel.getLocalAddress()
                               + "- >"+ channel.getRemoteAddress())); }}finally{ CHANNELS.remove(response.getId()); }}}Copy the code

Response is used to receive messages sent by the server. We can see that the exception information is stored in the exception property of Response:

Response [id=0, version=null, status=20, event=false, error=null, result=RpcResult [result=null, exception=java.lang.RuntimeException: BizException]]
Copy the code


(2) Where is the exception thrown

We know that the consumer object is a proxy object that first executes to InvokerInvocationHandler:

public class InvokerInvocationHandler implements InvocationHandler {
	private finalInvoker<? > invoker;public InvokerInvocationHandler(Invoker
        handler) {
		this.invoker = handler;
	}

	@Override
	public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { String methodName = method.getName(); Class<? >[] parameterTypes = method.getParameterTypes();if (method.getDeclaringClass() == Object.class) {
			return method.invoke(invoker, args);
		}
		if ("toString".equals(methodName) && parameterTypes.length == 0) {
			return invoker.toString();
		}
		if ("hashCode".equals(methodName) && parameterTypes.length == 0) {
			return invoker.hashCode();
		}
		if ("equals".equals(methodName) && parameterTypes.length == 1) {
			return invoker.equals(args[0]);
		}
		
		// RpcInvocation [methodName=sayHello, parameterTypes=[Class java.lang.string], arguments=[Java front], attachments={}]
		RpcInvocation rpcInvocation = createInvocation(method, args);
		
		MockClusterInvoker(FailoverClusterInvoker(RegistryDirectory(invokers)))
		Result result = invoker.invoke(rpcInvocation);
		
		/ / results contain abnormal information will throw an exception - > such as abnormal result object RpcResult [result = null, exception = Java. Lang. RuntimeException: sayHelloError1 error]
		returnresult.recreate(); }}Copy the code

The RPCresult. set method handles exceptions and throws an exception if the exception object is not found to be empty:

public class RpcResult extends AbstractResult {

    @Override
    public Object recreate(a) throws Throwable {
        if(exception ! =null) {
            try {
                Class clazz = exception.getClass();
                while(! clazz.getName().equals(Throwable.class.getName())) { clazz = clazz.getSuperclass(); } Field stackTraceField = clazz.getDeclaredField("stackTrace");
                stackTraceField.setAccessible(true);
                Object stackTrace = stackTraceField.get(exception);
                if (stackTrace == null) {
                    exception.setStackTrace(new StackTraceElement[0]); }}catch (Exception e) {
            }
            throw exception;
        }
        returnresult; }}Copy the code


5.2.3 How can A Service Exception be Degraded

Based on the above examples, we know that Dubbo’s built-in service degradation policy can degrade only timeout exceptions, but not service exceptions.

So how should business exceptions be degraded? We can integrate Dubbo and Hystrix for abnormal service circuit breaker, and the relevant configuration is not complicated. You can check relevant information online.


6 Article Summary

In this article, we first introduced the service avalanche scenario and again understood it from a nonlinear perspective. Then we summarized the service avalanche response plan, in which service degradation is one of the important ways to deal with the service avalanche. We aimed at the timeout exception and business exception two fields, combined with the source code in-depth analysis of Dubbo service degradation use scenario, I hope this article is helpful to you.


Welcome to pay attention to the public account “JAVA Front” to view more wonderful sharing articles, mainly including source code analysis, practical application, architecture thinking, workplace sharing, product thinking and so on, at the same time, welcome to add my wechat “JAVA_front” to communicate and learn together