Microservices architecture solves many problems, but introduces many at the same time. This article is to explore how to solve the following several problems.

There are a large number of synchronous RPC dependencies, how to ensure its own reliability?

The dependent microservice invocation failed, should I fail or succeed? How to ensure stability after relying on many external services. I’m only successful if all the services I rely on succeed, and my own stability is a concern.

How to ensure data repairable after RPC call failure?

If the call fails, skip. So how to fix the data inconsistency caused by this? Usually drizzle, can be ignored. But after a big failure, artificial or to wipe the ass, this cost is particularly high. The biggest benefit of using message queues is that messages can pile up in the event of a failure and then be processed slowly after the failure is recovered, reducing the cost of human intervention.

Message queue is a bypass flow of RPC main flow. How to ensure reliability?

When relying on message queues for system decoupling, how to ensure that messages themselves are reliably queued? Do messages need to be reliably written to the queue before committing a database transaction? If a message must be written to a queue first, such as kafka. But what if Kafka dies? Isn’t my online business compromised by an offline queue?

How do message queues stay consistent with transactions in the database?

If the message is written to the queue first, then the database commits the transaction. Then there will be cases where the database commit fails because of concurrent modifications, but the message has already been written to the queue. If a business process such as rewards hangs at the back of the queue, this will result in an error or require the rewards side to check the database status again. However, if the database transaction is committed first and then written to the queue, there is no strict guarantee that the message in the queue is not lost.

These problems are common to any business that uses a mixture of RPC and asynchronous queues. Here is a proposal to solve all of the above problems.

Synchronous to asynchronous, solve the stability problem

In normal times, RPC calls are synchronous. If the call fails, the synchronous call is automatically demoted to asynchronous. The message is now queued and asynchronously retried. So there are three possibilities for dealing with downstream dependencies

  • Completely strong dependence, the downstream can not hang
  • Because my return value depends on some downstream processing result, I have to call it synchronously. But not strongly dependent, scalable. This part of the data is not returned when demoting. Synchronous calls become asynchronous when degraded.
  • Fully asynchronous. The downstream service simply consumes the queue I write to, and I do not communicate directly with RPC


Put the message queue into the main process

If you want to hang important business logic behind a message queue. The integrity of the data in the message queue must be ensured. So it is not logical to write messages to the queue as a bypass. If a message queue fails to write or times out, an error should be returned instead of allowing execution to continue.

Kafka’s stability and latency are often inadequate for online services. For example, if Kafka needs to wait for responses from multiple brokers to reliably write to three copies, this delay can fluctuate greatly. We need to use a local file as a buffer in case we cannot write in time.

In fact, by introducing local file queue and remote distributed queue, a combined queue scheme with higher availability and lower delay is constructed. This local queue is ideally implemented if it can encapsulate a Kafka Agent to act as a proxy for local writes.

Ensure distributed transaction consistency

We talked about using queues, relying on queues. But it does not solve the problem that the database and the message queue are two separate transactions and cannot guarantee final consistency.

If RPC1, RPC2 succeeds, but the process is interrupted (such as power outage). RPC3 will not be executed and the data will be inconsistent. Here RPC1 could be a database operation, RPC3 could be an enqueued call to a message queue, it’s the same thing.

The solution to this problem is to pass the buck to a third party.

We need a delay queue that suspends a delayed job at the entry point and then cancellations it. If not, the delay queue is responsible for triggering the delay task and executing the entire business process over and over again.

In this way, we can guarantee the ultimate consistency of any RPC operation flow. Kafka message queues are guaranteed as a form of RPC operation.

conclusion

Three separate technical solutions have been given

  • The synchronous to asynchronous solution improves the availability of synchronous RPC and data consistency.
  • Introducing local queues as a backstop improves the overall availability of message queues and reduces latency.
  • The introduction of a delay queue supervises the execution of a string of RPC calls

We only need to combine these three independent solutions to apply queue technology to pure RPC synchronization composite microservice clusters to improve availability and data consistency. At the same time, the message data can be guaranteed to be reliable, thus establishing the prerequisite for other business logic to put itself behind the queue.

Java bosom friend, focus on Java practical article push, not to be missed!

Source: https://zhuanlan.zhihu.com/p/25346771