Fault tolerance
Fault-tolerant techniques are important in any framework because no one can guarantee that the system will be error-free 100% of the time, but we want the system to be available 100% of the time. The reality is that software servers often fail or become unusable, even with hardware failures. In the telecom industry with many equipment, equipment failure is “too common”. If there is no good response measures, it will not be able to provide effective services. Therefore, the idea of “Let it Crash” emerged. Because faults cannot be completely prevented, measures should be taken in a timely manner. For example:
- The system needs to be fault-tolerant and should remain available and continue to operate when a failure occurs. Recoverable failures should not include triggering catastrophic failures.
- In some cases, as long as the core functions are available in the system. In this case, measures should be taken to isolate the failure part from the core part of the system to prevent unexpected results.
- A partial failure of the system cannot destroy the entire system, and a method is needed to isolate specific failures for subsequent processing.
At the end of this chapter, we’ll use a framework based on Akka Testkit + scalaTest to verify some of the fault-tolerant mechanisms of actors that can be confusing or difficult to intuitively understand. See: Akka Combat: Test-driven Development TDD – Nuggets (juejin. Cn)
Let it crash
Akka takes a separation of concerns approach, isolating normal business processes from troubleshooting. The Actor performs its own tasks in one of these processes, regardless of what to do when an exception is encountered. At the same time, the monitor is ready to handle any errors that might occur in another troubleshooting process.
Where do the monitors come from? As mentioned in the first chapter of this column, Akka adopts parental supervision. When Actor A creates another Actor B, A assumes the responsibility to monitor Actor B. The monitor itself does not catch exceptions and only makes different decisions based on the cause of the crash, including:
- Restart; The Actor is recreated from its registered Props, and the next Actor instance continues to process the next message, but the ActorRef reference remains the same.
- Resume: The same Actor instance ignores the error and moves on to the next message.
- Stop: The Actor terminates and is no longer involved in processing the message.
- Escalate: If the monitor does not know what to do, it reports the problem to its parent Actor, who is also a monitor.
These four situations are discussed in detail in the monitoring section below. In general, the following are Akka’s “Let it Crash “features for building systems:
- Fault Isolation — The monitor can decide to abort an Actor, removing it from the system altogether.
- Fault-tolerant Structures — Akka can replace Actor instances without affecting other actors.
- Redundancy — One Actor can be replaced by another. The monitor can decide to weed out the failing Actor and then create another type to replace it.
- Reboot — This can be done by restarting.
- Component lifecycle – An Actor is an active component that can be started, stopped, and restarted.
- Suspend – When an Actor crashes, its mailbox is suspended until the monitor makes a decision.
- Separation of concerns — The two processes of normal Actor message processing and error recovery are orthogonal and distinct.
Actor life cycle
There are three important events throughout the life cycle of an Actor: a start event, a stop event, and a restart event.
In each event, the Actor reserves some hook methods. You can insert your own code into these hook methods to recreate the specific state of a new Actor instance. For example, processing messages that failed before, or releasing resources when the Actor is down. Although Akka calls hooks asynchronously, the order in which they are called is guaranteed.
Start events
Actor is created by the actorOf method and starts automatically. A top-level Actor can be created with system.actorof (), which then calls context.actorof () in its own context to create child actors, as illustrated in the monitoring section below. The preStart method is called after the constructor of the Actor, and you can override preStart to do some pre-initialization of the Actor.
Stop events
The stop event is discussed before the restart event. The stop event indicates that the Actor is stopped, and the method, indicating that the life cycle of the Actor is terminated, only happens once. There are three situations where it stops:
- By context (ActorSystem or
ActorContext
)stop
Method stop. - Send one to ActorRef
PoisonPill
. - The monitored Stop policy stops.
You can override the postStop hook method to free up some valuable system resources before the Actor is removed and recycled, or save the last state of the Actor somewhere outside of the Actor for the rest of the system to use. Note that the stopped Actor will be disconnected from its ActorRef. In other words, when the Actor is completely stopped, all subsequent messages to the ActorRef will become dead letters.
Restart the event
When the Actor is restarted, the upper ActorRef in the Actor remains the same, but the internal instances are replaced (actors can be rebuilt in the ActorSystem through the Props object). The restart event involves two Actor instances, so the process is much more complex than the stop and start events. Actor reserves two hook methods in the restart event: preRestart and postRestart.
PreRestart stores the last state of a crashed Actor before rebuilding a new Actor instance: Reason represents the exception encountered, and Message holds the information that was processed when the exception occurred.
override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
super.preRestart(reason, message)
}
Copy the code
There are two points to note:
- The message that raises the exception
message
It is discarded by default, which avoids the situation where invalid messages repeatedly cause restarts and other normal messages cannot be processed, for shortEmail poisoning. This is not mandatory unless the developer is fully aware of the consequences of trying to reprocess an exception message, such asAble to determineExceptions are caused by accidental errors from the outside world rather than by the message itself. - In general, rewrite
preRestart
Method is actively calledsuper.preRestart(reason, message)
It’s necessary. This causes the system to actively call the values of the current Actor and its childrenpostStop
Method to clean up the resources they occupy. This is not mandatory, and developers also need to be fully aware of the consequences of not recycling resources immediately, such as explicitly ordering the remaining children to do the rest of the work before being stopped.
A previous instance can “leave” information to its next instance through the preRestart method before being stopped and destroyed. Such as:
override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
super.preRestart(reason, message)
// This message will be passed to the next instance.
self ! Map["port":8080]}Copy the code
This message is delivered by self to the end of the queue in its Mailbox, so it will be received and processed by the next instance later. For example, if the user insists on processing the message that raised the exception again, the message can be reposted to the next successor using the preReStart method. The postRestart method is called after the new Actor instance is constructed through the constructor. See Experiment 4 below.
The entire life cycle of an Actor can be directly concatenated using a single diagram:
Important: As mentioned earlier, if the super-. preRestart method is declared to be called when overriding preRestart, then it and children’s postStop method are additionally called to clean up resources before execution. Similarly, if the super.postRestart method is declared to be called when postRestart is overridden, it and children’s preStart are first called to recover resources before it executes. See Experiment 3 below.
monitoring
In Akka’s error handling, monitoring and monitoring are two different concepts, but monitoring and monitoring can be used together and are closely related to the life cycle of actors.
Akka Series (3) : Monitoring and Fault Tolerance – Jianshu.com
As long as you can get an Actor reference ActorRef, you can actively establish/unmonitor relationships. This does not need to be a parent-child Actor relationship. Such as:
// Set up the monitoring relationship context.watch(otherActorRef) // Remove the monitoring relationship context.unwatch(otherActorRef)Copy the code
When the ActorRef being monitored is stopped for one of the following reasons (corresponding to an Actor lifecycle stop event) :
- Affected by the parent Actor’s Stop policy, see monitoring below.
- received
PoisonPill
News. - Father is Actor
context.stop()
Method stop.
The monitor receives the Terminate(actorRef) message and does some processing with it. Note that when an Actor is restarted, it does not pass this message to the monitor because its ActorRef itself is not affected. See Experiment 2 below.
monitoring
All actors created by users have a common ancestor called user Guardian, as shown in the figure.
The user creates an Actor using either context.acterof () or the system.actorof () method. Actors created with system.actorof () are called top-level actors. Within an ActorSystem, there is usually only one or very few top-level actors.
The entire Akka system of Actors builds a tree of father-child family trees, and developers only need to focus on the user space. Instead of actively monitoring, parent and child actors naturally form a monitoring relationship. For example, if the context of Actor A creates Actor B in its own context, then A is the parent Actor of B. It can also be said that A is the monitor of B. The monitor can decide what to do when a child encounters an exception, or it can dump a problem that it cannot handle on its own to a higher-level monitor.
Part I: Actor Architecture _ Wang_WbQ’s blog -CSDN Blog
Predefined policy
Even if the monitor responsibility is not actively implemented, each Actor implements the default policy defaultStrategy, as shown in the source code below. In addition to the first three Akka internal exceptions, the monitor will always try to reboot an infinite number of times to resolve application exceptions, which can cause blocking in some cases.
Exceptions that are not caught are continually thrown up to the User Guardian level. But as you can see from the default policy code, only system-level errors (which are throwable but not Exception) can actually be passed to the user guardian, which already means that the program has encountered a serious Error that cannot be recovered, and it is wise to gracefully shut down the entire system.
/** * When supervisorStrategy is not specified for an actor this * `Decider` is used by default in the supervisor strategy. * The child will be stopped when [[akka.actor.ActorInitializationException]], * [[akka.actor.ActorKilledException]], or [[akka.actor.DeathPactException]] is * thrown. It will be restarted for other `Exception` types. * The error is escalated if it's a `Throwable`, i.e. `Error`. */
final val defaultDecider: Decider = {
case_ :ActorInitializationException ⇒ Stop
case_ :ActorKilledException ⇒ Stop
case_ :DeathPactException ⇒ Stop
case_ :Exception ⇒ Restart
}
Copy the code
When one of the parent Actor’s children crashes at runtime, it has two strategies:
- Handle only broken children,
OneForOneStrategy
. This applies when children perform independent tasks without sharing resources. - All the children were treated,
AllForOneStrategy
. This method is applicable to children performing associated tasks and sharing resources.
Akka provides another built-in stop strategy, stoppingStrategy, which will stop any crashed Actor directly. It is a one-for-one strategy.
// The following two policies are equivalent. override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(){case _ : Exception => Stop} override def supervisorStrategy: SupervisorStrategy = stoppingStrategyCopy the code
Custom Policies
Each monitor can make a different strategy based on the actual situation. For crashed actors, the monitor has four different policies:
Resume
: The least costly, easiest way to deal with. Ignore the error and proceed with the same Actor instance.Restart
: Remove the previous Actor and replace it with a new instance,The ActorRef is going to stay the same over timeMailbox hangs briefly until the restart is complete.Stop
: Deactivates subactor,Including the ActorRef that permanently discards it.Escalate
: throws an error up, and the parent Actor decides how to handle it.
For exceptions that are not specified, Escalate is automatically thrown to the user’s guardian.
The Restart policy is tied to the Actor Restart event. The number of Restart times can be explicitly specified using the maxNrOfRetries parameter. When the number of retries exceeds this threshold, Actor is stopped instead of throwing an exception up. WithinTimeRange limits the time window for retries. For example, the following policy indicates that the Actor restarts at most five times in 10 seconds, otherwise it is deactivated.
override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(maxNrOfRetries = 5,withinTimeRange = 10 second){
case _ : Exception => Restart
}
Copy the code
Without any restrictions on Restart, the system can become stuck in a blocking loop due to mailbox poisoning, as shown in Experiment 5 below.
ActorSystem always restarts actors as quickly as possible, and Akka provides an alternative “back off reboot” approach that is more useful in some cases, but is not based on a reboot strategy. See below: BackoffSupervisor.
Actor always defaults to continue processing from the next message after Resume/Restart. If the Actor is terminated by the Stop policy, all other asynchronous messages to that Actor become dead letters.
Akka does not allow orphan actors, so stop or restart any Actor and its children will stop and restart from the bottom up.
BackoffSupervisor
This section is referenced from Classic Fault Tolerance • Akka Documentation.
The fallback restart policy applies to errors that are caused by an occasional external event, such as a database connection failure that causes a write to fail. It is wise to make the Actor wait a little while and then try again, rather than repeatedly Restart in a near-blocking manner.
A BackoffSupervisor is an Actor that establishes a monitoring relationship by receiving another child Actor Props that is being monitored at initialization. It acts as a message broker for this child Actor: The BackoffSupervisor receives messages from outside, but the actual logic for processing depends on the propped Child Actor.
Instead of using the Restart mechanism to Restart an Actor, BackoffSupervisor directly deactivates the crashed Actor and starts a new instance at an appropriate time. Therefore, the postStop and preStart methods are triggered during a backoff restart, but the preRestart and postRestart methods are not. All messages that continue the BackoffSupervisor become dead letters while waiting for the restart.
BackoffSupervisor has two trigger options that have strict usage scenarios:
Backoff.onFailure
: The monitored Actor triggers a backoff restart when it throws an Exception, not a passcontext.stop()
,PoisonPill
Such as normal stop and shut down.Backoff.onStop
: The Actor to be monitoredStop by any meansAll trigger an evasive reboot.This option must be used by actors that stop themselves as an Exception signal instead of throwing exceptions.
In usage scenarios that don’t involve persistent actors, developers typically just need to select OnFailure. The following code gives an example of TestedActor being monitored by BackoffSupervisor:
val childProps: Props = Props[TestedActor05]
val supervisorProps: Props = BackoffSupervisor.props{
Backoff.onFailure (
childProps = childProps, // Back up the monitoring child Props
childName = "child".// child Props
minBackoff = 3 second, // Minimum retreat time
maxBackoff =30 second, // Maximum retreat time
randomFactor = 0.2 // Random factor 0.2)}// It is the ActorRef of backoffSupervisor.
// The actual processing is internal ActorRef.
val ref: ActorRef = context.actorOf(backoffSupervisor, "backoff-supervisor")
Copy the code
The first retreat time is set to minBackoff, and the subsequent retreat time will be increased by multiples. In the above code, the retreat time will gradually accumulate to 3,6,12,24,30 (unit: s), and the maximum retreat time will not exceed maxBackoff.
Backout algorithm is very common in the underlying mechanism of computer network. Fallback algorithms also generally use jitter (random delay) to prevent continuous collisions, and this is no exception in BackoffSupervisor. For example, this prevents a simultaneous restart of a large number of actors from causing the load of the external database to surge at a certain moment. Use the randomFactor parameter to control the amount of random jitter.
Sometimes, developers may need more configuration and want to either cancel the backout restart and Stop if something goes wrong, or reset the backout time after it has been normal for a while. It is designed with withcontainer Strategy() and withAutoReset(), respectively:
val backoffSupervisor: Props = BackoffSupervisor.props {
Backoff.onFailure(
childProps = Props[TestedActor05],
childName = "child-actor",
minBackoff = 3 second,
maxBackoff = 30 second,
randomFactor = 0.2
)
// Refresh the retreat time after normal operation within 10s.
.withAutoReset(10 second)
// Mount a predefined monitoring policy.
.withSupervisorStrategy(
OneForOneStrategy() {case FatalException= >Stop})}Copy the code
See Experiment 6 for the complete test code.
In most situations, Actor hook methods, monitoring methods, and monitors are mixed together to build a complex and effective fault-tolerant mechanism. The following are all the unit tests covered in this section.
Experiment 1: Capture Terminated information through a monitor
The overall logic of the unit test:
- To establish
TestActor02Supervisor
monitoringTestedActor02
The former is also a monitor for the latter (as mentioned earlier, monitoring and monitoring are not contradictory). - Create a Stop event.
- Cause the monitor to receive
Terminated(child)
The message.
The entire event flow is annotated in the comments, and the TestActor02Supervisor is both monitor and monitor in this test case.
class StopStrategyTest extends TestKit(ActorSystem("testSystem"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll {
"The TestedActor02" must {
"send 'Terminated' to its supervisor when it is broken." in {
val ref: ActorRef = system.actorOf(Props(new TestActor02Supervisor(testActor)), "supervisor-01")
// 1. Create child
ref ! NewActor
// 3. The command is abnormal
ref ! ThrowEx
// 9. Verify that the test succeeded.
expectMsg(TestOk)}}}class TestActor02Supervisor(out : ActorRef) extends Actor with ActorLogging {
// 6. The monitor catches an exception and executes the Stop policy.
override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy() {case _ : Exception= >Stop}
override def receive: Receive = {
// 2. Create child to establish monitoring relationship
case NewActor= >val child = context.actorOf(Props[TestedActor02]."child-1"); context.watch(child)// 4. Make child throw an exception
case ThrowEx => context.child("child-1").get ! ThrowEx
TestActor is testk. TestActor is testk. TestActor is testk. TestActor is testk
case Terminated(child) => {
log.info("the child actor[{}] is terminated.",child.path)
out ! TestOk}}}class TestedActor02 extends Actor with ActorLogging{
// 5. Throw an exception
override def receive: Receive = {case ThrowEx= >throw new Exception("Designed Exception")}
// 7. Run postStop to send the Termianted message to the monitor before destruction
override def postStop() :Unit = log.info("TestedActor02 will shut down.")}Copy the code
Experiment 2: Verify that the Restart policy does not trigger Terminated
The idea behind this unit test is simple: generate a Restart event and the monitor, if it receives Terminated(), sends a test failure message to the unit test testActor. To see the experimental results more clearly, you can set some side effects in the hook method in TestedActor.
class TerminatedTest extends TestKit(ActorSystem("testSystem"))
with MustMatchers
with WordSpecLike
with StopSystemAfterAll {
"The TestActor04supervisor" must {
"get no message like `Terminated(child)`" in {
val ref: ActorRef = system.actorOf(Props(new TestActor04Supervisor(testActor)), "supervisor-1")
// 1. Create child
ref ! NewActor
// 3. Run the command to throw an exception
ref ! ThrowEx
TestActor is not receiving the message. The supervisor is not receiving the Terminated() message during the restart.
expectNoMsg()
}
}
}
class TestActor04Supervisor(out : ActorRef) extends Actor with ActorLogging {
// 6. Catch the exception and restart
override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy() {case_ :Exception= >Restart}
override def receive: Receive = {
// 2. Generate child to establish monitoring relationship.
case NewActor= > {val child: ActorRef = context.actorOf(Props[TestedActor04]."child-1")
context.watch(child)
}
// 4. Run the child command to raise an exception
case ThrowEx => context.child("child-1").get ! ThrowEx
// If Terminated information is received during restart, the test fails to be sent to testActor.
case Terminated(child) => {
log.info("the child:{} was crashed.",child)
out ! TestFailed
}
// 11. After receiving the message, the program ends.
case Restart => log.info("the child restarted.")}}class TestedActor04 extends Actor with ActorLogging {
// 5. Throw an exception
override def receive: Receive = {case ThrowEx= >throw new Exception("Designed Exception.")}
// run preRestart
override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
Call super.prerestart
super.preRestart(reason, message)
log.info("invoke preRestart")}// 8. Run postStop
override def postStop() :Unit = log.info("invoke postStop")
// 10. After the Restart, send a Restart message to the monitor
override def postRestart(reason: Throwable) :Unit = context.parent ! Restart
}
Copy the code
Experiment 3: Test the full Actor life cycle
The overall idea of the unit test:
- rewrite
preStart
.preRestart
.postRestart
.postStop
Method, each method to insert a little side effect. - call
super.preRestart
和super.postRestart
. - Generate a Restart event and observe the printing sequence of logs.
class LifecycleTest extends TestKit(ActorSystem("testSystem"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll {
// Do not leave Spaces after this string, otherwise it will cause a Bug
"The Tested Actor" must {
"go through: <constructor>, preStart, postStop, preRestart, <constructor>, preStart, postRestart, postStop" in {
val ref: ActorRef = system.actorOf(Props(new TestedActorSupervisor(testActor)), "supervisor-01")
ref ! NewActor
ref ! ThrowEx}}}class TestedActorSupervisor(out: ActorRef) extends Actor {
override def receive: Receive = {
case NewActor => context.actorOf(Props[TestedActor01]."child-01")
case ThrowEx => context.child("child-01").get ! ThrowEx;
}
override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy() { case_ :Exception= >Restart}}class TestedActor01 extends Actor with ActorLogging {
override def receive: Receive = {
case ThrowEx= >throw new Exception("Designed Exception")}// This is equivalent to the Scala object's instantiation field.
log.info("invoke constructor<TestedActor01>:{}".this.hashCode())
override def preStart() :Unit = log.info("invoke preStart, the hashcode of this instance:{}".this.hashCode())
override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
super.preRestart(reason,message)
log.info("invoke preRestart, the hashcode of this instance:{}".this.hashCode())
}
override def postRestart(reason: Throwable) :Unit = {
super.postRestart(reason)
log.info("invoke postRestart, the hashcode of this instance:{}".this.hashCode())
}
override def postStop() :Unit = log.info("invoke postStop, the hashcode of this instance:{}".this.hashCode())
}
Copy the code
To verify that the two actors are not the same instance after being restarted, the hook method always prints this.hashcode to the log. This unit test does not need to set assertions, just watch the log output order (as illustrated in the previous Actor life cycle).
Experiment 4: Send information via preRestart to the next instance
The general idea of the unit test is as follows:
TestActorSupervisor
令TestedActor
Throws an exception and crashes.- Create a restart event.
- Record the time the crash occurred before the last instance was removed, through
self
To the next instance. - The next instance receives and records the time when the last crash occurred
lastMsg
。
Note that without overwriting the supervisorStrategy monitor method, ActorSystem by default will always restart a crashed Actor an infinite number of times.
class RestartTest extends TestKit(ActorSystem("testSystem"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll {
"A Tested Actor" must {
"send crashTime to next Actor instance by `preRestart` method when it brakes." in {
val ref: ActorRef = system.actorOf(Props(new TestActorSupervisor(testActor)))
// 1. Create child
ref ! NewActor
// 3. Run the command to throw an exception
ref ! ThrowEx
/ / 10. Accept to testActor TestOk, testing success.
expectMsg(TestOk)}}}class TestActorSupervisor(out: ActorRef) extends Actor with ActorLogging {
// The default policy will try to restart TestedActor automatically.
// 6. Run the Restart command by default
// override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(){}...
override def receive: Receive = {
// create child
case NewActor => context.actorOf(Props[TestedActor]."child-1")
// 4. Make child throw an exception
case ThrowEx => context.child("child-1").get ! ThrowEx
TestActor testk is testActor. // testActor is testActor
case TestOk => out ! TestOk}}class TestedActor extends Actor with ActorLogging {
var lastMsg: String = "ok"
override def receive: Receive = {
Testk is displayed. Testk is displayed. Testk is displayed.
case ExInfo(exMessage) => {
lastMsg = exMessage
log.info(s"this actor has crashed in ${lastMsg} yet.")
context.parent ! TestOk
}
// 5. Throw an exception
case ThrowEx= >throw new Exception("Designed Exception")}// 7. The previous instance is destroyed and the next instance is sent an ExInfo message.
override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
super.preRestart(reason, message)
val crashTime: String = new SimpleDateFormat("yyyy-MM-dd").format(new Date)
self ! ExInfo(crashTime)
}
}
Copy the code
Experiment 5: Recreate mailbox poisoning
This unit test made a simple change based on Experiment 4 to create a mailbox poisoning scenario: when the previous Actor instance restarts, it sends another ThrowEx message to the next Actor instance.
class PoisonMailboxTest extends TestKit(ActorSystem("testSystem"))
with MustMatchers
with WordSpecLike
with StopSystemAfterAll {
"A TestActor03" must {
"struggles in loop exception with default strategy" in {
val ref: ActorRef = system.actorOf(Props[TestActor03Supervisor])
// 1. Create child
ref ! NewActor
// 3. Run the command to throw an exception
ref ! ThrowEx
// There is no assertion here, let the main thread enter timeout waiting TIMED_WAITED state,
// Observe the work of other threads.
Thread.sleep(10000)}}}class TestActor03Supervisor extends Actor with ActorLogging {
// 6. Default policy Restart
override def receive: Receive = {
// create child
case NewActor => context.actorOf(Props[TestedActor03]."child-1")
// 4. Make child throw an exception
case ThrowEx => context.child("child-1").get ! ThrowEx}}class TestedActor03 extends Actor with ActorLogging {
// 5. Throw an exception
// 8. Throw an exception again to cause mailbox poisoning, and the program falls into an infinite loop step 6-8.
override def receive: Receive = {case ThrowEx= >throw new Exception("Designed Exception")}
Pass the exception message to the next instance.
override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
super.preRestart(reason, message)
log.info("send message to next instance:{}",message)
self ! message.get
}
}
Copy the code
As I mentioned earlier, by default, the system executes the restart policy an infinite number of times. So without limiting the restart strategy, the system risks getting clogged. On the other hand, be careful to handle message, the exception message that causes an Actor to crash, to avoid the Bug of looping exceptions during an Actor restart.
Experiment 6: Test BackoffSupervisor
This unit test is the complete code from the previous section on BackoffSupervisor. Pay attention to:
TestedActor
的preRstart
和postRestart
If no, the Restart mechanism of BackoffSupervisor is different from that of Restart.- Because of the set
withAutoReset(10 second)
According to the printing time of the log, it can be found that the two retreat and restart times are about 3s. - Because of the extra Settings
withSupervisorStrategy
BackoffSupervisor receives a message thrown by the ActorFatalException
Then choose to stop rather than back off and restart.
class BackoffSupervisorTest extends TestKit(ActorSystem("system"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll{
"A BackoffSupervisor" must {
"waiting for a moment after crashed." in {
val backoffSupervisor: Props = BackoffSupervisor.props {
Backoff.onFailure(
childProps = Props[TestedActor05],
childName = "child-actor",
minBackoff = 3 second,
maxBackoff = 30 second,
randomFactor = 0.2
)
.withAutoReset(10 second)
.withSupervisorStrategy(
OneForOneStrategy() {case FatalException= >Stop})}// Just declare it as a top-level Actor for testing purposes.
val ref: ActorRef = system.actorOf(backoffSupervisor, "backoff-supervisor")
// Send a message to this BackoffSupervisor Ref, which actually forwards it to the internal Child Actor.
ref ! ThrowEx
// Wait patiently for BackoffSupervisor to reset the retreat time.
Thread.sleep(15000)
// Throw an exception again
ref ! ThrowEx
// Allow for a bit of jitter to make the wait time slightly longer than 3s
// If the retreat time is not reset (4s < 6s), it will lose the message.
Thread.sleep(4000)
ref ! ThrowFatalEx
// Allow some time for the program to run
Thread.sleep(10000)}}}class TestedActor05 extends Actor with ActorLogging {
override def receive: Receive = {
case ThrowEx= >throw new Exception("Designed Exception")
case ThrowFatalEx= >throw FatalException
}
override def postStop() :Unit = {log.info("TestedActor shuts down.")}
override def preStart() :Unit = {log.info("TestActor starts.")}
// The two hook methods will not be called
override def preRestart(reason: Throwable, message: Option[Any) :Unit = {log.info("pre-Starting.")}
override def postRestart(reason: Throwable) :Unit = {log.info("post-Starting.")}}object FatalException extends Exception("Fatal Exception")
Copy the code
The resources
Akka delayed the restart of TEH TeH -CSDN blog
Introduction to Akka Framework – Zhihu (Zhihu.com)
22, talk about AKka (2) monitoring and monitoring _LLIANLIANpay blog -CSDN blog _AKka monitoring
Introduction to Akka Framework – Zhihu (Zhihu.com)