Author introduction: Boy shepherd.
The introduction
Bugs are always with us in our programmer lives, and although we expect to write perfect programs, even the best programmers can’t guarantee that they won’t write bugs. Therefore, we write tests for the program to improve the quality of the final delivery by finding bugs ahead of time. From my work in PingCAP, I feel that good database and good database test are inseparable. In this sharing, we will study the test of database transaction based on the transaction isolation level of the first lecture, and mainly describe how we guarantee the correctness of the transaction in PingCAP.
Since we have many methods to ensure the correctness of transactions, we will focus on Jepsen and Elle this time, while other methods are complementary. I will also briefly explain their methods, advantages and disadvantages. I have divided the methods of transaction testing into the following categories:
-
Validation of theoretical correctness
-
Invariant based correctness verification
-
Validation that checks execution history
-
Auxiliary test means
Review the Percolator commit protocol
Percolator
Before we start talking about the testing methodology, let’s review the Percolator commit protocol to get a feel for the complexities under it. Percolator submission protocol uses 2PC submission mode to ensure atomicity of transactions. However, under shared-nothing architecture, no node has global transaction information, and transaction state information is scattered to each Key, making the processing of Key state more complicated.
Figure 1 – Two-phase commit under Percolator
Figure 1 shows the two-phase commit process under Percolator. In the Prewrite phase, data is written to the storage engine. These key-value pairs may be stored on different instances, where the Primary Key(PK) is the first Key in the transaction and is an atomic indicator of the transaction’s success. The data written in the Prewrite phase contains the data actually written by the transaction and the PK of the transaction (x in the figure), so once the Prewrite phase is complete, the transaction has been written successfully. However, we also need the Commit phase to make the successful transaction visible to the outside world. In the Commit phase, the transaction is marked as successful by the Commit PK operation. Then Commit the other keys so that the entire transaction can be read directly.
Figure 2 – Read/write relationship processing in Percolator
Under this commit protocol, however, another transaction may read the transaction being committed from any point in time, as shown in Figure 2 from four points in time. At read1, the transaction does not prewrite yet, so the old value is read. In read2, the read transaction sees the result of Prewrite, but the corresponding Key has not been committed. At this time, the PK corresponding to the transaction will be queried to confirm the status, but the PK is not mentioned at this time. Therefore, it needs to determine whether the read transaction has expired. Wait for a while and try again. If it has expired, the transaction will be rolled back. Read3 is similar to read2, but it reads the contents of the transaction and commits the uncommitted Key read in the transaction. Read4 Read keys that are already committed and read data directly.
A bug that breaks atomicity
Atomicity is one of the important guarantees for the correctness of transactions. For example, in a transfer transaction, if only half of the transaction is successful, one account may deduct money while the other account does not receive money.
Figure 3 – Atomicity broken bug
Figure 3 shows a bug with broken atomicity when three transactions are executed concurrently. Txn1 executes a statement to try to lock the 3 keys x-Z, but the statement fails to run, where the locked X-Y needs to be asynchronously revoked, and then the second statement will re-select a primary Key, assume that the new primary Key is A, and also try to lock y, which will prevent the asynchronous lock to continue to execute. When Txn3 locks Y, it reads the lock added by STmt1. When RESOLVE, it finds that PK has been rolled back. Because Txn2 incorrectly reads the cache generated by Txn3, Txn3 mistakenly rolls back the Y added by STmt2, resulting in atomization destruction of transaction. In the original Percolator commit protocol, there was no primary key change logic, and we implemented this optimization in order to avoid restarting the transaction if the lock failed, which made the implemented transaction model more complex.
The problem here is that failed statements are not considered to cause a new primary key to be picked up in a transaction, but the original Percolator commit protocol did not include primary key replacement, meaning that the optimization we made in our implementation to implement distributed transactions made the model more complex.
Validation of theoretical correctness
We use TLA+ to verify the correctness of the theory. TLA+ is a modeling language designed for parallel and distributed systems that can simulate all possible situations to ensure the correctness of the theory.
Figure 4 – Formalizing the validation process
Formal validation of the model using TLA+ requires defining the initial state, followed by defining the Next procedure and the THEOREM to verify correctness. Next refers to a process that might occur. In a parallel system, serial processes occur one by one, but the sequence of parallel processes is unpredictable. Figure 4 shows the running of the formal validation, executing the Next procedure once, and then using the defined THEOREM to verify that the state is in question. Because Next is a process that can happen, and all of the Next calls can lead to multiple execution paths, TLA+ searches for all possible paths to ensure that any order in the parallel system does not violate our defined constraints.
While TLA+ can theoretically verify correctness, there are limitations to this approach:
-
Complexity increases exponentially with the amount of process, and if the search path is too complex, it may take a significant amount of time to complete the search at the desired depth.
-
Being right in theory does not prevent errors in implementation.
Linear consistency and Snapshot Isolation
First we need to understand that linear consistency and transaction isolation are unrelated concepts.
Figure 5 – Non linearizable
Linearizability was originally a concept when multiple processors operate memory in parallel. Later, it was introduced into the database and proposed two requirements for single element transactions:
-
Transactions on a single element can be serialized;
-
On a single element, Txn2 must be after Txn1 in the serialized queue if the time point at which Txn2 is turned on is later than the time point at which Txn1 is committed.
Figure 5 is an example of serializable but non-linearizable execution order Txn2 -> Txn1. Although Txn2 is turned on later than Txn1 is committed, Txn2 is before Txn1 in serializable queues. Spanner proposed the concept of external consistency, and believed that external consistency was stronger than the standard of linearizable consistency, because its constraints on the sequence of transactions could be extended to multi-element operations. External consistency was also understood as a definition of linearizable, and their constraints had roughly the same effect. When we consider the isolation and consistency, serializability is not the ideal level of isolation and consistency. For example, in Figure 5, Txn1 is a transaction that consumes. After consumption, another transaction reads the balance before consumption, which is obviously unacceptable in many scenarios. In Jepsen’s consistency model, an isolation and consistency level is set on top of serializability. When a database satisfies both Serializable and linearizable, it is called Strict Serializable.
Figure 6 – Linearization and external consistency
As shown in Figure 6, under linearized execution, Txn3 rewrites x from 1 to 2, and the commit time is from TS1 to TS2. Then, for the client, the time points earlier than TS1 and later than TS2 are state confirmed, and in TS1 and TS2, Since it is not certain when Txn3 will actually take effect, the value of x is in an agnostic state, and reading up to 1 or 2 is allowed.
Snapshot Isolation(SI) is a widely adopted Isolation level, and TiDB is also an SI Isolation level database, so before we talk about transaction testing, it is important to understand how the SI Isolation level is defined using inter-transaction dependencies. It is important to note that SI is an isolation level of shoot first, shoot later, so our goal in defining it is to avoid vague language such as “SI is a transaction read from a snapshot” and to give relatively objective targets.
Figure 7 – Partial order dependence
In order to define SI, we need to introduce a new transaction dependency, called transaction start dependency, which reflects the partial order relationship between one transaction commit time and another transaction start time. Partial order relationship often occurs at the same time with other dependencies and has transitivity. As shown in Figure 7, Txn2 reads the write of Txn1, which indicates that the start time of Txn2 is later than the effective time of Txn1 submission. It should be noted that the partial order here refers to the time points inside the database, and the partial order relation of the time points inside the database can be inferred from the order observed externally by taking it into comprehensive consideration with the level of linear consistency. That is, if Txn2 does not read the write of Txn1 as shown in Figure 7, Even if the client connected to the database can be sure that Txn2 was turned on after Txn1 was fully committed, c1 ≺t s2 cannot be concluded, which may be because the database does not provide linear consistency that can be linearized.
Adya refers to SI’s isolation level as PL-SI, which imposes constraints on the internal point in time of the database on PL-2, including transaction start dependencies and some read and write dependencies.
Figure 8-G-SIA abnormality
SI refers to a transaction that takes a snapshot and reads from that snapshot to all writes at a time less than or equal to that snapshot point in time. Therefore, for two transactions with partial ordering relationship, if C1 ≺t s2, Txn2 needs to read the modification of Txn1, and Txn1 should not read the write content of Txn2 (s1 ≺ T c1 ≺t s2 ≺t c2 can be deduced from the transitivity of partial ordering relationship). However, Txn1 in Figure 8 reads what Txn2 writes, thus breaking the semantics of SI. From another perspective, transaction initiation dependencies, like WR dependencies and WW dependencies, reflect the sequence of transactions. When these relationships are looped, their sequence cannot be confirmed. G-sia Interference is a WR. A ring is formed between the WW and S dependencies.
Figure 9-G-SIB abnormality
Figure 9 shows the g-SIB (Missed Effects) anomaly where WR, WW, RW, and S dependencies form a ring, but only one RW dependency is allowed. This phenomenon can be understood as: one transaction does not fully read the write of another transaction. In the figure, Txn2 writes two values, but Txn1 only reads one value, and the other value reads the old version, resulting in RW dependence. Only one RW dependency is allowed because one RW dependency is sufficient to check for this problem, while two RWS can cause misjudgments such as the Write Skew.
Pl-si needs to prevent G-SIA and G-SIb from occurring on the basis of PL-2. This is slightly different from the standard repeatable PL-2.99, so be careful.
Jepsen
When it comes to transactional testing, you have to mention Jepsen. Jepsen is an important part of TiDB’s quality assurance. In addition to every release, Jepsen also runs continuously in daily testing.
Figure 10 – The idea of constraint checking
What is Jepsen and why is Jepsen effective and efficient? Figure 10 shows some ideas of the author. If we want to verify whether the solution of the equations composed of ① and ② is correct, we can carefully check the solution process, or substitute the results into the original equation to check whether the given conditions are met. I believe that most people, when checking the result of the equation, will choose the latter, because the latter is easy and effective to detect errors. Jepsen’s idea is to design some “equations”, give the solution to the database, and finally check whether the solution is correct through some constraints.
Bank
Figure 10 – Jepsen Bank
Jepsen Bank is a very classic test that simulates a very simple situation. Figure 10 shows how this use case works. There are many users and their balance records in a table, and there are many transactions concurrently conducting transfer operations. The checking condition of Bank is also very simple, that is, the total balance of all users’ accounts remains unchanged. In the SI isolation level, any snapshot must meet this constraint. If a snapshot does not meet this constraint, an abnormal G-SIB (Missed Effects) may have occurred. You can consider the cause. Jepsen periodically starts a transaction while it is running, queries the total balance, and checks whether the constraint is broken.
Bank is a test scenario close to the real business, with simple logic understanding. However, due to the concurrent construction, a large number of transaction conflicts may be caused in the actual operation process. Bank does not care how the database deals with these conflicts and whether transactions will fail, and most of the errors will be reflected in the balance.
Long Fork
Figure 11 – Jepsen Long Fork
Long Fork is a transaction test scenario designed for the SI isolation level, where there are two types of transactions: write transactions that assign to one register, and read transactions that query multiple registers, and finally analyze whether the read transactions read a violation of THE PL-SI. In Figure 11, Txn1 and Txn2 are write transactions, and Txn3 and Txn4 are read transactions. The presence of G-SIA in Figure 11 breaks the PL-SI, but we need to make some assumptions here to find the ring.
Figure 12 – Jepsen Long Fork
Figure 12 is an analysis of the example in Figure 11. Based on the WR dependencies, we can determine c2 ≺t s3 and C1 ≺t S4. However, since we do not know the start time of Txn3 and Txn4, we need to make assumptions. If S3 ≺t s4, then the assumption on the left in the figure can be deduced from the transitivity of partial order, C2 ≺t s4, and then we can find a ring composed of S dependence and RW dependence. If S4 ≺t S3, as shown on the right of the figure, a G-SIB anomaly is also detected. If S3 and S4 are equal, then Txn3 and Txn4 will read the same content, but there are contradictions and exceptions in the actual read content, and the specific steps to find the ring are left to the reader to deduce.
summary
Jepsen provides some methods for finding exceptions through constraint checking and designs a series of test scenarios. Has a good coverage rate, its advantages are:
-
Focusing on the constraints, the correctness verification is simplified.
-
Test efficiency is high.
Figure 13 – Missing Lost Update
However, Jepsen also has his shortcomings. Figure 13 shows a Lost Update anomaly under the Bank. The T2 transfer was Lost, but the anomaly could not be detected from the result, because the sum of the balance did not change.
History Check
Detection through BFS
One type of test is the History Check, which is designed to find as many anomalies as possible and to dig up as much information as possible from the execution history. The paper “On the Complexity of Checking Transactional Consistency” examines transaction Consistency and Complexity from the execution history.
Figure 14 – Serializable detection
Serializable detection follows its literal meaning, we just need to find a sequence of execution that allows all transactions to be executed serially, so it is natural to use a breadth-first search (BFS) to check whether it is serial. The problem is that the complexity of this method is O(deep^(N+3)) assuming Sequential Consistency, where deep is the number of transactions per thread and N is the number of threads.
Figure 15 – SI detection by guard variable conversion
It is simple to detect serializability, but it cannot be performed directly to detect whether a historical sequence conforms to the SI isolation level. As shown in FIG. 15, by adding guard variables, the paper can judge whether the read/write relationship of SI is reasonable through the detection of serializability mentioned above. In the case of Lost Update, both transactions make changes to x, writing the first version and the second version respectively, so insert two GUARD variables, concatenating the two changes to X between the two transactions, and not allowing the changes to X to be inserted. As long as the execution history of (b) in the figure can meet the requirements of serializability, it can be shown that the execution history of (a) meets the requirements of SI.
Elle
To address the difficulty of understanding and complexity, Jensen’s author Kyle Kingsbury published a transaction consistency detection method called Elle.
Elle, by means of detection and anomaly detection to verify the consistency of the transaction and isolation, if we have the specified requirements, the way it works is that as far as possible to find out execution history of exception, rather than trying to figure out a serial execution sequence, so is has obvious advantages in efficiency, but relatively, the process of this test method is more complicated.
Figure 16 – Simple G1c test
Figure 16 is an example of detecting G1c. According to the theory we discussed in lecture 1, we do not need to look for an execution sequence according to the literal definition of serializability, but only need to detect if something is not allowed to occur. Although it is not intuitive to say that the absence of some exceptions is literally serializable, it is reasonable, based on studies of isolation level definitions, to check for exceptions to determine whether they are serializable. In addition, finding exceptions is the desired outcome for transaction testing.
Figure 17 – Model designed by Elle
As shown in Figure 17, Elle designs four models, namely register, addition counter, set and List. The methods of detecting execution history are almost the same, and List Append will be used as the object to explain later.
Elle transaction has two stages of generation and execution. In the generation stage, Elle will randomly generate the content that the transaction needs to read and write, and these pre-generated reads and writes will get results in the execution stage. That is, a transaction has two records in its history :invoke in the generation phase and: OK /:fail/:info in the execution phase.
-
Invoke, a transaction is generated and then executed.
-
: OK, the transaction executes and confirms that the commit was successful.
-
:fail to confirm that the transaction has not been committed.
-
:info, the transaction status is not necessarily (such as a connection error while committing).
In order to analyze linear consistency, In the analysis process, Elle considers the dependence of read and write relationship between transactions defined by Adya, and also provides the dependence related to time. The dependence on time can also reflect the sequence relationship of transactions on the premise of meeting the corresponding linear consistency.
-
Process Depend, the order in which transactions are executed in a thread, can be used in systems that support Sequential Consistency.
-
Realtime Depend, the sequence of transactions executed in all threads strictly follows the execution time, which can be used in systems that support Linearizability.
Figure 18 – Time-dependent loop detection
Figure 18 shows an example of Sequential Consistency but not Linearizability. The client executes transactions in two threads, but the client knows the strict order of transactions between the different threads. Note that in a distributed database, These threads may be connected to different database nodes. If the system only satisfies Sequential Consistency, the corresponding dependency graph should look like the one at the bottom left, with no loops. But if the system is Linearizability satisfied, then the dependency graph will become as shown in the bottom right, Txn3 and Txn4 form a ring, in other words, Txn4 occurs before Txn3, but reads the write of Txn3, and this exception, This is not evident from the readwrite relationships defined by Adya alone.
We’ll take a look at several examples of how Elle works and detects database transaction exceptions.
{:type :invoke, :f :txn, :value [[:a :x [1 2]] [:a :y [1]]], :process 0, :time 10, :index 1}
{:type :invoke, :f :txn, :value [[:r :x nil] [:a :y [2]]], :process 1, :time 20, :index 2}
{:type :ok, :f :txn, :value [[:a :x [1 2]] [:a :y [1]]], :process 0, :time 30, :index 3}
{:type :ok, :f :txn, :value [[:r :x []] [:a :y [2]]], :process 1, :time 40, :index 4}
{:type :invoke, :f :txn, :value [[:r :y nil]], :process 0, :time 50, :index 5}
{:type :ok, :f :txn, :value [[:r :y [1 2]]], :process 0, :time 60, :index 6}
Copy the code
Example 1 – contains the execution history of g-SIB
Figure 19-G-SIB dependency diagram
Example 1 is a history of g-SIB exceptions, represented here in Clojure’s EDN format. From the :process attribute, it can be inferred that the first and third lines are Txn1, the second and fourth lines are Txn2, and the fifth and sixth lines are Txn3. From the :time attribute of Txn1 and Txn2 from :invoke to: OK, they are probably in parallel. [[:r :y [2 1]] from Txn3 read, because the List is arranged in Append order, it can be judged that Txn2 occurs before Txn1. However, Txn2 read the X before Txn1 was written, where RW dependence was generated and a ring appeared. In this example, Elle used the characteristics of List to find out the WW dependence that was not easy to judge originally.
{:type :invoke, :f :txn, :value [[:a :x [1 2]] [:a :y [1]]], :process 0, :time 10, :index 1}
{:type :invoke, :f :txn, :value [[:a :x [3]] [:a :y [2]]], :process 1, :time 20, :index 2}
{:type :ok, :f :txn, :value [[:a :x [1 2]] [:a :y [1]]], :process 0, :time 30, :index 3}
{:type :ok, :f :txn, :value [[:a :x [3]] [:a :y [2]]], :process 1, :time 40, :index 4}
{:type :invoke, :f :txn, :value [[:r :x nil]], :process 2, :time 50, :index 5}
{:type :ok, :f :txn, :value [[:r :x [1 2 3]]], :process 2, :time 60, :index 6}
{:type :invoke, :f :txn, :value [[:r :x nil]], :process 3, :time 70, :index 7}
{:type :ok, :f :txn, :value [[:r :x [1 2]]], :process 3, :time 80, :index 8}
Copy the code
Example 2 – May break the execution history of Linearizability
Figure 20 – Dependency diagram that may break Linearizability
Example 2 is an example of Linearizability that may be broken. From the dependency diagram at the top of Figure 20, there appears a ring composed of RW, WR and Realtime dependencies, namely g-SIB phenomenon, but there is a Realtime dependency in this ring. The system may also create the loop by breaking the Linearizability. The destruction of Linearizability can be found more clearly from the figure below. In this case, Elle will report the possible types of anomalies because it cannot be determined which kind of anomaly occurs.
Figure 21 – Example from Elle paper
Figure 21 is an example given in Elle’s paper, where the first action is Txn1, the second action is Txn2, and the third action is Txn3. There is a Realtime relationship between Txn1 and Txn2, and Txn2 reads the List with Key 255 without Txn3 writing, indicating that there is an RW dependence, while Txn1 reads the write of Txn3. Something similar to figure 20 occurs, and the analysis is not extended here.
MIKADZUKI
Elle showed how powerful dependency graphs can be in testing, and inside PingCAP, we tried another way to test databases with dependency graphs. The process of reviewing Elle is to analyze the execution history after execution, convert it into a dependency graph, and determine whether it meets a certain isolation level or consistency level. MIKADZUKI does the opposite. It tries to generate an execution history in the presence of a dependency graph. By comparing the generated execution history with the actual performance of the database, you can find out whether the database is healthy or not.
Figure 22 – MIKADZUKI dependency diagram hierarchy
Figure 22 is a diagram hierarchy within MIKADZUKI, where transactions in a Process are executed serially and transactions between processes are executed in parallel. Both Depend and Realtime represent the order in which the transaction is executed. Therefore, when the transaction is generated, these two kinds of dependencies will not make the transaction form a loop.
Figure 23 – MIKADZUKI execution process
Figure 23 shows the MIKADZUKI execution flow, with four stages in a round:
-
Generate a Graph without a ring;
-
Fill the write request in Graph with randomly generated read/write data expressed in KV form, where Key is the primary Key index or unique index, and Value is the entire row of data;
-
From the write request, the result that the read request should read is inferred according to the dependency between transactions.
-
According to the transaction dependency description in the graph, the transaction is executed in parallel. If a different result is read from the prediction in the third step, it indicates that the result is wrong.
This test method helped us find some problems. In the later stage of the experiment, we tried to add the dependency created by creating a loop. The loop created by WW dependency would have deadlocks under normal execution, while the deadlock detection was not easy to find in the previous test method, because the deadlock detection would not cause any exception.
summary
By checking the execution history, we can find exceptions as much as possible, and the complexity of checking the history is greatly reduced thanks to academic research on isolation levels and consistency. Elle then designs models that provide clues for analyzing the relationships between transactions, making it possible and effective to examine the complete history.
Fault injection
Murphy’s law states that mistakes will occur wherever mistakes can occur, and that even the smallest probability of error will occur one day. On the other hand, the scale and quantity of the test environment are much smaller than the production environment. Under normal circumstances, most of the errors will occur in the production environment, so almost all the rare system bugs caused by this will also occur in the production environment, which is obviously not expected to happen.
In order to keep bugs in the test environment, we will use a number of methods for fault simulation, including:
-
Failpoint, which injects errors into the process.
-
Chaos Test, simulation of external failures, closer to the real situation.
Failpoint
The Failpoint test is used to inject some problems into the process. It can be enabled or not at compile time. In normal releases, Failpoint is disabled. TiDB controls Failpoint by folding code, and TiKV by macros and compile-time environment variables. Through Failpoint, we can efficiently simulate some rare but possible situations:
-
There are code paths that are hard to access and may be important guarantees of correctness;
-
A program can be killed at any node;
-
Code execution can hang on any node.
// disable
failpoint.Inject("getMinCommitTSFromTSO", nil)
// enable
failpoint.Eval(_curpkg_("getMinCommitTSFromTSO"))
Copy the code
Example 3 – Enabling Failpoint
Example 3 is a simple example of opening Failpoint. In closed state, Inject function does nothing. When Failpoint is opened, Inject function becomes Eval. In this case, HTTP requests can be used to control Failpoint’s behavior, including:
-
Artificially added sleep;
-
Let goroutine panic;
-
Suspend execution of this goroutine;
-
GDB access is interrupted.
// disable
failpoint.Inject("beforeSchemaCheck", func() {
c.ttlManager.close()
failpoint.Return()
})
// enable
if _, _err_ := failpoint.Eval(_curpkg_("beforeSchemaCheck"));
_err_ == nil {
c.ttlManager.close()
return
}
Copy the code
Example 4 – Inject variables using Failpoint
In example 4, TiDB does more injection with Failpoint, in which the TTL Manager is turned off, which results in a fast expiration of pessimistic locks and interrupts transaction commits. In addition, you can modify variables in the current scope with Failpoint. Without Failpoint, these failures would have been rare, but with Failpoint, we can quickly test whether exceptions such as breaking consistency occur when failures occur.
Figure 24 – Injection of Failpoint submitting Secondary Keys
Figure 24 shows that during the commit phase of the two-phase commit, the effect of delaying or skipping the Secondary Keys is achieved through injection, which is rarely seen under normal circumstances.
Chaos Test
In a distributed system, it’s hard to expect developers to always write the right code, and in fact most of the time we don’t get it completely right. If Failpoint is a fine-grained control of what a piece of code might do, it is a drill. Then Chaos Test is the indiscriminate destruction of the system, is the real battlefield.
Figure 25 – Chaos Test concept diagram
FIG. 25 is a concept diagram of Chaos Test. The biggest difference between Chaos Test and Failpoint is that Failpoint is still the place where the developer assumes that an exception may occur. It is impossible to design Failpoint comprehensively, and Chaos Test adds an extra layer of insurance to this flaw.
During the development of Chaos Mesh, we also made many attempts, such as:
-
kill node
-
Power off test of physical machine
-
Network latency and packet loss
-
Machine time drift
-
I/O Performance Limitation
The Chaos Mesh Test was proven to work at PingCAP prior to its official release, and we are sharing the results with the community through Chaos Mesh.
conclusion
Before I wrote this article, I was thinking a lot about what I could share about transactional testing. And there was a time when I felt like I had nothing to talk about. However, when I tried to explain some testing methods, I realized that testing is a very esoteric and easy to be ignored knowledge, we spent a lot of thought in the process of database development in the design and running of tests, this article is just the tip of the iceberg of transaction testing system. All tests are for better product quality, transaction as one of the core features of the database, should be paid more attention to. Through the fire and the flames, we carry on.