Abstract: Today’s builders of demanding distributed systems face challenges that cannot be completely solved by traditional object-oriented programming (OOP) models, but can benefit from the Actor model.
Why do modern systems need a new programming model?
The Actor model was proposed by Carl Hewitt several decades ago as a way of parallel processing in high performance networks – when high performance networks were not yet available. Today, hardware and infrastructure capabilities have caught up and surpassed Hewitt’s vision. Thus, builders of demanding distributed systems encounter challenges that cannot be fully addressed by traditional object-oriented programming (OOP) models, but which can benefit from the Actor model. Today, the Actor model is not only recognized as an efficient solution — it has been tested by the most demanding applications in the world. To highlight the problems solved by the Actor model, this topic discusses the following mismatches between traditional programming assumptions and modern multi-threaded, multi-CPU architectures:
- The challenge of encapsulation
- The illusion of shared memory in modern computer architectures
- The illusion of a call stack
The challenge of encapsulation
A core pillar of OOP is encapsulation. Encapsulation indicates that the internal state of an object cannot be accessed directly from outside; It can only be modified by calling a set of auxiliary methods. Objects are responsible for exposing security operations that protect the immutability of the data it encapsulates. For example, operations on an ordered binary tree are not allowed to violate the order of the tree. Callers want to maintain order, and they need to be able to rely on this constraint when querying for a particular piece of data in the tree. When analyzing OOP runtime behavior, we sometimes draw a message sequence diagram to show the interaction of method calls. Such as:
Unfortunately, the diagram above does not accurately represent the lifelines of objects during execution. In effect, one thread performs all the calls, and the invariant constraints on all objects appear in the same thread in which the method is called. Update the thread execution graph, which looks like this:
The importance of the above becomes apparent when attempting to model multithreaded behavior. All of a sudden, our neat diagrams are not enough. We can try to explain multithreading access to the same object:
There is an execution part where two threads call the same method. Unfortunately, the object’s encapsulation model does not guarantee what happens when this part is executed. Without some coordination between the two threads, the two calling instructions will interleave each other in arbitrary ways that do not guarantee invariability. Now, imagine this problem complicated by the existence of multiple threads.
A common solution to this problem is to add a lock to these methods. Although this guarantees that at most one thread will execute the method at any given time, this is an expensive strategy:
- Locking severely limits concurrency and is costly in modern CPU architectures, requiring the operating system to shoulder the burden of suspending the thread and then resuming it.
- The caller thread is blocked, so it can’t do any other meaningful work. This is not acceptable in desktop applications, and we want to make the application’s user interface (UI) responsive even when a long background job is running. In the background, blocking is completely wasteful. One might imagine that this could be remedied by starting a new thread, but threads are also an expensive abstraction.
- Locks introduce a new threat: deadlocks
These facts lead to a no-win situation:
- Without enough locks, the state will be broken
- With enough locks, performance suffers and deadlocks can easily result
In addition, locks are only useful locally. When cross-machine coordination is involved, the only alternative is distributed locking. Unfortunately, distributed locks are orders of magnitude less efficient than local locks and limit scalability. Distributed locking protocols require multiple rounds of communication across machines in a network, so latency skyrockets.
In object-oriented languages, we generally don’t think much about lines or linear execution paths. We often think of a system as a network of object instances that respond to method calls, modify their internal state, and then communicate with each other through method calls to drive the entire application state forward:
However, in a multithreaded distributed environment, what actually happens is that threads thread through the network of object instances along method calls. Thus, threads are the true drivers of execution:
【 summary 】
- An object can only be guaranteed encapsulation (unvarying protection) if accessed by a single thread, and multithreaded execution almost always results in a corruption of the internal state of the object. Each variant can be violated by two competing threads in the same code segment.
- While locks may seem like a natural remedy for encapsulation when maintaining multiple threads, in reality, locks are inefficient and can easily lead to deadlocks in any real-world application.
- Locks are useful locally, but trying to make locks distributed offers limited potential for extension.
The illusion of shared memory in modern computer architectures
The programming model of the ’80s and’ 90s defined that writing a variable meant writing directly to a memory location (which at some point confused the fact that local variables might only exist in registers). In modern architectures, if we simplify, CPUs write to the cache line rather than directly to memory. Most Caches are locally CPU private, that is, one kernel writes variables that are not seen by other cores. In order for local changes to be visible to other cores, the cache row needs to be passed to the cache of another core for another thread.
In the JVM, we must explicitly indicate the location of memory shared between threads by using volatile or Atomic. Otherwise, we can only access the memory in the locked part. Why don’t we mark all variables as volatile? Because sending cache rows across cores is a very expensive operation! Doing so implicitly stops the cores involved in doing extra work and leads to a bottleneck in the cache consistency protocol. (protocols used to transfer cache rows between main memory and other CPUs). The result is an order of magnitude slower execution.
Even for developers who know this, figuring out which memory locations should be volatile or which atomic structures to use is a dark art. 【 summary 】
- There is no true shared memory, and CPU cores, like computers on a network, explicitly pass chunks of data (cache rows) to each other. Communication between cpus and communication between computers on a network have more in common than many people realize. Messaging is now standard for computers across CPUs or networks.
- Rather than hiding the layers of messaging through variables marked as shared or using atomic data structures, a more formal and principled approach is to store state locally to a concurrent entity and explicitly transfer data or events between concurrent entities through messages.
The illusion of a call stack
Today, we often take the call stack for granted. However, the call stack was invented in an era when concurrent programs were less important because multi-CPU systems were not common. The call stack does not span threads and therefore does not model asynchronous call chains.
A problem occurs when a thread tries to delegate a task to the background. In practice, this means delegating to another thread. This is not a simple method/function call because the call is strictly internal to the thread. Typically, the caller thread puts an object into a memory location shared with a worker thread (Callee), which in turn retrieves the object in some circular event. This allows the caller thread to run forward and perform other tasks. The first question is: How can the caller thread be notified that the task is complete? But a more serious problem arises when a task fails with an exception. Where should the exception propagate? The exception will be propagated to the exception handler of the worker thread, completely ignoring who the real caller is:
This is a serious problem. How does the worker thread handle this situation? It may not be able to solve this problem because it usually does not know the purpose of the failed task. The caller thread needs to be notified in some way, but there is no call stack to return an exception. Failure notification can only be done over a side channel, for example, by placing an error code where the caller thread would have expected the result to be ready. If this notification is not in place, the caller thread will not be notified of task failure and loss! This is strikingly similar to how networked systems work – messages and requests can be lost or failed without notice. This bad situation gets worse when a task fails and a worker thread encounters a bug that cannot be recovered. For example, an internal exception caused by a bug is passed up to the root of a worker thread and causes it to shut down. This immediately raises the question of who should restart the normal operation of the service held by the thread, and how to restore it to a known good state. At first glance, this seems easy enough, but suddenly we encounter a new and unexpected phenomenon: the actual task that the thread is executing is no longer at the shared memory location (usually a queue) where the task was fetched. In fact, as the exception reaches the top, all call stacks are expanded and the task state is completely lost! We have lost a message, although this is local communication and does not involve a network (message loss is to be expected). 【 summary 】
- To achieve meaningful concurrency and performance in today’s systems, threads must delegate tasks to each other in an efficient, non-blocking manner. With this concurrent approach to task delegation (especially in network/distributed computing), stack call-based error handling becomes ineffective, and new, explicit error signaling mechanisms need to be introduced. Failure becomes part of the domain model.
- Task-delegated concurrent systems need to handle service failures and have a principled way to recover them. Clients of such a service need to know that tasks/messages will be lost during a restart. Even if not lost, a response may experience arbitrary delays due to previous tasks in the queue (a very long queue), delays due to garbage collection, and so on. In these cases, concurrent systems should treat response cutoff times in the form of timeouts, just like network/distributed systems.
IO /docs/ Akka /c…
This article is shared from huawei cloud community “[Akka series] Why modern systems need a new programming model?” , original author: Lizi.
Click to follow, the first time to learn about Huawei cloud fresh technology ~