Main points of this paper:
- Erlang offers lightweight processes, immutable, location-transparent distribution, messaging, supervised behavior, and many other advanced dynamic features that make it ideal for fault-tolerant, highly available, and scalable systems.
- Unfortunately, Erlang is not ideal for low-level things like XML parsing, because dealing with anything other than the Erlang virtual machine is cumbersome
- In this case, consider using a different language. Rust, in particular, has recently come to the forefront with its hybrid feature set, which makes similar promises to many aspects of Erlang, with a slight improvement in performance and security.
- Because Rust and Erlang take radically different approaches to many key aspects of language design, including memory management, variation, sharing, and so on, they are quite different at the Erlang BEAM layer. BEAM provides basic support for fault tolerance, extensibility, and other fundamental features of Erlang that are not available in Rust.
- Therefore, while Rust cannot be seen as a replacement for Erlang, it is possible to mix the two languages in the same project to take advantage of their capabilities.
During my two-year stint as a telecom network simulator programmer, I took advantage of Erlang’s concurrency, fault tolerance, and distributed computing features for many CPU-intensive applications.
Erlang is a high-level, dynamic, functional language that provides lightweight processes, immutability, location-transparent distribution, messaging, supervised behavior, and more. Unfortunately, it’s not ideal for the underlying work, and obviously that’s not their primary intent. For example, one of the most typical use cases is XML parsing, which Erlang is not good at. In fact, XML sections must be read from the command line or the network, and processing anything other than the Erlang virtual machine is tedious work. You probably know the problem. In this case, consider using a different language. Rust, in particular, has recently come to the forefront with its hybrid feature set, which has similar promises to many aspects of Erlang, with added benefits in terms of bottom-level performance and security.
Rust compiles to binary and runs directly on hardware, just like your C/ C ++ programs. How is it different from C/ C ++? A lot of. Its motto says, “Rust is a system programming language that runs very fast, prevents segment errors, and ensures thread-safety.”
This article will focus on the comparison between Erlang and Rust, emphasizing their similarities and differences, which may be of interest to both Erlang developers who study Rust and Rust developers who study Erlang. The final section details each language capability and disadvantage.
invariance
Erlang: Variables are immutable in Erlang, and once bound cannot be changed, they cannot be rebound to different values.
Rust: Variables in Rust are also immutable by default, but can easily be made mutable by adding muT keywords to them. Rust also introduced the concepts of ownership and lending to effectively manage memory allocation. For example, string literals are stored in executable files, strings are transferred when assigned to other variables, whereas integers (i32,i64, U32…) , float(f32,f64) and other primitive data types are stored directly on the stack.
Pattern matching
Erlang: The advantage of Erlang’s simplicity of code is its pattern-matching capability. Case statements and “=” (equal sign) can be used anywhere, including the function name, the number of arguments, and the arguments themselves.
Rust: In let binding, the = symbol can be used for binding as well as pattern matching. In addition, Rust Match is similar to the Case statement in Erlang and the Switch statement in most other languages in that it tries to match patterns in multiple cases and then branches to the matched one. Feature/method overloading is not built into Rust, but it can use traits. Irrefutable patterns match anything, and they always do. For example, in let x=5, x is always bound to the value 5. Conversely, inconclusive patterns may not match in some cases. For example, if let Some(x) = somevalue explicitly states that somevalue should handle any value other than None. Conclusive patterns can be used directly in a let binding, while unconclusive patterns can be used in an if let, while let, or match structure.
cycle
Erlang: Loops can be completed using recursion or list derivation in Erlang.
Rust: In imperative languages, loops occur in common ways like for, while, and loop, with a basic loop structure. In addition, there are iterators.
Closures and anonymous functions
Erlang: Erlang has anonymous functions that can be declared by enclosing code blocks with the fun and end keywords. All anonymous functions use the current context closure and are passed across processes on the same node or other connected nodes. Anonymous functions add great value to the Erlang distributed mechanism.
Rust: Rust also supports closures using anonymous functions. These can also “capture” the environment and be executed elsewhere (in a different method or thread context). Anonymous functions can be stored in a variable that can be passed as a function and as a parameter across threads.
Lists and tuples
Erlang: Lists are dynamic, one-way linked lists that can store any Erlang data type as an element. Elements in a list cannot be retrieved by index and must be iterated from scratch (unlike arrays in Rust). Tuples are fixed in size and cannot be changed at run time. They can be pattern matched.
Rust: Like lists in Erlang, Rust has vectors and arrays. Arrays are fixed-size and can be used if the size of the element is known at compile time. Vectors are internal linked lists (similar to lists in Erlang). Vectors are used when size changes dynamically, either plain or double-ended. Normal vectors are unidirectional, whereas biendian vectors are bidirectional lists that can grow at both ends. Rust also has tuples that cannot be changed at run time. If the function needs to return multiple values, you can use tuples. Tuples can also be pattern-matched.
The iterator
Erlang: Iterators in Erlang are used with lists. The list module provides a variety of iteration mechanisms, such as Map, filter, ZIP, drop, and so on. In addition, Erlang supports list derivation, which uses generators as lists and can perform operations on each element in the list based on predicates. The result is another list.
Rust: Vectors, two-ended vectors, and arrays can be used by iterators. Iterators in Rust are lazy by default. The source is not consumed unless there is a collector at the end. Iterators provide a more natural way to use any list data type than traditional loop constraints, such as loops, because they are never out of scope.
Record and Map
Erlang: Records are fixed-size structures defined at compile time, whereas Maps are dynamic and their structures can be declared or modified at run time. Maps are similar to hashMaps in other languages in that they are used as key-value stores.
Rust: Rust supports declaring structures at compile time. Structures cannot be modified at run time; for example, members cannot be added or removed. Because Rust is a low-level language, constructs can store references. References need to use lifecycle parameters to prevent dangling references. Rust has a standard collection library that supports many other data structures, such as Maps, sets, sequences, and so on. All of these data structures can also be lazily iterated.
String, Binary, and Bitstring
Erlang: A string in Erlang is simply a list of ASCII values for each character stored in a one-way linked list. Therefore, it is always easier to append characters at the beginning of a string than at its end. Binaries are special in Erlang because they are like contiguous byte arrays that make up bytes (8-bit sequences). A Bitstring is a special case of Binary, which stores sequences of bits of different sizes, such as three 1-bit sequences, one 4-bit sequence, and so on. The length of the bit string need not be a multiple of 8. String, Binary, and Bitstring support higher levels of convenience syntax that make pattern matching easier. Therefore, if you are doing network programming, packaging and unpacking a network protocol package is simple.
Rust: In Rust, there are two types of strings. String literals are stored neither on the heap nor on the stack, but directly in the executable. String literals are immutable. Strings can have dynamic sizes. In this case, they are stored on the heap, and their references are kept on the stack. If strings are known at compile time, they are stored as literals, while strings unknown at compile time are stored in the heap. This is an efficient way to identify the memory allocation strategy at compile time and apply it at run time.
The life cycle
Erlang: Variables are bound only within the scope of the function and are released by the garbage collector specific to the current process. Thus, each variable has the same lifetime as the function that uses it. That is, programs should be modularized into functions as much as possible to use memory efficiently. In addition, you can even use special triggers to trigger garbage collection by calling Erlang: GC () when needed.
Rust: Rust has no garbage collection Rust uses a lifecycle to manage memory. Each variable in a scope (separated by curly braces or the body of the function) is given a new life cycle if it is not lent or referenced from the parent process. The life cycle of a variable does not end at the end of the scope from which the variable is borrowed; it only ends at the end of the parent scope. Thus, the life cycle of each variable is managed either by the current scope or by the parent scope, and the compiler ensures this. During compilation, Rust injects code to remove the values associated with the variable when its life cycle ends. This approach avoids using garbage collection to determine which variables can be released. Rust provides fine-grained control over memory by managing the lifecycle within functions. Unlike the Erlang function, which triggers garbage collection at the end of a function, in Rust you can divide your code into multiple scopes using {}, and the compiler will place drop code at the end of each scope.
Variable binding, ownership, and lending
Erlang: Erlang has a simple binding method. If a variable was previously unbound, the occurrence of any variable is bound to the value on the right, otherwise it is pattern matched. Any type in Erlang can be bound to a variable. Variables are bound only in the context of the function in which they appear and are released by the current process-specific garbage collector when no longer in use. Ownership of data cannot be transferred to different variables. If another variable in the same function context wants to have the same data, it must clone that data. This is in line with Erlang’s philosophy of not sharing anything, and enables the use of cloned values to be safely sent to different nodes or processes without competing for data. In Erlang, there are no references and therefore no borrowings. All data is allocated to the heap.
Rust: Ownership and lending are two powerful concepts in Rust that set the language apart from the mainstream. This is why Rust is considered a very important low-level data-nothing-content-language, which provides memory security without the need for a garbage collector, thereby ensuring minimal runtime overhead. Ownership of data belongs to one variable, which means that no other variable can share ownership of the data. If necessary, ownership is transferred to a different variable assignment and the old variable is no longer valid. If a variable is sent to a function as an argument, ownership is also transferred. This operation is called move because ownership of the data is transferred. Ownership helps manage memory efficiently.
Ownership rule: Each value has an explicit owner at some point: if the owner is out of range, the value is garbage collected.
Lending occurs when ownership of a value is temporarily borrowed from the variable that owns it to a function or variable, either mutable or immutable. Once borrowing goes beyond the scope of a function or {} separator block, ownership returns. During borrowing, the parent function/scope has no ownership of the variable until the borrowed function/scope ends.
The lending rule: For a variable, there can be any number of immutable references, but only one immutable reference within a range. In addition, mutable and immutable references cannot coexist in the same scope.
Reference counting
Reference counting is used to track the use of variables by other processes/threads. The reference count is increased when a new process/thread holds the variable and decreased when a process/thread exits. When the count reaches 0, the value is deleted.
Erlang: When data is passed across multiple processes in Erlang, the data is passed through a single message. This means that it is copied to another process’s heap, not referenced. Data replicated within a process is garbage collected by per-process garbage collector at the end of its life cycle. However, binaries larger than 64KB are reference-counted when passed across the Erlang process.
Rust: When data is shared between threads, it is not replicated to improve efficiency. Instead, it is wrapped by a reference counter. References are special because multiple mutable references can be passed to multiple threads, but are mutually exclusive for data synchronization. References to immutable data do not need to be mutually exclusive. All relevant checks are done at compile time and help prevent data contention in Rust.
The messaging
Erlang: Messaging in Erlang is asynchronous. Suppose a process sends a message to another process. If the lock is immediately available, the message is copied to another process mailbox. Otherwise it will be copied to a heap fragment and the receiving process will get it at a later point in time. This enables true asynchronous and data uncontested behavior, albeit at the expense of copying the same message in another process’s heap.
Rust: Rust has channels, like water flowing between two points. If you put something in a stream, it flows to the other side. Each time a Rust channel is created, a transmit and a receive handler are created. The transmitting processor is used to put messages on the channel and the receiving processor reads them. Once an emitter places a value on a channel, ownership of the value is transferred to that channel, and if another thread reads the value from that channel, ownership is transferred to that thread. When using channels, the principle of ownership remains, with only one owner per value. When the last thread exits, the resource is garbage collected.
Shared mutation
Erlang: Sharing is a sin in Erlang, but Erlang allows controlled mutations using Erlang Term Storage(ETS). ETS tables can be shared across multiple tables and synchronized internally to prevent contention. ETS can be tuned to bring high read concurrency or high write concurrency. The entire table can be attached to a set of processes, and if all of these processes exit, the entire table is garbage collected.
Rust: As a low-level language, Rust provides a way to share mutations in resources. With reference counting and mutex, resource access is synchronized with multiple thread mutations. If multiple threads sharing the same resource exit, the resource is garbage collected by the last thread to exit. This provides a clean and efficient way to share, mutate, and clean up resources.
behavior
Erlang: Behavior is the formalization of a common pattern. The idea is to divide the code of a procedure into a generic part (the behavior module) and a specific part (a callback module). You just need to implement some callbacks and call specific apis to use the behavior. There are various standard behaviors, such as GenServer, GenFSM, genSupervisor, etc. For example, if you want a standalone process that can run continuously like a server, listening for asynchronous and synchronous calls or messages, then you can implement its GenServer behavior. It can also implement custom behavior.
Rust: If you have a set of methods that are common across multiple data types, they can be declared as a trait. Features are Rust interfaces that are extensible. Traits eliminate the need for traditional method overloading and provide a simple pattern for operator overloading.
Memory allocation
Erlang: Variables are dynamically strongly typed in Erlang. Type definitions are not provided at run time, and type conversions are minimized at run time to prevent type errors. When the program runs, variables are allocated dynamically on the heap of the underlying OS thread and released at garbage collection.
Rust: Rust is a static, strict, and inferential language. Static means that the Rust compiler checks for types at compile time to prevent type errors at run time. Some types are inferred during compilation. For example, String variables originally declared as String are assigned to different variables. There is no need to implicitly declare the type; the data type of the new variable is inferred by the compiler itself. The compiler tries to determine which variables can be allocated on the stack and which variables can be allocated on the heap, so Rust memory allocation is very efficient and fast. Unlike Erlang, Rust largely uses the stack to allocate all data types whose sizes are known at compile time, while dynamic data types (Strings, Vectors, and so on) are allocated on the heap at run time.
Scalability, fault tolerance, distributed
Erlang BEAM is a unique feature of Erlang. BEAM is built in a way that ensures scalability, fault tolerance, distribution, and concurrency.
How does Erlang extend? Unlike native threads in the operating system, BEAM can support lightweight processes called green threads, which are typically isolated with few native operating system threads. Literally, a million or more Erlang processes can be separated from any one native operating system thread. This is made possible by allocating large chunks to local threads and sharing them across multiple Erlang processes. Each Erlang process gets a block to store all its variables. Since it can be as small as 233 words, the heap of native operating system threads could easily handle a million processes. In addition, communication between processes is hardly a bottleneck, thanks to Erlang’s built-in asynchronous messaging. A process is never blocked in order to send a message to another process: it either tries to get a lock on another process’s mailbox and puts the message directly into it, or it puts the message into a separate heap fragment and attaches that heap fragment to the other process heap. The Erlang virtual machine also has built-in distribution capabilities to run processes and interact with them transparently across machines.
How does concurrency work in Rust? When you use native operating system threads, they are scheduled by the operating system scheduler. When you use native operating system threads, they are scheduled by the operating system scheduler. For example, if you are running Linux, scheduling efficiency decreases with the number of threads. However, Erlang’s BEAM decouples and manages multiple green threads from a local operating system thread. By default, each process is assigned 2000 decays (every operation in Erlang has a decaying budget, where 1 decays roughly equal to a minimum function call) and is allowed to run until the allocated decays are exhausted, followed by preemption. During preemption, the next Erlang process in the runqueue is scheduled to run. This is how each Erlang process is scheduled.
How does the BEAM layer manage memory? As we mentioned, the heap for each local operating system thread is shared between multiple Erlang processes. Whenever the Erlang process needs more memory, it looks for available memory in the local operating system thread heap and takes it (if available). Otherwise, depending on the data type requested, a particular memory allocator service will attempt to retrieve a chunk of memory from the OS using malloc or MMAP. BEAM makes efficient use of a memory block in multiple processes by dividing it into multiple carrier blocks (containers of memory blocks managed by the allocator) and providing each Erlang process with the correct carrier. Based on current needs, such as reading a large number of XML sections from a network socket, BEAM dynamically calculates how much memory to allocate, the number of carriers to allocate memory to, how many carriers to hold after GC cycles are freed, and so on. Freed blocks of memory are merged almost immediately after reallocation, making the next allocation faster.
How does Erlang garbage Collection work? Erlang provides a per-process garbage collector that uses generational markers to clean up garbage collection algorithms. In conjunction with Erlang’s built-in non-sharing, collecting garbage from one process does not interfere with other processes in any way. Each process has a young heap and an old heap. Younger heaps have more frequent garbage collections. If some data survives two consecutive young garbage collection cycles, it will be moved to the old heap. The old heap is garbage collected only when it reaches a specified size.
How does Erlang’s fault tolerance work? Erlang sees failure as inevitable and tries to prepare for it. Any normal Erlang application needs to follow a supervisory hierarchy, in which each Erlang process needs to be supervised by a supervisor. Supervisors are responsible for restarting work processes under their control based on the type of failure. The supervisor can also configure restart policies for workers based on the type of worker monitoring, such as one-to-one (each worker process exits only one worker process), one-to-many (if one worker process exits, all worker processes are restarted), and so on. BEAM provides links to propagate exit signals between processes, as well as monitors to monitor exit signals that propagate between processes within the same BEAM VM, and transparently transfer locations across distributed BEAM VMS. Erlang’s BEAM can also load code dynamically on one or all virtual machines at a time. BEAM is responsible for loading code changes in memory and applying them. Tell BEAM about the order in which modules are loaded, state management, and the extra effort required to prevent any undefined process state.
In contrast to Erlang, Rust does most of the work at compile time and very little at run time. Since most system programming languages lack memory security at run time, Rust does its best to ensure that code compiles without problems at run time. While BEAM ensures memory safety at run time, sometimes overhead can get very complicated, so Rust chooses compile time.
The core language feature of Rust is to be as concise as possible. An example: Rust often builds lightweight green threads (similar to Erlang processes) at night. At some point, this feature was consciously removed because it was not seen as a common requirement for every application, and it came with runtime costs. Instead, the feature can be provided through Crate when needed. While Erlang can also import external libraries, its core functions (such as green threads) are embedded in the VM and cannot be turned off or swapped using local threads. However, the Green thread efficiency of the Erlang Vm is very high, as has been proven for decades, and turning it off is not a common requirement for people who choose to use Erlang.
How does Rust expand? Scaling limits are usually dependent on the availability of communication and distribution mechanisms. As for communication mechanisms, it is debatable whether the Erlang model based on messaging and per-process garbage collection and ETS is more efficient than Rust’s channel of single ownership and shared variation.
In Erlang, any message can be copied to all other processes. The garbage collector does a lot of cleaning up during both sending and receiving. Rust’s channel, on the other hand, is multiple producers and a single consumer. This means that if a message is sent to the consumer, it is not copied and its ownership is transferred to the consumer. The consumer then injects cleanup code at the end of its scope to reclaim the value. By cloning this value for all channels, you can send the same message to multiple consumers. In some cases, Rust’s ownership model combined with predictable memory cleaning may be better than Erlang’s garbage collection.
Another important aspect of communication is shared mutations. In theory, Erlang: ETS is similar to Rust shared mutations with mutexes and reference counts in conjunction with Rust. However, while Rust has very fine-grained mutation units, which are as small as Rust variables, the mutation units in Erlang’s ETS are at the ETS table level. Another significant difference is Rust’s lack of a built-in allocation mechanism.
How is concurrency in Rust? Rust threads default to local threads. The operating system uses its own scheduling mechanism to manage them, so it is a property of the operating system, not the language. Having native OS threads can significantly improve the performance of network, file IO, encryption, and other OPERATING system libraries. Alternatively, you can use some green threads or a co-library with its own scheduler, and you have plenty of options. Unfortunately, there is no stable Crate yet. Rayon is a data parallel library that implements a work-stealing algorithm to balance loads between local threads.
How does Rust do memory management? As discussed, it does a lot of static analysis using concepts of ownership and life cycle to determine which variables can be assigned to the stack and which to the heap. One thing Rust does well here is try to allocate as much data as possible on the stack, rather than on the heap. This greatly improves memory read/write speed.
How is garbage collection done? As mentioned above, Rust marks and determines the life cycle of a variable at compile time. In addition, most variables used by Rust tend to reside on the stack, which is easier to manage. In Erlang, the garbage collector must be triggered at a given time interval to look for unused data in the entire heap and then release it. This becomes more difficult without warning in languages that allow shared references, such as Java. The predictability of garbage collection duration is difficult to achieve in these languages; Java is less predictable than Erlang and Rust is more predictable than Erlang.
How does fault tolerance work? Rust itself has no built-in mechanism for identifying run-time failures and recovering from them. Rust provides basic error handling through the Result and Option types, but it doesn’t guarantee that every unexpected case will always be handled, unless you have a run-time error management framework embedded in your language. Erlang has the upper hand here, providing at least five niners of uptime through consistent use of its monitoring framework and hot code loading. Rust will have to work harder to do that.
conclusion
Erlang and Rust are strong in their respective fields. Erlang has been around for a long time and has proven to be a strong and industry-ready ecosystem in terms of scalability, concurrency, distribution, and fault tolerance. Rust has its own defined features, such as high-level language features that can run at a low level and take advantage of native performance, secure programming, and common features such as concurrency support and provisions for error handling.
In my opinion, if some very complex use cases require all of the above features, an interesting option would be to combine Rust with Erlang as a shared library or NIF (NIF). All data processing, IO operations, and operating system calls can be dumped to Rust and the results synchronized back to the Erlang virtual machine. The goal is to make things easier.
Is Rust a replacement for Erlang? My answer is, no. Erlang BEAM has been proven to be highly scalable, concurrent, distributed and fault tolerant for decades. Erlang has been trying to tackle them with BEAM, extracting common problems so that programmers don’t have to worry about them and can focus on the problem at hand. In contrast, for Rust, we have a lot of options available through community-created Crate, but as a programmer, I need to mix them in the right way. Rust’s other challenge was its steep learning curve. This is definitely a bigger leap for those just starting out or coming from dynamic programming languages. In short, the two languages target different audiences and solve different problems, and it’s probably best to mash up what they do best.
About the author
Krishna Kumar Thokala currently works as an application developer at Thoughttworks. Previously, he worked as a developer on Erlang’s telecom network simulator and as an architect built a configuration management system using Yang Modeling on NetConf. In addition to building software systems, robotics/electronics and industrial automation are his areas of interest. You can follow him on Medium, LinkedIn and Twitter
A Comparison Between Rust and Erlang