Catching bugs before even running the code – PART 2: Safe memory management and concurrency in Rust
In this blog I will go through a couple of features of the Rust programming language and explain how the features encourage writing safe code and avoid some pitfalls that cause headaches even to experienced programmers in other programming languages.
ABOUT THE AUTHOR
My name is Otto, and I am a software developer who has worked different areas of the field such as mobile phone software development, back-end development and embedded systems development. In addition to Rust, I have mostly been programming in C and different flavors of C++ on different operating systems and hardware platforms. I joined Symbio in January 2021.
ABOUT THIS BLOG
In this blog I will go through a couple of features of the Rust programming language and explain how the features encourage writing safe code and avoid some pitfalls that cause headaches even to experienced programmers in other programming languages.
DYNAMIC MEMORY MANAGEMENT AND BORROW CHECKER
Reserving memory and releasing it when it is not needed anymore are operations that almost all computer programs do frequently. In some programs, usually in embedded systems, it is possible to allocate all the needed memory in the beginning and hold onto it for the duration of the program, but this is often not feasible when a lot of memory is needed, so the memory is allocated dynamically.
There are different approaches in programming languages to manage dynamically allocated memory. In some languages the programmer must carefully match each allocation with a deallocation. Mistakes in doing so lead to problems that are very difficult to investigate. In some languages the memory is managed by a garbage collector that sweeps through the memory occasionally. Sometimes this slows the program down. Reference counting is another approach to do the same thing.
Rust has a novel feature known as the borrow checker that helps with memory management. When memory is reserved dynamically the Rust compiler will track borrows of the data and the memory will be released after there are no borrows left. This will happen just once when the program is being compiled. There will be no performance penalty when running the program. There are rules how data can be borrowed, and this is usually something that new Rust programmers spend some time getting adjusted to. When writing Rust programs, the programmer must pay special attention to which entity owns the data and whether the data is being passed forward as a reference, a mutable reference, or a clone. It is also possible to put data into reference counting wrappers or containers that allow interior mutability.
It is possible to explicitly leak memory if for some reason the memory must remain allocated for the duration of the program. Functions “std::mem::forget” and “Box::leak” can be used for that purpose. Also, in some specific corner cases like calling “std::process::exit” the compiler will not guarantee that memory will be deallocated, or that destructors would be run for file descriptors or other system resource handles.
Since borrow checking runs during compilation, there are no performance penalties when running the program. In comparison to garbage collected languages such as Java and Go where the programmer can forget about freeing the memory, the borrow checker approach offers predictable run-time performance: there will be no pauses when the unused data is being disposed of. Reference counting wrappers might have some impact on performance depending on whether they are shareable between threads.
In comparison to languages like C where the programmer must explicitly free the memory that is not used anymore the borrow checker eliminates the type of use after free errors that happen when the data has been disposed of but is still being referenced and used in some part of the program. Another typical problem with languages without garbage collection is that some references to dynamically allocated memory are forgotten, but the memory itself is not freed. These are called memory leaks. With Rust it is actually very hard to write code that results in memory leaks, but in some corner cases it is possible to create loops between reference counted variables that prevent the memory from being freed.
SAFETY GUARANTEES IN MULTITHREAD PROGRAMS
The way the Rust compiler tracks ownership in programs also helps designing concurrent and parallel programs. In concurrent programs, multiple separate tasks can be in progress at the same time and the task
that is being worked on currently can change suddenly. For example, if a higher priority work comes in the less important work item can be paused and continued after the more important task is done. In parallel programs there are multiple ongoing tasks but in addition, more than one task is being carried out at the same time. For example, there might be multiple operators working on items of different priority at the exact same time.
In many programming the shared data is usually protected with guards such as mutexes or semaphores provided by the operating system or the language core. However, if the guards are not used properly this may lead to data races or deadlocks. In data races the result of the computation depends on the order in which the separate threads access the data. A deadlock occurs when multiple threads are waiting on each other, like four vehicles waiting for each other in an intersection of two streets.
In Rust the borrow checker prevents data from being shared between threads in an unsafe manner. Let us consider a program that spawns one thread from the main process: if the data is not needed anymore in the main process, the data can be moved to the thread and the thread will take ownership of the data. If both the main process and the thread need access to the data, the data must be wrapped in thread-safe containers. If a mutex guard is used it will take care of releasing the mutex when exclusive access to data is not needed anymore.
It is worth noting that it is still possible to create deadlocks in Rust programs with the standard wrappers. Data races are not possible in safe Rust so you can count on shared data being in a coherent state whenever it is accessed.
Data structures that provide safety in multithreaded programs come with a performance penalty. If you are writing a single-threaded program you can use more lightweight and faster structures in Rust, like “Rc” instead of “Arc”.
HOW TO TAKE RUST IN USE IN YOUR PROJECT
Often it is not feasible to completely rewrite working code from scratch. A lot of work has been done to carefully balance between different corner cases, and a rewrite without a comprehensive regression test suite can cause some old problems to resurface.
Completely new features are a one possible place to start experimenting whether Rust development would be the right choice for your software project. For interoperability with existing code, you can use C API, 3rd party packages for other languages, compiling Rust to WebAssembly or using UNIX sockets or Internet protocols. There are a lot of options to choose from for different types of software projects from embedded to web development.
If there is a known stability problem with just one part of the program, it might be worthwhile to study whether it is possible to rewrite that part of the program in Rust. Especially when writing code that deals with concurrency it is in fact harder or sometimes outright impossible to write unsafe code in Rust, as the compiler mandates that the safety guarantees must be in place before it allows the program to compile.
With regards to performance, there are a lot of benchmarks but usually they show that Rust gets close to C and C++ with regards to performance. Rust can even be the fastest of the group because strict aliasing rules allow optimizations that might not be possible in C or C++. Before replacing production code in performance critical parts of the program it is wise to do careful benchmarks between the production code any proposed alternative.
One example of introducing Rust to an existing codebase is Google’s effort to use Rust for writing Linux kernel drivers. [Google Online Security Blog: Rust in the Linux kernel (googleblog.com)]. While there is no official support for Rust as a language to write kernel drivers, Linux kernel maintainer Linus Torvalds has stated “drivers are probably the first place for an attempt like this”. [Linus Torvalds weighs in on Rust language in the Linux kernel | Ars Technica]
This code example demonstrates usage of reference counting wrappers in multithreaded applications. The module implements an ephemeral data storage that can be used to store values that can be indexed by a path, like a URL. In the insert function there is a loop that goes deeper into the tree as it goes through the given path. The borrow checker can make it difficult to write loops like this where a data structure is changed in a loop. The Entry API provided by the standard library makes the job easier here. You can view the code in GitHub.
Otto Harju, Software Engineer
01.06.2021 | Articles