Programming with State: Concurrency
- Jaakko KangasharjuService Technologist
Threads are a common abstraction for concurrency. Since each thread has access to the same program memory, the problems of mutable state get exacerbated. Not only can state now change between function calls, it can be changed by another thread while a function is executing. This problem is typically handled by using locks to prevent multiple threads from operating on the same piece of state at the same time, but locks can be difficult to get right.
Looking at the value-entity model from the first post of this series, there are additional problems with multithreading. I mentioned invariants as a useful tool to keep the state of an entity manageable. But since an invariant can be temporarily violated while an entity is modifying itself, a concurrent execution of a state query might observe the inconsistent state. And this leads back to all the problems with mutable state that we were trying to avoid by adopting the value-entity model.
Locks internal to an entity can of course be used to ensure that other threads do not see temporary invariant violations. An additional problem to the usual ones in the entity model is that often we want to allow concurrent invocations on an entity that do not conflict with each other, and designing locking to support this can get complicated. Furthermore, a lock-based system only supports those operations that were thought of by the entity implementor. For instance, a common operation needed for an associative array is insert-if-not-present, but if the array only implements operations for checking the presence and unconditional insertion, this combined operation cannot be safely performed by the caller.
Software Transactional Memory (STM) is an interesting concept that solves many of the problems in lock-based entity-style programming. The optimistic execution makes non-conflicting operations succeed. The ability to wrap atomic operations into larger atomic operations solves the composability of operations. And finally, when programming with the observable-state model that emits modification events, STM prevents such an event from being emitted before all state modification has completed. STM is not very mainstream, though, so even though libraries exist for many programming languages, it would be difficult to justify its adoption.
From the word “entity”, one might expect a different kind of execution. An entity brings to mind something autonomous and separate, but this is not how threaded execution sees things. There are no enforced entity boundaries. Any thread can just come and execute an entity’s code at any time. What was architecturally good encapsulation in a single-threaded system completely disappears in a multi-threaded system.
My preferred system following this thinking would have each entity “living” in a single thread. All of the code belonging to an entity would always be executed on that entity’s thread. This is similar to many UI frameworks that require UI component manipulation to happen in the UI thread, except that I would prefer the system to take care of thread switching instead of having to do it manually. Such a system would make programming much more asynchronous, since each entity invocation could involve a thread switch, but I have written code in this style, and it is not too difficult with, say, the async/await mechanism.
This kind of view of objects as independent entities that communicate with each other is not new. It resembles Alan Kay’s definition of object-oriented programming, which is focused on messages between objects. When looking from the concurrency perspective, processes in Erlang function semantically as independent entities with internal state. To me, it is more difficult to conceptualize observable entity state in the message passing model, especially the dependency network of derived state as described in the first post. Still, when concurrency gets involved, I find seeing entity invocation as message passing a better way of comprehending the system in a way that helps make it work correctly.
There are some patterns and ways of thinking that can mitigate the problems of concurrency with mutable state. Unlike in the single-threaded case, these tend to be more difficult to set up so that they would work automatically. Therefore more discipline and careful documentation of the used patterns is needed, since the programming environment cannot always enforce it for us.
One way to avoid the issues with other threads observing temporarily-broken invariants is to postpone internal state modification. Instead of directly modifying an entity’s state during function execution, use local variables to capture all the modified pieces of state, and only at the end of execution set the entity’s state. This also helps when an invocation depends on operations that can fail, since no actual modifications have yet been made so the execution can simply be abandoned instead of having to implement complex rollback logic.
This style of state update becomes interesting when the state to update is some form of collection, like an associative array. Fully copying a big structure for local modifications may be too inefficient (but it might not; remember to measure before assuming where your bottlenecks are). A good option would be to use a persistent data structure, which allows the modifications to be made locally in a natural way, swapping the original for the updated one at the end, but persistent data structures are not commonly used in many programming languages. The ultimate expression of this style would be to keep all (mutable) state of an entity in a single persistent data structure, which would eliminate any issues with breaking invariants, since the state would then only get modified all at once in a single operation.
When an entity gets more complex, it can be helpful to structure its internals as a state machine, with the state and allowed transitions defining which invocations are permissible. This can also help in concurrent and asynchronous situations. Namely, when the state machine implementation is thread-safe, the check on permitted state transitions can take care that conflicting operations are not executed, even when requested concurrently.
Surprisingly, I’ve found that automated tests can be helpful. In general, concurrency issues are non-deterministic, depending on specific timing conditions that won’t be reproduced exactly. What I do, when applicable, is to run a huge number of threads in the automated test, all hammering on the entity that is supposed to be thread-safe, and verify that the results are consistent with some correct execution. This is not always possible to do, usually the expected result is not as clear-cut as with a traditional test, and the test may not always catch failures. But if the expectations can be defined, I’ve found that running enough threads flushes out many problems reliably enough.
Dealing with mutable state in a concurrent program can get very difficult. Concurrency makes it ever more important to architect the program so that state mutation happens only in specific, controlled ways. Building the program directly on low-level concurrency abstractions, as often provided by popular programming platforms, will usually lead to trouble. Whenever possible, try to extract higher-level concepts into reusable components, such as task executors or operation queues. Be even stricter in controlling how state is mutated. And when it is not possible to ensure correct operation through language-enforced mechanisms, be very disciplined and document carefully what is being done and why.