Shared States in Method Signatures

Paulo Zemek

Rate me:

4.67/5 (4 votes)

6 Jan 2015CPOL13 min read

6.4K

Shared states in method signatures

Thread-Safety Guarantees

When I finished the post, Constrained C# (also see the Part 2), I was talking about thread-safety. For the Constrained C#, I was proposing an idea of SynchronizationRequired and I know that idea is not perfect. Yet, I really don't know if we can keep the spirit of C# with the ideas I am going to give now. So, these ideas are only a reflection about possible solutions, as I don't believe they will be immediately put into practice even if Constrained C# is made.

Immutability

Immutability is frequently said to be the solution to multi-threading issues. Immutable objects are naturally accessible by many different threads without problems, after all, they aren't going to change, so no locks are required, there's no risk of dead-locks, state corruption, etc.

This is all good except for two details:

Many "immutable objects" are actually referenced by mutable fields somewhere else. For example, you don't change an immutable list, you create a new instance that has the new value, yet some variable that was referencing the old instance may be mutated to reference the new instance (and if two threads are adding items to that "variable", some kind of synchronization is still needed);
The creation of an immutable instance is still a memory mutation, so CPU caches must be correctly flushed to the main memory when the creation finishes in one CPU if the object can be used by other threads. Then the main memory must be read when another CPU sees a reference to the new instance for the first time. Such synchronization can be done under-the-hood and in some processors is not required, but the newest and faster processors are the ones that suffer the most with such a synchronization. If we never know if a new instance is going to be used by another CPU, it is better to flush the CPU caches (put a write barrier) at every write. If a reference we just read is new to the current CPU (or if we have no idea if it is new) we must clear the CPU caches (put a read barrier). If we don't do this, we risk reading a new reference for the first time and accessing garbage either because this CPU already had old data cached or because the other CPU didn't flush the instance contents to the main memory yet.

So, in my opinion, immutability is not the final solution. It is certainly a solution to some of the problems, but other solutions are still welcome. And that's what I will try to explore.

Understanding the Problems

Those who embrace immutability can say that the problems I just presented only exist because I was mixing mutable and immutable objects. If all objects are immutable, then the only way for another thread to see a new object is if we call a specific function to send this new object to another thread. So it becomes the responsibility of that function to correctly write data to the main memory, as well as the responsibility of the receiving function to correctly read data from the main memory. There's no direct field manipulation making an instance accessible to other processors and that extra memory synchronization can be ignored because the object is not going to change at all.

I am not trying to convince anyone to use immutable objects. Actually, what I see is that shared states aren't a problem. Mutable objects aren't a problem either. The combination of both becomes a problem. By using everything as immutable, we solve the problem because we avoid that bad combination. By not sharing any object with another thread, we also solve the problem, as each thread can do its own job and mutate objects without interfering with the job of the other threads.

But the reality is that we do need to combine both at some moment, for many different reasons. Memory utilization and performance being the most common ones. So, why not make a clear distinction between shared states and unshared states and between mutable and immutable objects?

Example of a Possible Solution

What I am going to present is not tested at all. It is simply a verbose version of what it could look like. So, take a look at some random method signatures (consider them to be all static methods for this example):

void RemoveDuplicates<T>(required nonstored mutable List<T> list);
required long Factorial(required int value) pure;
unshared mutable required T Create<T>();
optional immutable string Concat(optional nonstored immutable string a, 
optional nonstored immutable string b);
void AddToCache<T>(required stored readlockable T item);
void AddToLocalCache<T>(required stored threadowned T item);

I know, I used many unknown keywords and the signatures are much bigger than we are used to see in C#. Actually, the signatures could be smaller if I opted for some automatic behavior (like required and immutable by default) or if I used some symbols (like ? for optional variables), yet I wanted that anyone could get an idea of what's happening without a previous knowledge of automatic behavior or explanations of the symbols, as I want to avoid losing focus on what I really want to talk about.

So, the special keywords I use have the following meanings:

required: Parameters with the required modifier will never receive null. Note that this is not the same as putting an if inside the method, as null constants should generate compile-time errors and the validations, if needed, are done by the caller, not by the callee, and it must be compiler enforced;
optional: Parameters with the optional modifier can receive null. The method body is required to test for null before using such a value or before passing it as a parameter to another method that uses the variable as required, yet it can pass it freely to another method that also receives optional values.

nonstored: The received input parameter will not be stored for future use by this method, meaning that such object can be modified without problems just after the call returns;
stored: The received input parameter can be stored for future use. This means that both the method being called and the caller must agree how to use it in the future to avoid problems. This is why we have those readlockable and threadowned modifiers. In fact, the full list is:
- threadowned: Only a single thread will own this object. To be able to call a method that receives a threadowned variable, the caller must either become the owner of the object (if it was not owned yet) or must verify that it is on the right thread. To be stored, a field of the same type (threadowned) must be used and every method that uses the object must verify the thread-ownership if it wasn't already done before using the object;
- readlockable: To be able to use this object, it must be locked for reading. The object may be writeable, but the method receiving it is forbidden from using the write methods or to do an write-lock;
- writelockable: To be able to use this object, it must locked. The method is free to use write-lock as well as to only use a read-lock;
- monitorlockable: To be able to use this object, a Monitor lock should be used (that is always a full lock);
- immutable: The received parameter will be treated as immutable, both the caller and the callee will never modify the object, even if its type is mutable. To be able to pass a parameter as immutable to another method, the variable must be either immutable or unshared. If the variable is of some kind of shared/stored already, it can't be seen as immutable anymore.

pure: This is a special situation very common in functional languages. Not only all the received parameters will be treated as read-only, but no "side-effects" will be caused. This means that the method is not going to write to a file, update static variables or anything else and calling the method thousands of times with a given set of input values will always generate the same result, so caching the results is a valid optimization as there's no risk of having different results from doing it.
unshared: All new objects without a constructor are unshared by default (and may be unshared with a constructor as long as the constructor follows the rule). As long unshared objects are only given as input parameters to methods that don't store them and kept in a single local variable, they are kept as unshared. An object in this situation doesn't require any kind of thread-validations, locking or anything else and can become immutable, thread-checked, locked-shared, etc. by simply giving them to a method that require it to work that way or by assigning them to a variable of that kind. An unshared object can be passed to another method as unshared, meaning that the receiving method gets the ownership of the object and the caller simply can't use that object anymore (the variable becomes unassigned).

As an extra bit of information, as structs are always copied when calling methods, there's no need to use any sharing modifier. The data will be a copy and if the struct contains references, the struct fields will need to have the right sharing information.

Understanding the Solution

Actually, the required and optional modifiers are only presented in the signatures because I don't want to lose what I presented in the previous posts, yet they are useless for the purpose of this post.

The solution is all about knowing what is shared and what's not. As long as we don't store an object in static variables or other objects' fields, it can stay as unshared. This means that the compiler can avoid any under-the-hood synchronization and developers can avoid any explicit synchronization because the object is guaranteed to be only seen by a single thread. Any changes made to the object will only happen when we set fields directly or when calling methods that clearly want to modify the object.

In current C#, we never know if an object given to a method is going to be stored in another object, in a static variable or shared with other threads. We can usually figure that out by the purpose of the method and, ironically, this rarely happens by accident in unmanaged languages as storing a pointer to an object doesn't guarantee it will be alive any moment in the future, so some kind of "ownership" or "reference counting" is done to avoid the problem. In managed languages, any parameter given as input can be stored and used later (even if it shouldn't) and nothing in the C# signature tells if that's going to happen or not.

The proposed signature looks ugly, I know, but that's because I made it much more verbose than needed. The purpose is that shared and unshared objects are known from the signature, even when using interfaces.

I could declare a method in an interface telling that it returns an immutable array. When implementing it, I would probably write a method that creates a mutable array, populates it and then returns it as immutable without doing any intermediary copies. By default, a new array will be mutable and unshared, so the method could set many of its indexes with real data or referencing other objects (in this case the other objects will become shared, not the array). Returning the array as immutable doesn't require new allocations, as the only thing needed is the guarantee that no-one else will ever try to change the array.

Considering the compiler will never break its own rules, an object received as immutable can never be used as mutable, independently if the type itself is mutable or not. This is similar to the const modifier put into methods and variable declarations in C++, but without any unsafe cast that can change the constness of an instance and without the possibility of having a non-const and a const variable at the same time. That is, at the first moment an object is seen as immutable, all the variables that can reference it are limited to see the object as immutable. As only an unshared object can become immutable, there's no risk of having an object seen as immutable by one variable when there are other variables still seeing it as mutable.

When an object needs to be stored or shared, it is the first moment in which it is stored or passed as a storable parameter that sets its sharing mode. An unshared local variable used to fill a threadowned method parameter will set the owner as the current thread, so the local variable will not be considered unshared anymore and will not have the possibility to be used in calls that require an unshared or immutable variable anymore.

A method can actually request an unshared variable. In this case, the method being invoked is considered to become the single owner of the value and the variable used to pass that parameter (if any) will be considered unassigned after the call.

It is important to note that static fields can only be immutable or of some kind of lockable. They can't be considered unshared and making a static field threadowned is non-sense to me. Obviously, a thread-static variable can be set as thread-owned and the thread-check is only needed when assigning the variable, not on every read, as the variable is exclusive for the thread.

What's Missing

At this point, the weakest point of this solution are the possible dead-locks. It is already guaranteed that we will not lose performance by excessively synchronizing memory and we will never read partial-states by forgetting to lock. Yet any multi-thread-shared mutable variable will require a lock-type. If both object A and B are shared and they reference each-other, it is possible that while accessing a method of A, a lock over B is needed and when accessing a method of B, a lock over A is needed. This is actually a source of deadlocks.

I really don't have any idea on how to completely avoid that situation. One of the ideas is that inner fields of objects can simply "share the parents lock" (that would probably require another keyword). By a similar idea, when making a variable shared, it could be possible to tell which object to use as a lock, so many different objects could share the same lock to avoid problems.

Yet, fine grained locks may be really useful when we have many shared objects and I really don't see a 100% guaranteed kind of static-analysis that will forbid developers from entering in deadlock situations.

Conclusion

I believe that it is possible to create a language with these traits in a less verbose manner, making it much more useable. If such a language doesn't target IL that already does excessive work under-the-hood, the performance can be actually better because most of the unnecessary memory synchronizations will be gone (and they happen a lot today).

Performance aside, these traits will be useful even if we target the .NET IL as they could help reduce errors related to shared states and state corruption. It will not help with deadlocks, yet it will not increase the chance of deadlocks happening. So, it is a win situation, even if it is not perfect yet.

CodeProject

This article was originally posted at http://blogs.msdn.com/b/paulozemek/archive/2015/01/05/shared-states-in-method-signatures.aspx

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Paulo Zemek

Software Developer (Senior) Microsoft

United States

I started to program computers when I was 11 years old, as a hobbyist, programming in AMOS Basic and Blitz Basic for Amiga.
At 12 I had my first try with assembler, but it was too difficult at the time. Then, in the same year, I learned C and, after learning C, I was finally able to learn assembler (for Motorola 680x0).
Not sure, but probably between 12 and 13, I started to learn C++. I always programmed "in an object oriented way", but using function pointers instead of virtual methods.

At 15 I started to learn Pascal at school and to use Delphi. At 16 I started my first internship (using Delphi). At 18 I started to work professionally using C++ and since then I've developed my programming skills as a professional developer in C++ and C#, generally creating libraries that help other developers do their work easier, faster and with less errors.

Want more info or simply want to contact me?
Take a look at: http://paulozemek.azurewebsites.net/
Or e-mail me at: paulozemek@outlook.com

Codeproject MVP 2012, 2015 & 2016
Microsoft MVP 2013-2014 (in October 2014 I started working at Microsoft, so I can't be a Microsoft MVP anymore).