MultiThreading Programming with C++11 - Part B
(Sharing Data - mutex, and race conditions, and deadlock)
The issue of sharing data between threads are mostly due to the consequences of modifying data.
If the data we share is read-only data, there will be no problem, because the data read by one thread is unaffected by whether or not another thread is reading the same data. However, once data is shared between threads, and one or more threads start modifying the data, that's the start of problems. In this case, we must take care to make sure that everything works out fine.
Some of the key words used in chapter are explained in Multi-Threaded Programming - Terminology, such as race Condition, invariants, mutex, and deadlock , etc.
Here is an example of three threads are attempting the same list, myList:
#include <iostream> #include <thread> #include <list> #include <algorithm> using namespace std; // a global variable std::list<int>myList; void addToList(int max, int interval) { for (int i = 0; i < max; i++) { if( (i % interval) == 0) myList.push_back(i); } } void printList() { for (auto itr = myList.begin(), end_itr = myList.end(); itr != end_itr; ++itr ) { cout << *itr << ","; } } int main() { int max = 100; std::thread t1(addToList, max, 1); std::thread t2(addToList, max, 10); std::thread t3(printList); t1.join(); t2.join(); t3.join(); return 0; }
One of the possible output looks like this:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29, 30,31,32,33,34,0,10,20,30,40,50,60,70,80,90,42,43,44,45,46,47,48,49,50,51,52,53, 54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80 ,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,
In the process of adding elements to the list, the thread t1 that puts every integer from 0 to 100 competes with another thread t2 that puts every 10th elements. Also, there is another competing thread, t3 that accessed the same list to print out the list. As we see the results, the list failed to adding elements in order. This is due to the three threads accessed the list at the same time. So, to get the right list, we need some protection mechanism such as mutex.
Though mutexes are probably the most widely used data-protection mechanism in C++, it's important to structure our code to protect the right data and avoid race conditions inherent in our interfaces. Mutexes also come with their own problems, in the form of a deadlock and protecting either too much or too little data.
In C++, we create a mutex by constructing an instance of std::mutex, lock it with a call to the member function lock(), and unlock it with a call to the member function unlock(). However, it is not a good practice to call the member functions directly, because this means that we have to remember to call unlock() on every code path out of a function, including those due to exceptions.
Instead, the Standard C++ Library provides the std::lock_guard class template, which implements that RAII idiom for a mutex. It locks the supplied mutex on construction and unlocks it on destruction, thus ensuring a locked mutex is always correctly unlocked.
The example below shows how to protect a list that can be accessed by multiple threads using a std::mutex, along with std::lock_guard. Both of these are declared in the <mutex> header.
#include <iostream> #include <thread> #include <list> #include <algorithm> #include <mutex> using namespace std; // a global variable std::list<int>myList; // a global instance of std::mutex to protect global variable std::mutex myMutex; void addToList(int max, int interval) { // the access to this function is mutually exclusive std::lock_guard<std::mutex> guard(myMutex); for (int i = 0; i < max; i++) { if( (i % interval) == 0) myList.push_back(i); } } void printList() { // the access to this function is mutually exclusive std::lock_guard<std::mutex> guard(myMutex); for (auto itr = myList.begin(), end_itr = myList.end(); itr != end_itr; ++itr ) { cout << *itr << ","; } } int main() { int max = 100; std::thread t1(addToList, max, 1); std::thread t2(addToList, max, 10); std::thread t3(printList); t1.join(); t2.join(); t3.join(); return 0; }
The new output shows that we got the right process working on the list.
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29, 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56 ,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,8 3,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,0,10,20,30,40,50,60,70,80,90,
Protecting data with a mutex is not quite as easy as just putting a std::lock_guard object in every member function. In other words, one stray pointer or reference makes all that protection for nothing. At one level, checking for stray pointers or references is easy, as long as none of the member functions return a pointer or a reference to the protected data to the caller either via a return value or via an out parameter, the data is safe.
However, it's not that straightforward. As well as checking that the member functions that don't pass out pointers or references to their callers, it's also important to check that they don't pass such pointers or references in to functions they call that aren't under our control. This is just as dangerous because those functions might store the pointer or reference in a place where it can later be used without the protection of the mutex. Particularly dangerous in this regard are functions that are supplied at runtime via a function argument or other means. The next code example demonstrates the code accidentally passing out a reference to protected data, which is a common mistake when trying to use mutexes to protect shared data:#include <iostream> #include <mutex> #include <string> #include <thread> using namespace std; class MyData { int i; string s; public: void doTask(int ii, string ss) { i = ii; s = ss; cout << " i = " << i << " s = " << s << endl; }; }; class DataWrapper { private: MyData data; std::mutex m; public: template<typename Function> void processData(Function userFunc) { std::lock_guard<std::mutex> lock(m); // passing protected data to user-supplied function userFunc(data); } }; void goodFunction(MyData& protectedData) { protectedData.doTask(1, "my string"); } MyData* unprotected; void badFunction(MyData& protectedData) { unprotected = &protectedData; } DataWrapper w; void foo() { w.processData(goodFunction); // passing in a bad function w.processData(badFunction); // unprotected access to protected data unprotected->doTask(99, "hacked string"); } int main() { std::thread t(foo); t.join(); return 0; }
The output should look something like this:
i = 1 s = my string i = 99 s = hacked string
Tthe code in processData() looks harmless enough, protected with std::lock_guard, but the call to the user-supplied function userFunc means that foo() can pass in badFunction() to bypass the protection and then call do_task() without the mutex being locked.
The colde violates the rule:
"Don't pass pointers and references to protected data outside the scope of the lock, whether by
returning them from a function, storing them in externally visible memory, or passing them as
arguments to user-supplied functions."
Just because we're using a mutex or other mechanisms to protect shared data, we're not necessarily protected from race conditions. We still have to ensure that the data should be protected properly.
Suppose we need a code to handle a doubly linked list.
In order for a thread to safely delete a node for a doubly linked list,
we need to ensure that we're preventing concurrent accesses to three nodes:
the node being deleted and the nodes on either side. If we
protect accesses to the pointers of each node individually, we'd be no better off
than with code that used no mutexes, because the race condition could still happen.
It's not the individual nodes that need protecting for the individual steps but the
whole data structure, for the whole delete operation. The easiest solution in this case
is to have a single mutex that protects the entire list.
Just because individual operations on the list are safe, we're not out of the woods
yet. We can still get race conditions, even with a really simple interface. Consider a
stack data structure like the std::stack container adapter shown in the example below.
Aside from the constructors and swap(),
there are only five things we can do to a std::stack:
- push() a new element onto the stack
- pop() an element off the stack
- read the top() element
- check whether it's empty()
- and read the number of elements-the size() of the stack.
If we change top() so that it returns a copy rather than a reference and protect the internal data with a mutex, this interface is still inherently subject to race conditions. This problem is not unique to a mutex-based implementation; it's an interface problem, so the race conditions would still occur with a lock-free implementation.
#include <mutex> #include <deque> using namespace std; template<typename T, typename Container = std::deque<T> > class stack { public: explicit stack(const Container&); explicit stack(Container&& = Container()); template <typename Alloc> explicit stack(const Alloc&); template <typename Alloc> stack(const Container&, const Alloc&); template <typename Alloc> stack(Container&&, const Alloc&); template <typename Alloc> stack(stack&&, const Alloc&); // not reliable bool empty() const; // not reliable size_t size() const; T& top(); T const& top() const; void push(T const&); void push(T&&); void pop(); void swap(stack&&); };
The problem here is that the results of empty() and size() can't be relied on. Although they might be correct at the time of the call, once they've returned, other threads are free to access the stack and might push() new elements onto or pop() the existing ones off of the stack before the thread that called empty() or size() could use that information.
In particular, if the stack instance is not shared, it's safe to check for empty() and then call top() to access the top element if the stack is not empty, as the code below:
stacks; if(!s.empty()) { int const value=s.top(); s.pop(); do_task(value); }
Note that calling top() on an empty stack may show undefined behavior. With a shared stack object, this call sequence (empty()->top()->pop()) is no longer safe, because there might be a call to pop() from another thread that removes the last element in between the call to empty() and the call to top().
So, this is a classic race condition, and the use of a mutex internally to protect the stack contents doesn't prevent it. That because it's a consequence of the interface rather than a consequence of not protecting underlying data properly.
What's the solution?
Well, this problem happens as a consequence of the design
of the interface, so the solution is to change the interface.
What changes need to be made?
In the simplest case, we could just
declare that top() will throw an exception if there aren't any elements in the stack
when it's called. Though this directly addresses this issue, it makes for more cumbersome
programming, because now we need to be able to catch an exception, even if
the call to empty() returned false. This essentially makes the call to empty() completely
redundant.
If we look closely at the previous snippet, there's also potential for another race condition but this time between the call to top() and the call to pop(). Consider two threads running the previous snippet of code and both referencing the same stack object, s. This isn't an unusual situation; when using threads for performance, it's quite common to have several threads running the same code on different data, and a shared stack object is ideal for dividing work between them. Suppose that initially the stack has two elements, so we don't have to worry about the race between empty() and top() on either thread, and consider the potential execution patterns.
If the stack is protected by a mutex internally, only one thread can be running a stack member function at any one time, so the calls get nicely interleaved, while the calls to do_task() can run concurrently. One possible execution is shown in table below.
Thread 1 | Thread 2 |
---|---|
if(!s.empty()) | |
if(!s.empty()) | |
int const value=s.top(); | |
int const value=s.top(); | |
s.pop(); | |
do_task(value); | s.pop(); |
do_task(value); |
As we can see, if these are all the threads running, there's nothing in between the two calls to top() to modify the stack, so both threads will see the same value. Not only that, but there are no calls to top() between the calls to pop(). Consequently, one of the two values on the stack is discarded without ever having been read, whereas the other is processed twice. This is yet another race condition and far more insidious than the undefined behavior of the empty()/top() race; there's never anything obviously wrong going on, and the consequences of the bug are likely far removed from the cause, although they obviously depend on exactly what do_task() really does.
This calls for a more radical change to the interface, one that combines the calls to top() and pop() under the protection of the mutex. A combined call can lead to issues if the copy constructor for the objects on the stack can throw an exception. This problem was dealt with fairly comprehensively from an exception-safety point of view by Herb Sutter but the potential for race conditions brings something new to the mix.
For those of us who aren't aware of the issue, consider a stack<vector<int>>. Now, a vector is a dynamically sized container, so when we copy a vector the library has to allocate some more memory from the heap in order to copy the contents. If the system is heavily loaded, or there are significant resource constraints, this memory allocation can fail, so the copy constructor for vector might throw a std::bad_alloc exception. This is especially likely if the vector contains a lot of elements. If the pop() function was defined to return the value popped, as well as remove it from the stack, we have a potential problem: the value being popped is returned to the caller only after the stack has been modified, but the process of copying the data to return to the caller might throw an exception. If this happens, the data just popped is lost; it has been removed from the stack, but the copy was unsuccessful!
The designers of the std::stack interface helpfully split the operation in two:
- get the top element (top())
- remove it from the stack (pop())
So, if we can't safely copy the data, it stays on the stack. If the problem was lack of heap memory, maybe the application can free some memory and try again.
Unfortunately, it's precisely this split that we're trying to avoid in eliminating the race condition!
However, we have 4 options though they aren't without cost.
The first option is to pass a reference to a variable in which we want to receive the popped value as an argument in the call to pop():
vector<int> v; myStack.pop(v);
But this option has the distinct disadvantage that
- tt requires the calling code to construct an instance of the stack's value type (in our case, it is <int>) prior to the call, in order to pass this in as the target. For some types, this is impractical, because constructing an instance is expensive in terms of time or resources. For other types this isn't always possible, because the constructors require parameters (int our case, it is typename "Alloc") that aren't necessarily available at this point in the code.
- it requires that the stored type is assignable. This is an important restriction: many user-defined types do not support assignment, though they may support move construction or even copy construction, and thus allow return by value.
There's only an exception safety problem with a value-returning pop() if the return by value can throw an exception. Many types have copy constructors that don't throw exceptions, and with the new rvalue-reference support in the C++ Standard, many more types will have a move constructor that doesn't throw exceptions, even if their copy constructor does. One valid option is to restrict the use of our thread-safe stack to those types that can safely be returned by value without throwing an exception.
Although this is safe, it's not ideal. Even though we can detect at compile time the existence of a copy or move constructor that doesn't throw an exception using the std::is_nothrow_copy_constructible and std::is_nothrow_move_constructible type traits, it's quite limiting. There are many more user-defined types with copy constructors that can throw and don't have move constructors than there are types with copy and/or move constructors that can't throw although this might change as people get used to the rvalue-reference support in C++11. It would be unfortunate if such types couldn't be stored in our thread-safe stack.
The third option is to return a pointer to the popped item rather than return the item by value. The advantage here is that pointers can be freely copied without throwing an exception, so we've avoided Cargill's exception problem.
The disadvantage is that returning a pointer requires a means of managing the memory allocated to the object, and for simple types such as integers, the overhead of such memory management can exceed the cost of just returning the type by value. For any interface that uses this option, std::shared_ptr would be a good choice of pointer type; not only does it avoid memory leaks, because the object is destroyed once the last pointer is destroyed, but the library is in full control of the memory allocation scheme and doesn't have to use new and delete. This can be important for optimization purposes: requiring that each object in the stack be allocated separately with new would impose quite an overhead compared to the original non-thread-safe version.
option1 and (option2 or option3) - If we've chosen option 2 or 3, it's relatively easy to provide option 1 as well, and this provides users of our code the ability to choose whichever option is most appropriate for them for very little additional cost.
The code below shows the class definition for a stack with no race conditions in the interface and that implements options 1 and 3: there are two overloads of pop(), one that takes a reference to a location in which to store the value and one that returns a std::shared_ ptr<>. It has a simple interface, with only two functions: push() and pop().
#include <exception> #include <memory> // std::shared_ptr<> struct empty_stack: std::exception { const char* what() const throw(); }; template<typename T> class threadsafeStack { public: threadsafeStack(); threadsafeStack(const threadsafeStack&); threadsafeStack& operator=(const threadsafeStack&) = delete; void push(T new_value); std::shared_ptr<T> pop(); void pop(T& value); bool empty() const; };
By paring down the interface we allow for maximum safety; even operations on the whole stack are restricted. The stack itself can't be assigned, because the assignment operator is deleted, and there's no swap() function.
It can, however, be copied, assuming the stack elements can be copied. The pop() functions throw an empty_stack exception if the stack is empty, so everything still works even if the stack is modified after a call to empty(). As mentioned in the description of option 3, the use of std::shared_ptr allows the stack to take care of the memory-allocation issues and avoid excessive calls to new and delete if desired.
Our five stack operations have now become three: push(), pop(), and empty(). Even empty() is superfluous. This simplification of the interface allows for better control over the data; we can ensure that the mutex is locked for the entirety of an operation. The following listing shows a simple implementation that's a wrapper around std::stack<>.
#include <exception> #include <memory> #include <mutex> #include <stack> struct empty_stack: std::exception { const char* what() const throw(); }; template<typename T> class threadsafeStack { private: std::stack<T> data; mutable std::mutex m; public: threadsafeStack(){} threadsafeStack(const threadsafeStack& other) { std::lock_guard<std::mutex> lock(other.m); // copy in constructor body rather than // the member initializer list // in order to ensure that the mutex is held across the copy. data=other.data; } threadsafeStack& operator=(const threadsafeStack&) = delete; void push(T new_value) { std::lock_guard<std::mutex> lock(m); data.push(new_value); } std::shared_ptr<T> pop() { std::lock_guard<std::mutex> lock(m); // check for empty before trying to pop value if(data.empty()) throw empty_stack(); // allocate return value before modifying stack std::shared_ptr<T> const res(std::make_shared<T>(data.top())); data.pop(); return res; } void pop(T& value) { std::lock_guard<std::mutex> lock(m); if(data.empty()) throw empty_stack(); value=data.top(); data.pop(); } bool empty() const { std::lock_guard<std::mutex> lock(m); return data.empty(); } };
This stack implementation is actually copyable-the copy constructor locks the mutex in the source object and then copies the internal stack. We do the copy in the constructor body rather than the member initializer list in order to ensure that the mutex is held across the copy.
As the discussion of top() and pop() shows, problematic race conditions in interfaces essentially arise because of locking at too small a granularity; the protection doesn't cover the entirety of the desired operation.
Problems with mutexes can also arise from locking at too large a granularity; the extreme situation is a single global mutex that protects all shared data. In a system where there's a significant amount of shared data, this can eliminate any performance benefits of concurrency, because the threads are forced to run one at a time, even when they're accessing different bits of data. The first versions of the Linux kernel that were designed to handle multiprocessor systems used a single global kernel lock. Although this worked, it meant that a two-processor system typically had much worse performance than two single-processor systems, and performance on a four-processor system was nowhere near that of four single-processor systems. There was too much contention for the kernel, so the threads running on the additional processors were unable to perform useful work. Later revisions of the Linux kernel have moved to a more fine-grained locking scheme, so the performance of a four-processor system is much nearer the ideal of four times that of a single-processor system, because there's far less contention.
One issue with fine-grained locking schemes is that sometimes we need more than one mutex locked in order to protect all the data in an operation. As described previously, sometimes the right thing to do is increase the granularity of the data covered by the mutexes, so that only one mutex needs to be locked. However, sometimes that's undesirable, such as when the mutexes are protecting separate instances of a class. In this case, locking at the next level up would mean either leaving the locking to the user or having a single mutex that protected all instances of that class, neither of which is particularly desirable.
If we end up having to lock two or more mutexes for a given operation, there's another potential problem lurking in the wings: deadlock. This is almost the opposite of a race condition: rather than two threads racing to be first, each one is waiting for the other, so neither makes any progress.
Deadlock is the biggest problem with having to lock two or more mutexes in order to perform an operation.
One of the most common ways of avoiding deadlock is to always lock the two mutexes in the same order: if we always lock mutex A before mutex B, then we'll never deadlock. Sometimes this is straightforward, because the mutexes are serving different purposes, but other times it's not so simple, such as when the mutexes are each protecting a separate instance of the same class.
As an example, let's think about an operation that exchanges data between two instances of the same class. In order to ensure that the data is exchanged correctly, without being affected by concurrent modifications, the mutexes on both instances must be locked. However, if a fixed order is chosen such as, the mutex for the instance supplied as the first parameter, then the mutex for the instance supplied as the second parameter. This can backfire, however, all it takes is for two threads to try to exchange data between the same two instances with the parameters swapped, and we have deadlock!
However, the C++ Standard Library has a cure for this in the form of std::lock which is a function that can lock two or more mutexes at once without risk of deadlock.
The example below shows how to use std::lock for a simple swap operation.
#include <mutex> using namespace std; class MyObjectClass {}; void swap(MyObjectClass& lhs,MyObjectClass& rhs); class X { private: MyObjectClass myObj; std::mutex m; public: X(MyObjectClass const& obj):myObj(obj){} friend void swap(X& lhs, X& rhs) { // the arguments are checked to ensure they are different instances, // because attempting to acquire a lock on a std::mutex // when we already hold it is undefined behavior. if(&lhs;==&rhs;) return; // the call to std::lock() locks the two mutexes std::lock(lhs.m,rhs.m); // two std::lock_guard instances are constructed one for each mutex. std::lock_guard<std::mutex> lock_a(lhs.m,std::adopt_lock); std::lock_guard<std::mutex> lock_b(rhs.m,std::adopt_lock); swap(lhs.myObj, rhs.myObj); } };
The code checks the arguments to ensure they are different instances, because attempting to acquire a lock on a std::mutex when we already hold it is undefined behavior. A mutex that does permit multiple locks by the same thread is provided in the form of std::recursive_mutex. Then, the call to std::lock() locks the two mutexes, and two std::lock_guard instances are constructed one for each mutex. The std::adopt_lock parameter is supplied in addition to the mutex to indicate to the std::lock_guard objects that the mutexes are already locked, and they should just adopt the ownership of the existing lock on the mutex rather than attempt to lock the mutex in the constructor.
This ensures that the mutexes are correctly unlocked on function exit in the general case where the protected operation might throw an exception; it also allows for a simple return. Also, it's worth noting that locking either lhs.m or rhs.m inside the call to std::lock can throw an exception. in that case, the exception is propagated out of std::lock. If std::lock has successfully acquired a lock on one mutex and an exception is thrown when it tries to acquire a lock on the other mutex, this first lock is released automatically: std::lock provides all-or-nothing semantics with regard to locking the supplied mutexes.
Although std::lock can help us to avoid deadlock in those cases where we need to acquire two or more locks together, it doesn't help if they're acquired separately. In that case we have to rely on our discipline as developers to ensure we don't get deadlock. This isn't easy: deadlocks are one of the nastiest problems to encounter in multithreaded code and are often unpredictable, with everything working fine the majority of the time. There are, however, some relatively simple rules that can help us to write deadlock-free code.
Though locks are the most frequent cause of deadlock, it does not just occur with locks. We can create deadlock with two threads and no locks just by having each thread call join() on the std::thread object for the other. In this case, neither thread can make progress because it's waiting for the other to finish.
This simple cycle can occur anywhere that a thread can wait for another thread to perform some action if the other thread can simultaneously be waiting for the first thread, and it isn't limited to two threads: a cycle of three or more threads will still cause deadlock.
The guidelines for avoiding deadlock all boil down to one idea: don't wait for another thread if there's a chance it's waiting for you. The individual guidelines provide ways of identifying and eliminating the possibility that the other thread is waiting for you.
The first idea is the simplest: don't acquire a lock if we already hold one. If we stick to this guideline, it's impossible to get a deadlock from the lock usage alone because each thread only ever holds a single lock. We could still get deadlock from other things such as the threads waiting for each other. However, mutex locks are probably the most common cause of deadlock. If we need to acquire multiple locks, do it as a single action with std::lock in order to acquire them without deadlock.
In case we must acquire two or more locks, and we can't acquire them as a single operation with std::lock, we should acquire them in the same order in every thread. The key is to define the order in a way that's consistent between threads.
In some cases, this is relatively easy. For example, look at the stack from Race Condition: Option4-the mutex is internal to each stack instance, but the operations on the data items stored in a stack require calling user-supplied code. We can, however, add the constraint that none of the operations on the data items stored in the stack should perform any operation on the stack itself. This puts the burden on the user of the stack, but it's rather uncommon for the data stored in a container to access that container, and it's quite apparent when this is happening, so it's not a particularly difficult burden to carry.
In other cases, this isn't so straightforward, as was the swap operation in deadlock. At least in that case we could lock the mutexes simultaneously, but that's not always possible. If we look back at the linked list example, we'll see that one possibility for protecting the list is to have a mutex per node. Then, in order to access the list, threads must acquire a lock on every node they're interested in.
For a thread to delete an item, it must then acquire the lock on three nodes: the node being deleted and the nodes on either side, because they're all being modified in some way. Likewise, to traverse the list a thread must keep hold of the lock on the current node while it acquires the lock on the next one in the sequence, in order to ensure that the next pointer isn't modified in the meantime. Once the lock on the next node has been acquired, the lock on the first can be released because it's no longer necessary.
This hand-over-hand locking style allows multiple threads to access the list, provided each is accessing a different node. However, in order to prevent deadlock, the nodes must always be locked in the same order: if two threads tried to traverse the list in reverse order using hand-over-hand locking, they could deadlock with each other in the middle of the list. If nodes A and B are adjacent in the list, the thread going one way will try to hold the lock on node A and try to acquire the lock on node B. A thread going the other way would be holding the lock on node B and trying to acquire the lock on node A-a classic scenario for deadlock.
When deleting node B that lies between nodes A and C, if that thread acquires the lock on B before the locks on A and C, it has the potential to deadlock with a thread traversing the list. Depending on the direction of traversal, such a thread would try to lock either A or C first but would then find that it couldn't obtain a lock on B because the thread doing the deleting was holding the lock on B and trying to acquire the locks on A and C.
One way to prevent deadlock here is to define an order of traversal, so a thread must always lock A before B and B before C. This would eliminate the possibility of deadlock at the expense of disallowing reverse traversal. Similar conventions can often be established for other data structures.
If the code is user supplied, we have no idea what it could do; it could do anything, including acquiring a lock. If we call user-supplied code while holding a lock, and that code acquires a lock, we've violated the guideline on avoiding nested locks and could get deadlock. Sometimes this is unavoidable; if we're writing generic code such as the stack in previous sample, every operation on the parameter type or types is user-supplied code. In this case, we need a new guideline.
Although this is really a particular case of defining lock ordering, a lock hierarchy can provide a means of checking that the convention is adhered to at runtime. The idea is that we divide our application into layers and identify all the mutexes that may be locked in any given layer. When code tries to lock a mutex, it isn't permitted to lock that mutex if it already holds a lock from a lower layer. We can check this at runtime by assigning layer numbers to each mutex and keeping a record of which mutexes are locked by each thread.
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization