What's Wrong With This Code (#10)

Tuesday, January 9, 2007

This time, Joe Developer thinks he found a bug in the .NET framework. Joe read that the volatile modifier is "usually used for a field that is accessed by multiple threads without using the lock Statement". Joe wants to try this out, and writes the following class.

class Worker
{
    
public void Start()
    {
        queue =
new Queue<string>();

        
Thread[] threads = new Thread[maxThreads];
        
for (int i = 0; i < threads.Length; i++)
            threads[i] =
new Thread(PopulateQueue);

        
Array.ForEach(threads, delegate(Thread t) { t.Start(); });
        
Array.ForEach(threads, delegate(Thread t) { t.Join(); });
        
        
Debug.Assert(queue.Count == maxThreads * maxIterations);
    }

    
void PopulateQueue()
    {
        
for (int i = 0; i < maxIterations; i++)
        {
            queue.Enqueue(
"foo");
        }
    }

    
volatile Queue<string> queue;

    
const int maxThreads = 5;
    
const int maxIterations = 1000000;
}

This code ran successfully at least a dozen times, then suddenly blew up with the following exception:

System.ArgumentException was unhandled
Message="Destination array was not long enough.
Check destIndex and length, and the array's lower bounds."
Source="mscorlib"
ParamName=""
StackTrace:
at System.Array.Copy ...
at System.Collections.Generic.Queue`1.SetCapacity ...
at System.Collections.Generic.Queue`1.Enqueue ...
at Worker.PopulateQueue ...
...

Joe thinks something has gone terribly wrong in the CLR, and for once, Joe would like to show his boss a problem in someone else's software. What should his boss think?


Comments
Haacked Tuesday, January 9, 2007
His boss should think that Joe is an idiot. ;)

I kid.

The volatile keyword just tells the compiler to not perform any optimizations such as reordering reads and writes.

It still doesn't prevent concurrency problems such as deadlocks and race conditions from happening.

For reference type, access to the reference is made volatile, but its internal access (such as the internal array encapsulated by the Queue) is not accessed in a volatile manner.
Patrick Tuesday, January 9, 2007
It looks a concurrency problem occuring in the internal implementation of the queue object. Looking at msdn we can see that queue is not thread safe unless the member is public (and in this case it isn't).
[From MSDN: Public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe.]

Possible solutions:

[a] lock the queue object at each enque (a lot of lock are used here but all workers can add simultaneously)
[b] lock the queu object in each worker for the whole for loop that enqueues elements (very few locks but only one worker at a time enqueues objects)

since there is no slow operation in the enqueue operation the quickest solution for this example should be [b]

[a] for (int i = 0; i < maxIterations; i++)
lock(queue.SyncRoot) {
queue.Enqueue("foo");
}

[b] lock(queue.SyncRoot) {
for (int i = 0; i < maxIterations; i++)
queue.Enqueue("foo");
}


McBerrs Tuesday, January 9, 2007
His boss should tell Joe to learn more about multithreading. The docs clearly say that Queue isn't thread safe in this context. Volatile isn't needed here since the variable is written only once, before the threads even start.

Multithreaded programming is really cool, because it's so difficult to understand all aspects of it. And the fact that multithreading is being pushed into the public now scares me...
Laurent :) Tuesday, January 9, 2007
Taking a look in msdn documentation about the Queue generic object learnt me that it is not thread safe when the collection is modified.

Furthermode, reading a little about using Volatile variable told me that it does not ensure
"magic" safe-threading. It simply change the way the compiler optimize it, knowing it wont be accessed by a single thread. It must be use "when single atomic operation can be performed to modify data member. If however this data member is a class, struct or array, accessing it with multiple thread would likely result in intermittent data corruption."

That's why I think the PopulateQueue function should contains a "lock" statement that enclose the for-next loop.

I love those "what wrong with code". Even if I am wrong often, I am learning a lot anyway, thank you very much !
zproxy Tuesday, January 9, 2007
No, you are wrong.

volatile: This ensures that the most up-to-date value is present in the field at all times.

What you did is you violated thread safety.

The Queue<T> is not thread safe.

See: msdn2.microsoft.com/en-us/library/7977ey2c.aspx
msdn2.microsoft.com/en-us/library/x13ttww7.aspx
Skup Tuesday, January 9, 2007
Joe did not understand what the documentation stated with ""usually used for a field that is accessed by multiple threads without using the lock Statement".


The volatil keyword just apply to the queue reference in this case. It means that the queue reference will not be cached to processor registers (by the optimizer), so that every read/write of this reference from different threads are effective.

But it does not induce that it provides any sync for the Queue object methods.

Without synchronization mechanism, you can expect this exception. It's not a bug, just read the doc better, joe.
Michel Tuesday, January 9, 2007
volatile is only valid for basic types.
Mike Tuesday, January 9, 2007
In lack of better words, this is taken from the MSDN Library:

"The use of volatile with the _shouldStop data member allows us to safely access this member from multiple threads without the use of formal thread synchronization techniques, but only because _shouldStop is a bool. This means that only single, atomic operations are necessary to modify _shouldStop. If, however, this data member were a class, struct, or array, accessing it from multiple threads would likely result in intermittent data corruption. Consider a thread that changes the values in an array. Windows regularly interrupts threads in order to allow other threads to execute, so this thread could be halted after assigning some array elements but before assigning others. This means the array now has a state that the programmer never intended, and another thread reading this array may fail as a result."
scott Tuesday, January 9, 2007
I think everyone nailed it pretty well.

The volatile modifier doesn't prevent a race condition in the queue. The queue was calling Array.Copy to "expand" into a bigger array and allow more items to fit inside. At the same time, another thread changed the queue by trying to insert a string, and the queue's internal state became corrupt.

A lock statement inside PopulateQueue would fix the problem by only permitting a single thread at a time to modify the queue.

Grim Tuesday, January 9, 2007
As stated by everyone else.

But, in this case it should also be pointed out that since it's a reference type that's been declared volatile, it's actually the reference that is volatile, not the queue object itself.
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!