Language Lawyers (and Nasal Demons)

Friday, July 30, 2004 by scott

As soon as a development language has an official specification behind it, the first language lawyer appears. If you’ve never seen one in action, they are the type who can rattle off section and paragraph numbers from a specification while explaining the bizarre behavior of some code snippet. You might see them write something along the lines of:

The values of a and b are automatically promoted back to int [6.2.1.1/ 3.2.1.1], which is then the type of the result of the / operator [6.2.1.5/3.2.1.5]. Since -32768 is exactly divisible by -1, and the result of 32768 is representable in the type int, the division must yield this value.

Nearly every language with a specification will describe situations with “undefined behavior”. Undefined behavior means there is something so horribly screwed up in the code, or in the data, that the program is probably going to crash spectacularly. Except specifications never use the word ‘crash’. Many of them, including the C# spec, will use the more dignified phrase: “unpredictable results”.

Now language lawyers, not really being lawyers, have a sense of humor. Since the phrase “unpredictable results” leaves the outcome so open-ended, they will say undefined behavior could, in fact, lead to demons flying out of your nostrils.

The first apparent reference to demons and noses came in a 1992 post to the comp.std.c newsgroup. The post contains many paragraph and section number references to the C language specification, but ends with this little gem:

In short, you can't use sizeof() on a structure whose elements haven't been defined, and if you do, demons may fly out of your nose.

The phrase caught on.

In the mid 90’s I was maintaining a C code base intended for both an 8 bit Hitachi chip and the PC, so I was interested in writing portable ANSI C programs and followed the comp.std.c and comp.lang.c newsgroups. I don’t know how many times someone would come along and post code which started like this:

void main (void)

As soon as the above line appeared the language lawyers appeared in force. You could look at the code, and see it was only one semi-colon away from curing world hunger, but it didn’t matter, because the response was invariably something like:

The Standard says that declaring main as returning void invokes undefined behavior. Beware the nasal demons.

Although sometimes the lawyers would make other amusing comments:

if you write:
void main (void) 
it's *possible* (under the rules of C) that your computer will pogo round the room whistling the star spangled banner. It's very unlikely, but C isn't interested in "likely".


Whenever I was having a really bad day, I’d write a program with main returning void just to see what would happen. I tried it with the Microsoft C compiler, the Borland C compiler, the Intel C compiler, and the now defunct Boston Systems Office C compiler. I was hoping to create a horrible, slimy demon who I could train to stand at my office door and terrify people trying to enter with requirements documentation in hand. Not only did I never get a demon, the programs always worked without any errors. Shows you what the language lawyers know. 
In any case, I’m hoping to find some good examples of undefined behavior in C# or VB so I can try to invoke nasal demons from .NET. I imagine it will be much harder to do with a managed runtime in place, but if you know of any opportunities, please let me know.

Host Header Tips

Thursday, July 29, 2004 by otcnews

Bernard Cheah answers questions about using host headers in IIS.

Static Constructors Are Not Miracle Workers

Thursday, July 29, 2004 by scott

The CLR guarantees a type initializer (static constructor) will execute only once for any given type. This simplifies my life, because I don’t need to worry about multiple threads inside of a static constructor. Take for example the following program:

static void Main(string[] args)
{
   Thread threadA = new Thread(new ThreadStart(TouchFoo));
   threadA.Name = "Thread A";
 
   Thread threadB = new Thread(new ThreadStart(TouchFoo));
   threadB.Name = "Thread B";
 
   threadA.Start();
   threadB.Start();
   
   threadA.Join();
   threadB.Join();
}
 
static void TouchFoo()
{
   Foo.SayHello();
}
 
class Foo
{
   static Foo()
   {
      Thread.Sleep(1000);
      Console.WriteLine("Foo .cctor on thread {0}", Thread.CurrentThread.Name);
   }
 
   static public void SayHello()
   {
      Console.WriteLine("Hello From Foo");
   }
}

In this program, Thread A will probably get the first chance to execute TouchFoo. In doing so, the CLR recognizes the static constructor for Foo has not been executed and sends Thread A into the constructor, which puts the thread to sleep for 1000 ms. The sleeping period allows Thread B to run. Thread B arrives inside of TouchFoo, and again the CLR recognizes the static constructor for Foo has not completed executing. This time however, the CLR knows somebody is working inside of the constructor, and it blocks Thread B until Thread A finishes Foo’s constructor. Typically then, this program would produce the following output:

Foo .cctor on thread Thread A
Hello From Foo on thread Thread A
Hello From Foo on thread Thread B

No matter how many threads we throw into the mix, the CLR will only allow one thread inside of the static constructor. Of course this isn’t magic, there has to be some thread synchronization (a lock) taken by the CLR before entering the static constructor.

Where there are locks, there is the possibility of a deadlock. So what happens if we try to put the runtime into a bind by feeding it code that should deadlock? Do you think the following program will execute, or will I need to kill it from task manager?

class Class1
{
   [STAThread]
   static void Main(string[] args)
   {
      Thread threadA = new Thread(new ThreadStart(TouchFoo));
      threadA.Name = "Thread A";
      
      Thread threadB = new Thread(new ThreadStart(TouchBar));
      threadB.Name = "Thread B";
 
      threadA.Start();
      threadB.Start();
 
      threadA.Join();
      threadB.Join();
   }
 
   static void TouchFoo()
   {
      string s = Foo.Message;
   }
 
   static void TouchBar()
   {
      string s = Bar.Message;
   }
}
 
class Foo
{
   static Foo()
   {
      Console.WriteLine("Begin Foo .cctor on thread {0}", Thread.CurrentThread.Name);
      Thread.Sleep(5000);
      Console.WriteLine("Foo has a message from Bar: {0}", Bar.Message);         
      message = "Hello From Foo";
      Console.WriteLine("Exit Foo .cctor on thread {0}", Thread.CurrentThread.Name);
   }
 
   static public string Message
   {
      get { return message; }
   }
 
   static string message = "blank";
}
 
class Bar
{
   static Bar()
   {
      Console.WriteLine("Begin Bar .cctor on thread {0}", Thread.CurrentThread.Name);
      Thread.Sleep(5000);
      Console.WriteLine("Bar has a message from Foo: {0}", Foo.Message);         
      message = "Hello From Bar";
      Console.WriteLine("Exit Bar .cctor on thread {0}", Thread.CurrentThread.Name);
   }
 
   static public string Message
   {
      get { return message; }
   }
 
   static string message = "empty";    
}

Notice in this example, the static constructor for Foo references Bar’s Message property, and vice versa. This produces the following scenario:

Thread A starts and eventually enters the static constructor for Foo. After a writing to the screen, the thread goes to sleep for 5000 ms. Thread B now has a chance to run and eventually enters the constructor for Bar, prints a message and goes to sleep.

Next, Thread A wakes up and finds it needs information from an un-initialized Bar, but it cannot run the constructor because Thread B has it locked. Thread B awakes and needs information from Foo, but it cannot run the Foo constructor because Thread A has it locked. Thread A needs the lock held by Thread B, and Thread B needs the lock held by Thread A. Classic deadlock!

As it turns out, the CLI specification addresses this issue in section 9.5.3.3 of Partition II, entitled “Races and Deadlocks”:

Type initialization alone shall not create a deadlock unless some code called from a type initializer (directly or indirectly) explicitly invokes blocking operations.

So if we don’t have a deadlock, what happens? Here is the output on my machine:

Begin Foo .cctor on thread Thread A
Begin Bar .cctor on thread Thread B
Bar has a message from Foo: blank
Exit Bar .cctor on thread Thread B
Foo has a message from Bar: Hello From Bar
Exit Foo .cctor on thread Thread A

The message Bar retrieves from Foo is “blank”, but Foo was supposed to initialize the message to “Hello From Foo”. The runtime allowed Thread B to reference Foo before Foo’s static constructor completed! It could have just as easily allowed Thread A to reference Bar before Bar’s static constructor completed, but by letting just one of the threads through, a lock became free and we avoid a deadlock. The runtime cannot perform miracles and let both the constructors run to completion.

The morale of the story is: never touch the static member of another type inside of a static constructor.

C++ programmers knew this was a bad scenario, although for slightly different reasons. See the C++ FAQ for the “static initialization order fiasco”. The punishment for C++ programmers who do this is to take a job at a fast food restaurant, but then hard-core C++ devs are pretty heavy handed (the topic for a future post).