Every time I write multithreaded code I worry. Perhaps it is because I read stories like the final report on the U.S. / Canadian blackout of August 2003. The problem was a race condition:
"There was a couple of processes that were in contention for a common data structure, and through a software coding error in one of the application processes, they were both able to get write access to a data structure at the same time," says Unum. "And that corruption led to the alarm event application getting into an infinite loop and spinning."
Seems simple, but consider:
"We text exhaustively, we test with third parties, and we had in excess of three million online operational hours in which nothing had ever exercised that bug," says Unum. "I'm not sure that more testing would have revealed that“.
Stories like the above, and about the Therac-25 machine which killed three people, always make me worry I’m missing something. I’ve never worked on radiation therapy machines or power grid software, but years ago I did some development for a machine that would take your blood pressure using a real cuff, and I’ll admit to being apprehensive about sticking my arm in for the squeeze.
The other reason I worry is because it is nearly impossible to test for race conditions.
Functional testing and stress testing may catch race conditions with some luck, but these bugs are hard to bring to the surface. Black box testing in general, with no knowledge of the implementation, has little chance to uncover a race condition except as a side-effect.
Unit testing is also little help. By definition, unit testing focuses on a small and specific piece of code and there is no interaction from other areas of the software. In addition, the exact timing conditions required to trigger a race condition bug have little chance of happening during a single-run pass or fail test.
In the end, a thorough code analysis (more sets of eyeballs) is the best tool to uncover race conditions. I remember a recent jaybaz blog entry showing code I’ve seen and written many times now in .NET and I’ve never thought about the potential race condition there. Sigh. Just goes to show race conditions are the most subtle bugs I know of.
I’ve been playing around with the web service API of Reporting Services again and I seem to have stumbled across a little bug. I plan on posting some code soon to show how to programmatically create a subscription – it took much longer to get something to work than I had originally thought and only part of that time was wasted on the bug.
While working on this I downloaded the Reporting Service Documentation Update to help out. I’ve noticed a number of documentation updates for 2004 products released this month (Biztalk 2004 has 1, 2, 3 updates this month). I’m wondering when every product will move to the MSDN style quarterly updates, or something akin to the Office 2003 Online help. Help is easy to update, unlike anti-lock brakes….
If you could pick one brand of vehicle on the road with a “software anomaly” in the braking system, would it be:
a) The Mini Cooper (like Rory Blyth drives)
b) The Cadillac Escalade (like Queen Latifah drives)
My choice would be A, because of course, a Mini would probably sustain more damage in a collision with my mailbox than my mailbox would, but the real vehicle with a brake bug is the Escalade, which could probably drive through the side of my house with no damage to the occupants. Thats 12,329 SUVs to recall.
Ah well, sleep beckons.
Maybe 10 months ago I downloaded Orca to tinker with an MSI file on a setup and installation project. While it was nice to twiddle inside of the MSI tables with Orca (ever read the story behind Orca?), the experience is nothing like Wix, which can decompile an MSI file into XML. This is great, because I can now check installations into a source repository as a text file and do DIFFs if needed.
Of course, you can also compile an XML Wix file into an MSI, which means no more fiddling around in a GUI to make changes in an opaque file for you. For some reason Visual Studio setup projects always worry me because I don’t know exactly what I’m changing when I do a save. I always feel there is something sinister at work in the background.
Even more interersting: the project has been open sourced by Microsoft. It is definitely a strange feeling to go to SourceForge to download MS bits.
That leaves DTS packages as the only other binary pieces under source control that I'd rather see as XML.
At some point, I hope the traditional help viewers, like dexplorer.exe for MSDN local help, can evolve beyond the traditional tree view of categories into a view with the ability to display more information.
For example, it might be useful to have a 3D look where each node could contain more information than just a category or item title. In the diagram below, each node contains some information about what is contained below (category counts, item counts, content ratings (pulled from the online site), and a star if you have a favorite underneath the node. The color coding could indicate where you spend most of your time, or where you have been most recently.
I know this could use a tremendous amount of screen real estate, but judging from Scott Swanson’s blog, most developers have moved well past 1024x768. In fact, I’m starting to wonder if I’ll be the last developer to start using dual 21 inch monitors.
Two great keynotes today. Pat Helland's “Metropolis” talk about SOAs had one of the most thorough analogies comparing the growth of the cities with the growth and future of IT.
We finally got the Whidbey bits during the morning keynote. I hear they are also up for MSDN subscribers to download.
Nothing kicks off a conference like Krispy Creme doughnuts, Bill Gates, and models dressed as airline stewardesses.
Starting to wind down a bit now. Whidbey has had a few hiccups during this day of presentations but has also gotten lots of applause - particularly edit and continue, the hands off HTML designer (no code munging when you switch to designer view) - and the self contained web server (no more FPSE). Everyone anxious to get a copy of the latest bits tommorow. Tons of productivity improvements.
The coolest part of BillG's keynote (besides the commercial - hilarous) - was the overview of Speech Server. It's hard to believe integrating speech will be this easy.
Eric Lippert on stage now talking about talking about VSTO2.
Been a long day - but looking forward to the going to the ballpark tonight and chilling.
Getting Whidbey is great, but who do you have to sleep with to get a Yukon beta???
It is a little late to discuss leap years, since February 29th has already passed, but you never see what goes awry in any given leap year until after the magical date has passed.
CNN has an article about a leap year related glitch in the Pontiac Grand Prix. The display shows the wrong day of the week now, and the possible solutions range from resetting the software to completely replacing the display unit.
Have you ever wondered how many hours modern civilization has spent wrestling with the concept of a leap year? It’s the classic case of a business rule with a twist. Your unit tests look pretty simple when the analyst says “always place the order with the supplier who has the lowest price”. Then suddenly, almost as an afterthought, “oh, except on the last Friday of each month, then we need to place the order with the preferred supplier”. Suddenly, the test cases quadruple.
So in coming up with an estimate for hours lost to Leap Year, I thought of all the coders over the years in RPG, COBOL, Fortran, and assembly, and the myriad of languages where the leap year algorithm has been implemented time and time again by everyone from seasoned professional to college student.
Then consider the number of times a software leap year algorithm has been defective. We know older version of Lotus 1-2-3 and Excel both had the bug, but even new stuff like Grand Prix cars and personal video recorders appear to hiccup. Sometimes the leap year bug poses problems in obscure places (why is my Kerberos authentication failing this year?). Other times, just trying to deal with the quirky date in a thorough manner gives you unexpected side effects, like blogging software with only 29 days in January.
Working in .NET, we all know there is a static method on the System.DateTime class with the name IsLeapYear. Problem solved? Hardly – but the mention of DateTime keeps the post relevant to .NET.
Leap year isn’t just a software issue. Consider the company who pays salaried employees every Friday. Only this year, there are 53 Fridays in the year instead of the typical 52, an occurrence we won’t see again until 2032. As a well compensated CEO, you need to spend many hours with other executives to answer the following multiple choice question:
In order to reconcile the extra pay day in 2004, do we
a) Decrease the weekly pay of all salaried employees to compensate for the extra pay period, with a possible loss of moral.
b) Keep the weekly pay the same for all 53 weeks, in effect giving ~ 2% bonus.
c) Switch to bi-monthly paychecks.
d) Outsource all salaried employees below the VP level and sublease the office space.
Even PhD candidates sometimes forget about leap years, and this leads to increased global warming.
Finally, when considering an estimate, think of the 4.1 million people worldwide who have been born on Leap Day. Think of the hours they and their parents have agonized over when to celebrate the birthday during non leap years. Consider the poor fellow in Ann Arbor who was born on leap day, and carries a driver’s license that expired on February 29, 2003 – try entering that date into a rental car system.
So let’s make a rough estimate. 6 billion people in the world. On average, let’s say 15 minutes of each person’s life is devoted to resolving Leap Year issues. That’s roughly 170,000 hours the current population has lost. Not that we can really do anything about the situation, because changing the calendar usually requires an act from either the Pope or an emperor, so your local congressperson stands little chance. In software it will just be one of those business rules in the “hard requirement” category.