Four years later, a retrospective

Welcome back!  I’ve joined a few of my coworkers in a weekly blog challenge, to write a new post each week.  To start, I figured I should finally write the post I meant to write four years ago, about a small piece of code that saved our rear ends from a huge issue plaguing our production environment. 

Right about this time four years ago, our busy season was just beginning, and we had recently deployed our custom built phone call distributer, Complemax.  We started seeing an issue crop up that was so severe, Complemax effectively stopped routing calls in our call center.  Complemax works by building a grid of phone calls and agents, and scoring each match.  Then the calls are routed to the best matching agents.  After digging through logs, we found that Complemax was processing messages fine, calls were being added and removed as they went on hold or hung up, and employees were updating as they came online or changed their status, but the grid wasn’t being scored! 

So now we had found the root of the problem, but what was the cause?  After analyzing memory dumps, digging through the code, and learning more about low level debugging than I ever had, we finally found the issue was caused by our threading.  The grid makes use of a ReaderWriterLockSlim as its state is updated.  Every five seconds, a read lock would be taken so the current matches could be scored, during which pending updates would wait a short time.  Unfortunately, that lock prioritizes writes over reads, causing our all-important grid calculation thread to be starved by the large amount of updates that were coming in as we received more and more phone calls.

How would we prioritize the read lock, to freeze the grid while we looped through it?  The solution needed to be simple, so we could fix the issue in production as fast as possible.  That’s when I came up with the idea of a “dual” reader-writer lock.  Use two locks on top of each other to give the less common read operation higher priority.  When an update is needed, take a read lock on the first lock and then a write lock on the second lock.  When the grid needs to be frozen, take a write lock on the first lock and pending “reads” on that lock will wait, preventing more updates to the grid.

I’m sure if we had caught the issue before it reached production, we would’ve ended up using something other than a ReaderWriterLockSlim in the first place, but surprisingly in the four years since we added the “dual” lock, we haven’t changed it.  It’s simple, it works, and it handles the load thrown at it.  It was certainly a good learning experience, one I’m glad I had once, but not one I’d ask for.

Advertisements

Book Review: CLR via C#, Second Edition

I recently volunteered to write a book review for Neumont’s quarterly newsletter, so I figured I’d post it here as well:

I don’t read many technical books. In fact, CLR via C#, Second Edition is the first one I’ve read completely from start to finish. It was recommended to me by my coworkers as one of the most useful reference books to have on hand. The book’s author, Jeffrey Richter, is co-founder of the consulting firm Wintellect, and has consulted for Microsoft on various projects, including development of the .NET Framework. He covers all of the core parts of the CLR (Common Language Runtime), on which the framework operates, while providing examples all along the way. The book itself is focused on detailing how the CLR functions, rather than explaining all of the features of C#. Instead, C# is used simply to illustrate how to the CLR works.

Richter covers the organization of the Type system into assemblies and modules, explaining the metadata and IL that is generated when code is compiled. He devotes several chapters to all aspects of classes, including fields, methods, properties, events, and more. The rest of the book covers all of the main parts of the .NET Framework, including generics, exception handling, garbage collection, reflection, and threading. In every chapter Richter offers advice about best practices to help developers avoid common programming mistakes and write fast, efficient code.

Unfortunately, Richter recently announced that he won’t be updating the book to account for the releases of .NET 3.0 and 3.5. His reason is that .NET 3.0 and 3.5 still run on the same version of the CLR, which is the main focus of the book, but there are places in the book that would benefit from an update. For example, in his chapter on threading Richter explains the problems in the ReaderWriterLock class, and why it should never be used. .NET 3.5 includes a new ReaderWriterLockSlim class that fixes those problems. It would be worth mentioning the new class in the book. At least Richter plans to release a future edition when a new version of the CLR is released.

Despite this drawback, it is very easy to see why so many people think this is one of the most important books for every .NET developer. Having this knowledge of the CLR has certainly helped me write better code, and I still refer to it all the time. If you haven’t read this book yet, and have any plans at all to develop with the .NET framework, this book should be the first one you read.