Rhino.ServiceBus 3.0 Delivery Options

This post delves into the details of a feature I contributed to Rhino.ServiceBus, available in the recently released version 3.0.  Previously, you were able to add custom headers to outgoing messages by registering an implementation of ICustomizeMessageHeaders with the container.  That interface has been renamed to ICustomizeOutgoingMessages, and now allows you to also set a DeliverBy and/or MaxAttempts.  You can register any number of implementations with the container, and the bus will invoke each one as it builds outgoing messages.

Rhino.Queues has full support for DeliverBy and MaxAttempts.  If either of these two options is set, the message will be moved to the OutgoingHistory and marked as failed transmission if the message couldn’t be delivered in time.

MSMQ supports DeliverBy by setting the TimeToReachQueue property on the MSMQ message.  MaxAttempts is only supported when set to 1, which sets TimeToReachQueue to TimeSpan.Zero, meaning MSMQ only makes one attempt to deliver the message.  If MaxAttempts is set to a value other than 1, an InvalidUsageException is thrown.

Because it’s up to the user to create an implementation of ICustomizeOutgoingMessages, the user can set delivery options in a way that works best for them, whether it’s by setting options for specific message types, specific destinations, or reflecting over messages to look for specific attributes or interfaces.

Examples:

If you want to limit how many times the bus tries delivering messages to endpoints that aren’t always online, you could add a query parameter to the endpoint and the customize using the Destination on the OutgoingMessageInformation.

 

Maybe you have a message that’s only useful for a short period of time.

 

Using the OutgoingMessageInformation instance, you can inspect the Messages being sent, the Destination endpoint, and the Source endpoint to make any customizations you need.  You can modify the Headers collection on that instance, set MaxAttempts, or set DeliverBy.

Advertisements

Four years later, a retrospective

Welcome back!  I’ve joined a few of my coworkers in a weekly blog challenge, to write a new post each week.  To start, I figured I should finally write the post I meant to write four years ago, about a small piece of code that saved our rear ends from a huge issue plaguing our production environment. 

Right about this time four years ago, our busy season was just beginning, and we had recently deployed our custom built phone call distributer, Complemax.  We started seeing an issue crop up that was so severe, Complemax effectively stopped routing calls in our call center.  Complemax works by building a grid of phone calls and agents, and scoring each match.  Then the calls are routed to the best matching agents.  After digging through logs, we found that Complemax was processing messages fine, calls were being added and removed as they went on hold or hung up, and employees were updating as they came online or changed their status, but the grid wasn’t being scored! 

So now we had found the root of the problem, but what was the cause?  After analyzing memory dumps, digging through the code, and learning more about low level debugging than I ever had, we finally found the issue was caused by our threading.  The grid makes use of a ReaderWriterLockSlim as its state is updated.  Every five seconds, a read lock would be taken so the current matches could be scored, during which pending updates would wait a short time.  Unfortunately, that lock prioritizes writes over reads, causing our all-important grid calculation thread to be starved by the large amount of updates that were coming in as we received more and more phone calls.

How would we prioritize the read lock, to freeze the grid while we looped through it?  The solution needed to be simple, so we could fix the issue in production as fast as possible.  That’s when I came up with the idea of a “dual” reader-writer lock.  Use two locks on top of each other to give the less common read operation higher priority.  When an update is needed, take a read lock on the first lock and then a write lock on the second lock.  When the grid needs to be frozen, take a write lock on the first lock and pending “reads” on that lock will wait, preventing more updates to the grid.

I’m sure if we had caught the issue before it reached production, we would’ve ended up using something other than a ReaderWriterLockSlim in the first place, but surprisingly in the four years since we added the “dual” lock, we haven’t changed it.  It’s simple, it works, and it handles the load thrown at it.  It was certainly a good learning experience, one I’m glad I had once, but not one I’d ask for.