Developer’s tool belt: debugging production issues–part 3

Before I dive into the next post in this series, I want to recommend reading Debugging Microsoft .NET 2.0 Applications.  I rely on this book whenever I can’t remember how to do something in WinDBG, and what I’ll be blogging about investigating memory issues or deadlock I learned from this book.  In addition to its great detail about using WinDBG, it has a lot of other great tips for debugging with Visual Studio.

There are a few commands I run on a dump when I first open it in WinDBG to get a quick look at the state of the process.  The first command must always run in order to analyze .NET dumps:

.loadby sos clr

This loads SOS.dll into WinDBG and enables all of the .NET related commands you’ll need to use.  If the dump was created due to a crash, something like the following might be displayed when it’s first loaded:

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(80c.3fc): CLR exception - code e0434352 (first/second chance not available)
00000000`771718ca c3              ret

You can execute .excr but it isn’t aware of .NET, so it will just display the registers and unmanaged stack trace at the time of the exception.  Instead, !analyze –v will display the Windows Error Reporting information related to the exception, including the stack trace of the thread the exception occurred on.  Typically, I want to see what all of the threads looked like when the dump was made.  First, I use !threads:


Oh look, there’s the OutOfMemoryException that crashed the process.  Threads with XXXX as the thread number in the leftmost column are dead threads that haven’t been cleaned up yet.  You won’t be able to switch to those threads or view their stack, but if a lot them are listed that could be an indication of the issue.  If a particular thread doesn’t catch my eye, I’ll usually use !EEStack next.  It works the same as !DumpStack, but gives the full stack trace (managed and unmanaged) for all threads, which helps you get a good idea of what was going on at the time of the dump.  !ClrStack is similar but only displays the managed portion of the stack, but it’s useful when you want to see the parameters passed to methods and the local variable when you get deeper into the dump.  In the case of !EEStack, seeing the unmanaged stack is useful because it makes it easy to see what threads are currently sleeping, whether because they have nothing to do or they’re waiting for a lock or some other action is causing them to block.  This is usually indicated by a WaitFor… at the top of the stack trace:


For example, here’s the top of a stack trace of a thread waiting for a SQL query to complete:


You can see the difference between how managed and native methods are displayed, and at the top there’s a WaitForSingleObject in this case.

If my intent was to try to track down a memory issue, my next step would be to look at the heap with !DumpHeap –stat.  I’ll go into the details of debugging memory issues in my next post.

Developer’s tool belt: debugging production issues–part 2

So your shiny new application is ready to deploy. You’ve written unit tests and have great code coverage.  What could go wrong?  Then after running for some time in production an issue crops up.  Maybe it freezes and you suspect there might be a deadlock, or it crashes without any error logged, or there’s a memory leak you can’t track down.  If the issue can’t be reproduced and the logs don’t show the issue, then digging into a memory dump might be the last resort.

This post will cover setting up WinDBG and taking memory dumps.  I’ll dive into details about analyzing memory dumps in my next post.  The first step is to install WinDBG.  This used to be a separate install but I guess someone thought that was too easy, so now it’s bundled into the Windows SDK.  Download the web installer and choose Debugging Tools for Windows from the installation options:


Once WinDBG is installed, an environment variable named “_NT_SYMBOL_PATH” needs to be created to specify where WinDBG can find debugging symbols for modules.  Set this variable to “SRV*C:\Symbols*;SRV*C:\Symbols*”.  This instructs WinDBG to download symbols from and and to save them to C:\Symbols, from which they’ll be loaded by WinDBG when needed.

Creating Dumps

There are a few ways to create memory dumps, which basically boils down to user preference.  The easiest way to create a dump is by using Task Manager, which has had the ability since Windows Vista.  Right click on the process you want to create the dump of and select Create Dump File:


This will take a minute or two and display the location of the dump:


One thing to keep in mind, which I just learned when creating the dump of Visual Studio for the screenshot above, is that Task Manager can’t create dumps of 32-bit processes by default.  When trying to open the resulting dump file it errors with the message “Debugging a 64-bit dump of a 32-bit process is not supported, please collect a 32-bit dump of a 32-bit process.”  A 32-bit dump can be created by running the 32-bit version of Task Manager from C:\Windows\SysWOW64.  This blog post has more information.  Since I’ve always been creating dumps from 64-bit processes, I haven’t been affected by this before, thanks to .NET targeting Any CPU when compiling by default.

The traditional method for creating dumps is by using ADPlus, which will attach CDB (Console Debugger that’s installed with WinDBG) and create a dump file.  The great thing about ADPlus is that you can use it to create dumps immediately when the process has hung (known as “hang” dumps) or have it wait until the process crashes and create a crash dump.  To use ADPlus, it’s easiest to add “C:\Program Files\Debugging Tools for Windows (x64)” to the Path environment variable and open an administrator command prompt.  Just type “adplus” to view a summary of the options available.  Details on the various options can be found here

It might look complicated but I usually only need to use three options.  First, specify –hang or –crash to indicate whether to create a dump immediately or to just attach CDB now and wait for a crash.  Second, specify where to save the dump with –o.  And third, specify what process to attach to, which is most easily done by using –p and entering the process ID (you can get this from Task Manager using View > Select Columns… > PID) or by using –pn and entering the process file name if there’s only one running.  So, if I wanted to create a hang dump for process 7900, I’d type “adplus –hang –o C:\Dumps –p 7900”.  This opens a separate console window while the dump is being created that displays output from CDB, which will close when it’s finished.

The last method of obtaining a dump is by using the dump created by Windows Error Reporting when the process crashes, if you were lucky enough that one was created.  Look through the Application event log for an Information event from Windows Error Reporting logged just after the Error logged for the crash.  This log entry will have details about the crash, like the following example:

Fault bucket , type 0
Event Name: CLR20r3
Response: Not available
Cab Id: 0

Problem signature:
P3: 507c3954
P4: mscorlib
P6: 4f1967ce
P7: 1f45
P8: 10
P9: System.OutOfMemoryException

Attached files:

These files may be available here:

Analysis symbol: 
Rechecking for solution: 0
Report Id: e566d53e-24ab-11e2-bc24-005056990056
Report Status: 4

The AppCrash folder may have an .hdmp file in it, which can be copied to another folder and opened with WinDBG.  You’ll want to copy it so you can use it after the report queue is cleared.

In my next post, I’ll detail the techniques I use to triage a memory dump taken from a crash or a hang. Coming up after that, I’ll cover how to search for memory leaks in a memory dump.

Developer’s tool belt: debugging production issues–part 1

Following my post about the bus factor, I thought it’d be a good idea to focus on a topic that hasn’t been shared much with my coworkers but is a great tool to have in your toolbelt: how to triage production issues as they happen, when the stakes are high and you need to get things working quickly and figure out what went wrong as fast as possible.  This series will primarily cover gathering and analyzing memory dumps with WinDBG, but before I dive into details about memory dumps, I want to cover something that’s very important for any application that’s been deployed to production.


Logging is your first line of defense when things start going wrong.  With effective logging, you can avoid digging through a memory dump in all but the worst cases.  Why is that important?  Because debugging through a memory dump is time consuming, very time consuming, and all of that time could be better spent on other things.  It also means you can fix production issues faster, minimizing the system’s downtime.  In some cases, from what you see happening in the log you can proactively take steps before an issue becomes serious enough to take the system down.

Make logging a first class citizen in your software architecture, not an afterthought.  There are many mature logging frameworks to choose from, so you just need to pick one that covers your needs.  Keep in mind what the things that might need to be logged in the system.  Do different components need to log separately? Does the logging need to be asynchronous for performance?  Do different logs need to be created for each execution?  For example, if you’re creating an application that imports files, does it make sense to log each file import separately or is it fine to combine everything into one file?  One of the most important things to keep in mind, though, is don’t log so much that you have to dig through a pile of junk to find the useful parts.  It takes some time and trial and error to get it right, but it really makes a difference.

Logs are useless if no one looks at them

In general it’s good to review logs periodically to make sure there’s nothing out of the ordinary going on.  On the other hand, errors that occur shouldn’t be buried in a log file waiting to be found days after they’ve occurred.  While it’s a good idea to still log them with everything else, they also need to be more in your face.  I’ve mostly done this with email alerts, but there are other ways to create alerts too, such as IM, text message, or third party monitoring software.  You have to be careful to limit how many alerts get created or you start to ignore all of the error emails that come in.  Keep that in mind as you’re writing error handling code to make sure you don’t “cry wolf”.  Reserve sending emails for the exceptions that are critical.  Context is also essential when emailing an error.  The more details included the greater the chance that you won’t even need to open the full log.  For example, if you catch a SqlException include the parameters used in the SQL statement.  Maybe you’ll immediately spot the edge case you hadn’t tested for.

Determining a good error handling strategy has to be done on a case by case basis, and will vary from application to application.  There are a lot of factors to consider.  Is this a client application or a backend mission critical system?  Are the integration points going to be prone to errors?  Is it ok for the process to crash or does that need to be avoided as much as possible?  The frequency of errors and how they get handled will affect how they get logged.  In some cases emailing each error works great, while in others it will make more sense to only send an alert if errors occur repeatedly.  It might take a few iterations to get things right, to find the places that need more detail logged, and decide what works best for the application.

In my next post, I’ll cover the different ways of creating memory dumps and opening them in WinDbg.  After that I’ll be posting about using WinDbg to dig through the mountains of data within the dump.