Monday, August 17, 2009

Debugging win32 API errors

This blog post documents the methodology I used to diagnose a win32 API exception with a generic error message.  If you are in the same situation hopefully you'll find this helpful..

In April Tess Ferrandez blogged about Visualizing virtual memory usage and GC Heap.  I thought it would be cool if her code were modified to call Windbg programmatically and use a WPF frontend.  I found out Windbg uses dbgeng.dll!  Awesome!  I'll just write a managed wrapper around that.  While researching writing the wrapper I found CodePlex already had such a project called: mdbglib.  Great!  So I grabbed the code compiled and......Unhandled Exception!  No....... "First-chance exception at 0x7c9666c6 (ntdll.dll) in ASDumpAnalyzer.exe: 0xC0000139: Entry Point Not Found.".

My call stack looked like this: (bolded entries will be referenced later)

  ntdll.dll!_RtlRaiseStatus@4()  + 0x26 bytes
  ntdll.dll!_LdrpSnapThunk@32()  + 0x2a2b2 bytes
  ntdll.dll!_LdrpSnapIAT@16()  + 0xd9 bytes
  ntdll.dll!_LdrpHandleOneOldFormatImportDescriptor@16()  + 0x7a bytes
  ntdll.dll!_LdrpHandleOldFormatImportDescriptors@16()  + 0x2e bytes
  ntdll.dll!_LdrpWalkImportDescriptor@8()  + 0x11d bytes
  ntdll.dll!_LdrpLoadDll@24()  - 0x26c bytes
  ntdll.dll!_LdrLoadDll@16()  + 0x110 bytes
  kernel32.dll!_LoadLibraryExW@12()  + 0xc8 bytes
  mscorjit.dll!Compiler::impImportBlockCode()  + 0x5661 bytes
  mscorjit.dll!Compiler::impImportBlock()  + 0x59 bytes
  mscorjit.dll!Compiler::impImport()  + 0xb2 bytes
  mscorjit.dll!Compiler::fgImport()  + 0x20 bytes
  mscorjit.dll!Compiler::compCompile()  + 0xc bytes
  mscorjit.dll!Compiler::compCompile()  + 0x270 bytes
  mscorjit.dll!jitNativeCode()  + 0xa0 bytes
  mscorjit.dll!CILJit::compileMethod()  + 0x25 bytes
> ASDumpAnalyzer.exe!ASDumpAnalyzer.MainForm.MainForm_Load(object sender = {ASDumpAnalyzer.MainForm}, System.EventArgs e = {System.EventArgs}) Line 66 + 0x8 bytes C#

Based on the call stack I figured the code was throwing during JIT.  Looking at the method being jitted I was able to determine the Debuggee class was causing the Exception.  I figured the project was written on Windows Vista and was using a function that didn't exist in XP.  Looking around the CLI / C++ project nothing jumped out at me.  I decided to buy a C++ / CLI book, read about the win32 API calling convention: stdcall, and read about getting the parameters to LoadLibraryExW.  I also found out Visual Studio has  memory windows and a registers window.  I also found out that the ntdll.dll function names with a p in them are private, so I wouldn't be able to get their signature from MSDN like I could for Kernel32 LoadLibraryExW.
Armed with this knowledge I dug back in.
I put a breakpoint (bp) at {,,ntdll.dll}_LdrpSnapThunk@32.  This ended up getting hit all the time, so I put a hit count bp on {,,kernel32}_LoadLibraryExW@12 with a count of 59 because that is right before the app explodes.  I disabled my SnapThunk bp and enabled it after the LoadLibrary bp was hit.  Looking at the 32 bits on the stack using offsets from the esp register (esp, esp+4, esp+8........). and resolving the four 8 bit values addresses to memory locations I didn't see a function name like I was hoping.
So I started debugging assembly line by line.  After a few minutes I see 
7C917EE5  push        edi  
7C917EE6  push        eax  
7C917EE7  push        dword ptr [ebp+8] 
7C917EEA  push        dword ptr [esi+18h] 
7C917EED  push        ebx  
7C917EEE  call        _LdrpNameToOrdinal@20 (7C917EFDh)
I get very excited because there is a function name in esp when 7C917EEE executes!  This loop gets called a lot so I decide to put a bp on 7C917EEE and print the function name when it is called instead of breaking by putting this: {*(char**)esp}.  I hit F5 and watch function names fill up my output.  Right about the time I get my exception message I see DebugConnectWide in output.  Searching through the mdbglib project I see Debuggee.cpp line 56 has this call: Tools::CheckHR(DebugConnectWide(pwRemoteOptions, __uuidof(DbgClient), (void **)&debugClient ))  Comment that line rebuild, and no exception!


