Search This Blog

Showing posts with label windbg. Show all posts
Showing posts with label windbg. Show all posts

Friday, 2 April 2021

WinDbg : Parsing Arrays In the Debugger

 WinDbg : Parsing Arrays In The Debugger


Many a times data structs like arrays need parsing in windbg. These don't lend themselves as well as the LIST_ENTRY based structs do, because the debugger can parse such lists. Here are 2 simple ways of parsing arrays.


typedef struct _RECORD
{
    ULONG Foo;
    ULONG Bar;
} RECORD;

typedef struct _STATE
{
    RECORD Records[100];
} STATE;

int main()
{
    STATE localXyz;
}
Commands used:
  1. for
  2. dx
1: kd>.for (r @$t0 = 0; @$t0 < @@(#RTL_NUMBER_OF(localXyz.Records)); r @$t0 = @$t0 + 1) { ?? localXyz.Records[@$t0] }

struct _RECORD
   +0x000 Foo          : 
   +0x004 Bar        : 

There is another way of doing this in the latest version of Windbg, and that is using the dx command.

1: kd>dx -r2 -g localXyz.Records


Note : This example uses stack based objects, and hence uses the dot operator in the commands, however, if you are using a pointer to the struct you want to display, you will have to use the arrow operator instead of the dot.


Thursday, 24 September 2015

WinDbg : Deferred Symbol Loading

WinDbg : Deferred Symbol Loading



Sometimes WinDbg won't load the symbols for a particular module. If you want to debug that module or process call stacks containing that module's symbols you are going to be in for a surprise. So lets try and see what can be done to resolve the situation.

16.0: kd> lmv m mydll.dll
start             end                 module name

Unable to enumerate user-mode unloaded modules, NTSTATUS 0xC0000147

So looks like the module was not found. Lets try and see if the module is actually there.

16.0: kd> lm
start             end                 module name
00000000`00b30000 00000000`00b55000   MySvc   (deferred)             
00000000`704f0000 00000000`70513000   MyDll   (deferred)             
00000000`710b0000 00000000`711e3000   msxml3     (deferred)             
00000000`711f0000 00000000`711f7000   WSOCK32    (deferred)             
00000000`71200000 00000000`7135f000   sis        (deferred)             
00000000`73a10000 00000000`73a6c000   wow64win   (deferred)             
00000000`73a70000 00000000`73aaf000   wow64      (pdb symbols)          d:\websyms\wow64.pdb\7818D2D03E1C4936B815636B5EC869B21\wow64.pdb
00000000`747e0000 00000000`7497e000   COMCTL32   (deferred)             
00000000`74e20000 00000000`74e28000   wow64cpu   (pdb symbols)          d:\websyms\wow64cpu.pdb\C8D4F6038D014F62AB0C6A28A7FAFEF21\wow64cpu.pdb
00000000`74ed0000 00000000`74edc000   CRYPTBASE   (deferred)             
00000000`771a0000 00000000`77349000   ntdll      (pdb symbols)          d:\websyms\ntdll.pdb\6192BFDB9F04442995FFCB0BE95172E12\ntdll.pdb

Unloaded modules:
fffff880`01846000 fffff880`01854000   crashdmp.sys
fffff880`01854000 fffff880`0185e000   dump_storport.sys
fffff880`0185e000 fffff880`0187b000   dump_LSI_SAS.sys
fffff880`0187b000 fffff880`0188e000   dump_dumpfve.sys

Unable to enumerate user-mode unloaded modules, NTSTATUS 0xC0000147

Aaah, so this the problem, the module is present but the symbols are not. For the more keen observer, this looks to be a dump from a 64 bit machine. Hence the 64 bit addresses and the presence of wow64.

Okay, lets try and see what is there at the address where the module is loaded.

16.0: kd> db 00000000`704f0000 
0000:0000  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0000:0010  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0000:0020  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0000:0030  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0000:0040  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0000:0050  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0000:0060  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0000:0070  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????

Ouch!!!!

Now lets try to load the symbols. However, to do so, we need to calculate the size of the module.

70513000 - 704f0000 = 0x23000

Lets use reload to force load the symbols.

16.0: kd>.reload /i MyDll.Dll= 704f0000 

This should load the symbols for the Dll and map it to the address given. At this point any reffernces to this address zone will lead WinDbg to use the symbols mapped to it.

It is important to map the symbols to the right address, else the stack dumps will be gibberish.

Wednesday, 12 August 2015

WinDbg : Getting The Last Error in WinDbg

WinDbg : Getting The Last Error


One of the most used Windows APIs is perhaps GetLastError(). It returns us the error code of the last error encountered. However, many a times, we do not query for the last error while writing the code, and while debugging later on, we have no idea why it failed, we just know that it did. Here is an example:

HANDLE hVol = INVALID_HANDLE_VALUE;
hVol = CreateFile( csDrvName,
                           0,
                           FILE_SHARE_READ | FILE_SHARE_WRITE,
                           NULL,
                           OPEN_EXISTING,
                           0,                           // FILE_FLAG_NO_BUFFERING | FILE_FLAG_RANDOM_ACCESS
                           NULL);

if( hVol == INVALID_HANDLE_VALUE )
{
        return E_FAIL;
}

Now this function never queried the last error and if it had logging, then it never printed it to the logs. 

Which brings us to the question, if we had called GetLastError(), where does it get the info from, and if it did, then can we also get the same information from the debugger in cases when we wrote code like the above. The answer is Yes, we can, and here is how... 

Commands Used:
  1. !teb
  2. !error
Before we proceed, all error codes are thread specific, so being in the context of the thread you are debugging is important. To know know more about changing thread contexts this link might be helpful.  

Note : The fields of !teb sometimes vary between operating system versions. So if your output is slightly different from mine, there is no need to be concerned.

0:002> !teb
TEB at 7efd7000
    ExceptionList:        00b8fdf0
    StackBase:            00b90000
    StackLimit:           00b8e000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7efd7000
    EnvironmentPointer:   00000000
    ClientId:             0000099c . 00001d48
    RpcHandle:            00000000
    Tls Storage:          7efd702c
    PEB Address:          7efde000
    LastErrorValue:       2
    LastStatusValue:      0
    Count Owned Locks:    0
    HardErrorMode:        0


The value in LastErrorValue field is what GetLastError reports to us. It lives in the thread level, and that is why we can be multi threaded, and each thread can be calling a different API and the error codes don't get mangled.

Coming to the interpretation part. The !error command can interpret all errors and status messages in Windows.

0:001> !error 0n2
Error code: (Win32) 0x2 (2) - The system cannot find the file specified.

Note : Please be careful to use the right numeric base for !error. That is, if the value shown in !teb is in decimal, we should append a 0n to ensure it is not interpreted as hexadecimal.

Similarly, the LastStatusValue field will have the NTSTATUS codes.

Thursday, 30 October 2014

WinDbg : Debugging A Critical Section Based Dead Lock Scenario

WinDbg : Debugging A Critical Section Based Dead Lock Scenario


Dead locks can be very nasty and sometimes very hard to debug and root cause. This post is an attempt to walk you through one such scenario, where a critical section dead locks a thread. As with most of the previous posts, we will refrain from having source level help, that is, we will try and debug the faulty application without any source code and only symbols. 

Commands Used:

We start by attaching the debugger to he hung application, or we can also start the application inside the debugger if that is an alternative. In this example the name of the application is FaultyApp.Exe.

Once the application hangs, we start by taking a stack dump of all the threads currently running. Since this is an user land debugging scenario, the ~* command will give us the thread wise stack dump.

kd>~*kvn
0:004> ~*kvn

   0  Id: 6a0.5f8 Suspend: 1 Teb: 7efdd000 Unfrozen
 # ChildEBP RetAddr  Args to Child              
00 0016f7b8 770315f7 00000003 0016f808 00000001 ntdll!NtWaitForMultipleObjects+0x15 (FPO: [5,0,0])
01 0016f854 76e719f8 0016f808 0016f87c 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100 (FPO: [Non-Fpo])
02 0016f89c 76e74200 00000003 7efde000 00000000 kernel32!WaitForMultipleObjectsExImplementation+0xe0 (FPO: [Non-Fpo])
03 0016f8b8 00bd11b5 00000003 0016f8d4 00000000 kernel32!WaitForMultipleObjects+0x18 (FPO: [Non-Fpo])
04 0016f8e4 00bd143d 00000001 00699318 00699368 FaultyApp!main+0x95 (FPO: [Non-Fpo]) (CONV: cdecl) 
05 0016f92c 76e7338a 7efde000 0016f978 77d79f72 FaultyApp!__tmainCRTStartup+0xfe (FPO: [Non-Fpo]) (CONV: cdecl) 
06 0016f938 77d79f72 7efde000 735ac1c9 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
07 0016f978 77d79f45 00bd1500 7efde000 ffffffff ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
08 0016f990 00000000 00bd1500 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

   1  Id: 6a0.bc4 Suspend: 1 Teb: 7efda000 Unfrozen
 # ChildEBP RetAddr  Args to Child              
00 0089feb4 77d78e44 00000048 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15 (FPO: [3,0,0])
01 0089ff18 77d78d28 00000000 00000000 00000000 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo])
02 0089ff40 00bd105c 6ce392e0 00000000 00000000 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
03 0089ff54 76e7338a 00000000 0089ffa0 77d79f72 FaultyApp!FnThread1+0x2c (FPO: [Non-Fpo]) (CONV: cdecl) 
04 0089ff60 77d79f72 00000000 73c5c711 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
05 0089ffa0 77d79f45 00bd1005 00000000 ffffffff ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
06 0089ffb8 00000000 00bd1005 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

   2  Id: 6a0.12fc Suspend: 1 Teb: 7efaf000 Unfrozen
 # ChildEBP RetAddr  Args to Child              
00 00dffa50 77d78e44 00000048 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15 (FPO: [3,0,0])
01 00dffab4 77d78d28 00000000 00000000 00000000 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo])
02 00dffadc 00bd10fc 6ce392e0 00000000 00000000 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
03 00dffaf0 76e7338a 00000000 00dffb3c 77d79f72 FaultyApp!FnThread3+0x2c (FPO: [Non-Fpo]) (CONV: cdecl) 
04 00dffafc 77d79f72 00000000 7393c38d 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
05 00dffb3c 77d79f45 00bd100f 00000000 ffffffff ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
06 00dffb54 00000000 00bd100f 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

   3  Id: 6a0.11f0 Suspend: 1 Teb: 7efd7000 Unfrozen
 # ChildEBP RetAddr  Args to Child              
00 00a6fa34 77d78e44 00000048 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15 (FPO: [3,0,0])
01 00a6fa98 77d78d28 00000000 00000000 00000000 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo])
02 00a6fac0 00bd10ac 6ce392e0 00000000 00000000 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
03 00a6fad4 76e7338a 00000000 00a6fb20 77d79f72 FaultyApp!FnThread2+0x2c (FPO: [Non-Fpo]) (CONV: cdecl) 
04 00a6fae0 77d79f72 00000000 73eac391 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
05 00a6fb20 77d79f45 00bd100a 00000000 ffffffff ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
06 00a6fb38 00000000 00bd100a 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

#  4  Id: 6a0.11c8 Suspend: 1 Teb: 7efac000 Unfrozen
 # ChildEBP RetAddr  Args to Child              
00 00f0fd3c 77ddf926 73bcc5dd 00000000 00000000 ntdll!DbgBreakPoint (FPO: [0,0,0])
01 00f0fd6c 76e7338a 00000000 00f0fdb8 77d79f72 ntdll!DbgUiRemoteBreakin+0x3c (FPO: [Non-Fpo])
02 00f0fd78 77d79f72 00000000 73bcc509 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
03 00f0fdb8 77d79f45 77ddf8ea 00000000 ffffffff ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
04 00f0fdd0 00000000 77ddf8ea 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

The clue here is to look for the Windows API call RtlEnterCriticalSection. There might be  more than one thread in the stack making this function call, we can use any such thread.

Looking up on MSDN we see that the API is defined as follows:

NTSTATUS RtlEnterCriticalSection
 (
  RTL_CRITICAL_SECTION* crit
 )

In the above stack dump, we have several such threads, we take any one and them from the parameters list take the address of the RTL_CRITICAL_SECTION object pointer and type cast it to see the contents.

02 0089ff40 00bd105c 6ce392e0 00000000 00000000 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])

0:004> dt ntdll!_RTL_CRITICAL_SECTION 6ce392e0
   +0x000 DebugInfo        : 0x0069acf8 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : 0n-14
   +0x008 RecursionCount   : 0n1
   +0x00c OwningThread     : 0x000005f8 Void
   +0x010 LockSemaphore    : 0x00000048 Void
   +0x014 SpinCount        : 0

We see that one of the members of this object gives us the Owning thread ID. 0x000005f8.

Lets try and find out more details about this thread. In user mode, the double tilde (~~) command will do it for us.

0:004> ~~[0x000005f8 ]
   0  Id: 6a0.5f8 Suspend: 1 Teb: 7efdd000 Unfrozen
      Start: FaultyApp!mainCRTStartup (00bd1500)
      Priority: 0  Priority class: 32  Affinity: ff

Ok so the thread offending is the main thread. This is good. Now we know where in the code of FaultyApp we need to start digging.

Lets start with disassembling the Main function. I will add the code in small sections explaining the relevant portions in between.

0:004> uF FaultyApp!main
FaultyApp!main:
00bd1120 55              push    ebp
00bd1121 8bec           mov    ebp,esp 

This is standard function prologue code, that we see in the above two lines.


00bd1123 83ec14        sub    esp,14h                                           

The above line is allocating space for local variables in the stack 

00bd1126 a11070bf00  mov   eax,dword ptr [FaultyApp!__security_cookie (00bf7010)]
00bd112b 33c5            xor     eax,ebp                                           


These two lines above is the code for stack guard.


00bd112d 8945fc          mov     dword ptr [ebp-4],eax
00bd1130 a100b0bf00   mov     eax,dword ptr [FaultyApp!_imp__g_CritSec (00bfb000)]
00bd1135 50                 push    eax
00bd1136 ff1534b0bf00 call    dword ptr [FaultyApp!_imp__InitializeCriticalSection (00bfb034)]    

the four lines above  is the code to initialize the critical section

00bd113c c745ec00000000  mov     dword ptr [ebp-14h],0
00bd1143 eb09                   jmp     FaultyApp!main+0x2e (00bd114e)

FaultyApp!main+0x25:
00bd1145 8b4dec          mov     ecx,dword ptr [ebp-14h]
00bd1148 83c101          add      ecx,1
00bd114b 894dec          mov     dword ptr [ebp-14h],ecx

FaultyApp!main+0x2e:
00bd114e 837dec03        cmp     dword ptr [ebp-14h],3                             

The line just above is comparing a local variable with a constant '3'

00bd1152 7333            jae     FaultyApp!main+0x67 (00bd1187)                  

If the comparison result is equal or greater than 3 then then jump out, else loop                                                    
FaultyApp!main+0x34:
00bd1154 6a00                    push    0
00bd1156 6a00                    push    0
00bd1158 6a00                    push    0
00bd115a 8b55ec                mov     edx,dword ptr [ebp-14h]
00bd115d 8b04950070bf00  mov     eax,dword ptr FaultyApp!pFnarry (00bf7000)[edx*4]   

This above sections seems to be initializing a function pointer array for thread functions.  We know this because of the call to CreateThread API a few lines below....
                                
00bd1164 50              push    eax
00bd1165 6a00            push    0
00bd1167 6a00            push    0
00bd1169 ff154cb0bf00    call    dword ptr [FaultyApp!_imp__CreateThread (00bfb04c)]

(the reader is encouraged to use MSDN to see the signature for CreateThread and all the parameters pushed into it inside this function to decrypt what all was passed)

00bd116f 8b4dec          mov     ecx,dword ptr [ebp-14h]
00bd1172 89448df0       mov     dword ptr [ebp+ecx*4-10h],eax
00bd1176 8b55ec          mov     edx,dword ptr [ebp-14h]
00bd1179 837c95f000    cmp     dword ptr [ebp+edx*4-10h],0
00bd117e 7505              jne     FaultyApp!main+0x65 (00bd1185)

FaultyApp!main+0x60:
00bd1180 83c8ff          or      eax,0FFFFFFFFh
00bd1183 eb32            jmp     FaultyApp!main+0x97 (00bd11b7)

FaultyApp!main+0x65:
00bd1185 ebbe            jmp     FaultyApp!main+0x25 (00bd1145)

FaultyApp!main+0x67:
00bd1187 a1b892bf00      mov     eax,dword ptr [FaultyApp!hStartThreads (00bf92b8)]
00bd118c 50                    push    eax
00bd118d ff1540b0bf00   call    dword ptr [FaultyApp!_imp__SetEvent (00bfb040)]
00bd1193 6a00                push    0
00bd1195 68f4010000      push    1F4h
00bd119a ff1544b0bf00   call    dword ptr [FaultyApp!_imp__SleepEx (00bfb044)]
00bd11a0 e885010000     call    FaultyApp!DllFunction (00bd132a)                      

After setting an event and sleeping for a while, it finally jumps to this function in the above line. DllFunction. We need to disassemble it to see where it leads us. I am not pasting the rest of the disassembly of the Main function as it is irrelevant to the rest of the context.

Lets start with the disassembly of DllFunction.


0:004> u FaultyApp!DllFunction
FaultyApp!DllFunction:
00bd132a ff2504b0bf00    jmp     dword ptr [FaultyApp!_imp__DllFunction (00bfb004)]
FaultyApp!__security_check_cookie:
00bd1330 3b0d1070bf00    cmp     ecx,dword ptr [FaultyApp!__security_cookie (00bf7010)]
00bd1336 7502                  jne     FaultyApp!__security_check_cookie+0xa (00bd133a)
00bd1338 f3c3                  rep ret
00bd133a e908020000       jmp     FaultyApp!__report_gsfailure (00bd1547)
FaultyApp!__tmainCRTStartup:
00bd133f 6a14                  push    14h
00bd1341 68d05dbf00       push    offset FaultyApp!__rtc_tzz+0x104 (00bf5dd0)
00bd1346 e8551a0000       call    FaultyApp!__SEH_prolog4 (00bd2da0)

Whats going on? This doesn't look like a function at all....so look closely at the first line copied again in the line below for your convenience...

00bd132a ff2504b0bf00    jmp     dword ptr [FaultyApp!_imp__DllFunction (00bfb004)]

This is actually a function pointer which is being de-referenced. So if it is a pointer, we need to see what it is pointing at:

0:004> dd 00bfb004
00bfb004  6ce11005 00000000 00000000 00000000
00bfb014  00000000 00000000 00000000 00000000
00bfb024  00000000 00000000 00000000 00000000
00bfb034  77d72c8a 77d622c0 77d62280 76e71691
00bfb044  76e71215 76e7324c 76e73495 76e741e8
00bfb054  76e75169 76e74a25 76e751fd 76e711c0
00bfb064  76e711a9 76e717b8 76e71420 77d8107b
00bfb074  77d79dd5 76e779d8 76e74a37 76e71222

Lets have a look at what is actually at this address.

0:004> u 6ce11005
FaultyDLL!ILT+0(_DllFunction):
6ce11005 e916000000      jmp     FaultyDLL!DllFunction (6ce11020)
FaultyDLL!ILT+5(_DllMain:
6ce1100a e931000000      jmp     FaultyDLL!DllMain (6ce11040)
6ce1100f cc              int     3
6ce11010 cc              int     3
6ce11011 cc              int     3
6ce11012 cc              int     3
6ce11013 cc              int     3
6ce11014 cc              int     3

It shows the name of a different module. FaultyDll. Looks like it is a DLL function being used, through the import address table (I know the name of the module has the word DLL in it, but that is not why i reached this conclusion).

Now lets disassemble the actual function the address of which we have been able to decipher.

0:004> u 6ce11020
FaultyDLL!DllFunction:
6ce11020 55                    push    ebp
6ce11021 8bec                mov     ebp,esp
6ce11023 68e092e36c      push    offset FaultyDLL!g_CritSec (6ce392e0)
6ce11028 ff1500b0e36c   call    dword ptr [FaultyDLL!_imp__EnterCriticalSection (6ce3b000)]
6ce1102e b801000000      mov     eax,1
6ce11033 5d                    pop     ebp
6ce11034 c3                    ret
6ce11035 cc                    int     3

It is a small function and what do we see in it? It enter's the critical section, but never leaves it. This is the bug!!

Main is shown as the owner of the critical section because it calls a faulty function which enters but never leaves the critical section.

Note: This is a very simple test application tailor made to produce the hang, real life scenarios might be much harder to debug and locate. But I do hope that this gives a starting point to folks wanting to chase such bugs!


Trivia : !cs command also gives info about all critical sections. !cs <address of crit> will dump info about that particular critical section. I deliberately avoided using it, as this exercise helps one understand such debugging scenarios in depth, rather than being dependent on a debugger extension.