Search This Blog

Thursday 25 September 2014

WinDbg : How To Debug Memory Leaks With The !heap Command

WinDbg : How To Debug Memory Leaks With The !heap Command


Memory and resource leaks are best debugged on a live system. There are several user and kernel mode tools available to help us. But there are times when we get a process/kernel crash dump file, and the reason shown is that the entire virtual memory was consumed! These type of crashes are more likely to be seen on older 32 bit operating systems, since the 64 bit address space is huge and hence hard to overflow, also, the newer operating systems have more efficient heap managers and memory managers. Before we go on, I would recommend that you read this article here about the introductory concepts of heaps. 

All through out this article we will be focusing on a user mode service crash on a 32 bit Windows 2003 OS. 

Commands used: 
  1. !heap (introduced for the first time in this article)
  2. dds (this command is used in this article here). For more details of the d command, please read this.
  3. $
  4. ||
  5. .expr
  6. ??

The dump file used here is a full user mode dump.


0:004> $ Lets start with seeing what kind of dump file we are dealing with...

0:004> ||
.  0 Full memory user mini dump: E:\Public\PID-18800__MYFAULTYSERVICE.EXE__2nd_chance_CPlusPlusEH__full_19ac_2014-09-09_15-54-23-328_4970.dmp

0:004> $ Okay, so it is a full memory user dump. now lets see that state of the heap

0:004> !heap -s
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                    (k)     (k)    (k)     (k) length      blocks cont. heap 
-----------------------------------------------------------------------------
00150000 00000002    4096    644   1244     74     8    10    0      0   L  
00250000 00008000      64     12     12     10     1     1    0      0      
007e0000 00001002      64     52     52      5     1     1    0      0   L  
00800000 00000002    1024     20     20      3     1     1    0      0   L  
00a00000 00001002    4096   2388   2388     13     5     3    0      0   L  
Virtual block: 01040000 - 01040000 (size 00000000)
00b00000 00001002 1960256 1960256 1960256     70     0     0    1      7   L  
00f30000 00001002      64     28     28      7     4     1    0      0   L  
01520000 00001002      64     16     16      4     1     1    0      0   L  
06330000 00001002    7232   4060   5824   1077   226    24    0      0   L  
    External fragmentation  26 % (226 free blocks)
06350000 00001003    1280    220    268     37     7     7    0    N/A   
065c0000 00001003     256      4      4      2     1     1    0    N/A   
06a90000 00001003     256      4      4      2     1     1    0    N/A   
06ad0000 00001003     256      4      4      2     1     1    0    N/A   
06b10000 00001003     256      4      4      2     1     1    0    N/A   
06390000 00001002      64     16     16      3     1     1    0      0   L  
-----------------------------------------------------------------------------

0:004> $ The largest allocations seem to be happening in the heap 00b00000
0:004> $ That doesn't mean it is leaking, but it is a good starting point

0:004> $ using the heap's statistics, we can see the types of allocations happening in it

0:004> !heap -stat -h 00b00000
 heap @ 00b00000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    70 84a897 - 3a09c210  (53.53)
    30 aa8149 - 1ff83db0  (29.49)
    60 25da12 - e31c6c0  (13.09)
    38 12ecf1 - 423d4b8  (3.82)
    a8980 1 - a8980  (0.04)
    40 97a - 25e80  (0.01)
    200 106 - 20c00  (0.01)
    47 3d9 - 1112f  (0.00)
    18 725 - ab78  (0.00)
    d0 80 - 6800  (0.00)
    90 96 - 5460  (0.00)
    50 fa - 4e20  (0.00)
    20 1fa - 3f40  (0.00)
    8c 4e - 2aa8  (0.00)
    80 54 - 2a00  (0.00)
    24 129 - 29c4  (0.00)
    78 3e - 1d10  (0.00)
    800 3 - 1800  (0.00)
    28 74 - 1220  (0.00)
    1000 1 - 1000  (0.00)

0:004> $ In this case, there seem to be an allocation unit of 0x70 bytes,
0:004> $ which is consuming 53.53 percent of the heap allocations

0:004> $ This allocation is definitely a candidate for suspicion 

0:004> $ Lets try to get all information about this heap (this command can take a really long time)
    
0:004> !heap -a 00b00000
Index   Address  Name      Debugging options enabled
  6:   00b00000 
    Segment at 00b00000 to 00b10000 (00010000 bytes committed)
    Segment at 00b10000 to 00c10000 (00100000 bytes committed)

<Output trimmed to save space>

    Segment at 77980000 to 77b5e000 (001de000 bytes committed)
    Segment at 7c4e0000 to 7c6be000 (001de000 bytes committed)
    Segment at 7dd60000 to 7df3f000 (001df000 bytes committed)
    Flags:                00001002
    ForceFlags:           00000000
    Granularity:          8 bytes
    Segment Reserve:      77b40000
    Segment Commit:       00002000
    DeCommit Block Thres: 00000200
    DeCommit Total Thres: 00002000
    Total Free Size:      00002365
    Max. Allocation Size: 7ffdefff
    Lock Variable at:     00b00608
    Next TagIndex:        0000
    Maximum TagIndex:     0000
    Tag Entries:          00000000
    PsuedoTag Entries:    00000000
    Virtual Alloc List:   00b00050
    Unable to read nt!_HEAP_VIRTUAL_ALLOC_ENTRY structure at 01040000
    UCR FreeList:        010f0d70
    UCRSegment - 010f0000: 00003000 . 00010000
    FreeList Usage:      0000027c 00000000 00000000 00000000
    FreeList[ 02 ] at 00b00188: 7df0e970 . 00b26798  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 00b26798
    FreeList[ 03 ] at 00b00190: 5766a650 . 0189e728  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 0189e728
    FreeList[ 04 ] at 00b00198: 5b228800 . 0178cee0  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 0178cee0
    FreeList[ 05 ] at 00b001a0: 64bb7cc0 . 00b4cc70  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 00b4cc70
    FreeList[ 06 ] at 00b001a8: 016a1418 . 00b286d0  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 00b286d0
    FreeList[ 09 ] at 00b001c0: 7defe050 . 7defe050  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 7defe050
    Segment00 at 00b00640:
        Flags:           00000000
        Base:            00b00000
        First Entry:     00b00680
        Last Entry:      00b10000
        Total Pages:     00000010
        Total UnCommit:  00000000
        Largest UnCommit:00000000
        UnCommitted Ranges: (0)

    Heap entries for Segment00 in Heap 00b00000
         address: psize . size  flags   state (requested size)
        00b00640: 00640 . 00040 [01] - busy (40)

<Output trimmed to save space>

        00b04dc8: 00018 . 00038 [01] - busy (30)
        00b04e00: 00038 . 00078 [01] - busy (70)
        00b04e78: 00078 . 00078 [01] - busy (70)
        00b04ef0: 00078 . 00078 [01] - busy (70)
        00b04f68: 00078 . 00078 [01] - busy (70)
        00b04fe0: 00078 . 00078 [01] - busy (70)
        00b05058: 00078 . 00038 [01] - busy (30)
        00b05090: 00038 . 00038 [01] - busy (30)
        00b050c8: 00038 . 00078 [01] - busy (70)
        00b05140: 00078 . 00028 [01] - busy (20)
        00b05168: 00028 . 00078 [01] - busy (70)
        00b051e0: 00078 . 00038 [01] - busy (30)
        00b05218: 00038 . 00078 [01] - busy (70)
        00b05290: 00078 . 00028 [01] - busy (20)
        00b052b8: 00028 . 00078 [01] - busy (70)
        00b05330: 00078 . 00028 [01] - busy (20)
        00b05358: 00028 . 00078 [01] - busy (70)

<Output truncated to save space>

0:004> $ Since we suspect the 70 byte allocation, lets start dumping the memory of these objects for any clues

0:004> db 00b04e00
00b04e00  0f 00 07 00 15 01 08 00-3c 5a 5b 00 78 03 b0 00  ........<Z[.x...
00b04e10  00 00 00 00 00 00 00 00-80 4e b0 00 00 00 00 00  .........N......
00b04e20  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00b04e30  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00b04e40  0f 00 00 00 00 00 00 00-41 70 70 20 54 6f 74 61  ........App Tota
00b04e50  6c 00 00 00 00 00 00 00-09 00 00 00 0f 00 00 00  l...............
00b04e60  00 00 00 00 d0 4d b0 00-5c fb d3 e4 e9 c3 02 00  .....M..\.......
00b04e70  01 00 00 00 00 00 00 00-0f 00 0f 00 1a 01 08 00  ................

0:004> $ Looks like an ASCII string in it, perhaps an eye catcher field in a structure or object...

0:004> $ For C++ objects, the first DWORD is the VTABLE pointer, and sometimes these pointers have class names in symbols
0:004> $ Lets try the dds command to see if it can identify any symbol

0:004> dds 00b04e00
00b04e00  0007000f
00b04e04  00080115
00b04e08  005b5a3c MyFaultyService!CTimer::`vftable'
00b04e0c  00b00378
00b04e10  00000000
00b04e14  00000000
00b04e18  00b04e80
00b04e1c  00000000
<Output truncated>

0:004> $ Lets try the dds command on another 70 byte entry to see if it can identify any symbol

0:004> dds 00b04f68
00b04f68  000f000f
00b04f6c  00080138
00b04f70  005b5a3c MyFaultyService!CTimer::`vftable'
00b04f74  00b00210
00b04f78  0038ce92
00b04f7c  00000000
<Output truncated>

Looks to be a problem with the class CTimer inside the binary. At this point, we are in a position to examine the class and it's objects to see if a possible problem can be detected. 

As as additional check, we can try to see whether our timer object indeed takes up 0x70 bytes.

0:004> .expr
Current expression evaluator: MASM - Microsoft Assembler expressions
0:004> .expr /s c++
Current expression evaluator: C++ - C++ source expressions
0:004> ??sizeof(myfaultyservice!Ctimer)

unsigned int 0x70

Which confirms our hunch.

We were lucky, since this was a C++ object and it had a VTABLE. We are also lucky, that it is our code and we had the symbols for it. The case of finding such objects in memory without symbols might be an even daunting task.

Also to be noted is that just because there are too many instances of an object, doesn't mean that it leaks. The debugger is trying to help you here, but you are the best judge of your code.



Tuesday 23 September 2014

x86 : Function Calling Conventions

x86 : Function Calling Conventions


What are calling conventions?

Calling conventions are a standardized protocol for invoking subroutines (also called functions). Each hardware platform expects the compilers to follow certain rules while generating code regarding how arguments are passed to a subroutine, how the stack is setup, where to return to once the function/subroutine execution completes, who cleans up the stack etc. Such rules make up a  calling convention. So in short, calling convention specifies the method that a compiler sets up to access a subroutine. Different compilers can chose to generate code in different fashion, and in theory, code from any compiler can be interfaced together, so long as the functions all have the same calling conventions. 

While debugging, sometimes this knowledge becomes crucial to identify certain defects. For example, stack over flows might need us to know what calling convention was followed and thus how the over flow happened. I would encourage the reader to follow this article and the code examples given here thoroughly, since we will be using this knowledge in many of the future crash dump analysis scenarios.

Calling conventions specify how arguments are passed to a function, how return values are passed back out of a function, how the function is called, and how the function manages the stack and its stack frame. We will discuss all of these in greater detail shortly. In short, the calling convention specifies how a function call in C or C++ is converted into assembly language. 

There are many ways for this translation to occur, and each compiler can chose to do it slightly differently. Which is why it's so important to specify certain standard methods. If these standard conventions did not exist, it would be nearly impossible for programs created using different compilers to communicate and interact with one another.

Major calling conventions that are used with the C language are: 
  • STDCALL 
  • CDECL 
  • FASTCALL

In addition, there is another calling convention typically used with C++: 
  • THISCALL 

Older programming languages used other conventions example: 
  • PASCAL
  • FORTRAN 
These are not very widely used any more, and I would limit my discussion scope to the the C and C++ ones.

Argument passing

A function can have more than one arguments. The order of those arguments would matter for the execution of the function/subroutine. Example:

int foo(int a, int b);

A functions of this signature taking two arguments would need the assembly equivalent to pass these two arguments one after the other in the right order. If the arguments are passed in the left to right sequence then the assembly would look like :

push a
push b
call foo

however, if they are passed right to left, then the assembly would look like:

push b
push a
call foo

This is crucial, since assuming that foo is part of a library compiled with the assumption that arguments will be passed to it in the left to right order, and this library is being used by another linker and compiler pair, which assumes that the convention to be used is right to left, then the function foo will get the arguments in the incorrect order, which could be catastrophic.

Return Value


Some functions return a value, and that value must be received reliably by the function's caller. The called function places its return value in a place where the calling function can get it when execution returns. The called function stores the return value before executing the assembly ret instruction.

Caller - The Calling function

The "parent" function that calls the subroutine. Execution resumes in the calling function directly after the subroutine call, unless the program terminates inside the subroutine.

Callee - Called function

The "child" function that gets called by the "parent."

Stack Cleanup
Arguments passed to a function are done by putting them on the stack. This is called 'pushing'. When arguments are pushed onto the stack, eventually they must be removed from the stack, an operation called as 'popping' . Whichever function, the caller or the callee, is responsible for cleaning the stack must also reset the stack pointer to eliminate the passed arguments.

Name Decoration 

When C code is translated to assembly code, the compiler will often "decorate" the function name by adding extra information that the linker will use to find and link to the correct functions. For most calling conventions, the decoration is very simple (often only an extra symbol or two to denote the calling convention), but in some extreme cases (notably C++ THISCALL convention), the names are "mangled" severely. This is done because in C++ we have function overloading, and two functions with same names might actually have different implementations. Each compiler has separate mangling rules, and it us possible to identify the compiler used to generate the assembly by looking at the mangled names. To understand the different decoration methods, please refer to this article here.


Prologues - Entry sequence

The prologue is the few instructions at the beginning of a function, which prepare the stack and registers for use within the function.

Epilogues - Exit sequence

A set of a few instructions at the end of a function, which restore the stack and registers to the state expected by the caller, and return to the caller. Some calling conventions clean the stack in the exit sequence.

Call sequence

A few instructions in the middle of a function (the caller) which pass the arguments and call the called function. After the called function has returned, some calling conventions have one more instruction in the call sequence to clean the stack.

Let us now examine how the assembly looks when a function is defined using these different conventions. As mentioned earlier, we will only look into the C and C++ conventions here.


Standard C Calling Conventions

Note : All code examples and assembly equivalents are assuming it is x86 on 32 bit. Which means that pointer size is 4 bytes.

CDECL

This is the default calling convention for the C programming language. Other calling conventions are also used, but those are not part of standard ANSI C.

Assume this simple function below: 

_cdecl int foo(int a, int b)
{
    return (a+b);
}

Also lets assume that it was used in another function (caller) as follows:

nRet = foo(1, 2);

This will produce the following assembly listings (more or less).


_foo:
    push ebp
    mov ebp, esp
    mov eax, [ebp + 8]
    mov edx, [ebp + 12]
    add eax, edx 
    pop ebp
    ret

For the caller it will look like :

push 2
push 1
call _foo
add esp, 8

When translated to assembly code, CDECL functions are almost always prepended with an underscore (that's why all previous examples have used "_" in the assembly code). To learn more about decorations please refer to this article here.


So we see that in the CDECL calling convention the following holds:
  • Arguments are passed on the stack in Right-to-Left order, and return values are passed in eax.
  • Arguments are passed to functions by pushing them in the stack.
  • The calling function cleans the stack. This allows CDECL functions to have variable-length argument lists (aka variadic functions). For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used. All variable argument functions like printf, main etc which use va_start(), va_arg() will use this convention.

We see that the caller adjusts the stack by adding 8 to ESP. This is because the there were two variables pushed into the stack. The following might make it easier to understand:

The EBP register in x86 has the following values:

EBP + 4 :  is return address
EBP + 8  : is address of first parameter 
EBP - 4  : is first local variable

That is why we see that in the above assembly, the values pushed into the stack are accessed by :
    mov eax, [ebp + 8]
    mov edx, [ebp + 12]

EBP (frame based model) points to the section of stack which will start with locals. Before EBP is stack setup stuff  like return address, variables etc.

Another register used for stack operations is the ESP. The ESP looks out for stack over flow (like a page fault) if EBP hits a page boundary it is a fault.

For the above function which has two variables as argument the stack looks like:

EBP  ->
       RET ADDR
       1
       2
ESP  ->

That brings us to the question, why do we not see the push for the return address in the stack when we do examine a disassembly? This is because when the call is setup, the CALL instruction is the one who pushes the return address and the EBP pointer there.

Now lets have a look at the same function and it's call when we use STDCALL instead.

STDCALL

STDCALL, also known as "WINAPI" (and a few other names, depending on where you are reading it) is used almost exclusively by Microsoft as the standard calling convention for the Win32 API. Since STDCALL is strictly defined by Microsoft, all compilers that implement it do it the same way.

Consider the function:

__stdcall foo(int a, int b)
{
    return (a+b);
}

...and it's called as:

nRet = foo(1, 2);

The corresponding assembly code might look similar to this:

:_foo@8
    push ebp
    mov ebp, esp
    mov eax, [ebp + 8] 
    mov edx, [ebp + 12]
    add eax, edx
   pop ebp
   ret 8

and the caller looks like:

push 2
push 1
call _foo@8

So we see that in the STDCALL calling convention the following holds:
  • In the function body, the ret instruction has an (optional) argument that indicates how many bytes to pop off the stack when the function returns.
  • Once again, like the CDECL, the arguments are passed by pushing them in the stack.
  • STDCALL functions are name-decorated with a leading underscore, followed by an @, and then the number (in bytes) of arguments passed on the stack. This number will always be a multiple of 4, on a 32-bit aligned machine. More about name decorations can be found here.

FASTCALL

The FASTCALL calling convention is not completely standard across all compilers, so it should be used with caution. In FASTCALL, the first 2 or 3 32-bit (or smaller) arguments are passed in registers, with the most commonly used registers being edx, eax, and ecx. Additional arguments, or arguments larger than 4-bytes are passed on the stack, often in Right-to-Left order (similar to CDECL). The calling function most frequently is responsible for cleaning the stack, if needed.
Because of the ambiguities, it is recommended that FASTCALL be used only in situations with 1, 2, or 3 32-bit arguments, where speed is essential.
_fastcall int foo(int a, int b)
{
    return(a + b);
}

and the caller looks like :

nRet = foo(1, 2);

The assembly for this pair might looks similar to:

:@foo@8
    push ebp
    mov ebp, esp ;many compilers create a stack frame even if it isn't used
    add eax, edx ;a is in eax, b is in edx
    pop ebp
    ret

and the caller's code would look like :

mov eax, 1
mov edx, 2
call @foo@8

Many compilers still produce a stack frame for FASTCALL functions, especially in situations where the FASTCALL function itself calls another subroutine. However, if a FASTCALL function doesn't need a stack frame, optimizing compilers are free to omit it.

So we see that in the FASTCALL calling convention the following holds:
  • In the function body, the ret instruction has an (optional) argument that indicates how many bytes to pop off the stack when the function returns.
  • Unlike the CDECL or STDCALL, the arguments are passed through registers ecx and edx. Which means that they don't need to be copied into registers inside the function before being accessed, which is what we see in the case of CDECL or STDCALL. But what happens when there are more arguments than there are registers? Well, in that case after running out of registers, the rest of the arguments are infact pushed into the stack like the other calling conventions.
  • The name decoration for FASTCALL prepends an @ to the function name, and follows the function name with @x, where x is the number (in bytes) of arguments passed to the function. Please refer to more details here.

Standard C++ Calling Conventions

THISCALL
C++ requires that non-static methods of a class be called by an instance of the class. Thus there must be ab mechanism in place to ensure that pointers to the object are passed to the function. In THISCALL, the pointer to the class object is passed in ecx, the arguments are passed Right-to-Left on the stack, and the return value is passed in eax.

Assuming we have a class bar with the non-static member function foo(). then a call to foo might look like:
barObj.foo(a, b);

ignoring name mangling for the moment (which is a default thing in C++), the function call would look like:
mov ecx, barObj
push b
push a
call _foo
To understand C++ name mangling please refer to this.
Here is an example of a C++ class with the above function defined, and how the name might get mangled.
class bar {
    foo(int a, int b);
}
bar::foo(1, 2)
The resultant mangled name might look like:
?foo@bar@@QAEHH@Z

As mentioned in this article here, actual debugging of a binary, without symbols, means that we won't get any help regarding the function calling convention type from it's name. Such symbols are stripped off the binary, unless public symbols for them are available. In such cases we would need to use the other hints shown above to understand the true nature of a call. We will cover such symbol less debugging in another article.