Jump$: September 2014

Thursday, 25 September 2014

WinDbg : How To Debug Memory Leaks With The !heap Command

WinDbg : How To Debug Memory Leaks With The !heap Command

Memory and resource leaks are best debugged on a live system. There are several user and kernel mode tools available to help us. But there are times when we get a process/kernel crash dump file, and the reason shown is that the entire virtual memory was consumed! These type of crashes are more likely to be seen on older 32 bit operating systems, since the 64 bit address space is huge and hence hard to overflow, also, the newer operating systems have more efficient heap managers and memory managers. Before we go on, I would recommend that you read this article here about the introductory concepts of heaps.

All through out this article we will be focusing on a user mode service crash on a 32 bit Windows 2003 OS.

Commands used:

!heap (introduced for the first time in this article)
dds (this command is used in this article here). For more details of the d command, please read this.
$
||
.expr
??

The dump file used here is a full user mode dump.

0:004> $ Lets start with seeing what kind of dump file we are dealing with...

0:004> ||

. 0 Full memory user mini dump: E:\Public\PID-18800__MYFAULTYSERVICE.EXE__2nd_chance_CPlusPlusEH__full_19ac_2014-09-09_15-54-23-328_4970.dmp

0:004> $ Okay, so it is a full memory user dump. now lets see that state of the heap

0:004> !heap -s

Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast

(k) (k) (k) (k) length blocks cont. heap

-----------------------------------------------------------------------------

00150000 00000002 4096 644 1244 74 8 10 0 0 L

00250000 00008000 64 12 12 10 1 1 0 0

007e0000 00001002 64 52 52 5 1 1 0 0 L

00800000 00000002 1024 20 20 3 1 1 0 0 L

00a00000 00001002 4096 2388 2388 13 5 3 0 0 L

Virtual block: 01040000 - 01040000 (size 00000000)

00b00000 00001002 1960256 1960256 1960256 70 0 0 1 7 L

00f30000 00001002 64 28 28 7 4 1 0 0 L

01520000 00001002 64 16 16 4 1 1 0 0 L

06330000 00001002 7232 4060 5824 1077 226 24 0 0 L

External fragmentation 26 % (226 free blocks)

06350000 00001003 1280 220 268 37 7 7 0 N/A

065c0000 00001003 256 4 4 2 1 1 0 N/A

06a90000 00001003 256 4 4 2 1 1 0 N/A

06ad0000 00001003 256 4 4 2 1 1 0 N/A

06b10000 00001003 256 4 4 2 1 1 0 N/A

06390000 00001002 64 16 16 3 1 1 0 0 L

-----------------------------------------------------------------------------

0:004> $ The largest allocations seem to be happening in the heap 00b00000.

0:004> $ That doesn't mean it is leaking, but it is a good starting point

0:004> $ using the heap's statistics, we can see the types of allocations happening in it

0:004> !heap -stat -h 00b00000

heap @ 00b00000

group-by: TOTSIZE max-display: 20

size #blocks total ( %) (percent of total busy bytes)

70 84a897 - 3a09c210 (53.53)

30 aa8149 - 1ff83db0 (29.49)

60 25da12 - e31c6c0 (13.09)

38 12ecf1 - 423d4b8 (3.82)

a8980 1 - a8980 (0.04)

40 97a - 25e80 (0.01)

200 106 - 20c00 (0.01)

47 3d9 - 1112f (0.00)

18 725 - ab78 (0.00)

d0 80 - 6800 (0.00)

90 96 - 5460 (0.00)

50 fa - 4e20 (0.00)

20 1fa - 3f40 (0.00)

8c 4e - 2aa8 (0.00)

80 54 - 2a00 (0.00)

24 129 - 29c4 (0.00)

78 3e - 1d10 (0.00)

800 3 - 1800 (0.00)

28 74 - 1220 (0.00)

1000 1 - 1000 (0.00)

0:004> $ In this case, there seem to be an allocation unit of 0x70 bytes,

0:004> $ which is consuming 53.53 percent of the heap allocations

0:004> $ This allocation is definitely a candidate for suspicion

0:004> $ Lets try to get all information about this heap (this command can take a really long time)

0:004> !heap -a 00b00000

Index Address Name Debugging options enabled

6: 00b00000

Segment at 00b00000 to 00b10000 (00010000 bytes committed)

Segment at 00b10000 to 00c10000 (00100000 bytes committed)

Segment at 77980000 to 77b5e000 (001de000 bytes committed)

Segment at 7c4e0000 to 7c6be000 (001de000 bytes committed)

Segment at 7dd60000 to 7df3f000 (001df000 bytes committed)

Flags: 00001002

ForceFlags: 00000000

Granularity: 8 bytes

Segment Reserve: 77b40000

Segment Commit: 00002000

DeCommit Block Thres: 00000200

DeCommit Total Thres: 00002000

Total Free Size: 00002365

Max. Allocation Size: 7ffdefff

Lock Variable at: 00b00608

Next TagIndex: 0000

Maximum TagIndex: 0000

Tag Entries: 00000000

PsuedoTag Entries: 00000000

Virtual Alloc List: 00b00050

Unable to read nt!_HEAP_VIRTUAL_ALLOC_ENTRY structure at 01040000

UCR FreeList: 010f0d70

UCRSegment - 010f0000: 00003000 . 00010000

FreeList Usage: 0000027c 00000000 00000000 00000000

FreeList[ 02 ] at 00b00188: 7df0e970 . 00b26798

Unable to read nt!_HEAP_FREE_ENTRY structure at 00b26798

FreeList[ 03 ] at 00b00190: 5766a650 . 0189e728

Unable to read nt!_HEAP_FREE_ENTRY structure at 0189e728

FreeList[ 04 ] at 00b00198: 5b228800 . 0178cee0

Unable to read nt!_HEAP_FREE_ENTRY structure at 0178cee0

FreeList[ 05 ] at 00b001a0: 64bb7cc0 . 00b4cc70

Unable to read nt!_HEAP_FREE_ENTRY structure at 00b4cc70

FreeList[ 06 ] at 00b001a8: 016a1418 . 00b286d0

Unable to read nt!_HEAP_FREE_ENTRY structure at 00b286d0

FreeList[ 09 ] at 00b001c0: 7defe050 . 7defe050

Unable to read nt!_HEAP_FREE_ENTRY structure at 7defe050

Segment00 at 00b00640:

Flags: 00000000

Base: 00b00000

First Entry: 00b00680

Last Entry: 00b10000

Total Pages: 00000010

Total UnCommit: 00000000

Largest UnCommit:00000000

UnCommitted Ranges: (0)

Heap entries for Segment00 in Heap 00b00000

address: psize . size flags state (requested size)

00b00640: 00640 . 00040 [01] - busy (40)

00b04dc8: 00018 . 00038 [01] - busy (30)

00b04e00: 00038 . 00078 [01] - busy (70)

00b04e78: 00078 . 00078 [01] - busy (70)

00b04ef0: 00078 . 00078 [01] - busy (70)

00b04f68: 00078 . 00078 [01] - busy (70)

00b04fe0: 00078 . 00078 [01] - busy (70)

00b05058: 00078 . 00038 [01] - busy (30)

00b05090: 00038 . 00038 [01] - busy (30)

00b050c8: 00038 . 00078 [01] - busy (70)

00b05140: 00078 . 00028 [01] - busy (20)

00b05168: 00028 . 00078 [01] - busy (70)

00b051e0: 00078 . 00038 [01] - busy (30)

00b05218: 00038 . 00078 [01] - busy (70)

00b05290: 00078 . 00028 [01] - busy (20)

00b052b8: 00028 . 00078 [01] - busy (70)

00b05330: 00078 . 00028 [01] - busy (20)

00b05358: 00028 . 00078 [01] - busy (70)

0:004> $ Since we suspect the 70 byte allocation, lets start dumping the memory of these objects for any clues

0:004> db 00b04e00

00b04e00 0f 00 07 00 15 01 08 00-3c 5a 5b 00 78 03 b0 00 ........<Z[.x...

00b04e10 00 00 00 00 00 00 00 00-80 4e b0 00 00 00 00 00 .........N......

00b04e20 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

00b04e30 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

00b04e40 0f 00 00 00 00 00 00 00-41 70 70 20 54 6f 74 61 ........App Tota

00b04e50 6c 00 00 00 00 00 00 00-09 00 00 00 0f 00 00 00 l...............

00b04e60 00 00 00 00 d0 4d b0 00-5c fb d3 e4 e9 c3 02 00 .....M..\.......

00b04e70 01 00 00 00 00 00 00 00-0f 00 0f 00 1a 01 08 00 ................

0:004> $ Looks like an ASCII string in it, perhaps an eye catcher field in a structure or object...

0:004> $ For C++ objects, the first DWORD is the VTABLE pointer, and sometimes these pointers have class names in symbols

0:004> $ Lets try the dds command to see if it can identify any symbol

0:004> dds 00b04e00

00b04e00 0007000f

00b04e04 00080115

00b04e08 005b5a3c MyFaultyService!CTimer::`vftable'

00b04e0c 00b00378

00b04e10 00000000

00b04e14 00000000

00b04e18 00b04e80

00b04e1c 00000000

0:004> $ Lets try the dds command on another 70 byte entry to see if it can identify any symbol

0:004> dds 00b04f68

00b04f68 000f000f

00b04f6c 00080138

00b04f70 005b5a3c MyFaultyService!CTimer::`vftable'

00b04f74 00b00210

00b04f78 0038ce92

00b04f7c 00000000

Looks to be a problem with the class CTimer inside the binary. At this point, we are in a position to examine the class and it's objects to see if a possible problem can be detected.

As as additional check, we can try to see whether our timer object indeed takes up 0x70 bytes.

0:004> .expr
Current expression evaluator: MASM - Microsoft Assembler expressions
0:004> .expr /s c++
Current expression evaluator: C++ - C++ source expressions
0:004> ??sizeof(myfaultyservice!Ctimer)

unsigned int 0x70

Which confirms our hunch.

We were lucky, since this was a C++ object and it had a VTABLE. We are also lucky, that it is our code and we had the symbols for it. The case of finding such objects in memory without symbols might be an even daunting task.

Also to be noted is that just because there are too many instances of an object, doesn't mean that it leaks. The debugger is trying to help you here, but you are the best judge of your code.

Tuesday, 23 September 2014

x86 : Function Calling Conventions

x86 : Function Calling Conventions

What are calling conventions?

Calling conventions are a standardized protocol for invoking subroutines (also called functions). Each hardware platform expects the compilers to follow certain rules while generating code regarding how arguments are passed to a subroutine, how the stack is setup, where to return to once the function/subroutine execution completes, who cleans up the stack etc. Such rules make up a calling convention. So in short, calling convention specifies the method that a compiler sets up to access a subroutine. Different compilers can chose to generate code in different fashion, and in theory, code from any compiler can be interfaced together, so long as the functions all have the same calling conventions.

While debugging, sometimes this knowledge becomes crucial to identify certain defects. For example, stack over flows might need us to know what calling convention was followed and thus how the over flow happened. I would encourage the reader to follow this article and the code examples given here thoroughly, since we will be using this knowledge in many of the future crash dump analysis scenarios.

Calling conventions specify how arguments are passed to a function, how return values are passed back out of a function, how the function is called, and how the function manages the stack and its stack frame. We will discuss all of these in greater detail shortly. In short, the calling convention specifies how a function call in C or C++ is converted into assembly language.

There are many ways for this translation to occur, and each compiler can chose to do it slightly differently. Which is why it's so important to specify certain standard methods. If these standard conventions did not exist, it would be nearly impossible for programs created using different compilers to communicate and interact with one another.

Major calling conventions that are used with the C language are:

STDCALL
CDECL
FASTCALL

In addition, there is another calling convention typically used with C++:

THISCALL

Older programming languages used other conventions example:

PASCAL
FORTRAN

These are not very widely used any more, and I would limit my discussion scope to the the C and C++ ones.

Argument passing

A function can have more than one arguments. The order of those arguments would matter for the execution of the function/subroutine. Example:

int foo(int a, int b);

A functions of this signature taking two arguments would need the assembly equivalent to pass these two arguments one after the other in the right order. If the arguments are passed in the left to right sequence then the assembly would look like :

push a

push b

call foo

however, if they are passed right to left, then the assembly would look like:

push b

push a

call foo

This is crucial, since assuming that foo is part of a library compiled with the assumption that arguments will be passed to it in the left to right order, and this library is being used by another linker and compiler pair, which assumes that the convention to be used is right to left, then the function foo will get the arguments in the incorrect order, which could be catastrophic.

Return Value

Some functions return a value, and that value must be received reliably by the function's caller. The called function places its return value in a place where the calling function can get it when execution returns. The called function stores the return value before executing the assembly ret instruction.

Caller - The Calling function

The "parent" function that calls the subroutine. Execution resumes in the calling function directly after the subroutine call, unless the program terminates inside the subroutine.

Callee - Called function

The "child" function that gets called by the "parent."

Stack Cleanup

Arguments passed to a function are done by putting them on the stack. This is called 'pushing'. When arguments are pushed onto the stack, eventually they must be removed from the stack, an operation called as 'popping' . Whichever function, the caller or the callee, is responsible for cleaning the stack must also reset the stack pointer to eliminate the passed arguments.

Name Decoration

When C code is translated to assembly code, the compiler will often "decorate" the function name by adding extra information that the linker will use to find and link to the correct functions. For most calling conventions, the decoration is very simple (often only an extra symbol or two to denote the calling convention), but in some extreme cases (notably C++ THISCALL convention), the names are "mangled" severely. This is done because in C++ we have function overloading, and two functions with same names might actually have different implementations. Each compiler has separate mangling rules, and it us possible to identify the compiler used to generate the assembly by looking at the mangled names. To understand the different decoration methods, please refer to this article here.

Prologues - Entry sequence

The prologue is the few instructions at the beginning of a function, which prepare the stack and registers for use within the function.

Epilogues - Exit sequence

A set of a few instructions at the end of a function, which restore the stack and registers to the state expected by the caller, and return to the caller. Some calling conventions clean the stack in the exit sequence.

Call sequence

A few instructions in the middle of a function (the caller) which pass the arguments and call the called function. After the called function has returned, some calling conventions have one more instruction in the call sequence to clean the stack.

Let us now examine how the assembly looks when a function is defined using these different conventions. As mentioned earlier, we will only look into the C and C++ conventions here.

Standard C Calling Conventions

Note : All code examples and assembly equivalents are assuming it is x86 on 32 bit. Which means that pointer size is 4 bytes.

CDECL

This is the default calling convention for the C programming language. Other calling conventions are also used, but those are not part of standard ANSI C.

Assume this simple function below:

_cdecl int foo(int a, int b)

{

return (a+b);

}

Also lets assume that it was used in another function (caller) as follows:

nRet = foo(1, 2);

This will produce the following assembly listings (more or less).

_foo:

push ebp

mov ebp, esp

mov eax, [ebp + 8]

mov edx, [ebp + 12]

add eax, edx

pop ebp

ret

For the caller it will look like :

push 2

push 1

call _foo

add esp, 8

When translated to assembly code, CDECL functions are almost always prepended with an underscore (that's why all previous examples have used "_" in the assembly code). To learn more about decorations please refer to this article here.

So we see that in the CDECL calling convention the following holds:

Arguments are passed on the stack in Right-to-Left order, and return values are passed in eax.

Arguments are passed to functions by pushing them in the stack.

The calling function cleans the stack. This allows CDECL functions to have variable-length argument lists (aka variadic functions). For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used. All variable argument functions like printf, main etc which use va_start(), va_arg() will use this convention.

We see that the caller adjusts the stack by adding 8 to ESP. This is because the there were two variables pushed into the stack. The following might make it easier to understand:

The EBP register in x86 has the following values:

EBP + 4 : is return address

EBP + 8 : is address of first parameter

EBP - 4 : is first local variable

That is why we see that in the above assembly, the values pushed into the stack are accessed by :

mov eax, [ebp + 8]

mov edx, [ebp + 12]

EBP (frame based model) points to the section of stack which will start with locals. Before EBP is stack setup stuff like return address, variables etc.

Another register used for stack operations is the ESP. The ESP looks out for stack over flow (like a page fault) if EBP hits a page boundary it is a fault.

For the above function which has two variables as argument the stack looks like:

EBP ->

RET ADDR

ESP ->

That brings us to the question, why do we not see the push for the return address in the stack when we do examine a disassembly? This is because when the call is setup, the CALL instruction is the one who pushes the return address and the EBP pointer there.

Now lets have a look at the same function and it's call when we use STDCALL instead.

STDCALL

STDCALL, also known as "WINAPI" (and a few other names, depending on where you are reading it) is used almost exclusively by Microsoft as the standard calling convention for the Win32 API. Since STDCALL is strictly defined by Microsoft, all compilers that implement it do it the same way.

Consider the function:

__stdcall foo(int a, int b)

{

return (a+b);

}

...and it's called as:

nRet = foo(1, 2);

The corresponding assembly code might look similar to this:

:_foo@8

push ebp

mov ebp, esp

mov eax, [ebp + 8]

mov edx, [ebp + 12]

add eax, edx

pop ebp

ret 8

and the caller looks like:

push 2

push 1

call _foo@8

So we see that in the STDCALL calling convention the following holds:

In the function body, the ret instruction has an (optional) argument that indicates how many bytes to pop off the stack when the function returns.
Once again, like the CDECL, the arguments are passed by pushing them in the stack.
STDCALL functions are name-decorated with a leading underscore, followed by an @, and then the number (in bytes) of arguments passed on the stack. This number will always be a multiple of 4, on a 32-bit aligned machine. More about name decorations can be found here.

FASTCALL

The FASTCALL calling convention is not completely standard across all compilers, so it should be used with caution. In FASTCALL, the first 2 or 3 32-bit (or smaller) arguments are passed in registers, with the most commonly used registers being edx, eax, and ecx. Additional arguments, or arguments larger than 4-bytes are passed on the stack, often in Right-to-Left order (similar to CDECL). The calling function most frequently is responsible for cleaning the stack, if needed.

Because of the ambiguities, it is recommended that FASTCALL be used only in situations with 1, 2, or 3 32-bit arguments, where speed is essential.

_fastcall int foo(int a, int b)

{

return(a + b);

}

and the caller looks like :

nRet = foo(1, 2);

The assembly for this pair might looks similar to:

:@foo@8

push ebp

mov ebp, esp ;many compilers create a stack frame even if it isn't used

add eax, edx ;a is in eax, b is in edx

pop ebp

ret

and the caller's code would look like :

mov eax, 1

mov edx, 2

call @foo@8

Many compilers still produce a stack frame for FASTCALL functions, especially in situations where the FASTCALL function itself calls another subroutine. However, if a FASTCALL function doesn't need a stack frame, optimizing compilers are free to omit it.

So we see that in the FASTCALL calling convention the following holds:

In the function body, the ret instruction has an (optional) argument that indicates how many bytes to pop off the stack when the function returns.
Unlike the CDECL or STDCALL, the arguments are passed through registers ecx and edx. Which means that they don't need to be copied into registers inside the function before being accessed, which is what we see in the case of CDECL or STDCALL. But what happens when there are more arguments than there are registers? Well, in that case after running out of registers, the rest of the arguments are infact pushed into the stack like the other calling conventions.
The name decoration for FASTCALL prepends an @ to the function name, and follows the function name with @x, where x is the number (in bytes) of arguments passed to the function. Please refer to more details here.

Standard C++ Calling Conventions

THISCALL

C++ requires that non-static methods of a class be called by an instance of the class. Thus there must be ab mechanism in place to ensure that pointers to the object are passed to the function. In THISCALL, the pointer to the class object is passed in ecx, the arguments are passed Right-to-Left on the stack, and the return value is passed in eax.

Assuming we have a class bar with the non-static member function foo(). then a call to foo might look like:

barObj.foo(a, b);

ignoring name mangling for the moment (which is a default thing in C++), the function call would look like:

mov ecx, barObj

push b

push a

call _foo

To understand C++ name mangling please refer to this.

Here is an example of a C++ class with the above function defined, and how the name might get mangled.

class bar {

foo(int a, int b);

}

bar::foo(1, 2)

The resultant mangled name might look like:

?foo@bar@@QAEHH@Z

As mentioned in this article here, actual debugging of a binary, without symbols, means that we won't get any help regarding the function calling convention type from it's name. Such symbols are stripped off the binary, unless public symbols for them are available. In such cases we would need to use the other hints shown above to understand the true nature of a call. We will cover such symbol less debugging in another article.

Search This Blog

Thursday, 25 September 2014

WinDbg : How To Debug Memory Leaks With The !heap Command

Tuesday, 23 September 2014

x86 : Function Calling Conventions

Subscribe To