Search This Blog

Thursday 25 September 2014

WinDbg : How To Debug Memory Leaks With The !heap Command

WinDbg : How To Debug Memory Leaks With The !heap Command


Memory and resource leaks are best debugged on a live system. There are several user and kernel mode tools available to help us. But there are times when we get a process/kernel crash dump file, and the reason shown is that the entire virtual memory was consumed! These type of crashes are more likely to be seen on older 32 bit operating systems, since the 64 bit address space is huge and hence hard to overflow, also, the newer operating systems have more efficient heap managers and memory managers. Before we go on, I would recommend that you read this article here about the introductory concepts of heaps. 

All through out this article we will be focusing on a user mode service crash on a 32 bit Windows 2003 OS. 

Commands used: 
  1. !heap (introduced for the first time in this article)
  2. dds (this command is used in this article here). For more details of the d command, please read this.
  3. $
  4. ||
  5. .expr
  6. ??

The dump file used here is a full user mode dump.


0:004> $ Lets start with seeing what kind of dump file we are dealing with...

0:004> ||
.  0 Full memory user mini dump: E:\Public\PID-18800__MYFAULTYSERVICE.EXE__2nd_chance_CPlusPlusEH__full_19ac_2014-09-09_15-54-23-328_4970.dmp

0:004> $ Okay, so it is a full memory user dump. now lets see that state of the heap

0:004> !heap -s
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                    (k)     (k)    (k)     (k) length      blocks cont. heap 
-----------------------------------------------------------------------------
00150000 00000002    4096    644   1244     74     8    10    0      0   L  
00250000 00008000      64     12     12     10     1     1    0      0      
007e0000 00001002      64     52     52      5     1     1    0      0   L  
00800000 00000002    1024     20     20      3     1     1    0      0   L  
00a00000 00001002    4096   2388   2388     13     5     3    0      0   L  
Virtual block: 01040000 - 01040000 (size 00000000)
00b00000 00001002 1960256 1960256 1960256     70     0     0    1      7   L  
00f30000 00001002      64     28     28      7     4     1    0      0   L  
01520000 00001002      64     16     16      4     1     1    0      0   L  
06330000 00001002    7232   4060   5824   1077   226    24    0      0   L  
    External fragmentation  26 % (226 free blocks)
06350000 00001003    1280    220    268     37     7     7    0    N/A   
065c0000 00001003     256      4      4      2     1     1    0    N/A   
06a90000 00001003     256      4      4      2     1     1    0    N/A   
06ad0000 00001003     256      4      4      2     1     1    0    N/A   
06b10000 00001003     256      4      4      2     1     1    0    N/A   
06390000 00001002      64     16     16      3     1     1    0      0   L  
-----------------------------------------------------------------------------

0:004> $ The largest allocations seem to be happening in the heap 00b00000
0:004> $ That doesn't mean it is leaking, but it is a good starting point

0:004> $ using the heap's statistics, we can see the types of allocations happening in it

0:004> !heap -stat -h 00b00000
 heap @ 00b00000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    70 84a897 - 3a09c210  (53.53)
    30 aa8149 - 1ff83db0  (29.49)
    60 25da12 - e31c6c0  (13.09)
    38 12ecf1 - 423d4b8  (3.82)
    a8980 1 - a8980  (0.04)
    40 97a - 25e80  (0.01)
    200 106 - 20c00  (0.01)
    47 3d9 - 1112f  (0.00)
    18 725 - ab78  (0.00)
    d0 80 - 6800  (0.00)
    90 96 - 5460  (0.00)
    50 fa - 4e20  (0.00)
    20 1fa - 3f40  (0.00)
    8c 4e - 2aa8  (0.00)
    80 54 - 2a00  (0.00)
    24 129 - 29c4  (0.00)
    78 3e - 1d10  (0.00)
    800 3 - 1800  (0.00)
    28 74 - 1220  (0.00)
    1000 1 - 1000  (0.00)

0:004> $ In this case, there seem to be an allocation unit of 0x70 bytes,
0:004> $ which is consuming 53.53 percent of the heap allocations

0:004> $ This allocation is definitely a candidate for suspicion 

0:004> $ Lets try to get all information about this heap (this command can take a really long time)
    
0:004> !heap -a 00b00000
Index   Address  Name      Debugging options enabled
  6:   00b00000 
    Segment at 00b00000 to 00b10000 (00010000 bytes committed)
    Segment at 00b10000 to 00c10000 (00100000 bytes committed)

<Output trimmed to save space>

    Segment at 77980000 to 77b5e000 (001de000 bytes committed)
    Segment at 7c4e0000 to 7c6be000 (001de000 bytes committed)
    Segment at 7dd60000 to 7df3f000 (001df000 bytes committed)
    Flags:                00001002
    ForceFlags:           00000000
    Granularity:          8 bytes
    Segment Reserve:      77b40000
    Segment Commit:       00002000
    DeCommit Block Thres: 00000200
    DeCommit Total Thres: 00002000
    Total Free Size:      00002365
    Max. Allocation Size: 7ffdefff
    Lock Variable at:     00b00608
    Next TagIndex:        0000
    Maximum TagIndex:     0000
    Tag Entries:          00000000
    PsuedoTag Entries:    00000000
    Virtual Alloc List:   00b00050
    Unable to read nt!_HEAP_VIRTUAL_ALLOC_ENTRY structure at 01040000
    UCR FreeList:        010f0d70
    UCRSegment - 010f0000: 00003000 . 00010000
    FreeList Usage:      0000027c 00000000 00000000 00000000
    FreeList[ 02 ] at 00b00188: 7df0e970 . 00b26798  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 00b26798
    FreeList[ 03 ] at 00b00190: 5766a650 . 0189e728  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 0189e728
    FreeList[ 04 ] at 00b00198: 5b228800 . 0178cee0  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 0178cee0
    FreeList[ 05 ] at 00b001a0: 64bb7cc0 . 00b4cc70  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 00b4cc70
    FreeList[ 06 ] at 00b001a8: 016a1418 . 00b286d0  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 00b286d0
    FreeList[ 09 ] at 00b001c0: 7defe050 . 7defe050  
    Unable to read nt!_HEAP_FREE_ENTRY structure at 7defe050
    Segment00 at 00b00640:
        Flags:           00000000
        Base:            00b00000
        First Entry:     00b00680
        Last Entry:      00b10000
        Total Pages:     00000010
        Total UnCommit:  00000000
        Largest UnCommit:00000000
        UnCommitted Ranges: (0)

    Heap entries for Segment00 in Heap 00b00000
         address: psize . size  flags   state (requested size)
        00b00640: 00640 . 00040 [01] - busy (40)

<Output trimmed to save space>

        00b04dc8: 00018 . 00038 [01] - busy (30)
        00b04e00: 00038 . 00078 [01] - busy (70)
        00b04e78: 00078 . 00078 [01] - busy (70)
        00b04ef0: 00078 . 00078 [01] - busy (70)
        00b04f68: 00078 . 00078 [01] - busy (70)
        00b04fe0: 00078 . 00078 [01] - busy (70)
        00b05058: 00078 . 00038 [01] - busy (30)
        00b05090: 00038 . 00038 [01] - busy (30)
        00b050c8: 00038 . 00078 [01] - busy (70)
        00b05140: 00078 . 00028 [01] - busy (20)
        00b05168: 00028 . 00078 [01] - busy (70)
        00b051e0: 00078 . 00038 [01] - busy (30)
        00b05218: 00038 . 00078 [01] - busy (70)
        00b05290: 00078 . 00028 [01] - busy (20)
        00b052b8: 00028 . 00078 [01] - busy (70)
        00b05330: 00078 . 00028 [01] - busy (20)
        00b05358: 00028 . 00078 [01] - busy (70)

<Output truncated to save space>

0:004> $ Since we suspect the 70 byte allocation, lets start dumping the memory of these objects for any clues

0:004> db 00b04e00
00b04e00  0f 00 07 00 15 01 08 00-3c 5a 5b 00 78 03 b0 00  ........<Z[.x...
00b04e10  00 00 00 00 00 00 00 00-80 4e b0 00 00 00 00 00  .........N......
00b04e20  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00b04e30  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00b04e40  0f 00 00 00 00 00 00 00-41 70 70 20 54 6f 74 61  ........App Tota
00b04e50  6c 00 00 00 00 00 00 00-09 00 00 00 0f 00 00 00  l...............
00b04e60  00 00 00 00 d0 4d b0 00-5c fb d3 e4 e9 c3 02 00  .....M..\.......
00b04e70  01 00 00 00 00 00 00 00-0f 00 0f 00 1a 01 08 00  ................

0:004> $ Looks like an ASCII string in it, perhaps an eye catcher field in a structure or object...

0:004> $ For C++ objects, the first DWORD is the VTABLE pointer, and sometimes these pointers have class names in symbols
0:004> $ Lets try the dds command to see if it can identify any symbol

0:004> dds 00b04e00
00b04e00  0007000f
00b04e04  00080115
00b04e08  005b5a3c MyFaultyService!CTimer::`vftable'
00b04e0c  00b00378
00b04e10  00000000
00b04e14  00000000
00b04e18  00b04e80
00b04e1c  00000000
<Output truncated>

0:004> $ Lets try the dds command on another 70 byte entry to see if it can identify any symbol

0:004> dds 00b04f68
00b04f68  000f000f
00b04f6c  00080138
00b04f70  005b5a3c MyFaultyService!CTimer::`vftable'
00b04f74  00b00210
00b04f78  0038ce92
00b04f7c  00000000
<Output truncated>

Looks to be a problem with the class CTimer inside the binary. At this point, we are in a position to examine the class and it's objects to see if a possible problem can be detected. 

As as additional check, we can try to see whether our timer object indeed takes up 0x70 bytes.

0:004> .expr
Current expression evaluator: MASM - Microsoft Assembler expressions
0:004> .expr /s c++
Current expression evaluator: C++ - C++ source expressions
0:004> ??sizeof(myfaultyservice!Ctimer)

unsigned int 0x70

Which confirms our hunch.

We were lucky, since this was a C++ object and it had a VTABLE. We are also lucky, that it is our code and we had the symbols for it. The case of finding such objects in memory without symbols might be an even daunting task.

Also to be noted is that just because there are too many instances of an object, doesn't mean that it leaks. The debugger is trying to help you here, but you are the best judge of your code.



No comments:

Post a Comment