Search This Blog

Tuesday 23 September 2014

x86 : Function Naming Conventions For C & C++

x86 : Function Naming Conventions For C & C++


Compilers usually follow a convention when they compile functions. Usually each different type of calling convention has a particular naming method. This scope of this article is limited to the most popular of the C and C++ function calling methods. Such modifications to the function names are called name decoration in C and name mangling in C++.

We will discuss the differences in the calling conventions in extensive details in this article.


Standard C Naming Conventions


CDECL

This is the default calling convention for the C programming language. Other calling conventions are also used, but those are not part of standard ANSI C.

Assume this simple function below: 

_cdecl int foo(int a, int b)
{
    return (a+b);
}

When translated to assembly code, CDECL functions are almost always prepended with an underscore (that's why all previous examples have used "_" in the assembly code). Thus the function name would look like:

call _foo



STDCALL

Consider the same function as above, this time with the _stdcall keyword:

__stdcall foo(int a, int b)
{
    return (a+b);
}

STDCALL functions are name-decorated with a leading underscore, followed by an @, and then the number (in bytes) of arguments passed on the stack. This number will always be a multiple of 4, on a 32-bit aligned machine.
Hence for this function above it will look like:

call _foo@8

FASTCALL

Now Lets look at how this same function might look when it is defined as a FASTCALL.
_fastcall int foo(int a, int b)
{
    return(a + b);
}

call @foo@8

The name decoration for FASTCALL prepends an @ to the function name, and follows the function name with @x, where x is the number (in bytes) of arguments passed to the function. Thus we see that we can identify the three major types of C functions calling conventions from function names.

Standard C++ Calling Conventions

C++ functions are heavily name-decorated because of the complexities inherent in function overloading, this decoration is called "Name Mangling." 
Unfortunately C++ compilers are free to do the name-mangling differently since the standard does not enforce a convention. Additionally, other issues such as exception handling are also not standardized.
Even though the algorithms are not standardized, each compiler has it's own signature method of doing so, and in many cases, it's possible to determine which compiler created the executable by examining the specifics of the name-mangling format.

THISCALL

Here are a few general remarks about THISCALL name-mangled functions:
  • They are recognizable on sight because of their complexity when compared to CDECL, FASTCALL, and STDCALL function name decorations
  • They sometimes include the name of that function's class.
  • They almost always include the number and type of the arguments, so that overloaded functions can be differentiated by the arguments passed to it.
Here is an example of a C++ class and function declaration:
class bar {
    foo(int a, int b);
}
bar::foo(1, 2)
The resultant mangled name might look like:
?foo@bar@@QAEHH@Z

Extern "C"

In a C++ source file, functions placed in an extern "C" block are guaranteed not to be mangled. This is done frequently when libraries are written in C++, and the functions need to be exported without being mangled. Even though the program is written in C++ and compiled with a C++ compiler, some of the functions might therefore not be mangled and will use one of the ordinary C calling conventions (typically CDECL).

Note on Name Decorations

We've been discussing name decorations in this post, but the fact is that in pure disassembled code there typically are no names whatsoever, especially not names with fancy decorations. The assembly stage removes all these readable identifiers, and replaces them with the binary locations instead. Function names really only appear in two places:


  • Listing files produced during compilation
  • In export tables, if functions are exported

When disassembling raw machine code, there will be no function names and no name decorations to examine. For this reason, you will need to pay more attention to the way parameters are passed, the way the stack is cleaned, and other similar details. All of these are explained in much detail in this article here.

Another word of caution is regarding optimizations. While we haven't covered code optimizations yet, suffice it to say that optimizing compilers can even make a mess out of these details. Functions which are not exported do not necessarily need to maintain standard interfaces, and if it is determined that a particular function does not need to follow a standard convention, some of the details will be optimized away. In these cases, it can be difficult to determine what calling conventions were used (if any), and it is even difficult to determine where a function begins and ends. This book cannot account for all possibilities, so we try to show as much information as possible, with the knowledge that much of the information provided here will not be available in a true disassembly situation.

No comments:

Post a Comment