Sep 19, 2012

Posted by | 0 Comments

Fastest memcpy

Fastest memcpy

Optimized memory copy version. Approx 30-70% faster than memcpy in Microsoft Visual Studio 2005.



Code:

void memcpy_sse2(void* dest, const void* src, const unsigned long size_t)
{
  __asm
{
mov esi, src;    //src pointer
mov edi, dest;   //dest pointer

mov ebx, size_t; //ebx is our counter
shr ebx, 7;      //divide by 128 (8 * 128bit registers)

loop_copy:
prefetchnta 128[ESI]; //SSE2 prefetch
prefetchnta 160[ESI];
prefetchnta 192[ESI];
prefetchnta 224[ESI];

movdqa xmm0, 0[ESI]; //move data from src to registers
movdqa xmm1, 16[ESI];
movdqa xmm2, 32[ESI];
movdqa xmm3, 48[ESI];
movdqa xmm4, 64[ESI];
movdqa xmm5, 80[ESI];
movdqa xmm6, 96[ESI];
movdqa xmm7, 112[ESI];

movntdq 0[EDI], xmm0; //move data from registers to dest
movntdq 16[EDI], xmm1;
movntdq 32[EDI], xmm2;
movntdq 48[EDI], xmm3;
movntdq 64[EDI], xmm4;
movntdq 80[EDI], xmm5;
movntdq 96[EDI], xmm6;
movntdq 112[EDI], xmm7;

add esi, 128;
add edi, 128;
dec ebx;

jnz loop_copy; //loop please
loop_copy_end:
}
}

Courtesy of William Chan

Read More
Aug 29, 2012

Posted by | 0 Comments

Calling Convention – Part IV (__thiscall)

Calling Convention – Part IV (__thiscall)

Make sure you have read “Calling Convention – Part I”, “Calling Convention – Part II” & “Calling Convention – Part III” of this article.

This calling convention ( __thiscall )

__thiscall is the default calling convention for calling member functions of C++ classes (except for those with a variable number of arguments).

The main characteristics of __thiscall calling convention are:

  1. Arguments are passed from right to left, and placed on the stack. this is placed in ECX.
  2. Stack cleanup is performed by the called function.

C++ Name Decoration/Mangling For thiscall

Please click Here to get detail overview of C++ Name Decoration.

The example for this calling convention had to be a little different. First, the code is compiled as C++, and not C. Second, we have a class/struct with a member function, instead of a global function.

class CSum
{
public:
      int Add ( int nValue1, int nValue2)
      {
           return nValue1+nValue2;
      }
};

The assembly code for the function call looks like this:

push 3
push 2
lea ecx,[sumObj]                 ; Object of CSum (this pointer)
call ?Add@CSum@@QAEHHH@Z         ; CSum::Add
mov DWORD PTR [nResult],eax

The function itself is given below:

; // function prolog
push ebp
mov ebp, esp
push ebx
push esi
push edi
; // return nValue1 + nValue2;
mov eax, DWORD PTR [nValue1]
add eax, DWORD PTR [nValue2]
; // function epilog
pop ebx
pop esi
pop edi
mov esp, ebp
pop ebp
;//Stack cleanup and return
ret 8

Now, what happens if we have a member function with a variable number of arguments? In that case, __cdecl is used, and this is pushed onto the stack last.

Conclusion

__thiscall calling convention is the default calling convention used by C++ member functions that do not use variable arguments.

Read More
Jul 28, 2012

Posted by | 0 Comments

Calling Convention – Part III (__fastcall)

Calling Convention – Part III (__fastcall)

Make sure you have read “Calling Convention – Part I” & “Calling Convention – Part II” of this article.

Fast calling convention ( __fastcall )

Fast calling convention indicates that the first two arguments should be placed in registers (ECX & EDX) and rest are pushed on stack. This reduces the cost of a function call, because operations with registers are faster than with the stack.

We can explicitly declare a function to use the __fastcall convention as shown:

int __fastcall Add( int nValue1, int nValue2 );

The main characteristics of __fastcall calling convention are:

  1. The first two function arguments that require 32 bits or less are placed into registers ECX and EDX. The rest of them are pushed on the stack from right to left.
  2. Arguments are popped from the stack by the called function.

Function Name Decoration For fastcall

Function name is decorated by by prefixing a ‘@’ character and postfixing a ‘@’ and number of bytes of stack space required by the arguments at end of the function name.

@Add@8 //@ added before & after function name and number of bytes space required on stack

Note: Microsoft have reserved the right to change the registers for passing the arguments in future compiler versions.

Here goes an example:

; // put the arguments in the registers EDX and ECX
mov         edx,3
mov         ecx,2
; // call the function
call @Add@8
; // copy the return value from EAX to a local variable (int nResult)
mov DWORD PTR [nResult], eax

The called function is shown below:

; // function prolog
push ebp
mov ebp, esp
push ebx
push esi
push edi
; // return nValue1 + nValue2;
mov eax, DWORD PTR [nValue1]
add eax, DWORD PTR [nValue2]
; // function epilog
pop ebx
pop esi
pop edi
mov esp, ebp
pop ebp
;//Stack cleanup and return
ret 8

Conclusion

Advantage

Advantage of __fastcall calling convention is that it attempts to put arguments in registers, rather than on the stack, thus making function calls faster.

Disadvantage

Disadvantage of __fastcall calling convention is that functions with variable number of arguments (like printf())  can’t use __fastcall. Instead they must use __cdecl, because it is the only calling convention who knows the number of arguments in each function call; therefore only the caller can perform the stack cleanup.

Final Part: Cont. Calling Convention – Part IV

Read More
Jul 27, 2012

Posted by | 2 Comments

Calling Convention – Part II (__stdcall)

Calling Convention – Part II (__stdcall)

Make sure you have read “Calling Convention – Part I” of this article.

Standard calling convention ( __stdcall )

This convention is usually used to call Win32 API functions.

Note: WINAPI is nothing but another name for__stdcall:

#define WINAPI __stdcall

We can explicitly declare a function to use the __stdcall convention:

int __stdcall Add( int nValue1, int nValue2 );

The main characteristics of __stdcall calling convention are:

  1. Arguments are passed from right to left, and placed on the stack.
  2. Stack cleanup is performed by the called function.

Function Name Decoration For stdcall

Function name is decorated by prefixing an underscore character ‘_’ and postfixing a ‘@’ character and number of bytes of stack space required by the arguments at end of the function name.

_Add@8 //underscore before function name & @ and number of bytes space required on stack

Now, take a look at an example of a __stdcall call:

; // push arguments to the stack, from right to left
push 3
push 2
; // call the function
call _Add@8
; // copy the return value from EAX to a local variable (int nResult)
mov DWORD PTR [nResult], eax

The called function is shown below:

; // function prolog
push ebp
mov ebp, esp
push ebx
push esi
push edi
; // return nValue1 + nValue2;
mov eax, DWORD PTR [nValue1]
add eax, DWORD PTR [nValue2]
; // function epilog
pop ebx
pop esi
pop edi
mov esp, ebp
pop ebp
;//Stack cleanup and return
ret 8

Conclusion

__stdcall is default calling convention for Win32 API’s.

Advantage

Advantage of __stdcall calling convention is that it creates smaller executables than __cdecl, in which the code for stack cleanup will be cleaned by called function.

Disadvantage

Disadvantage of __stdcall calling convention is that functions with variable number of arguments (like printf())  can’t use __stdcall. Instead they must use __cdecl, because it is the only calling convention who knows the number of arguments in each function call; therefore only the caller can perform the stack cleanup.

Cont. Calling Convention – Part III

Read More
Jul 26, 2012

Posted by | 0 Comments

Calling Convention – Part I (__cdecl)

Calling Convention – Part I (__cdecl)

Before reading, I assume that you must have sufficient knowledge of C/C++ & Assembly.

Introduction

During the process of learning C++ programming for Windows, you must have came across strange specifiers that sometime appear in front of function declarations, like __cdecl, __stdcall (WINAPI, CALLBACK), __fastcall, etc. These specifiers are called “Calling Conventions“.

What are the calling conventions?

When a function is called, the arguments are typically passed to it, and the return value is retrieved. A calling convention describes how the arguments are passed and values returned by functions. It also specifies how the function names are decorated.

Is it really necessary to understand the calling conventions to write good C programs?

No, not at all. However, it may be helpful with debugging. Also, it is necessary for linking C/C++ with assembly code.

How does it work?

No matter which calling convention is used, the following things will happen:

  1. All arguments are typically saved (pushed) on stack, but may also be in registers (__fastcall will be discussed in later post).
  2. Program execution jumps to the address of the called function.
  3. Inside the function, registers ESI, EDI, EBX, and EBP are saved on the stack. The part of code that performs these operations is called ‘Function Prologand usually is generated by the compiler.
  4.  The function-specific code is executed, and the return value is placed into the EAX register.
  5. Registers ESI, EDI, EBX, and EBP are restored from the stack. The piece of code that does this is called Function Epilog‘, and in most cases compiler generates it.
  6. Arguments are removed(popped) from stack. This operation is called stack cleanup and may be performed either inside the called function or by the caller, depending on the calling convention used.

As an example for the calling conventions (except for this (__thiscall)), we are going to use a simple function:

int Add( int nValue1, int nValue2 )
{
    return nValue1 + nValue2;
}

The call to this function will look like this:

int nResult = Add( 2, 3 );

Note: Remember to compile this example code as C. If you are compiling as C++ code use the example code below to avoid ‘C++ name decorations’ (C++ name decorations are beyond the scope of this post. Will be discussed in later posts). In this post I will explain ‘C  name decorations’.

#ifdef __cplusplus
extern "C" {
#endif
int Add( int nValue1, int nValue2 )
{
    return nValue1 + nValue2;
}
#ifdef __cplusplus
}
#endif

C calling convention ( __cdecl )

This convention is the default for C/C++ programs. If a project is set to use some other calling convention, we can still declare a function to use __cdecl:

int __cdecl Add( int nValue1, int nValue2 );

The main characteristics of __cdecl calling convention are:

  1. Arguments are passed from right to left, and placed on the stack.
  2. Stack cleanup is performed by the caller.

Function Name Decoration For cdecl 
Function name is decorated by prefixing it with an underscore character ‘_’.

_Add //underscore before function name

Now, take a look at an example of a __cdecl call:

; // push arguments to the stack, from right to left
push 3
push 2
; // call the function
call _Add
; // cleanup the stack by adding the size of the arguments to ESP register
add esp, 8
; // copy the return value from EAX to a local variable (int nResult)
mov DWORD PTR [nResult], eax

The called function is shown below:

; // function prolog
push ebp
mov ebp, esp
push ebx
push esi
push edi
; // return nValue1 + nValue2;
mov eax, DWORD PTR [nValue1]
add eax, DWORD PTR [nValue2]
; // function epilog
pop ebx
pop esi
pop edi
mov esp, ebp
pop ebp
ret

Conclusion

__cdecl is the default calling convention for C and C++ programs.

Advantage

The advantage of this calling convention is that it allows functions with a variable number of arguments to be used (e.g printf).

Disadvantage

The disadvantage is that it creates larger executables.

Cont. Calling Convention – Part II

Read More
Jun 21, 2012

Posted by | 1 Comment

Directly Accessing Virtual Table

Directly Accessing Virtual Table

Before reading this post, make sure you have a strong grip on C, C++ & little bit of Assembly Language. To understand the content of this topic, you’ll need to have basic understanding of what “Virtual Function Table” is.

A pointer is 32bit (4 bytes) in a 32-bit architecture and 64bit (8 bytes) in a 64-bit architecture. So all instances/objects of a class or class hierarchy, where we have a virtual table (declared one or more methods as virtual in class or it was declared in one of its base class), will have additional 4 bytes in them and 8 bytes in case of a 64-bit architecture.

This pointer is called virtual table pointer, sometimes ‘vptr’. In VC++ compiler, the objects will have a pointer named ‘__vfptr’ in them and in some other compiler it’s ‘__vptr_X’, where X is the class name.

Now, __vfptr is not directly accessible from your code. For example, if you write the following code you’ll get a compiler error as the __vfptr is not available for your use.

      //Creating object of class X and passing integer as parameter.
      X objX( 123 );

      //Trying to get __vfptr directly.
      objX.__vfptr;

However, if you compile and debug the code after commenting this line:

      objX.__vfptr;

you’ll see ‘objX.__vfptr’ in the variable watch window.

vfptr in watch window

vfptr in watch window

Okay, now we’d like to see how we can access the virtual table even if the compiler doesn’t want us to. The following code does that.

#include <iostream>

//A simple class
class X
{
private:
      //Integer data member
      int m_nValue;
public:
      //Constructor with type integer as parameter
      X( int nVal )
      {
            m_nValue = nVal;
      }

      //virtual function with no parameter & no return type
      virtual void Func1( void )
      {
            std::cout << "Virtual Func1 Called!" << std::endl;
      }

      //virtual function with one integer parameter & no return type
      virtual void Func2( int nParam )
      {
            std::cout << "Virtual Func2 Called With Parameter Value: " << nParam << std::endl;
      }

      //virtual function with no parameter & integer return type
      virtual int Func3( void )
      {
            return this->m_nValue;
      }

      //Destructor
      virtual ~X( )
      {
      }
};

int main( int argc, char* argv[] )
{
      //Creating object of class X as pointer and passing integer as parameter.
      X *pObj = new X( 123 );

      //Getting virtual table pointer of object pObj and assigning it to vfptr
      size_t *vfptr = *( reinterpret_cast< size_t** >( pObj ) );

      //Virtual Table is filled sequentially (from top to bottom), like
      //I have first declared Func1, so it will be on position '0'.
      //After that Func2, so it will be on position '1' on Virtual Table. And so on...
      //Now we will assign Function Addresses to its respective Function Pointers
      //that matches their declared function signature.
      //You can also use typedef for function pointers:
      //typedef void (__stdcall *FUNC1)( void );
      //FUNC1 Func1 = reinterpret_cast<FUNC1>(vfptr[0]);
      void (__stdcall *Func1)( void ) = (void (__stdcall *)(void))vfptr[0];
      void (__stdcall *Func2)( int nParam ) = (void (__stdcall *)(int))vfptr[1];
      int  (__stdcall *Func3)( void ) = (int (__stdcall *)(void))vfptr[2];

      //Before directly calling methods from virtual table, we first have to pass
      //this pointer.
      //Assigning this pointer to ECX register. VC++ compiler uses ECX register
      //to pass this pointer to its methods instead of pushing it on stack.
      __asm
      {
            mov ecx, pObj
      }

      //Calling Func1
      Func1( );

      //Again assigning this pointer. Hmm... Again assigning this pointer. Why?
      //Reason is EBX, ESI, EDI and EBP are preserved inside function call
      //and EAX, ECX, EDX are freely allowed to be overwritten. So there is
      //a fair chance that ECX register may have been overwritten so we are
      //again assigning this pointer to ECX register.
      __asm
      {
            mov ecx, pObj
      }

      //Calling Func2 and passing an integer as parameter.
      Func2( 123 );

      //Assigning this pointer. Same reason given above.
      __asm
      {
            mov ecx, pObj
      }

      //Calling Func3 and saving its return value in nRetVal.
      int nRetVal = Func3();

      std::cout << "Value returned by Func3: " << nRetVal << std::endl;
      return 0;
}

Note: This code will work only in VC++.

All the explaining is done in the code, the only question which may arise in your mind is why have I used ‘stdcall’ instead of ‘cdecl’ in following code:

      void (__stdcall *Func1)( void ) = (void (__stdcall *)(void))vfptr[0];
      void (__stdcall *Func2)( int nParam ) = (void (__stdcall *)(int))vfptr[1];
      int  (__stdcall *Func3)( void ) = (int (__stdcall *)(void))vfptr[2];

Well those who know Calling Convensions and know how ‘thiscall’ works have already figured out why I have used ‘stdcall’ calling convention here.
For others here is the explanation:
GCC ‘thiscall’ is similar to ‘cdecl’ but this pointer is pushed last onto the stack as if it would have been the first argument to the function. But VC++ ‘thiscall’ is similar to ‘stdcall’ but the this pointer passed in ECX. So that’s why I have used ‘stdcall’. Calling a function with the “wrong” convention will destroy the stack.

Related Posts Plugin for WordPress, Blogger...

Read More