How much faster is C code compiled with fastcall?

Setting fastcall in VS 2019Something I read about the other day was the __fastcall convention. In Visual Studio you enable this with the /Gr flag and in gcc (it’s __attribute__((fastcall)). For clang it’s fastcall but see this.

So what does fastcall do? It changes the calling convention, so instead of pushing parameters to a function on the stack, it passes them in the registers starting with ECX then EDX and so on.  Let’s look at an example.

#include <stdio.h>

int add(int a, int b, int c) {
	return a + b * 2 + c * 3;
}

int main() {
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
}

This is the disassembly code from VS 2019. I pressed F10 to start debugging then Debug => Windows => Disassembly to get the listing. Note thei is x86, ie 32-bit.

005B18DC  lea         edi,[ebp-0C0h]  
005B18E2  mov         ecx,30h  
005B18E7  mov         eax,0CCCCCCCCh  
005B18EC  rep stos    dword ptr es:[edi]  
005B18EE  mov         ecx,offset _9831A1D6_test@c (05BC003h)  
005B18F3  call        @__CheckForDebuggerJustMyCode@4 (05B131Bh)  
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
005B18F8  push        6  
005B18FA  push        5  
005B18FC  push        4  
005B18FE  call        _add (05B1023h)  
005B1903  add         esp,0Ch  
005B1906  push        eax  
005B1907  push        offset string "Add(4,5,6)=%d\n" (05B7B30h)  
005B190C  call        _printf (05B10D2h)  

Now if I build it after setting the /Gr flag. In Vs 2019, on the project property pages, click advanced then the Calling Convention and switch from cdecl (/Gd) to –fastcall (/Gr).

008118EC  lea         edi,[ebp-0C0h]  
008118F2  mov         ecx,30h  
008118F7  mov         eax,0CCCCCCCCh  
008118FC  rep stos    dword ptr es:[edi]  
008118FE  mov         ecx,offset _9831A1D6_test@c (081C003h)  
00811903  call        @__CheckForDebuggerJustMyCode@4 (081131Bh)  
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
00811908  push        6  
0081190A  mov         edx,5  
0081190F  mov         ecx,4  
00811914  call        @add@12 (0811276h)  
00811919  push        eax  
0081191A  push        offset string "Add(4,5,6)=%d\n" (0817B30h)  
0081191F  call        _printf (08110CDh) 

I’ve highlighted the differences in bold. However the function Add is also different as the fastcall version doesn’t have to pop parameters off the stack.

Note, to get this to compile I had to prefix main with __cdecl. Using /Gr means that every function in the program uses registers and that’s not allowed with main as it’s called from Windows end must use the cdecl (default stack passing) convention.

This is what main looks like now.

int __cdecl main() {

Notes

This is only for 32-bit. 64-bit code is done somewhat differently so possibly wouldn’t be that different. Next I have to write a program that does lots of function calls and use high precision timing to see how much of a difference it makes. To be continued.