How much faster is C code compiled with fastcall?
Something I read about the other day was the __fastcall convention. In Visual Studio you enable this with the /Gr flag and in gcc (it’s __attribute__((fastcall)). For clang it’s fastcall but see this.
So what does fastcall do? It changes the calling convention, so instead of pushing parameters to a function on the stack, it passes them in the registers starting with ECX then EDX and so on. Let’s look at an example.
#include <stdio.h>
int add(int a, int b, int c) {
return a + b * 2 + c * 3;
}
int main() {
printf("Add(4,5,6)=%d\n", add(4, 5, 6));
}
This is the disassembly code from VS 2019. I pressed F10 to start debugging then Debug => Windows => Disassembly to get the listing. Note thei is x86, ie 32-bit.
005B18DC lea edi,[ebp-0C0h] 005B18E2 mov ecx,30h 005B18E7 mov eax,0CCCCCCCCh 005B18EC rep stos dword ptr es:[edi] 005B18EE mov ecx,offset _9831A1D6_test@c (05BC003h) 005B18F3 call @__CheckForDebuggerJustMyCode@4 (05B131Bh) printf("Add(4,5,6)=%d\n", add(4, 5, 6)); 005B18F8 push 6 005B18FA push 5 005B18FC push 4 005B18FE call _add (05B1023h) 005B1903 add esp,0Ch 005B1906 push eax 005B1907 push offset string "Add(4,5,6)=%d\n" (05B7B30h) 005B190C call _printf (05B10D2h)
Now if I build it after setting the /Gr flag. In Vs 2019, on the project property pages, click advanced then the Calling Convention and switch from cdecl (/Gd) to –fastcall (/Gr).
008118EC lea edi,[ebp-0C0h] 008118F2 mov ecx,30h 008118F7 mov eax,0CCCCCCCCh 008118FC rep stos dword ptr es:[edi] 008118FE mov ecx,offset _9831A1D6_test@c (081C003h) 00811903 call @__CheckForDebuggerJustMyCode@4 (081131Bh) printf("Add(4,5,6)=%d\n", add(4, 5, 6)); 00811908 push 6 0081190A mov edx,5 0081190F mov ecx,4 00811914 call @add@12 (0811276h) 00811919 push eax 0081191A push offset string "Add(4,5,6)=%d\n" (0817B30h) 0081191F call _printf (08110CDh)
I’ve highlighted the differences in bold. However the function Add is also different as the fastcall version doesn’t have to pop parameters off the stack.
Note, to get this to compile I had to prefix main with __cdecl. Using /Gr means that every function in the program uses registers and that’s not allowed with main as it’s called from Windows end must use the cdecl (default stack passing) convention.
This is what main looks like now.
int __cdecl main() {
Notes
This is only for 32-bit. 64-bit code is done somewhat differently so possibly wouldn’t be that different. Next I have to write a program that does lots of function calls and use high precision timing to see how much of a difference it makes. To be continued.