Simple but effective optimisation in C
Here’s a short program. It repeats a loop twice, indexing through a text string a million times adding up the value. Instead of starting the total at 0, I set it to the outer loop index, to try and reduce the scope for immediate optimisation.
#include<stdio.h>
#include <string.h>
#include "hr_time.h"
stopWatch s;
char* testString = "This is a rather long string just to prove a point";
int main()
{
int total = 0;
startTimer(&s);
for (int i = 0; i < 1000000; i++) {
total = i;
for (int index = 0; index < (int)strlen(testString); index++) {
total += testString[index];
}
}
stopTimer(&s);
printf("Total =%d Took %8.5f\n", total, getElapsedTime(&s));
startTimer(&s);
int len = (int)strlen(testString);
for (int i = 0; i < 1000000; i++) {
total = i;
for (int index = 0; index < len; index++) {
total += testString[index];
}
}
stopTimer(&s);
printf("Total =%d Took %8.5f\n",total, getElapsedTime(&s));
return 0;
}
I compiled and ran it twice, once in Debug and once in Release mode on Windows using MSVC.
Debug:
Total =1004673 Took 0.55710 Total =1004673 Took 0.11465
Release
Total =1004673 Took 0.00762 Total =1004673 Took 0.00765
Clearly in Release compilation, the compiler is smart enough to realise that it can optimise strlen(testString) away so there’s no difference between the two times. But in debug it’s clear that calling strlen() inside a for loop is relatively slow.
I compiled it and ran it with clang on Ubuntu 20.04 in the Hyper-V VM. The times with default optimization were
0.18370 0.10644
and with “-O3” added to the compile line for maximum optimisation,. this changed to
0.0762 0.0745
which is almost identical to the Windows release compile times.