Tag: Timings

Is variable++ faster than ++variable?

Is variable++ faster than ++variable?

TimingsOne of the things I as told when I learnt C++ and then later C was that a post-inc (i.e. variable++) was faster than a pre-inc i.e. ++variable. Frankly I’m not sure if it is really true but its not a difficult thing to test.

Here’s a short program

#include <stdio.h>
#include "hr_time.h"

#define NUMLOOPS 100000000

int main() {
  stopWatch s;
  startTimer(&s);
  int j=0;
  for (int i=0;i<NUMLOOPS;i++){
    ++j;
  }
  stopTimer(&s);
  printf("PreInc = %10.5f\n",diff(&s));

  startTimer(&s);   
  j=0;
  for (int i=0;i<NUMLOOPS;i++){
    j--;
  }
  stopTimer(&s);
  printf("PostInc = %10.5f\n",diff(&s));  
}

You can get the siurce code including hr_time.h and .c from the timings.zip file on GitHub. I used VS Code with clang to build this on Ubuntu. Here is the tasks.json file to build it. It assumes that the file is in your workspace folder and creates a file called ex1. The timings.zip file contains the json files as well.

{
    "version": "2.0.0",
    "tasks": [
        {
            "type": "shell",
            "label": "clang build active file",
            "command": "/usr/bin/clang",
            "args": [
                "-g",
                "${file}","${workspaceFolder}/hr_time.c",              
                "-o",
                "${fileDirname}/ex1",                
                "-lm"
            ],
            "options": {
                "cwd": "/usr/bin"
            },
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

Ignore the first three runs which were for 10 million not 100 million. All do indeed show that post-inc is indeed faster. Not by a great margin but each of the last 100 million loops takes between 94% and 96% of the preinc time.

So I timed a short program with /Gd and /Gr

So I timed a short program with /Gd and /Gr

StopWatch timings
Image by Michal Jarmoluk from Pixabay

This was the follow up to yesterday’s post about seeing if changing the function calling convention, switching from stacked parameters to passing them in registers made a difference in execution time.

This was the program I used.

#include <stdio.h>
#include "hr_time.h"

int add(int a, int b, int c,int d,int e) {
	return a - b * 2 + c * 3 + d * 3 + e * 5;
}

int __cdecl main() {
	int total=0;
	stopWatch s;
	startTimer(&s);
	for (int i = 0; i < 10000000; i++) {
		total += add(i, 5, 6, i, 8);
	}
	stopTimer(&s);
	printf("Value = %d Time = %7f.5\n",total, getElapsedTime(&s));
}

Pretty similar to the one I did yesterday except with two more parameters in the add function and my Windows high-res timing code. I’ve extracted the two timing files (hr_time.h/.c) from the asteroids and it’s in the LearnC folder on GiHhub.

As before this was compiled as x86. Also I tried it first compiled as release. This means the optimizing compiler has its way and I got virtually identical for cdecl (/Gd), fastcall (/Gr) and even safecall (/Gz).

Disassembly of the machine code revealed that the optimizer had moved the function code inline in the for loop and this negated the call code. So I did it again in debug mode. Here there was a clear difference. The times for fastcall were 0.259 while the cdecl (the default) was 0.239 which is about an 8% speed increase. Safecall was roughly the same execution as cdecl. So the lesson seem to be don’t use fastcall.

I think I need a more complicated program which should be compiled in release mode but where optimization doesn’t transform the function into inline code. Perhaps making the function longer would do it so the function machine code would be too long to fit in a L1 cache.

Interestingly the release code execution time was 0.005557 seconds, almost 50 x faster than the debug time.

A test of resource loading in MonoGame

A test of resource loading in MonoGame

Ace of ClubsAce of DiamondsAce of HeartsAce of SpadesLoading resources into RAM on a smartphone can be a relatively slowish process. I’d been working on a card game and had both a set of 52 individual playing cards plus a sheet of 52 in one image. I have two Android phones so used the opportunity to test which is quicker loading 52 small images or one larger image. MonoGame ‘s Draw routines can copy from part of an image; you just specify the source.

In this case I was comparing loading 52 images, each 100 x 153 pixels (the four aces shown are the same images) or one image that is 1500 x 633. On the face of it, loading one image should be faster- there’s a certain overhead with loading each image so you get that x 52. But larger images can be problematic especially on devices like smartphones with less smaller amounts of RAM than desktops.

The two smartphones are both older, one is a BLU STUDIO ONE; the other a Chinese beast which shows up in Visual Studio as an Alps X27 Plus. It was bought on Ebay and claims to have 6 GB of RAm and 128 GB storage. In practice, when I tested the storage, it ran out after 12 GB! The phone only cost me £29 so I cn’t really complain…Although it claims to be Android 9.1 in the Developer information in Settings, it like the BLU seem to be running Android Lollipop. (5,1 – level 22) when plugged into my PC/Visual Studio.

I first ran the load code on the 52 images in Debug mode. Possibly not the best way to test code.

        private string LoadCards()

        {
            Stopwatch stopWatch = new Stopwatch();
            const string suits = "HCDS";
            const string ranks = "A23456789TJQK";
            stopWatch.Start();
            int i = 0;
            foreach (char c in suits)
            {
                int j = 0;
                foreach (char r in ranks)
                {
                    string s = @"Cards\"+r + c;
                    cards[i, j++] = Content.Load(s);
                }
                i++;
            }
            stopWatch.Stop();
            TimeSpan ts = stopWatch.Elapsed;

            string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
                ts.Hours, ts.Minutes, ts.Seconds,
                ts.Milliseconds / 10);
            return elapsedTime;
        }

That is the C# code to load the 52 images. The Alps which is a newer phone took 0.28 seconds to load them. The older BLU took 0.18. Neither are great times. To load the single image I commented the code out between the two stopWatch calls (Start() and Stop()) and then added this one line.

 allcards = Content.Load<Texture2D>("allcards");

The single image took 0.10 seconds on the Alps and 0.06 seconds on the BLU, both roughly three times faster loading one big image than loading all 52 images.  Information like this is useful and I’ll now create my own tools for building sprite sheets of multiple images.