Category: C

So I timed a short program with /Gd and /Gr

So I timed a short program with /Gd and /Gr

StopWatch timings
Image by Michal Jarmoluk from Pixabay

This was the follow up to yesterday’s post about seeing if changing the function calling convention, switching from stacked parameters to passing them in registers made a difference in execution time.

This was the program I used.

#include <stdio.h>
#include "hr_time.h"

int add(int a, int b, int c,int d,int e) {
	return a - b * 2 + c * 3 + d * 3 + e * 5;
}

int __cdecl main() {
	int total=0;
	stopWatch s;
	startTimer(&s);
	for (int i = 0; i < 10000000; i++) {
		total += add(i, 5, 6, i, 8);
	}
	stopTimer(&s);
	printf("Value = %d Time = %7f.5\n",total, getElapsedTime(&s));
}

Pretty similar to the one I did yesterday except with two more parameters in the add function and my Windows high-res timing code. I’ve extracted the two timing files (hr_time.h/.c) from the asteroids and it’s in the LearnC folder on GiHhub.

As before this was compiled as x86. Also I tried it first compiled as release. This means the optimizing compiler has its way and I got virtually identical for cdecl (/Gd), fastcall (/Gr) and even safecall (/Gz).

Disassembly of the machine code revealed that the optimizer had moved the function code inline in the for loop and this negated the call code. So I did it again in debug mode. Here there was a clear difference. The times for fastcall were 0.259 while the cdecl (the default) was 0.239 which is about an 8% speed increase. Safecall was roughly the same execution as cdecl. So the lesson seem to be don’t use fastcall.

I think I need a more complicated program which should be compiled in release mode but where optimization doesn’t transform the function into inline code. Perhaps making the function longer would do it so the function machine code would be too long to fit in a L1 cache.

Interestingly the release code execution time was 0.005557 seconds, almost 50 x faster than the debug time.

How much faster is C code compiled with fastcall?

How much faster is C code compiled with fastcall?

Setting fastcall in VS 2019Something I read about the other day was the __fastcall convention. In Visual Studio you enable this with the /Gr flag and in gcc (it’s __attribute__((fastcall)). For clang it’s fastcall but see this.

So what does fastcall do? It changes the calling convention, so instead of pushing parameters to a function on the stack, it passes them in the registers starting with ECX then EDX and so on.  Let’s look at an example.

#include <stdio.h>

int add(int a, int b, int c) {
	return a + b * 2 + c * 3;
}

int main() {
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
}

This is the disassembly code from VS 2019. I pressed F10 to start debugging then Debug => Windows => Disassembly to get the listing. Note thei is x86, ie 32-bit.

005B18DC  lea         edi,[ebp-0C0h]  
005B18E2  mov         ecx,30h  
005B18E7  mov         eax,0CCCCCCCCh  
005B18EC  rep stos    dword ptr es:[edi]  
005B18EE  mov         ecx,offset _9831A1D6_test@c (05BC003h)  
005B18F3  call        @__CheckForDebuggerJustMyCode@4 (05B131Bh)  
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
005B18F8  push        6  
005B18FA  push        5  
005B18FC  push        4  
005B18FE  call        _add (05B1023h)  
005B1903  add         esp,0Ch  
005B1906  push        eax  
005B1907  push        offset string "Add(4,5,6)=%d\n" (05B7B30h)  
005B190C  call        _printf (05B10D2h)  

Now if I build it after setting the /Gr flag. In Vs 2019, on the project property pages, click advanced then the Calling Convention and switch from cdecl (/Gd) to –fastcall (/Gr).

008118EC  lea         edi,[ebp-0C0h]  
008118F2  mov         ecx,30h  
008118F7  mov         eax,0CCCCCCCCh  
008118FC  rep stos    dword ptr es:[edi]  
008118FE  mov         ecx,offset _9831A1D6_test@c (081C003h)  
00811903  call        @__CheckForDebuggerJustMyCode@4 (081131Bh)  
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
00811908  push        6  
0081190A  mov         edx,5  
0081190F  mov         ecx,4  
00811914  call        @add@12 (0811276h)  
00811919  push        eax  
0081191A  push        offset string "Add(4,5,6)=%d\n" (0817B30h)  
0081191F  call        _printf (08110CDh) 

I’ve highlighted the differences in bold. However the function Add is also different as the fastcall version doesn’t have to pop parameters off the stack.

Note, to get this to compile I had to prefix main with __cdecl. Using /Gr means that every function in the program uses registers and that’s not allowed with main as it’s called from Windows end must use the cdecl (default stack passing) convention.

This is what main looks like now.

int __cdecl main() {

Notes

This is only for 32-bit. 64-bit code is done somewhat differently so possibly wouldn’t be that different. Next I have to write a program that does lots of function calls and use high precision timing to see how much of a difference it makes. To be continued.

C Tutorial thirteen published on allocating memory

C Tutorial thirteen published on allocating memory

Memory chips
Image by PublicDomainPictures from Pixabay

I’ve restricted this tutorial to using malloc as it’s the main way you allocate and use dynamic memory in the thirteenth tutorial.

Pointers hold an address (of somewhere in RAM) and this can be an existing variable, a function or even data like a text string. But if you want to reserve a block of RAM and get a pointer to it, you have to call the stdlib function malloc(). (Or calloc, but I will return to that in a future tutorial).

Of course once you’ve finished using a block of RAM, it’s only polite to return it to the operating system by calling free(). Don’t forget either that malloc always returns a void * pointer, so you should cast it to something appropriate.

 

C Tutorial twelve on function pointers published

C Tutorial twelve on function pointers published

Lots of pointers
Image by pencil parker from Pixabay

While function pointers are important. I don’t think they’re quite as important as pointers. C would just not be C without pointers. There are so many things that you would not be able to do if the language lacked pointers. Things like most data structures (try doing a linked list without pointers!) .

However function pointers give additional flexibility. You can pass them as parameters in functions and store them in variables.

These are the earlier tutorials on pointers:

And this is the new one:

 

Santa Paravia en Fiumaccio

Santa Paravia en Fiumaccio

Santa Paravia en FiumaccioI could understand you thinking “have I flipped my lid?”. What kind of a title is “Santa Paravia en Fiumaccio”? It’s actually the name of an old game written in C. A much more sophisticated Hammurabi if that makes sense.  You can read about it here on Wikipedia.

I was digging around the web looking for games in C with source code and came across a reference to it. I found it on archive.org though that version is a bit bashed up, All the < and > are displayed as their HTML equivalents – &lt; and &gt; so you need to do copy and replace on that. Plus a few of them have spaces between the & and the lt/gt!

That was about a thousand lines long. However I thought, why not do a search on the web specifically for it and found a cleaned up and slightly longer version on GitHub by DNSGeek. If you dig into his repository list you’ll see he’s also done a Python port with graphics! It also includes the instructions for playing it in a PDF.

If you try and compile the C source code, you’ll find that a curses library is missing. I’m not sure that it actually needs it so I commented out the include line and it didn’t seem to mind.  Under Visual Studio, it complains because of the strcpy functions that should be strcpy_s.

So I’ll fix the little things and see how ityplays. You just have to love a game that has functions with names like SerfsDecomposing() and SerfsProcreating()!

 

BBC Basic with SDL

BBC Basic with SDL

BBC Micro
From Wikipedia

Back in my games programming days I never had a BBC  micro, though I knew several people who did including one bloke who let people run their ROMs on his micro. Sneakily it made a copy of the ROM contents and saved it to disk. I remember wasting a lot of hours playing Elite on a Beeb back in 1984.

BBC BASIC for SDL 2.0 is a cross-platform implementation of the BBC BASIC programming language for Windows, Linux (x86), MacOS, Raspbian (Raspberry Pi), Android, iOS and Emscripten / WebAssembly by developer R.T. Russell. If you ever fancied writing BASIC programs and running them on a BBC Micro then you now can with this.

It is programmed in C (and I’ve added a link into the C code library) and you’ll find the multi-platform source on GitHub. A visit to the Complete BBC Games archive might also be in order!

 

Slay tutorial three published

Slay tutorial three published

Onslaught mapThis is a typical map produced by the generator. One large continent with coloured hexagons from 8 players arranged in clumps and individual hexes. It’s not quite perfect- in the top right corner there is a single blue hex but its not bad.

I’ve just published Slay tutorial three with the source code in the file onslaught2.zip on GitHub. I’m quite pleased with the map generator which is based on the one I devised for Empire and which I covered in an earlier blog post.  It does a lot and quick enough that when you press the N key it can generate a new map in a fraction of a second.  C + SDL2 is very fast even when drawing nearly a thousand hexagons every frame. It’s mostly in just one file (for now) with timing code and a data file for generating maps in separate files. The main file is just over 600 lines long.

As it needed a fair bit of debugging, I made it cross-compile in Windows or Linux (and probably Mac OS but that’s not tested). You can load the solution file in Windows with Visual Studio or put the files into a Folder with Visual Studio Code on Linux. Included in the zip file is the assets folder which has all the individual hex graphics and a .vscode folder with JSO files for doing the build with clang on Linux.  I’ve compiled it and run on both Windows and Linux. The SDL2 Window is 1300 x 768 pixels wide.

A look at a Raspberry Pi Pico

A look at a Raspberry Pi Pico

Raspberry Pi Pico
From Raspberrypi.org

As you probably know I do like my Raspberry Pi. But the RPi Pico is a different kettle of fish. I’m only mentioning it here because it is programmable in C/C++ and some may find it a less say overwhelming place to learn C than say a traditional Raspberry Pi.

What’s different between a Pico and a Pi 4B? A Pico uses a microcontroller- basically a CPU with built in RAM, bit of flash RAM, real time clock. RAM is tiny compared to any Pi. Just 264 KB (That’s still much more RAM than my CBM Vic-20 in 1981 with 3.5 KB of RAM!) and 2MB of Flash RAM. The CPU, an ARM CPU designed in the UK runs at clock speeds up to 133 MHZ. A Pi 4B runs at 1.5 GB, over 11x faster.

The biggest difference is that a Raspberry Pi runs any operating system you want. Microcontrollers are different. To run a program on a Pico you have to program it into Flash RAM first. You can do this with drag and drop. See here for C/C++. The Pico is an embedded system. RAM is used for data, stack etc but not the program which runs out of Flash RAM.

But if you like hardware then this is an excellent place to get started. You get all these (see here for Specifications).

  • 26 × multi-function GPIO pins
  • 2 × SPI, 2 × I2C, 2 × UART, 3 × 12-bit ADC, 16 × controllable PWM channels (PWM I’m guessing is pulse-width modulation).
  • Accurate clock and timer on-chip
  • Temperature sensor
  • Accelerated floating-point libraries on-chip
  • 8 × Programmable I/O (PIO) state machines for custom peripheral support

So what about games? Not really. Or at best very simple games using the single LED. No, this is about learning C (or C++ or even- shock – Python) and interfacing hardware.  You might for example put one of these inside a drone to provide control software.

Kilo- a thousand lines text editor in C

Kilo- a thousand lines text editor in C

Antirez kilo text editorDeveloped by Salvatore Sanfilippo aka antirez and licensed under the BSD 2 licence, kilo is a simple text editor in one file.

If you are learning C and want to see how to write a utility, this might be a good example to follow. Warning though he does use pointers so make sure you’ve learnt them first!

I had a stab at writing one quite a few years ago but it wasn’t very good. I have a suspicion that writing a good text editor depends upon you first creating a good implementation of the text storage. Solve that and it’s downhill for the rest.

I’ve added this to my curated library of C code, on the C Code link on the top menu.

 

Onslaught (aka Slay on Linux) tutorial two published

Onslaught (aka Slay on Linux) tutorial two published

Onslaught linux hexagons screenI’m quite pleased with this. It took about six hours in total to create including the time to create the graphics. Running in Hyper-V under Ubuntu 20.04, it draws a screenful of graphics in about 65 microseconds.

I took the hexagon drawing code from the AboutEmpire.zip code on GitHub and modernised it for SDL2. The Empire code uses Surfaces from SDL1 while this uses Textures from SDL2.

Orange hexagon Blue hexagon There are nine hexagons with all but the dark one having an internal border.

I think the orange and salmon hex look a bit too close, so I’ll change one of them.

The tutorial goes into a bit more depth. about the program (which is just over 200 lines long) and can be found on GitHub in the file Onslaught1.zip.