Category: Techniques

There’s always someone inventing something better

There’s always someone inventing something better

JSON Logo
JSON Graphical Logo

In this case, I’m talking about better than JSON. JSON was invented as an alternative to XML which was invented as a way for computers to send data in a human readable form. Unfortunately XML is a pretty bloated format. Putting data into XML can make it five or six times larger.

Plus by the time XML was coming into common use in the mid-late 90s, there was a horrible letter soup of associated acronyms. It sort of appeared at the same time as Java and the whole XML ecosystem was lapped up by big business. JSON was born as an alternative method and it became very popular as JavaScript was growing at the same time, JSON is short for JavaScript Object Notation.

Now there’s MessagePack an alternative to JSON. It’s supposed to be faster and smaller than JSON. I came across it while looking at SignalR, a way to send data between clients and servers and used in Blazor and .NET websites.  It must have been around for a while as it was used back in 2011.

There’s now 50 programming language implementations including C at the bottom of the MessagePack home page although there’s over 100 but that includes several implementations for the same language.

If you have to move data between two computers, or maybe two processes on the same computer, you want it to be as small as possible and that’s what MessagePack allows.

 

Learning a programming language is not so easy

Learning a programming language is not so easy

Never stop learning
Image by Gerd Altmann from Pixabay

When I started, back in the dark ages there weren’t many programming languages about. It was a choice of BASIC, Fortran, C, Pascal and Cobol or some obscure languages apl, lisp, snobol etc. At Uni I learnt BASIC in first year then Pascal. We touched on Cobol in one course, enough to put me off it for life. We also did one semester on assembly language for a 6800 CPU. That was fun and probably helped me to learn 6502 and Z80 a few years later.,

Back then once I started learning other languages, it was long before the web existed. Programming languages came with manuals – user guides, reference guides. So you could learn enough to get started and then dip into the reference guide as and when you needed.

But since the Web appeared, the manuals no longer exist as printed books. But what I’ve noticed is, it gets harder to acquire a new technology if you are not working in it fulltime. I learnt PHP and HTML twenty years ago and reinforced that learning by creating websites. But now technologies like Blazor and ASP.NET MVC are quite a bit more complicated.  I’m doing  Udemy course on Blazor with over 200 lessons (most about 5 minutes long) and have only got up to lesson 55. Finding the time is probably one of the hardest things.

C is probably one of the easiest programming languages to learn but any other language or complicated technology is going to take a lot longer. I think It should be easier. There is an immense amount of free material on the web including sites to pose questions (StackOverflow), low cost courses (Udemy), free videos (Youtube) and yet it doesn’t seem easier.  Back pre-web you had to pay for programming languages. But unless you are a student or have a lot of spare time to study, it can be slow learning new stuff.

 

So I timed a short program with /Gd and /Gr

So I timed a short program with /Gd and /Gr

StopWatch timings
Image by Michal Jarmoluk from Pixabay

This was the follow up to yesterday’s post about seeing if changing the function calling convention, switching from stacked parameters to passing them in registers made a difference in execution time.

This was the program I used.

#include <stdio.h>
#include "hr_time.h"

int add(int a, int b, int c,int d,int e) {
	return a - b * 2 + c * 3 + d * 3 + e * 5;
}

int __cdecl main() {
	int total=0;
	stopWatch s;
	startTimer(&s);
	for (int i = 0; i < 10000000; i++) {
		total += add(i, 5, 6, i, 8);
	}
	stopTimer(&s);
	printf("Value = %d Time = %7f.5\n",total, getElapsedTime(&s));
}

Pretty similar to the one I did yesterday except with two more parameters in the add function and my Windows high-res timing code. I’ve extracted the two timing files (hr_time.h/.c) from the asteroids and it’s in the LearnC folder on GiHhub.

As before this was compiled as x86. Also I tried it first compiled as release. This means the optimizing compiler has its way and I got virtually identical for cdecl (/Gd), fastcall (/Gr) and even safecall (/Gz).

Disassembly of the machine code revealed that the optimizer had moved the function code inline in the for loop and this negated the call code. So I did it again in debug mode. Here there was a clear difference. The times for fastcall were 0.259 while the cdecl (the default) was 0.239 which is about an 8% speed increase. Safecall was roughly the same execution as cdecl. So the lesson seem to be don’t use fastcall.

I think I need a more complicated program which should be compiled in release mode but where optimization doesn’t transform the function into inline code. Perhaps making the function longer would do it so the function machine code would be too long to fit in a L1 cache.

Interestingly the release code execution time was 0.005557 seconds, almost 50 x faster than the debug time.

How much faster is C code compiled with fastcall?

How much faster is C code compiled with fastcall?

Setting fastcall in VS 2019Something I read about the other day was the __fastcall convention. In Visual Studio you enable this with the /Gr flag and in gcc (it’s __attribute__((fastcall)). For clang it’s fastcall but see this.

So what does fastcall do? It changes the calling convention, so instead of pushing parameters to a function on the stack, it passes them in the registers starting with ECX then EDX and so on.  Let’s look at an example.

#include <stdio.h>

int add(int a, int b, int c) {
	return a + b * 2 + c * 3;
}

int main() {
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
}

This is the disassembly code from VS 2019. I pressed F10 to start debugging then Debug => Windows => Disassembly to get the listing. Note thei is x86, ie 32-bit.

005B18DC  lea         edi,[ebp-0C0h]  
005B18E2  mov         ecx,30h  
005B18E7  mov         eax,0CCCCCCCCh  
005B18EC  rep stos    dword ptr es:[edi]  
005B18EE  mov         ecx,offset _9831A1D6_test@c (05BC003h)  
005B18F3  call        @__CheckForDebuggerJustMyCode@4 (05B131Bh)  
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
005B18F8  push        6  
005B18FA  push        5  
005B18FC  push        4  
005B18FE  call        _add (05B1023h)  
005B1903  add         esp,0Ch  
005B1906  push        eax  
005B1907  push        offset string "Add(4,5,6)=%d\n" (05B7B30h)  
005B190C  call        _printf (05B10D2h)  

Now if I build it after setting the /Gr flag. In Vs 2019, on the project property pages, click advanced then the Calling Convention and switch from cdecl (/Gd) to –fastcall (/Gr).

008118EC  lea         edi,[ebp-0C0h]  
008118F2  mov         ecx,30h  
008118F7  mov         eax,0CCCCCCCCh  
008118FC  rep stos    dword ptr es:[edi]  
008118FE  mov         ecx,offset _9831A1D6_test@c (081C003h)  
00811903  call        @__CheckForDebuggerJustMyCode@4 (081131Bh)  
	printf("Add(4,5,6)=%d\n", add(4, 5, 6));
00811908  push        6  
0081190A  mov         edx,5  
0081190F  mov         ecx,4  
00811914  call        @add@12 (0811276h)  
00811919  push        eax  
0081191A  push        offset string "Add(4,5,6)=%d\n" (0817B30h)  
0081191F  call        _printf (08110CDh) 

I’ve highlighted the differences in bold. However the function Add is also different as the fastcall version doesn’t have to pop parameters off the stack.

Note, to get this to compile I had to prefix main with __cdecl. Using /Gr means that every function in the program uses registers and that’s not allowed with main as it’s called from Windows end must use the cdecl (default stack passing) convention.

This is what main looks like now.

int __cdecl main() {

Notes

This is only for 32-bit. 64-bit code is done somewhat differently so possibly wouldn’t be that different. Next I have to write a program that does lots of function calls and use high precision timing to see how much of a difference it makes. To be continued.

An intriguing Matchbox based learning machine

An intriguing Matchbox based learning machine

Matchbox
Image by Horst Winkler from Pixabay

I mentioned Menace the other day. This was a “Learning Machine” implemented entirely with Matchboxes. This was back in the 1960s. The idea is that a simple game like noughts and crosses (aka Tic-tac-toe) can be played by a crude form of learning computer.

You need enough matchboxes for every possible position in the game. That’s 304 matchboxes and you start but putting four coloured beads into each box for every possible turn. These represent the possible moves. To make a move, you pick the box corresponding to the position shake it and draw a bead out and play that move.

And so on until the game is finished. Then if Menace (as it was called- Machine Educable Noughts And Crosses Engine) won, you add three beads to each winning Matchbox, put one back in if its a draw and remove all of that colour if it lost.  Gradually it becomes better at winning.

There’s a far detailed write up here. It’s an intriguing technique though possibly not the easiest to apply as it requires you to have a matchbox for every position in a game. Fine with noughts and crosses but impossible with chess. Instead of Matchboxes, it could be done way faster in software.

It needs games with determinate moves and not too many, though I suppose with RAM in modern PCs, a few million moves would be manageable. One possibility would be the L game. I’ve never heard of anyone doing that, and it would make for an interesting project. I’ll add it to my list of future projects. Now I wonder how many possible moves there are in a L- Game? It’s made more complex because you can optionally move one of the single pieces as well as the L-Shape which can be rotated and flipped and it’s a 4 x 4 board not 3 x 3 as in noughts and crosses. .

Doing 3D Math(s) in C

Doing 3D Math(s) in C

Donuts.cIf you’re interested in seeing how 3D math(s) are done, this article by ex Googler Andy Sloane is about a program he wrote called donut.c. The name is doubly appropriate as not only is the source code shaped like a Donut (doughnut for us Brits), when run on Linux, it outputs an animated 3D donut. This was an attempt at obfuscated C back in 2006.

Now 15 years later he’s updated it to work without using the math library. Interesting is the explanation of how he did all the mathematics without using sin and cos directly. Also it includes a version that uses 10 bit integer arithmetic; no floats at all. If you’re interested in understanding how Donuts.c works, read this post from 2011.

I like articles like this- they’re a bit different and add to programming knowledge. and you can never get enough of that

 

 

 

How to read a text file into memory in C

How to read a text file into memory in C

Words readAs part of the crossword grid packing utility, the first stage is building the list of words in a structure in RAM. To keep thing simple I’m going to use an array of char * pointers. The array is limited to a maximum 20 words. If any more are read then they will be discarded.

Just to prove that it worked, I wrote this in Visual Studio and ran it in the debugger. You can see elements 0-9 in the screenshot with the source code in the background.

This is the listing of the program. Note, it doesn’t have a function to free up the memory dynamically allocated for each of the words.

You might wonder about these lines (25 and 26).

	if (line[len-1] == '\n')
		line[--len] = '
	if (line[len-1] == '\n')
line[--len] = '\0';
';

This is needed because when you use fgets to read a line of text it includes a line feed at the end. So the first line in the file Engine is actually 7 characters long with a terminating \n. The malloc on line 27 allocated enough RAM for len + 1 to allow for the terminating 0 at the end of each string. In line 33, if you are on Linux replace strcpy_s with strncpy. Both are safe versions of strcpy. Here’s the full listing.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define MAXWORDS 20
#define MAXWORDLENGTH 32

typedef char* pWord;
pWord allwords[MAXWORDS];

int ReadWords(char * filename) {
	memset(allwords, 0, sizeof(allwords)); // clear it
	char line[MAXWORDLENGTH*2]; // double max size just in case
	FILE* fwords;
	int errnum = fopen_s(&fwords,filename, "rt");
	if (!fwords) {
		printf("Missing word file %s", filename);
	}
	int wordIndex = 0;
	while (fgets(line, MAXWORDLENGTH, fwords)) {
		int len = strlen(line);
		if (!len)
			break;
		if (line[len-1] == '\n')
			line[--len] = '
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#define MAXWORDS 20
#define MAXWORDLENGTH 32
typedef char* pWord;
pWord allwords[MAXWORDS];
int ReadWords(char * filename) {
memset(allwords, 0, sizeof(allwords)); // clear it
char line[MAXWORDLENGTH*2]; // double max size just in case
FILE* fwords;
int errnum = fopen_s(&fwords,filename, "rt");
if (!fwords) {
printf("Missing word file %s", filename);
}
int wordIndex = 0;
while (fgets(line, MAXWORDLENGTH, fwords)) {
int len = strlen(line);
if (!len)
break;
if (line[len-1] == '\n')
line[--len] = '\0';
pWord pNewWord = (pWord)malloc(len + 1);    // line 27
if (!pNewWord) {
printf("Error allocating memory");
break;
}
if (wordIndex < MAXWORDS) { // can add to list
strcpy_s(pNewWord, len+1, line);  // line 33
allwords[wordIndex++] = pNewWord;
}
}
fclose(fwords);
return wordIndex; // = Count of words
}
int main() {
int numWordsread = ReadWords("words.txt");
}
'; pWord pNewWord = (pWord)malloc(len + 1); // line 27 if (!pNewWord) { printf("Error allocating memory"); break; } if (wordIndex < MAXWORDS) { // can add to list strcpy_s(pNewWord, len+1, line); // line 33 allwords[wordIndex++] = pNewWord; } } fclose(fwords); return wordIndex; // = Count of words } int main() { int numWordsread = ReadWords("words.txt"); }
Fun with optimisation in C

Fun with optimisation in C

Inner cogs
Image by ar130405 from Pixabay

Bit twiddling with C is quite a common thing but sometimes it can be hard to understand something quite simple.  Here’s a line of code. Can you say what it does? Note that a,b and c are all ints.

c = b ^ ((a ^ b) & -(a < b));

Or how about this which looks quite similar

c = a ^ ((a ^ b) & -(a < b));

The first line puts the minimum of a and b into c, while the second line puts the max of a and b. Both came from a page of bit twiddling hacks by Sean Eron Anderson at Stanford university. The significant thing about these is not that they provide two clever if incomprehensible ways of calculating min and max but they do it without branches.

If you look at the source of min and max in stdlib.h they are defined like this:

#define max(a,b) (((a) > (b)) ? (a) : (b))
#define min(a,b) (((a) < (b)) ? (a) : (b))

Because modern CPUs use instruction caches (pipelines), they try to keep them full through a technique called branch prediction. If a branch happens then the CPU has to read instructions from the wherever the branch goes to and this causes a pipeline stall which slightly slows down processing. So branchless min and max should run faster than branching code, at least in theory.

I tried it on Windows using a small Visual C++ (Well C) program doing the four in a loop 10 million times. I also ran it in 32-bit and 64-bit.  All runs were done in release mode which lets the compiler optimise.

Curiously the 32-bit took almost no time at all while the 64-bit took a lot lot longer. There wasn’t much difference between branchless and branch. Possibly the code is too small and the benefits of branchless are less significant because the instructions fall inside the instruction cache for 32-bit while 64-bit may be too big and so is affected by instruction cache misses.

This is the 32-bit time.

fastMin 0.0000000
fastMax 0.1000000
Min 0.1000000
Max 0.1000000

And this is the 64-bit. Quite a lot of difference!

fastMin 1757.6000000
fastMax 1677.9000000
Min 2039.4000000
Max 2134.4000000

This is the code which compiles in Visual C++. You can get the hr_time timing code (for WIndows) from the sdlttf.zip file. I increment and decrement a and b to stop the compiler possibly spotting that the values would be invariant otherwise and doing too-clever optimisation stuff.

#include <math.h>
#include <stdio.h>
#include "hr_time.h"

#define NUMLOOPS 10000000
	int a = 1000;
	int b = 67;
	int c;
	double t;
	stopWatch s;

int main() {

	startTimer(&s);
	for (int i = 0; i < NUMLOOPS; i++) {
		a++;
		b++;
		c = b ^ ((a ^ b) & -(a < b));
	}
	stopTimer(&s);
	t = getElapsedTime(&s) * 1000000.0;
	printf( "fastMin %8.7f\n", t);
	startTimer(&s);
	for (int i = 0; i < NUMLOOPS; i++) {
		a--;
		b--;
		c = a ^ ((a ^ b) & -(a < b));
	}
	stopTimer(&s);
	t = getElapsedTime(&s) * 1000000.0;
	printf("fastMax %8.7f\n", t);
	startTimer(&s);
	for (int i = 0; i < NUMLOOPS; i++) {
		a++;
		b++;
		c = min(a, b);
	}
	stopTimer(&s);
	t = getElapsedTime(&s) * 1000000.0;
	printf("Min %8.7f\n", t);
	startTimer(&s);
	for (int i = 0; i < NUMLOOPS; i++) {
		a--;
		b--;
		c = max(a, b);
	}
	stopTimer(&s);
	t = getElapsedTime(&s) * 1000000.0;
	printf("Max %8.7f\n", t);
}

Learning C and SDL

Learning C and SDL

Learning
Image by Gerd Altmann from Pixabay

I frequent the Reddit C subreddit and Quora.com and quite often see people asking what they should learn first. So I’ve updated the C tutorials page (link at top) with details of future tutorials as well as a new one on functions. This includes the entire course (though some are not yet written).  It has more details about each lesson and shows the whole course,

There’s considerable overlap with material in my first EBook though that was for Windows only and this is for Windows, Linux and Raspberry Pi. The Pi of course is Linux with some hardware specific stuff and I’ll include links back to previous blog entries which have explained how to get and display the Pi’s temperature in a C program or use game pads.

Unlike other C courses it will include SDL2 code and how to use it. I think C + SDL2 is a great way to make 2D games, and is probably the easiest and simplest to learn.

 

The joys of SSL certs

The joys of SSL certs

Certificate ass u meThere’s a saying “Don’t assume- it makes an ass of U and me“, and I er fell foul of this a month ago. A couple of months ago I setup a cheap VPS. It was one of those that you pay every month. What I didn’t realise was you are explicitly meant to renew the hosting every month and they send you an email with a link. Of course what did I do?, I er forgot to renew it. Annoyingly, I’d installed Virtualmin, redirected a domain and bought a cheap SSL certificate. All lost.

Now I actually did something right and there’s a lesson here. When you setup a SSL certificate, you create a CSR (Certificate Signing Request) and a Private key. You upload the CSR, pay your money (£20 for four years) and get a certificate back. The hosting companies I’ve used provide a SCR creation facility and somewhere to paste the private key and certificate when you get it. Then you click a button and your website now has a working SSL. It couldn’t be easier.

So luckily for me I had made a backup copy of my private key and was able to download the certificate. I’ve setup a completely new VPS, redirected the domain and very nervously pasted in the cert and private key. It worked. I had been bothered that the CSR was generated on a different server but it doesn’t seem to matter. So long as you have the private key the certificate works on a different server.