Category: C

Mle – a small text editor in under 10,000 lines of code

Mle – a small text editor in under 10,000 lines of code

Mie text editorIf you’re into C, one of the most interesting applications you can write is a text editor. It demands ability to use pointers for storing the text efficiently and command handling and doing things like searching, handling Unicode.

It can be equally instructive reading code someone else has written and this case Mle, is a text editor in less than 10,000 lines of C. It’s also cross-platform apparently, though you’ll have to build it on the relevant platform.

It uses three other open source libraries, linked at the foot of the main page. They are uthash, termbox2 and PCRE2.

Everything and fsearch

Everything and fsearch

Everything File SearchI’m currently operating on a Linux laptop as the M2 SSD on my new desktop PC decided to stop working the other day.  But I thought I’d mention a couple of utilities that I’ve recently started using.

My new PC, just three months old when working has a 1TB M2 SSD with Windows on it. Most software is installed here. All the data, backups, everything else is stored on a 10 TB Hard disk. This is my 10th PC since 1989 so it has the name PC10. Yes, original I know.

I have been copying everything I’ve created and written since 1989 and like a snowball rolling down a hill it has grown. When I finished copying everything from my old PC; a process that took a large chunk of four days, I’d used 2.3 TB out of the 10 TB hard disk.   I just copied across the Gigabit network connection and it maxxed out at 113 MB/S. That’s even with anti-virus running. Of course Windows has an overhead so sometimes the transfer speed drops down to KB/S for small files. I worked it out that over the 48 hours (4 days- 12 hours a day), it copied at 13 MB/S average.

In future it might make sense to zip up folders with small files in them. In the meantime I need to do a bit of pruning of files, removing duplicates etc.

Everything

I’d been looking for a utility to let me find files quickly and somebody had suggested Everything. It’s brilliant. It tells me that I have just over 4 million files. I can filter on document types, images, music files or just search on matching names. And it is very very fast. If you want to find the massive file that’s eating up disk space or duplicates, it does it. But it’s only for Windows. Lacking a Windows system at the mo, I’ve just borrowed the image from the Everything home page.

Fsearch

FSearch window

Developer Christian Boxdörfer also liked Everything and decided to create a Unix clone of it which is FSearch. And it’s written in C. That’s it on the left.

I set it to find all C files and sorted by size.

 

Why I hate Assembly language…

Why I hate Assembly language…

6502 assembler listingI spent several years in the 1980s programming games.

I have a memory of 26 year old me sat hunched over a computer late at night back in 1985. I was working a 60-70 hour week as a partner in a games company. My current game was an American Civil War tactical wargame called Johnny Reb II. I was struggling with some ‘artificial intelligence’ code for the attackers (Confederate troops) to cross a bridge over a river. On the other side the defenders (Union) were trying to defend the bridge.

Johnny Reb II

Artificial Intelligence in games is a completely different thing from ML and Data Science nowadays. Back then it was just a control algorithm for troops reacting to the presence of enemy troops and working out the best routes, targets to attack, whether to retreat and so on.

What made it worse was that the whole thing was written in 6502 assembly language (and later converted to Z80). Back then you had two choices: Basic which was to be honest slow and clunky for writing games or assembly language. If I was doing it now, without a moments hesitation I’d program it in C. But C compilers for 6502 didn’t exist back then.

The Problem with assembly language

The problem with assembly language is (a) it’s slow to write. You can write 10 lines of C in the same time as ten lines of assembly code. Those ten lines of C will do far more than ten lines of assembly code. In 6502 all you are doing is moving values between registers or register <> memory. Maybe add a number or increment one of the three available registers A, X or Y. These were all 8-bit registers so you couldn’t even index easily through 64-bit memory. To do 16-bit indexing you stored the 16-bit address in two successive page-0 locations (addresses 0-255) and then used Y as an 8-bit index. You could do the same in page 0- memory with the X register.

(b). It takes a lot of code to do anything in assembly language. You want floating point arithmetic in 6502? Take a look.  I think Steve Wozniak wrote those for the Apple I/II. What we take for granted in languages like C# or Java or JavaScript is code for high level data structures like dictionaries. I’m sure it could be done but it takes a fair bit of programming. You don’t have those in assembly language; all you have to use is simple and not very long arrays.

In C# I wrote a program to read a 46 MB text file and produce a sorted count of all words in the file.  It used a Dictionary, took me 30 minutes to write and it ran in 5 seconds.  It would take weeks to do the same in assembler. 

6502 Page 0 locations were valuable because they made your code both shorter and faster.

I wrote a cross-assembler for 6502 in Z80 as a way to learn Z80. Assemblers use labels (L20, L30, L31 etc. in the screenshot) and I needed a way to hold them efficiently in memory. I ended up with a 26 x 26 index table of 2 byte pointers. If you had a label ‘ROUTE’ then there would be a pointer to a chain at the location for [‘R’][‘O’]. Each entry in the chain was like this

  • 1 byte length of rest of label (i.e. 3 for ‘UTE’) – 0 marks the end of the chain.
  • 3 bytes to hold ‘UTE’.
  • 2 byte address value

No need to hold the whole word as you know the first two letters. It also makes comparing a label against one in the table was faster because it only needed to match against len(label)-2 characters.

So the next value in the chain would start after that or be a 0 for the end of the chain. Yes most of the index table might be empty (all 26x26x2= 1352 bytes) but every label in a chain used 2 bytes less than the full label text. So with more than 676 labels you saved memory. Searching for a label was just a matter of walking a chain. Labels were just addresses; so a location could hold a value like a count. You’d identify it with a label and use that label in 6502 instructions. No variables in assembler; it’s all addresses…

With 6502 you need to do two passes to generate code. If you have a label in the first page of memory (0-255) then instructions are only two bytes long and are faster to execute than the three byte instructions. So on the first pass you don’t know if a LDA label will be 2 or 3 bytes long. After the first pass through though you do know now, so on the 2nd pass it can output the correct size instructions.

Programming in assembler means you have to write a lot of code and in the early days before I had a development machine that meant I had to save the source code to tape and compile it using a cartridge assembler. The CBM-64 could take cartridges and one of them stored assembly language in RAM just like Basic. If the game did something wrong then the CBM-64 would reset and you’d lose your source and have to reload it from the slow tape. Let’s hope you didn’t forget to save changes before you ran it. I spent a few hours gnashing my teeth over a persistent crash. I was calling a CLR routine when it should have been CLS! d’oh…

Note, from memory it was the Mikro cartridge assembler. See screenshot below.

Miro assembler start up screen.
Mikro Assembler Startup From Github

Jump Tables

So a game back then might be 5,000 lines of code or longer. That’s quite a bit to hold in memory, given that you need space for the game machine code, sprites, graphics etc. as well. Plus it’s wasteful having to recompile the same code over and over again. My cross-assembler did 250 lines per second but divide that by two for the two passes.

So I split up long files into smaller ones and created a jump table at the start. There was no linker so the code was loaded into RAM at fixed addresses. If you had five subroutines in one file then there’d be five jumps at the start to the actual function. And the files that called those functions just had a block of five calls at the start.

That way you didn’t have to worry exactly where the function was located in RAM so long as that file was always loaded at the same address.

 

Switching to Development Machines

It got easier when we switched to development machines. The CBM-64 had a parallel port as did the development machine (Tatung Einstein-a CP/M computer) so a little bit of handler code in the CBM-64 set up the CIA chip to wait for data sent down the parallel cable and put the code directly in RAM. It took no time to load the handler from tape after a crash and then send down the whole file.

Modern CPUs do all sort of optimizing tricks and that’s even before you use vectorization. Compiler writers know how to generate code that uses these tricks but it would take quite a while to learn them so you could use them in hand-written assembly.

Conclusion

Writing in assembler in the 80s was easy to learn. Nowadays I wouldn’t know where to start- the Intel and AMD CPUs have a lot of different chips in their families so there are variations in what instructions are available. Oh and don’t forget there’s ARM CPUs as well.

Writing in C (or even C++) is a lot easier to get into and I very much doubt if you’d get any better performance in writing things in assembly. Also, it would take a lot longer.

Another maze generator and solver in C

Another maze generator and solver in C

Solved mazeI liked this one; it compiled perfectly without any changes and ran perfectly. It produces a maze of the specified size with a route. That’s not bad for a program written over 20 years ago. By developer Joe Wingbermuehle. You can view the source code here.

It runs in a terminal, just supply width and height characters like this. I compiled it into a file ex1.

./ex1 15 15 s

If you provide the s parameter, it will solve it as the screenshot shows using <> for the solved route. off for just the maze.

How to encrypt text using Xor

How to encrypt text using Xor

Binary
Image by Gerd Altmann from Pixabay

This is not meant to offer protection, but if you want to say hide text by disguising it, then using Xor for reversible encryption will do the trick. It relies on the principle that if you Xor A and B to get C then you can Xor C and A to get B or Xor C and B to get A.

I wrote a short program and tutorials to demonstrate taking a single bit of text then disguising it. To make it more challenging, I only used Xor values from the range 0-255 that had four or more bits with 1 in it, for example 15 which is 00001111 in binary.

You can find the tutorial How to do Xor encryption in C. Please note this is only a very light weight encryption method so don’t use it for anything too important!

How to extend C (99) with a library

How to extend C (99) with a library

toolkitI’m always looking to improve my C code and one way to do this is through others efforts. Today I came across Zpl, a cross-platform header-only library.

The zpl.h file is a whopping 17,495 lines long!  It has code for macro helpers,  memory, collections, string, hashtable, file, memory streamer, print, time, random, sorting and miscellaneous.

Given the length, it would be difficult to make sense but the authors (Vladyslav Hrytsenko and Dominik Madarász from the Ukraine and Slovakia respectively) have provided a folder of example applications that use the library.

It looks a very impressive library and well worth a look.

Is variable++ faster than ++variable?

Is variable++ faster than ++variable?

TimingsOne of the things I as told when I learnt C++ and then later C was that a post-inc (i.e. variable++) was faster than a pre-inc i.e. ++variable. Frankly I’m not sure if it is really true but its not a difficult thing to test.

Here’s a short program

#include <stdio.h>
#include "hr_time.h"

#define NUMLOOPS 100000000

int main() {
  stopWatch s;
  startTimer(&s);
  int j=0;
  for (int i=0;i<NUMLOOPS;i++){
    ++j;
  }
  stopTimer(&s);
  printf("PreInc = %10.5f\n",diff(&s));

  startTimer(&s);   
  j=0;
  for (int i=0;i<NUMLOOPS;i++){
    j--;
  }
  stopTimer(&s);
  printf("PostInc = %10.5f\n",diff(&s));  
}

You can get the siurce code including hr_time.h and .c from the timings.zip file on GitHub. I used VS Code with clang to build this on Ubuntu. Here is the tasks.json file to build it. It assumes that the file is in your workspace folder and creates a file called ex1. The timings.zip file contains the json files as well.

{
    "version": "2.0.0",
    "tasks": [
        {
            "type": "shell",
            "label": "clang build active file",
            "command": "/usr/bin/clang",
            "args": [
                "-g",
                "${file}","${workspaceFolder}/hr_time.c",              
                "-o",
                "${fileDirname}/ex1",                
                "-lm"
            ],
            "options": {
                "cwd": "/usr/bin"
            },
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

Ignore the first three runs which were for 10 million not 100 million. All do indeed show that post-inc is indeed faster. Not by a great margin but each of the last 100 million loops takes between 94% and 96% of the preinc time.

Interesting gcc/clang extensions to C

Interesting gcc/clang extensions to C

C ExampleBoth gcc and clang support extensions to C and while i normally try and make things I write about work on Windows (i.e. Visual Studio), these are useful enough that I thought they deserve a mention. Yes I know you can run gcc/clang on Windows using Cygwin or MinGW, but for various reasons I prefer Visual Studio.

You can add a constructor and destructor functions to a C program; the constructor function runs before main() and the destructor after main().

The syntax is not exactly clean or obvious (those are double underscores before and after the word attribute like Python dunders!) but I got this program to compile/run with clang 10 on Ubuntu as the screenshot shows.  Here’s a listing. I called the two functions ctor and dtor but you can use anything.

#include <stdio.h>

__attribute__((constructor)) void ctor(void)
{
  printf("Constructor runs first\n");
}

__attribute__((destructor)) void dtor(void)
{
  printf("Destructor runs last\n");
}

int main() {
    printf("Main\n");
}

The output  is:

david@DavidPC:~/Projects/Examples$ ./ex1
Constructor runs first
Main
Destructor runs last
Looking at C/C++ extensions for VS Code.

Looking at C/C++ extensions for VS Code.

VS Code C++ extensionsI was curious as to how many C extensions there are for VS Code. If you visit the marketplace (not a great name- all are free-some market!) in a browser, you can search through the (currently) 24,779 available extensions.

Finding C extensions is not easy. A search for C returns almost 16,000 results. C++  is a better thing to search on and gives 207 results, many of which are for C and C++. You can also search in VS Code but it’s easier in a web browser.

Even that’s probably too much but you can use the showing pull down to see how many extensions are in the various categories. If for instance you select Debuggers, then you will only see 18 extensions.

VS Code Extensions showing

Note: As I’m still only my old creaky Ubuntu laptop, I had to use scrot for screen capture and gthumb for editing the image.  Note, the scrot project is looking for a programmer to look after it. Here is how to contribute to the project.

Another Minecraft game in C

Another Minecraft game in C

MinecraftIf you remember back in November I mentioned a Minecraft server that was written in C.  Well now there’s another one that has appeared. Just lIke the other one it uses SDL2 and OpenGL and includes full source code.  This one uses clang.

It’s cross-platform for Windows and Mac and there are two different binaries, one for creative mode and one for survival mode.

It’s still a work in progress and needs sound effects and music, saving and loading levels and multiplayer to complete it. If you are learning C and want to see how a game like this is programmed, download the source code from GitHub and start studying it.