Why I hate Assembly language…

6502 assembler listingI spent several years in the 1980s programming games.

I have a memory of 26 year old me sat hunched over a computer late at night back in 1985. I was working a 60-70 hour week as a partner in a games company. My current game was an American Civil War tactical wargame called Johnny Reb II. I was struggling with some ‘artificial intelligence’ code for the attackers (Confederate troops) to cross a bridge over a river. On the other side the defenders (Union) were trying to defend the bridge.

Johnny Reb II

Artificial Intelligence in games is a completely different thing from ML and Data Science nowadays. Back then it was just a control algorithm for troops reacting to the presence of enemy troops and working out the best routes, targets to attack, whether to retreat and so on.

What made it worse was that the whole thing was written in 6502 assembly language (and later converted to Z80). Back then you had two choices: Basic which was to be honest slow and clunky for writing games or assembly language. If I was doing it now, without a moments hesitation I’d program it in C. But C compilers for 6502 didn’t exist back then.

The Problem with assembly language

The problem with assembly language is (a) it’s slow to write. You can write 10 lines of C in the same time as ten lines of assembly code. Those ten lines of C will do far more than ten lines of assembly code. In 6502 all you are doing is moving values between registers or register <> memory. Maybe add a number or increment one of the three available registers A, X or Y. These were all 8-bit registers so you couldn’t even index easily through 64-bit memory. To do 16-bit indexing you stored the 16-bit address in two successive page-0 locations (addresses 0-255) and then used Y as an 8-bit index. You could do the same in page 0- memory with the X register.

(b). It takes a lot of code to do anything in assembly language. You want floating point arithmetic in 6502? Take a look.  I think Steve Wozniak wrote those for the Apple I/II. What we take for granted in languages like C# or Java or JavaScript is code for high level data structures like dictionaries. I’m sure it could be done but it takes a fair bit of programming. You don’t have those in assembly language; all you have to use is simple and not very long arrays.

In C# I wrote a program to read a 46 MB text file and produce a sorted count of all words in the file.  It used a Dictionary, took me 30 minutes to write and it ran in 5 seconds.  It would take weeks to do the same in assembler. 

6502 Page 0 locations were valuable because they made your code both shorter and faster.

I wrote a cross-assembler for 6502 in Z80 as a way to learn Z80. Assemblers use labels (L20, L30, L31 etc. in the screenshot) and I needed a way to hold them efficiently in memory. I ended up with a 26 x 26 index table of 2 byte pointers. If you had a label ‘ROUTE’ then there would be a pointer to a chain at the location for [‘R’][‘O’]. Each entry in the chain was like this

  • 1 byte length of rest of label (i.e. 3 for ‘UTE’) – 0 marks the end of the chain.
  • 3 bytes to hold ‘UTE’.
  • 2 byte address value

No need to hold the whole word as you know the first two letters. It also makes comparing a label against one in the table was faster because it only needed to match against len(label)-2 characters.

So the next value in the chain would start after that or be a 0 for the end of the chain. Yes most of the index table might be empty (all 26x26x2= 1352 bytes) but every label in a chain used 2 bytes less than the full label text. So with more than 676 labels you saved memory. Searching for a label was just a matter of walking a chain. Labels were just addresses; so a location could hold a value like a count. You’d identify it with a label and use that label in 6502 instructions. No variables in assembler; it’s all addresses…

With 6502 you need to do two passes to generate code. If you have a label in the first page of memory (0-255) then instructions are only two bytes long and are faster to execute than the three byte instructions. So on the first pass you don’t know if a LDA label will be 2 or 3 bytes long. After the first pass through though you do know now, so on the 2nd pass it can output the correct size instructions.

Programming in assembler means you have to write a lot of code and in the early days before I had a development machine that meant I had to save the source code to tape and compile it using a cartridge assembler. The CBM-64 could take cartridges and one of them stored assembly language in RAM just like Basic. If the game did something wrong then the CBM-64 would reset and you’d lose your source and have to reload it from the slow tape. Let’s hope you didn’t forget to save changes before you ran it. I spent a few hours gnashing my teeth over a persistent crash. I was calling a CLR routine when it should have been CLS! d’oh…

Note, from memory it was the Mikro cartridge assembler. See screenshot below.

Miro assembler start up screen.
Mikro Assembler Startup From Github

Jump Tables

So a game back then might be 5,000 lines of code or longer. That’s quite a bit to hold in memory, given that you need space for the game machine code, sprites, graphics etc. as well. Plus it’s wasteful having to recompile the same code over and over again. My cross-assembler did 250 lines per second but divide that by two for the two passes.

So I split up long files into smaller ones and created a jump table at the start. There was no linker so the code was loaded into RAM at fixed addresses. If you had five subroutines in one file then there’d be five jumps at the start to the actual function. And the files that called those functions just had a block of five calls at the start.

That way you didn’t have to worry exactly where the function was located in RAM so long as that file was always loaded at the same address.


Switching to Development Machines

It got easier when we switched to development machines. The CBM-64 had a parallel port as did the development machine (Tatung Einstein-a CP/M computer) so a little bit of handler code in the CBM-64 set up the CIA chip to wait for data sent down the parallel cable and put the code directly in RAM. It took no time to load the handler from tape after a crash and then send down the whole file.

Modern CPUs do all sort of optimizing tricks and that’s even before you use vectorization. Compiler writers know how to generate code that uses these tricks but it would take quite a while to learn them so you could use them in hand-written assembly.


Writing in assembler in the 80s was easy to learn. Nowadays I wouldn’t know where to start- the Intel and AMD CPUs have a lot of different chips in their families so there are variations in what instructions are available. Oh and don’t forget there’s ARM CPUs as well.

Writing in C (or even C++) is a lot easier to get into and I very much doubt if you’d get any better performance in writing things in assembly. Also, it would take a lot longer.