C64 assembly programming in 2020
Mostly for myself, since this machine is super well documented. I had some false starts, wanting to do everything the hard way, first wrote a text-editor in basic, on a C64, and then started writing an assembler, but realized that my basic skills are too weak and I gave up.. My plan had been to bootstrap everything, from editor, to assembler, to sprite editor, and THEN use those to program some simple game "because".. but, yeah, I don't have that kind of time anymore.
So instead, I'm going to learn how to program in a bit more modern environment, I'll try not to take TOO much advantage of the emulator capabilities, but, to be honest, having an Internet connection and targeting what is probably the most well-documented machine ever, EZ mode is perma-enabled.. I'd have liked to have the experience of figuring out this stuff "the authentic way" but how do you deprive yourself of just enough information while still making progress?
So for my learning experience, I'm choosing to make use of the documentation available, and to use a modern editor and graphics tools, and then cross-assemble and run in the emulator (EZ mode indeed)
Index
Memory map
If you're used to modern development, think of the memory map as a kind of API overview, it shows a lot about what functionality is available in the machine, and gives a nice entrypoint for investigating further.
There are plenty of them, and they all seem to have their strength and weaknesses, I've not yet found "the perfect one" if it exists.. I'll maybe try to make my own. What I would really like, is a memory map where I can visually see which block a given address is inside, and how relatively big or small that block is, and what it is often used for by people who know what they're doing. If there is IO or roms, I'd like to see what they are. Ideally, if something is a pointer, for example, the "sprite pointers", I'd like to be able to expand and get some explaination of what the default values (if any) are, and how the pointers are used. If it's a register, I'd like to know what the bits do and how they're initially set.
By far the best memory map I've found so far is https://sta.c64.org/cbm64mem.html there is a nice graphic on C64-Wiki.com but it lacks some of the details.
Instruction set
So, the memory map is kind of this nice overview of what stuff is in the machine, but the instruction set is a listing of all the cool stuff you can actually write.
Development environment
I'm using Kick Assembler for cross-assembly. It's a java program and can be easily run from the command line. I chose to make a small script called "kickass" and place in my $PATH so that calling the assembler is as easy as possible.
kickass: #!/bin/bash java -jar /home/dusted/code/c64/kickass/KickAss.jar $@
And that's that, to assemble a file, I just do:
~$ kickass awesome.asm
For editor, I use vim with this kickassembler syntax file, there's also a kickassembler extension for Visual Studio Code.
I could assemble the code, copy the PRG file to my Ultimate64 or to a floppy or tape, but that takes a long time, and instead, I'll just use x64 from VICE, as it can even take the prg file as an argument:
~$ kickass awesome.asm && x64 awesome.prg
This tries to assemble the program, and if the assembler didn't return error, runs the program in VICE.
Hello World 1.0
Before writing hello world, you need to know a few things about the C64 specifically. First, the machine is not as primitive as the Atari2600, you don't need to control the raster beam directly to get something on the screen. The C64 has multiple screen modes, the default one is text mode, which is perfect for hello world. In text-mode, the VIC (Video Interface Chip) looks somewhere ($0400 to $07ff)) in memory for what characters to put on screen, somewhere else (the character ROM at $D000 to $DFFF) for the pixel pattern defining those characters and a third place ($D800 to $DBFF)for the color of those pixels for that character (probably not in that order, but that doesn't matter).
So, in order to put hello world on the screen, even before figuring out which instructions to use, you must consult the memory map, figure out the (default) memory location of the screen and of the colors. Note that at boot, the character colors for the blank part of the screen is actually set to the background color, so you can't even see the letters you're putting there without changing their color too (unless you change the background color). Below is a simple hello world program I wrote:
//world.asm BasicUpstart2(start) // Macro that outputs a bit of BASIC program to SYS into our assembly code // Define some constants .const bgcolor = $D021 .const screen = $0400 .const color = $D800 // Store our program in "upper memory" above BASIC (there is a 4k unused block) *= $C000 start: // So far, no actual instructions/output has been produced (except for the BASIC program) // but this label gets the value of the next outputted byte, and that byte is going to // be located at $C000, so, this label gets $C000. // Set background color black lda #0 // Load 0 into register A (0 = black) sta bgcolor // Store value of register a (the 0) into the memory location // that holds the background color ($D021) // clear screen (set every character to a space) ldx #0 lda #' ' loop: // There are 1000 bytes of characters sta screen,x sta screen+256,x sta screen+512,x sta screen+768-24,x// start this a bit earlier so we don't overwrite those 24 bytes not on screen inx bne loop // Print the string to the screen ldx #0 // Let's use the X register as the index into our text string. stringLoop: lda txt,x // Load the character at txt+x bytes beq exit // If lda set zero bit we loaded the "null" byte after the string and we exit. sta screen,x // Else put it on the screen. lda #2 // Load 2 (the color red) into the A register sta color,x // And put it at the corrosponding color-memory. inx // Increase the index jmp stringLoop // and repet //return to basic exit: lda #1 // But first, move cursor to line 1 (the line after the one we wrote) sta $00d6 // We just do this so the READY prompt appears after our own message.. rts // Store a zero terminated string txt: .text "hello world" .byte 0
And here's a hexdump of the resulting world.prg file:
00000000 01 08 0c 08 0a 00 9e 34 39 31 35 32 00 00 00 00 |.......49152....| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0000b800 00 a9 00 8d 21 d0 a2 00 a9 20 9d 00 04 9d 00 05 |....!.... ......| 0000b810 9d 00 06 9d e8 06 e8 d0 f1 a2 00 bd 30 c0 f0 0c |............0...| 0000b820 9d 00 04 a9 02 9d 00 d8 e8 4c 1a c0 a9 01 85 d6 |.........L......| 0000b830 60 08 05 0c 0c 0f 20 17 0f 12 0c 04 00 |`..... ......| A closer examination of the bytes in the PRG file. Note that C64 uses little-endian byte-ordering. 01 08 = $0801 When a prg file is loaded with ",1" at the end of the LOAD command ( load "world",8,1 ) the the first two bytes are used by the KERNAL as the first address of memory at which to place the rest of the file. That means that the first two bytes are not descibing anything to be but in memory, but rather, where to put the rest of the file. So, when loading this program, the KERNAL will place the first byte ( 0c ) in memory location $0801, and the next byte ( 08 ) in location $0802, and so on. $0801 is the first usable byte of BASIC memory, so, the bytes will be loaded into basic memory. That is because the BasicUpstart2 macro at the beginning of our assembler listing, generated this load address and a little BASIC program for us, the content of the BASIC program is: "10 SYS 49152" Let's take a look at the next two bytes of our prg file, these are the bytes loaded into $0801 and $0802. 0c 08 = $080c BASIC programs are stored in memory in the following format: PP PP LL LL TT NN PP PP ... | | | | | | | | | +--------------------------------------------+ | | | | | | | | + Null byte, indicating end of basic line | | | + Token/ASCII/Data, multiple tokens/data bytes allowed. | | + Line number (16 bit) | + Memory address of the NEXT line (so, the byte right after NN) ----+ 0c 08 is thus the pointer to the NEXT BASIC line after the one that comes now. 0a 00 is the BASIC line number for the lines that comes now ( $000a = 10 ). 9e is the BASIC token for the SYS command, so, in BASIC, the letters for each command is not saved, but rather converted to and from single-byte tokens, much faster and takes up a lot less memory and storage. 34 39 31 35 32 = The PETSCII string "49152", which is the argument to the SYS command. It is the decimal notation for $c000, the address of our first assembler instruction. Note that this is treated as TEXT and not an integer, which is why it takes up so many bytes, and why it is not in little-endian order. 00 This is the terminator for the BASIC line. 00 00 This is the pointer to the next basic line, it having a value of zero means end of BASIC program. These two zeroes, are what the pointer 0c 08 points to. Try, from $0801, add the number of bytes of our basic program, you end up at $080c. The following zeroes are simply padding until address $C000. Note that the hexdump shows the first byte as being at b801, but recall that the prg file was loaded into memory beginnning at $0801. Well, the first two bytes of the file is not loaded into memory, so if we do 0x0801 + 0x0b801 - 2 we get the value: 0xc000 Time has come for our assembler program, the first byte located at $0c000 thanks to being offset by all that zero padding. a9 = OPcode for immediate-addressing-mode LDA, produced by LDA #0 00 = The single-byte operand for that instruction. 8d = OPcode for absolute-addressing-mode STA 21 d0 = $d021 = The two-byte operand for that instruction. You can do the rest if you want to.
hello world 1.5
The next iteration of the hello-world program makes a routine out of the code that printed the characters. That was done by adding some code to calculate the offset into screen memory and a bit of code to handle taking "arguments" in different ways.
Useful subroutines often take one or more arguments, and there are different ways of provding those arguments, depending on whether the routine needs to be re-entrant. The print routine does not. So I use in-place modification of some of the parameters, this can be done by placing a label at the instruction, and jumping the over the instrunction (thus reaching the operand address) by adding 1 to the label address.
Find the relevant parts below: (or download full hello1_5.asm program here)
// Put our texts a few places on screen lda #txt sta strPtr+2 lda #1 sta col+1 lda #0 ldy #0 jsr puts lda #2 sta col+1 lda #10 ldy #8 jsr puts lda #3 sta col+1 lda #17 ldy #15 jsr puts lda #<txt2 sta strPtr+1 lda #>txt2 sta strPtr+2 lda #4 sta col+1 lda #20 ldy #24 jsr puts // exit to basic, for example rts // puts: Print a string to the screen // strPtr+1 = 16 bit pointer to the string to show // col +1 = the color // a = column, y = row puts: // First, calculate offsets into screen and color memory sta scrOffs+1 // screen lowbyte is 00, so we just put the column directly. sta colOffs+1 // same goes for color lowbyte lda #>screen sta scrOffs+2 lda #>color sta colOffs+2 iny // add one to y so we can have y = 0 on first iteration. rloop: dey beq putnext // skip when y reaches zero lda #40 // there are 40 columns to a row adc scrOffs+1 clc // Clear the carry flag for the next add instruction. sta scrOffs+1 lda #40 adc colOffs+1 sta colOffs+1 bcc rloop // if the carry bit was not set, just loop more clc // if it was set, clear it and handle high-bytes inc scrOffs+2 inc colOffs+2 jmp rloop putnext: ldx #0 // Let's use the X register as the index into our text string. strPtr: lda $ffff,x // Load the character at txt+x bytes beq exit // If lda set zero bit we loaded the "null" byte after the string and we exit. scrOffs: sta $ffff,x // Else put it on the screen. col: lda #00 // Load 2 (the color red) into the A register colOffs: sta $ffff,x // And put it at the corrosponding color-memory. inx // Increase the index jmp strPtr // and repeat exit: rts
hello numbers!
I spent a fair deal of time wrapping my head around how to implement a simple "itoa" (integer to string) routine, and finally decided that for my uses, dealing with binary coded decimal is waaaahy easier.. I'm pretty satisfied that I was able to come up with these small routines myself, after failing to get the math behind the ones I've read online. Sure, they have disadvantages, and they probably run slow, but they're mine and I understand how they work. Integrating them into the hello-world program, is left as an exercise for the reader. I've not implemented a nice way of passing parameters to them yet, I will have to read up on zero-page addressing modes for that I'm afraid.
// Convert the number in the 10 bcd bytes to a string // What's neat about decimal mode, is that instead of a byte holding from 0 to 255 // It holds from 0-99, and the way the bits are packed, is that the first // nibble is the "tens" and the last nibble is the "ones" // So the byte 0101 0011 in hex is $53 right, the decimal value when // interpreting the number as binary, is 83, but look at those nibbles! // 0101 = 5, and 0011 = 3.. so if we interpret it as BCD, the decimal value // is actually 53, just as the hex representation. // So now, to convert to strings, we simply go through the bytes in our decimal // number one nibble at a time, and emit a character code for each. And find the // character codes is easy as well: // The reason we can OR with '$30' is because the last 4 bytes // of the character code is zero, and the BCD number is 4 bytes long, so it // "just so happens" to fit there. Otherwise, we could have added instead. // The logic is: $30 = 0011 0000 = '0' so if we had.. A nibble of decimal value 5 // then we'd have for example a byte '0000 0101' in binary, if we OR those values: // 0011 0000 O // 0000 0101 R // --------- // 0011 0101 = 35 = '5' // But wait! There are two decimal numbers in each byte? // Easy, we just take them one at a time. First we mask out the high nibble, // and emit our character code for that number, then we mask out the low nibble // and shift the remainding bits down into the low nibble and use the same method // to emit the next byte. // Just note that the bytes come out "in reverse", so we store them from // the end of the string towards the start, so it comes out in the right order. bcdtos: ldx #$00 // Counter into the bcd number ldy #19 // counter into the string, starting from rightmost byte bcdtosLoop: lda score,x // Load a byte from the number and #$0f // Keep only last 4 bits ora #$30 // OR them with the character code for '0' sta scoreStr,y // Store the character in the string dey // Move one left in the string lda score,x // Load the same byte again and #$f0 // Keep only the first 4 bits ror // Rotate those bits so they are the last 4 bits. ror ror ror ora #$30 // OR with the '0' character again. sta scoreStr,y // and store it in the next location dey // and move one more to the left in the string inx // Next byte in the BCD number cpx #10 // Check if that was the last (our number is 10 bytes) bne bcdtosLoop // if now, repeat. rts // Add number (from 0 to 99) to the 10 byte bcd number. // Note, to add the decimal number 53, do lda #$53 and then jsr bcdAdd // Parameter: A - Number to add bcdAdd: sei // Disable interrupts, it may be bad when in BCD mode ldx #0 // Start from first byte in number (little endian sed // Set Decimal mode bcdCarryLoop: clc // Clear the carry adc score,x // Add whatever was in A with the byte sta score,x // Store the result bcc bcdAddEnd // If no carry, we're done. lda #1 // Otherwise, we'll add 1 to the NEXT byte inx // So we target the next byte jmp bcdCarryLoop // to add it (it's in A, remember!) bcdAddEnd: // When we're done cld // Clear Decimal mode cli // Enable interrupts rts // Strips leading zeroes from the string stripZero: ldx #0 // Start from first byte (leftmost) in string stripZLoop: lda scoreStr,x // Load that byte cmp #'0' // See if it's the 0 character bne stripZEnd // If it is NOT a zero, we're done lda #' ' // Else, a space is sta scoreStr,x // saved instead. inx // Target next byte jmp stripZLoop stripZEnd: rts // In the "constants" part of the program, which I chose to be after all the // instructions, I added two labels, one for the destination string, and one // for storing a giant bcd number. scoreStr: .text " " .byte 0 score: .byte $00, $00, $00, $00, $00, $00, $00, $00, $00, $00 // 10 byte little endian bcd number, // Max = 99999999999999999999.
Hello Macros
So, recall how to use puts.. First, we load and store lowbyte and highbyte and color, then we load two registers, a with row and x with column where we want the text placed and THEN we jsr to the subroutine..
lda #<txt // 1 sta strPtr+1 // 2 lda #>txt // ... sta strPtr+2 lda #1 sta col+1 lda #10 ldx #15 jsr puts // 9 instructions!
And if that's not enough, the subroutine uses another 21 instructions on calculating and storing the offsets.. Now, how many instruction does it take to copy the string into screen memory? It takes 8.. and a 9th to return from the subroutine. I'm _SURE_ there are shorther ways to do this, (that is, to write a subroutine that allows you to place any string anywhere in any color), but this was what I came up with and it's not terrible as such, but it is wasteful for all those cases where I don't need that kind of flexibility.. For instance, what if it's always the same string to the same position, maybe even in the same color? Then I'm actually only needing those 8 instructions, and I don't need a subroutine at all!
enter the macro
So, a less flexible, but faster and shorter way of placing "the same string in the same place" is to use a macro to generate the 8 instructions for us, let's call it sputs (static puts):
.macro sputs(strPtr, row, column, colorNum) { .var offset = row * 40 + column ldx #0 // Let's use the X register as the index into our text string. putsLoop: lda strPtr,x // Load the character at txt+x bytes beq exit // If lda set zero bit we loaded the "null" byte after the string and we exit. sta screen+ offset,x // Else put it on the screen. lda #colorNum // Load color into the A register sta color+offset,x // And put it at the corrosponding color-memory. inx // Increase the index jmp putsLoop // and repeat exit: // label points to "whatever comes after these instructions" }
Now, look at that! it looks like a function! And we get to do math in there, like multiplication and stuff! Like everywhere else where we do "label+something" the assembler does the calculations on those labels for us, and inserts the new result. So, this is NOT a function even if it looks like it, the assembler will simply inser the code wherever the macro is called. And this is how we'd call it:
sputs(txt, 10, 15, 1)
Note that we cannot make the text move around with this, and we cannot change which string it points to (but we can chance the contents of the string), we can also chance the color, but to do that, we'd hack a bit, we can count that the color lda is just around the 12th byte into the macro, so if we labelled the macro, we can easily poke another number in there:
lda colorPtr sta colorChangingString+12 colorChangingString: sputs(txt, 10, 15, 1)
And those BCD routines, they're a bit longer than the number of instructions it takes to set them up, but not much, and if you think about it, then unless you're running out of memory, maybe it'd be nicer to not spend so many instructions on "doing nothing" and instead spend a bit more memory on duplicating the code where it's needed. This is something I'm finding difficult to get my head around, because I come from C with all its fancy functions any types and other modern conveniences.
If you're interested, I've put some of my macros into this file: hello_macro.asm
Stay tuned for Hello Sprite...
Hello Raster Interrupt
For many kinds of real-time interactive software, like games and demos, it's desirable to have some control over the speed that our program execute. For example, being able to run some code "every frame" and also know that the time between each time the code runs (assuming the code is fast enough to finish within the time a frame takes, which at PAL speed is 1/50 of a second, or 20 milliseconds.
The C64 supports us configuring an interrupt to occur when the VIC chip reaches a raster-line of our choice. We can then have our code be executed when that happens, in effect giving us control over when our code is run.
Another happy side effect of this, is that it enables some neat trick to measure CPU usage. But more about that in the next section.
// Setup raster interrupt lda #%01111111 sta $DC0D // Disable interrupt on timer-a underflow and $D011 sta $D011 // Bit 7 is the "msb" of the "interrupt on raster-line", because there are over 255 raster lines. lda #20 // We just want our interrupt at line 20. sta $D012 // So we store that in the lowbyte of the "interrupt on raster line" lda #<Irq // Highbyte of the code we want to have run on interrupt sta $0314 lda #>Irq // Lowbyte of that code sta $0315 // Store the pointer to the code (it's called an ISR, Interrupt Service Routine) lda #%00000001 // Then set bit 0, which enables raster interrupt from the VIC sta $D01A // Loop forever forever: jmp forever // Executed each time the VIC chip hits line 20. Irq: // Do whatever we want.. nop asl $D019 // Acknowledge the interrupt jmp $EA31 // And jump to the kernal interrupt service routine, so we can still // use kernal functions, like the keyboard scanning.
Hello CPU usage
Now that we know our code starts once per frame, we can exploit the stability to "see" how much time our code takes. It's really neat, visually, we can get a good idea about how much "frame time" we're using. The idea is super simple: Before running some piece of code, we change the border color.. Then after the piece of code, we change it back.
.const border = $D020 .const black = 0 .const white = 1 .const red = 2 .const cyan = 3 .macro borCol(col) { lda #col sta border } //setup interrupt as before.. ... // Let's loop forever, changing the color of the border to black forever: borCol(black) jmp forever Irq: // Change border color to white to "measure" how much time this block takes. borCol(white) jsr someRoutine // Let's see how long the next thing takes borCol(red) jsr somethingElse // Then, since we do let the kernal stuff run, let's also measure that borCol(cyan) asl $D019 // ack the interrupt jmp $EA31 // and let kernal run
Hello Color Memory
The color memory is normally located at $D800, note how it says "only bits 0-3", that's important. The memory for the colors is a 4 bit chip, so only the lower 4 bits of the byte is valid. Therefore, if you read from color memory for any reason, you must mask off the upper 4 bits. Otherwise, you may read trash bits along with your 4 color bits, this is important if you want to for example compare a value you read from color memory.
lda $D810 // read the color of character 11. and #$0F // AND the value, so only the lower 4 bits are kept in register A // now A is ready to use, containing the correct value of what was at that memory location.
Tools and references
- KickAssembler Official website (V5.16 locally archived)
- KickAssembler Manual Official website (V5.16 pdf locally archived)
- Memory Map https://sta.c64.org/cbm64mem.html (locally archived 2020-09-01)
- STAs C64 documentations: https://sta.c64.org/cbmdocs.html (locally archived 2020-09-01)
- Instruction Set https://www.masswerk.at/6502/6502_instruction_set.html (locally archived 2020-09-01)
- Ultimate Documentation (not really, but nice) Official github (locally archived and rendered 2020-09-01)
- VIM Syntax highlight Official website (locally archived 2020-09-01)
- Visual Studio Code KickAssembler extension Official website
- PETSCII Editor (is awesome) petscii.krissz.hu
- Online Sprite Editor https://spritemate.com/