This is an intro the the disassembly of Atari's 1979 Lunar Lander vector arcade game. There is also the full listing, llander.asm.
It's helpful to have the instruction set reference open in another tab to help answer any questions about the semantics of each instruction and the addressing modes.
Some common idioms in the code:
LDA variable ; Read the global variable into the A register BMI if_neg ; If A < 0, take this branch BPL if_pos ; If A >= 0, take this branch BEQ if_zero ; If A == 0, take this branch CMP #$25 ; Compute A - 0x25, but do not store the result. Set the flags N, Z, and C BEQ if_eq ; If A == 0x25, the result was zero, so take this branch BNE if_ne ; If A != 0x25, the result was non-zero, so take this one BCS if_gt ; If A >= 0x25, take this branch (see note above) BCC if_lt ; If A < 0x25, take this branch (see note above)
BIT variable ; Read the global variable, set the N and V status flags (also Z, but it's complicated) BMI if_bit7 ; If the 7th bit is set, take this branch BPL if_not_bit7 ; If the 7th bit is not set, take this branch BVS if_bit6 ; If the 6th bit is set, take this one BVC if_not_bit6 ; If the 6th bit is not set, take this one
LDA variable ; Read the global variable into the A register, set the N and Z flags BNE if_any_bits ; If any bits are set, take this branch BEQ if_no_bits ; If no bits are set, take this branch
INC variable BCC skip_inc INC variable_high skip_in: /* more stuff */
CLC ; start with the carry flag clear LDA v1_low ADC v2_low ; if the results overflows, the carry flag will be set STA v2_low LDA v1_high ; LDA does not change the carry flag ADC v2_high ; carry flag added to the high bytes STA v2_high
LDX #$03 ; for x = 3, 2, 1, 0 == four passes through the loop loop: CLC ; don't carry ADC array,X ; A += array[X] DEX ; x-- BPL loop ; if x >= 0 go again
Since the registers are 8-bits wide, passing a 16-bit value to a function requires two of them. Most of the time they are passed in A and X, but it is not consistent across all of the code in Lunar Lander. However a 16-by-16 multiply needs more registers, so some temporary zero page locations are used. The results are also left in zero page locations and can be used for chaning operations together.
As a warmup, here's a function that returns the minimum of two 16-bit values that are stored in global variables on the zero page:
To translate this into C with the same logic flow is not very idomatic, but hopefully makes it easier to see how it maps to the 8-bit math of the assembly:
uint8_t a_high, a_low;
uint8_t b_high, b_low;
uint16_t min16(void)
{
uint8_t a = a_high;
uint8_t x = a_low;
if (a_high < b_high)
goto ret;
if (a_high > b_high)
{
a = b_high;
x = b_low;
goto ret;
}
if (a_low < b_low)
x = b_low;
ret:
return a << 8 | x;
}
I'm not certain about some of these operations; it seems that the mult_acc field is never used after being zeroed and I wonder if it is left over from a prior implementation. This code also causes a problem with the tracing disassembler since it appears that there is subroutine call to 0x711c, which is in the middle of an instruction. If the multiply does overflow, a BRK instruction is triggered that should halt the game.
In any event, this algorithm could be translate roughly to the C code:
uint16_t mult16(uint8_t a, uint8_t y) { uint8_t mult_acc = 0, mult_acc_high = 0; uint8_t inv_a = ~a; for(int8_t x = 8 ; x != 0 ; x--) { if (inv_a & 0x80) { mult_acc += y; if (mult_acc + y > 0xFF) mult_acc_high += 1; } mult_acc_high <<= 1; if (mult_acc & 0x80) mult_acc_high |= 1; mult_acc <<= 1; } return mult_acc_high << 8 | mult_acc; }
The core game update routine uses the 16-bit signed magnitude XY accelerations to compute the ship's velocity in the XY coordinate frame, which are then used to update the XY positions. Adding the values as part of the timestep update is implemented in this set of functions:
The 6502 has a "Binary Coded Decimal" mode that only allows the values 0 - 9 for each four bits in a byte. This means that one byte can represent 00 to 99, and is frequently used by games to track scores or resources that are displayed to the player in base-10. On a modern system programmers would just use printf() or something to convert from binary to base-10, but that requires mutliply and divide operations that the 6502 did not have.
Most of the fuel and score calculations in Lunar Lander are done in BCD, but there are other parts that are all done in binary, so occasionally it is necessary to convert between them. For these few times there is an interesting algorithm called Double Dabble that relatively efficiently produces a result with no multiplies or divides. If space is available, a lookup table is also an option.
The Lakeside Arcade - Lunar Lander PCB Repair Logs has a helpful excerpt from the service manual showing the memory map for the arcade console:
There are five buttons on the controls, Rotate Right, Rotate Left, Abort, Start and Select. These are all memory mapped into the 6502's address space and set bit 7 in the byte when they are pressed.
There are also three coin slots mapped into the same memory region:
The lamps and sounds are also memory mapper peripherals. Eight output pins are mapped to the one address. The system caches the last value to avoid read-modify-write problems with this location.
To avoid read-modify-write cycles when updating the lamps or audio devices, the game keeps track of the last value written in a global variable and uses that as its cache. The functions take two parameters and act as a SET and RESET value.
The first function we're going to look at to keep things simple is the one that reads the player's rotation buttons.
This could also be rewritten into something like C:
volatile uint8_t * const IO_button_right = (void*) 0x2406;
volatine uint8_t * const IO_button_left = (void*) 0x2407;
int IO_read_rotate_buttons(void)
{
int yaw = 0;
if (*IO_button_right & 0x80)
yaw--;
if (*IO_button_left & 0x80)
yaw++;
return yaw;
}
We now know something about the way the game tracks orientation and that it uses a reference frame where positive rotation is to the left. Based on the cross references, we can see that this function is called by the ship_command_yaw and ship_command_yaw_easy functions. Let's look at that first one since it's "easier":
Note that the last instruction in the function is a BNE, even though a constant non-zero value has just been loaded into Y, so it becomes an always-taken relative jump. This is one byte shorter than the equivilant JMP instruction.
Another way that programmers saved memory was by reusing instructions across different functions. The "function" at 63d3 is a single RTS instruction that other nearby functions use instead of having their own RTS. This complicates the control-flow analysis of tools like ghidra and sometimes requires manual annotation to decompile.
Most of the ship's state is stored in zeropage global variables. These include the XY acceleration (stored as 17-bit signed magnitude), the velocity and position (stored as 9-bit signed magnitude?)
The mult16 function is used to compute the ship's X and Y acceleration based on the current thrust, a thrust-to-force lookup table, and then multiplying by the sine and cosine of the ship's angle to rotate the force into the screen coordinate frame.
Now that we have the math functions for computing signed magnitude addition and transforming the thrust vectors into the XY screen coordinate frame, we can finally update the ship's position.
The player has a large lever that controls the thrust from the ship's engine. More thrust burns more fuel and the game rewards good landings with more fuel.
The fuel is stored in BCD format, which means that each byte can represent up 00 - 99, so a three byte value can represented 000000 to 999999. The 6502 has a special mode in the ALU that causes addition and subtraction to produce results in this format. Games often used it for scores since they wanted to display a base-10 value for the player.
Various functions in the code will spend fuel, and they either call fuel_drain_16 to drain a two-byte amount, or the full fuel_drain that takes a three-byte amount. This is in BCD and the A:X:Y calling convention is different from some other functions that work on multi-byte arguments.
The yaw routines burn a little bit of fuel, but most of the burn comes from the main engine. This is handled here:
This is what makes the Atari arcade games from this era so special -- the vectors instead of pixels! The actual drawing is done by specialized hardware, which is well documented in Jed Margolin's "Secret Life of Vector Generators" and come in two varieties: analog and digital vector generators. Lunar Lander has the DVG, which is built out of DACs that steer the CRT's electron beam around.
The DVG implements its own little programming language with scaling, subroutines, and brightness control. The Hitch-Hacker's guide to the Atari DVG by Philip Pemberton has a good description of how they work, some of which is excerpted here. The main features of the DVG are:
Commands are 16-bits long, with the exception of VCTR and LBAS. The first nibble is the opcode so it is easy to visually tell what is going on.
The main commands that are used in Lunar Lander are:
The vector generator shares memory with the 6502. It has RAM from 0x4000 - 0x47FF and two ROM's from 0x4800 - 0x4FFF and 0x5000 - 0x5FFF (in the 6502's address space).
Once the 6502 has written a "frame" to the vector generator's RAM, it drives IO_DMAGO which tells the vector generator to start executing from the RAM until it hits a HLT opcode.
The font table is stored in the vector generator's ROM; note that all of their addresses are offset by 0x4000 since it is mapped into the 6502 at 0x4000 but at 0x0000 in the DVG, and that all of the addresses are *words*, not bytes, so the CADF command is a subroutine call to word 0xADF, DVG address 0x15be, or 6502 address 0x55be
Using our emulator for the DVG, we can render out this font table as an SVG:
Each character is it's own DVG subroutine. For example, here is the routine that draws A -- you can see that it consists of a few short vectors (command F) and then a RTSL (command D) to return:
Some of the displays are also generated this way.
which calls these subroutines:
Text strings are drawn as a sequence of DVG subroutine calls, each character is a DVG JSRL subroutine call copied from the CharPtrTbl to the vector generator RAM. The strings are stored not in ASCII, but with the offsets into the font table of the character subroutine call, and the last character in the string has the high bit set as a terminator.
The fixed strings are written to the screen by the WriteText function, which also handles localization and some sort of fixup that I haven't figured out yet:
There are 33 strings in the game and they are all indexed by number. For English the table has the pointers, note that string number 17 is an index into string 16 to reuse the word DESTROYED.
On the 6502 global variables are often stored on the "zero page" since it is possible to reference the bottom 256 bytes of memory with shorter instruction sequences. It is initialized to zero or a static value in the RESET code.
This is interesting because it has to deal with potential (physical) attacks on the coin slots, which apparently was a problem with some other arcade cabinets. The memory mapped coin detectors are discussed in the button handling section.
The mainboard is configured with an external timer that is running at 250 Hz and triggers and NMI that is used to drive the main timing loop. When an NMI is received, the 6502 will push some state on the stack, read the pointer from 0xFFa and jump to that address