What I Wish I Knew About Machine Language
What I Wish I Knew About Machine Language. Dan’s MEGA65 Digest for February 2023.
I spent six years of my early childhood on my Commodore 64, between ages 6 and 12. I wrote many programs, all using the built-in BASIC language, learning everything I could from reading the Commodore 64 Programmer’s Reference Guide and typing in program listings from Compute!’s Gazette magazine. I learned a lot from writing BASIC programs, but the one lesson I took away over and over again was this: to make anything really cool, like Marble Madness or Skyfox, I’d need to know machine language.
Machine language felt like dark magic. In Compute!’s Gazette, all the games with the cool graphics and high speed action were printed as columns of numbers, a pages-long incantation that if you typed it correctly would conjure a video game. The Programmer’s Reference had a chapter on machine language that I stared at endlessly trying to make sense of it, but it only made a passing reference to the fact that I needed additional software to write it. If I were older or had a friend with more experience, I would have known what utility cartridge to ask my parents for as a birthday gift, but it was too much for my kid self to figure out all by myself in a basement.
In fairness, Compute!’s Gazette did publish the occasional machine language coding tool, like Fast Assembler in the January 1986 issue. The companion disk to the issue even included the source code for Fast Assembler itself, the first time I had ever seen a complete machine language program listing. I remember making a small change and running the assembler on its own code, which produced a new version of the assembler that printed my name instead of its own. But that’s as far as I got.
In this issue of the Digest, we’re going to answer the questions I had when I was a kid. What is machine language? What is an assembler? And what in the name of Chuck E. Cheese is hexadecimal? We’ll also look at ways you can start learning machine language right now with your MEGA65.
Featured Files
It’s time once again for Featured Files, where we highlight stuff you can download from the Filehost and try with your MEGA65 today!
Mega Wizards, by RBJeffrey. A game for one to four players, using the joysticks. Use your wizardly wiles to bounce the magic ball, take out your enemies, and defend your crystal. Three and four players require a four-player joystick interface.
WORDUP by geehaf. A faithful variant of the popular daily word game Wordle, playable on your MEGA65. Guess the secret five-letter word in six or fewer tries, using hints produced by each guess.
xmas65 by MirageBD. A holiday greeting for the MEGA65 community with an impressive graphical effect and a rockin’ MOD-style backing track.
New Intro Disk!
The intro disk is the first thing you see when you turn on the MEGA65 for the first time, a bundle of games, demos, and music ready to play and show off your new computer to your friends. Thanks to Gurce, there’s now a second intro collection for your enjoyment!
Intro Disk #02 goes beyond the bounds of a single D81 disk image with dozens of titles and music files, all browsable from a colorful menu. Unpack the zip file and copy the files to your SD card, then boot the INTRO2.D81
disk image.
There are a lot of files for this one. I had success keeping the files in a subfolder on the SD card. You may want to move the MODS/
subfolder to the root to make these music files easier to play with the new version of the Manche MOD player, included on the intro disk.
What’s new with the ROM
The MEGA65 ROM is the built-in program code that powers the kernel, the BASIC programming language, and the operating environment. It’s what Commodore would have burned into a physical ROM chip had they released the Commodore 65 as a product. In the case of the MEGA65, the ROM is actually a file on the SD card, loaded during boot by the firmware.
Commodore canceled the C65 project before finishing the original C65 ROM. The MEGA65 team has made significant investments into finishing planned features, fixing bugs, and generally making improvements in the spirit of the original Commodore design. The latest MEGA65 ROM is much more useful thanks to these efforts. Release package version 0.9, which shipped with the first batch of MEGA65s in May of last year, bundled the ROM with the version ID of 920287. The most recent release package as of this writing is version 0.95, which shipped with the second batch last October, with ROM 920377.
The work continues on improving all aspects of the MEGA65 platform, including the ROM. You are invited to help test new “beta” releases of the ROM, starting with version 920378, now available. If you own a MEGA65 and are signed in to Filehost with your account, you can download the latest beta ROM and install it on your SD card. If you do not own a MEGA65, you can put the ROM together for use with the Xemu MEGA65 emulator, using the original C65 ROM, the MEGA65 ROM patch file, and the M65Connect app (also available on Filehost). (The C65 ROM is available for free for personal use as part of C64 Forever Free Express Edition from Cloanto.)
If you discover an issue with any version of the ROM, please report it using the Issues tab of the mega65-rom-public Github repo. Check out the new MEGA65 ROM FAQ article with answers to common questions and instructions on how to download and use the latest ROM. Also, be sure to join the #closed-roms
channel on the Discord for updates.
Addressing the computer
One of the first things I learned to do with the Commodore 64 was this command:
POKE 53280,4
This command still works today on your MEGA65. (Try it!) If you haven’t memorized what these numbers mean, it’s difficult to tell what this command does just from looking at it. Newer Commodores added an equivalent BASIC command that better describes its function, also available on the MEGA65:
BORDER 4
The POKE
command takes two numbers: an address and a value. It instructs the CPU to send the value to whatever device inside the computer is wired to that address. The effect this has depends entirely on what’s there. In the case of 53280, the address is connected to the VIC video chip, specifically the part that controls the color of the screen border. The different values represent different colors.
In many cases, the device at an address remembers the last sent value, and the CPU can retrieve it with another command:
PRINT PEEK(53280)
The POKE
command and PEEK()
function are most often associated with computer memory, a device whose sole purpose is to remember the most recent value sent to an address for as long as the computer is powered on. Here’s an address on the MEGA65 connected to memory:
POKE 6144,255
PRINT PEEK(6144)
Most addresses are connected to memory chips. Some are connected to other devices, like video and sound generators, disk drives, the keyboard, joysticks, and so on. Non-memory addresses are known as registers, or more specifically I/O registers for their purpose of reading input from and writing output to the devices that live there. Not all registers accept new values. For example, the CPU can read a register connected to a joystick port to see which way the joystick is being pushed, but there is no way to send a value to the joystick.
There are 256 possible values you can send to or read from an address, numbered 0 to 255. The meaning of the value depends entirely on how it is used. It can be a color, a character of text, a set of pixels on the screen, the pitch or volume of a sound, a piece of a larger number, or anything else that can be represented by a value.
Bits, bytes, and binary
Internally, the computer represents a value as one or more digital electronic signals. This fundamental unit of digital data is known as a bit, with two possible values, typically represented by the numbers 0
and 1
. Bits can be combined to represent more possible values: two bits can represent four values, 00
, 01
, 10
, or 11
; three bits can represent eight values. The value you send to or read from an address is eight bits, also known as a byte, with 256 possible values. The MEGA65 CPU mostly deals with byte-sized values, which is why it’s called an “8-bit computer.”
It’s very common to describe bit patterns as numbers—specifically, integers counting up from 0—even when they don’t represent numbers in the data. The binary number system puts these bit patterns in order. The first six binary numbers are 0
, 1
, 10
, 11
, 100
, and 101
, equal to the decimal numbers 0 through 5.
Just as decimal place values are powers of ten (1, 10, 100, 1000), each binary place value is a power of two (1, 2, 4, 8). You can convert a decimal number to its binary equivalent by setting the bits whose place values add to the number to 1
. To convert from binary to decimal, add the place values where bits are set.
Addresses are numbered starting from 0 and counting up, and it should be no surprise that the computer uses bits to represent addresses. A Commodore 64 uses 16 bits (two bytes) for addresses, for 65,536 possible addresses. The MEGA65 uses 28 bits for addresses.
Because the MEGA65’s 45GS02 CPU is an evolution of the Commodore 64’s 6510 CPU, MEGA65 machine language programs mostly work with 16 address bits, and use other features of the 45GS02 CPU to complete the address. It’s kind of like telling a taxi driver where to go using just the house number and street name, and assuming the driver knows the city, state, and country. See the chapter on memory in the manual for more information about how this works.
Hexadecimal to the rescue!
We use decimal numbers in our daily lives, but it gets unwieldy when working with computer programs. Computers are designed around binary, so all of the useful value ranges are convenient in binary—and confusing in decimal. 255 feels like a weird stopping point for a byte, but in binary it’s just the largest eight-digit value: 11111111
. As more bits get involved, the possibilities get even more difficult to track. Why is 65,536 an important number? How much is 1,048,576? What is at address 268,251,168?
POKE 268251168,3
Writing programs using binary numbers would quickly drive us mad. A 28-bit address would literally be twenty-eight 1’s and 0’s. We need a concise way to represent these numbers that doesn’t make them more confusing. That’s where hexadecimal comes in.
Hexadecimal is a numbering system that uses sixteen possible values per digit, so a single hex digit completely represents all possible values of four bits. A byte value can be represented by two hex digits instead of eight binary digits.
Earlier we saw that the border color address is 53280
in decimal. It looks like a random number, if you haven’t memorized it. In binary, a hint of a pattern emerges, but it’s still difficult to manage: 1101000000100000
To convert this to hexadecimal, split the bits into four-bit groups, then find the hex digit that represents each group. 53,280 in hexadecimal is D020.
To avoid confusing a hex number with a decimal number, it is often written with a dollar sign in front: $D020. This convention is common to microcomputer programming. (Modern languages use different conventions for hex numbers, such as 0xd020
in Python.)
Commodore 64 BASIC only knows decimal notation for numbers, so C64 programmers are well practiced at converting between bit values and decimal notation. MEGA65 BASIC can handle hexadecimal numbers directly, so you don’t need to bother. Just use the dollar sign notation:
POKE $D020,$04
This makes it easy to use MEGA65 BASIC as a hexadecimal calculator. To convert a hexadecimal number to decimal, simply print the value:
PRINT $D020
To see the hexadecimal representation of any number or expression, use the HEX$()
function:
PRINT HEX$(53280)
The MEGA65 I/O registers are in the address range $D000 – $DFFF. In hexadecimal, all of these addresses start with a D
. This is easier to understand than using decimal notation for the range: 53248 – 57343.
Most books on this subject discuss at length how to convert between hexadecimal and decimal. This is fine for getting accustomed to the idea. In practice, as long as your programming tools support hexadecimal notation, you rarely need to convert between hexadecimal and decimal. It’s more convenient to leave addresses and values in hexadecimal, and understand how hex digits represent bit patterns in groups of four. Adding one hexadecimal number to another takes some practice—but you always have your handy MEGA65 to help you.
The CPU always follows instructions
The CPU’s job is to perform the instructions of a machine language program. When it’s not running the instructions of your program, it is running the instructions of the MEGA65 kernel to blink the cursor, perform BASIC commands, access disk drives, and so forth. If it doesn’t look busy, it’s busy waiting.
Machine language instructions manipulate memory, interact with I/O registers, perform simple calculations, and make decisions about what instructions to perform next. There are surprisingly few things the CPU knows how to do. Most of its power comes from being able to perform very many instructions very quickly.
The CPU reads its instructions from bytes in memory. It keeps track of which address has its next instruction using an internal register called the program counter (PC). The CPU reads the instruction from the address stored in the program counter, then does it. By the end of the instruction, the program counter contains the address of the next instruction to perform, and the process repeats.
Those columns of numbers in a Compute!’s Gazette magazine program listing are machine language instructions (and data) for the program. As you type them in, Compute!’s “MLX” data entry program stores those values at their corresponding addresses. (MLX magazine listings also include a checksum value as the last number in the line, so it can catch your typing mistakes.)
To start a typical Compute!’s Gazette program, you use the BASIC SYS
command with the address of the first instruction. SYS
sets the CPU’s program counter to that address, and the program takes it from there.
You can write your own machine language routine in a similar way, setting values in memory that represent instructions then calling the routine with SYS
. Try entering these commands on your MEGA65:
POKE $1800,$EE
POKE $1801,$20
POKE $1802,$D0
POKE $1803,$60
SYS $1800
If you mis-type any of these numbers, your computer might do something strange. You can always restart the computer and try again.
A human-friendly machine language
Yes, EE 20 D0 60
is a machine language program. You’re not expected to know what it means, nor are you expected to write programs that way. It’s possible to do so, in the same way that I used to sit in that basement drawing pixel art on graph paper and converting it to byte values for my DATA
statements, but it’s slow going. (I tried this recently as an exercise for a different microcomputer. It’s fun, the first time.)
When most people say they’re writing a machine language program, most often they’re actually using assembly language, a programming language that is roughly similar to the computer’s machine language, along with a tool that converts it to machine code called an assembler. Assembly language consists of the same instructions that the CPU knows how to perform, just spelled out in a way that’s easier to read and write.
Here’s that short machine language routine again, this time written as assembly language instructions:
INC $D020
RTS
The first instruction tells the CPU to increase the value at address $D020 by one. The second instruction returns from the subroutine. As we’ve seen, address $D020 controls the border color. Each time you SYS
to the first instruction, the border color changes, and control returns back to BASIC.
Registers, instructions, and addressing modes
The CPU contains a little bit of memory of its own to use as scratch paper while performing computations. The MEGA65’s 45GS02 CPU has four general purpose registers, each the size of a byte: an accumulator (often referred to as just A), and registers X, Y, and Z. The program counter (PC) is an example of a special purpose register, holding the address of the current instruction.
The CPU maintains eight one-bit registers, called flags or just status registers (SR), that indicate useful aspects of the machine’s operation. For example, the Zero flag is set after a math operation results in zero, which is useful for countdowns or comparing whether two numbers are equal.
The CPU reserves a small area of memory to use as a data structure called a stack. Much like a stack of cards, you can push new values onto the stack, and pull the last pushed value off of it. The CPU uses the stack to remember its place when calling a subroutine, like the SYS
command does, so it can return to that place when the subroutine performs the RTS
instruction.
The CPU gives special treatment to a single page of memory, 256 bytes starting at an address that ends in $00
. It can access this memory faster than other locations, so it’s useful for storing variables that change frequently. In the Commodore 64 (and the 6502/6510 CPU), this is known as the zero page because it is always located at address $0000. In the MEGA65, the 45GS02 CPU can change which page it uses for this purpose, so it is known as the base page. The location of the base page is stored in a register (B).
Nearly all CPU instructions manipulate a value in working memory, a value at an address, or both. Here are just a few examples:
LDA $1900
: load the value from address $1900 into the accumulatorLDX #$FF
: load the value $FF into the X registerSTA $D020
: store the value in the accumulator to address $D020ADC #$1A
: add the value$1A
to the value in the accumulatorAND $C901
: set the accumulator to just the1
bits that are present in both the accumulator and the value at address $C901BNE $180C
: set the program counter to $180C (“branch to”) if the last math operation resulted in something other than zeroJMP $18FF
: set the program counter to $18FF (“jump to”)JSR $1900
: remember the address of the next instruction, start executing the subroutine at address $1900, then return to this location when the subroutine performsRTS
PHA
: push the accumulator value onto the stackPLA
: pull the most recently pushed value from the stack, and set the accumulator to that valueTXA
: swap the values in the accumulator and X register
Instructions that access addresses often have variants for different addressing modes that describe how the address is calculated. As shown in the examples above, some instructions can read a value from an address provided with the instruction, or it can read the value provided with the instruction itself (which is just reading it from the memory that contains the program code).
Addressing modes are a powerful concept, and understanding them is critical for performing certain tasks effectively. The MEGA65 45GS02 CPU has many addressing modes. I recommend learning a few common ones to get started, then read about the rest once you’ve tried writing a few short programs.
The MEGA65 manuals describe the 45GS02 instruction set and addressing modes in detail. The 45GS02 supports all of the instructions of the 6502, so descriptions of the 6502 instruction set (another good one) that you can find online and in books also apply to the MEGA65.
The MEGA65 machine language monitor
The chapter on machine language in The Commodore 64 Programmer’s Guide makes a brief mention of a software tool called “64MON,” and uses it to introduce machine language concepts. I know they did not intend this to be cruel, but it was an insurmountable hurdle for me as a kid that Commodore did not include 64MON or anything like it with the computer.
64MON was a machine language monitor, a tool for examining and interacting directly with the CPU and memory of a microcomputer. An ML monitor is a gateway to accessing the raw power of a machine, so useful that many microcomputers had one built in. The earliest models of the Apple II even booted directly into an ML monitor, instead of BASIC. Commodore added a built-in monitor to their computers starting with the C16. The MEGA65 includes an all-new monitor, written by Bit Shifter.
To start the MEGA65 ML monitor from the READY.
prompt, type the MONITOR
command:
MONITOR
The monitor accepts commands similar to the BASIC environment. For example, to exit back to BASIC, type X
then press Return. The monitor uses the BASIC screen editor, so you can cursor up to previous lines to repeat or modify commands.
The first thing the monitor displays is the contents of the CPU’s working memory: the program counter, the status register, the accumulator, and so on. These values aren’t useful when the monitor first starts with the MONITOR
command, but you can examine them at any time with the R
command.
I won’t describe all of the monitor’s uses and features here, but I do want to mention two of its primary functions: examining and changing memory values, and assembling and disassembling machine language instructions.
Try this command:
M1800
The M
command takes an address in hexadecimal (with or without the $
), then displays a block of memory values in hexadecimal. If you POKE’d that short program into memory earlier, you’ll see the byte values at the beginning of this list, along with a bunch of junk values:
>1800 EE 20 D0 60 8D 41 6A A9 01 8D 40 6A 20 32 52 20 ................
>1810 ...
The command to change values in memory is the >
that appears at the beginning of one of these lines. This is not a coincidence: you can move the cursor up to any line printed by the M
command, change a value, then press Return to modify the memory.
The monitor knows how to assemble instructions directly into memory, using their assembly language mnemonics. To start the assembly process, use the A
command followed by a starting address and the first instruction, then press Return. It converts the assembly instruction to machine code, stores it in memory, then prompts for the next instruction. Press Return without an instruction to end assembly.
Try typing A 1800 INC $D020
, followed by Return. Then enter RTS
on the next line, and finally enter a blank line. Notice that the assembler displays the machine code byte values that it came up with as you type: EE 20 D0
are the bytes for INC $D020
, and the 60
byte means RTS
.
To see the assembly instructions stored at an address, use the D
command (for “disassembly”) followed by the address. Try this now: D 1800
. 1800 EE 20 D0 INC $D020
. 1803 60 RTS
. 1804 8D 41 6A STA $6A41
. 1807 A9 01 LDA #$01
...
The instructions you entered appear at the top. Notice that the disassembly continues into the junk data region. The disassembler doesn’t know that those values are meaningless, so it just displays what those bytes would be as assembly instructions, even though it’s nonsense.
To execute a subroutine from the monitor, use the J
command (for “jump”) followed by an address. When the CPU encounters the RTS
instruction, it returns control back to the monitor, just as it does with SYS
in BASIC. Note that if you write a subroutine that never reaches the RTS
instruction, control will never return to the monitor. In most cases, you can hold Run/Stop and press Restore to return to the READY.
prompt without losing your program in memory.
Here’s a variation of the border color subroutine without an RTS
instruction. Use the monitor to assemble it at address $1800, then run it:
INC $D020
JMP $1800
For more information on the ML monitor’s features, see the manual. I also wrote a tutorial on the MEGA65 Monitor as part of a series on introductory assembly language.
Mega Assembler
The machine language monitor is super useful for understanding how machine language programs work, and for troubleshooting programs and examining memory. Even so, you would not write large programs this way. A full-fledged assembler application gives you essential tools for managing large amounts of code.
Mega Assembler by grubi is an assembler for the MEGA65. It includes a source code editor with built-in help features, and can assemble and run your programs, and save them to disk. Check out the example programs included on the disk image to get a feel for how it works.
One of the most useful features of an assembler is its ability to assign labels to the many numbers that appear in a machine language program. Here’s a version of the border color program for Mega Assembler that is easier to read than the raw instructions you entered into the monitor. (Mega Assembler needs the lines indented as shown.)
BORDER=$D020
*=$1800
START
INC BORDER
JMP START
Next steps
The MEGA65 manuals are the best and most complete reference for the MEGA65, but they do not yet contain introductory material for machine language programming. Thankfully, many books about machine language programming for the 6502/6510 CPU and the Commodore 64 apply to the MEGA65. The memory locations are different, and the MEGA65’s 45GS02 has more advanced features, but C64 programming books are still a valuable resource.
Check Archive.org for scans of popular assembly language books, such as:
- Machine Language for Beginners by Richard Mansfield
- Machine Language for the Commodore 64 and Other Commodore Computers by Jim Butterfield
- Compute!’s Machine Language Routines for the Commodore 64
Despite my struggles with it as a small child without the benefit of the Internet, the chapter of the Commodore 64 Programmer’s Guide on machine language is a good introduction to the topic.
With a built-in machine language monitor, a freely available assembler, and access to hundreds of websites and PDFs, it’s so much easier to learn how to write machine language programs for the MEGA65 today than it was for the Commodore 64 in 1986. I hope you give it a try, if only to get a sense of how the computer works, on its own terms.
See you next month!
— Dan