Cross Development for Fun and Profit, part 2
Cross Development for Fun and Profit, part 2. Dan’s MEGA65 Digest for July 2023.
In last month’s Digest, we introduced cross development, the practice of writing MEGA65 programs using a modern PC. We looked at several tools for writing programs in the BASIC 65 language and in 45GS02 assembly language, and we welcomed a new BASIC-like compiled language with MEGA65 support called XC=BASIC.
Let us take one step further into the world of compiled languages. The C programming language is one of the most widely used programming languages in computing history, suitable for everything from microcomputers to mainframes. It was the language that built the Unix operating system—or maybe it was the other way around. Fifty years later, C is still in widespread use, for better and for worse.
As fun as it is to write in assembly language, larger programs benefit from a language with a bit more structure to manage the inherent complexity of software. There are several cross-development tool chains for writing Commodore programs in C, and these can also be used to write programs for the MEGA65. In this issue, we will try using one of these tools to write our first C program, and walk through the concepts involved. We’ll look briefly at other tools and resources for getting started with C programming for microcomputers. And we’ll consider a much newer language vying to be the C of the next fifty years, called Rust.
Featured Files
We’ve been talking about coding tools lately, so let’s do an all-tools edition of Featured Files!
The Coffeebreak Compiler by TOS22. Matthias wanted a comfortable way to hack on assembly language code directly on his MEGA65 that he could extend with modern conveniences. The result is a lightly structured custom language and IDE that accepts inline assembly code and produces machine code programs, similar to ubik’s Eleven for BASIC. Press the Help key to browse the built-in documentation and tutorials.
MEGA65-Forth alpha by carthibar. MEGA65-Forth is a project to build an interpreter and interactive environment of the Forth programming language for the MEGA65. This is an early release with some features still in progress, but enough has been implemented to start learning the language. It is based on FIG Forth 1.1. Check out the source code and release notes on Github, as well as the Getting Started documentation.
bf65 by dddaaannn (that’s me!). The experimental minimalist programming language known as Brainfuck was invented by Urban Müller in 1993 as an attempt to design a language with as small a compiler as possible that is still Turing complete. The language is famous for its funny name and for having only eight parameterless instructions. I was motivated to write bf65 when I had the idea to take advantage of the MEGA65’s built-in BASIC editor as the editor for bf65 programs, including the ability to combine BASIC and bf65 code in the same program. The D81 disk image includes the bf65 interpreter and several demo programs. See documentation and source code on Github.
C64 for MEGA65 V5 is official!
We’ve mentioned the Commodore 64 core for the MEGA65 a few times in this Digest, and with good reason. For many owners, the potential of the MEGA65 to be an FPGA-based multi-Commodore platform is a major attraction, a chip-level recreation of an entire line of vintage computers that share a non-PC keyboard layout, made from all-new components and modern hardware conveniences. The Commodore 64 core by MJoergen and sy2002 is a huge step toward this potential.
C64 for MEGA65 version 5 is now officially released. For installation instructions, see the May issue of the Digest, and the excellent documentation.
Don’t miss the epic video trailer for the new release!
MEGAVision video now available
The first ever MEGAVision live streaming event took place on July 1st and was a huge success! Hosted by RetroCombs, Gurce, and lydon, MEGAVision featured nine short talks about MEGA65 projects. Watch the replay online, then visit the MEGAVision wiki page for links to resources for everything discussed in the presentation. I was happy to present my bf65 toy language interpreter at the event.
This was so successful that we’re excited to do another one sometime soon. Start thinking about topics you’d like to know more about, or topics you’d maybe like to talk about at the next MEGAVision. If you have an idea for a talk, send a message to Gurce or lydon on Discord.
Huge thanks to the organizers, to everyone who gave a talk, and to everyone who attended!
R4 mainboard announced
The MEGA65 project continues to invest in improving every aspect of our favorite computer. On June 30, Paul posted an in-depth article to his developer blog describing some of the recent improvements made to the design of the MEGA65 main board. The next revision of the main board, known as R4, will start shipping with the next delivery batch later this year.
Several of the improvements in the R4 board are electrical. There’s a fix for an HDMI back-power issue, improved audio quality from the 3.5 mm jack, a new Real-Time Clock (RTC) unit, and improved RF shielding for the major components. The design has also been reworked to use one fewer FPGA chip. These changes resolve some minor issues, but they otherwise do not affect the operation of the computer.
Two changes enhance the capability of the system slightly. There is now additional RAM hardware that can be used by FPGA cores, enabling porting of a larger variety of MiSTer cores in the future. Also, the joystick ports are now bi-directional, a capability used by a few peripherals like the Protovision Protopad and the Commodore version of the AtariLab science kit that I had as a kid.
Nothing about the MEGA65 platform changes with this revision. Any program written for the MEGA65 will run on both R3 and R4 boards. While the RAM hardware is intended to enable broader development of alternate cores, no core currently exists that needs it, and it’ll be a while before one does. Moreover, future cores that don’t need the extra RAM are unlikely to use it, so that they will work on all MEGA65s.
Nevertheless, it’s natural to wonder: will MEGA65 owners with the R3 main board be able to buy the R4 board as a replacement? Hopefully someday! The first priority is to fulfill all of the remaining preorders for complete MEGA65s. Beyond that, it’s not obvious that Trenz Electronic can feasibly stock spare parts, especially pre-assembled main boards with expensive FPGAs. So far, preorders have been fulfilled in batches, and it’ll take time to figure out the costs, risks, and logistics of switching to an in-stock distribution model. I know I’d love to be able to someday buy MEGA65 parts for upgrades, repairs, and building new machines with custom configurations.
For now, I’m excited by how the R4 main board is an investment in the MEGA65’s continued success. More than anything else, it means we will all get more out of our computers, no matter which version we have.
A single-file C program
A compiler is a tool that takes a computer program written in some programming language and generates an equivalent program for the target platform’s machine code. You can write a program in the C programming language, then use a C compiler to generate the equivalent machine code program for the MEGA65.
As with other development tools, a compiler could be a program that runs on the target device directly, like how Mega Assembler runs on the MEGA65. There is not yet a C compiler that runs natively on the MEGA65, but it is possible to use a cross development workflow as we did in the previous issue of the Digest. You create a source file on our PC, then use a cross compiler to generate a MEGA65 PRG file.
As much as I want to, I can’t fit a full C language tutorial in this Digest. Nevertheless, let’s start with a small C program, so we have something to play with:
#include <stdio.h>
int main() {
puts("IT'S A C PROGRAM!\n");
}
Put this in a file named hello.c
. This is our source file.
Most C programs consist of multiple source files, with filenames ending in .c
for definitions and .h
for declarations (“header” files). Exactly one .c
file defines a function named main()
, which is where the program begins. A source file can #include
header files to learn about functions provided by other modules. In this case, #include <stdio.h>
brings in the declarations from a module of the C standard library, including a declaration for the puts()
function used by the main()
function. Refer to any good book on C programming for an explanation of how C programs are organized.
An ideal compiler targeting the MEGA65 would know all about the inner workings of the MEGA65’s CPU, so it can use all of the CPU’s features in its machine code. So far, nobody has written a C compiler that targets the MEGA65 specifically. Instead, we’ll take advantage of the MEGA65’s Commodore lineage and use a cross compiler that targets earlier Commodore computers. Several Commodore cross compilers can be adjusted to generate programs that run on the MEGA65.
Installing llvm-mos
llvm-mos by Daniel Thornburgh and John Byrd is a cross compiler toolchain for target platforms based on the 6502 CPU. It provides a version of Clang for compiling C programs. Thanks to Mikael Lund (wombat on the Discord), llvm-mos supports building programs for the MEGA65. It supports generating code for CPUs in the 6502 lineage up to the 65C02, which will run fine on a MEGA65.
Visit the llvm-mos Github page, then scroll down to find the download link for your PC’s operating system. As usual, macOS requires an additional step to allow unsigned command line tools to run. A complete procedure for macOS users:
- Download llvm-mos for macOS.
tar xzf llvm-mos-macos.tar.xz
xattr -d com.apple.quarantine llvm-mos/bin/*
Ignore the list of “No such xattr” warnings.- Verify that the
mos-mega65-clang
command line tool works:./llvm-mos/bin/mos-mega65-clang --help
As noted in the llvm-mos documentation, you can add the bin/
folder to your command path for convenient access. macOS users, heed the warning that doing so may conflict with other versions of the Clang compiler you may have installed. The Apple Xcode Command Line Tools provide another version of Clang, and you may have this installed if you use Homebrew even if you don’t otherwise use Xcode. You do not need to have llvm-mos on your command path to use it: simply provide the full path to the llvm-mos commands when using them.
You can compile the hello.c
example program with the following command:
./llvm-mos/bin/mos-mega65-clang -Wall -Os -o hello.prg hello.c
This produces the file hello.prg
that you can run in Xemu or on your MEGA65, as we discussed in the previous issue. It loads to the top of BASIC memory with a BASIC bootstrap header, similar to the one we used for our assembly language program.
The equivalent program
So far, a compiler sounds a lot like an assembler: it takes a source file and produces a PRG file that does what we want. What’s different is how the tool decides what machine code to generate. When we write a program in assembly language, we must say exactly which machine code instructions the target machine’s CPU should execute. Assembly language mnemonics like LDA
represent machine code instructions in a somewhat-human-readable syntax, and the assembler translates the assembly language to machine code, one instruction at a time.
When you write a program in a compiled language, the compiler reads through your program, then decides which machine code instructions will do what you intend. You can ask llvm-mos to show you the assembly language program that it came up with for your C program with this command:
./llvm-mos/bin/mos-mega65-clang -Os -o hello.s -Wl,--lto-emit-asm hello.c
This generates an assembly language listing as the file hello.s
. Open this in a text editor to see the result. It is quite large because it includes the code for the puts()
function from the standard library.
To get a better idea of what compilers can do, consider a simpler program that doesn’t involve a call to a large function:
int square(int num) {
return num * num;
}
int main(void) {
return square(5);
}
There’s a website called Compiler Explorer, aka “GodBolt.org,” that can show C code and the compiled assembly code side by side directly in your browser. It supports many compilers and target CPUs, including llvm-mos and the MEGA65, so you can see the result in 6502 assembly language.
Follow this link to see one possible version of square()
and main()
produced from this C code.
In this example, you can see how llvm-mos converts the square()
function into several dozen instructions. The labels that begin with two underscores, such as __rc0
, refer to addresses in the base page that llvm-mos sets up to be used as temporary variables by the generated code. Later in the listing is the implementation for the main()
function that sets up the argument (5
), calls the square()
function, and ends with the result in a variable. The C language requires that the main()
routine returns a number, which is used by some operating systems as a result code from a program. The MEGA65 just ignores this return value.
But that’s just one possible assembly language program. Check out this version with the -Os
flag enabled.
Given the -Os
flag, llvm-mos generates different assembly language code for the same C program. It’s much shorter and much faster than the previous version. How can a compiler look at the same program and produce two drastically different results?
In the first version, llvm-mos took our C program literally: we wrote it as two functions, so it generated code for two functions. In this new version, llvm-mos studied the C source code, realized that this program always returns the number 25, so it produced machine code that returns the number 25. Both of these assembly language programs are equivalent to the original C program: they both return 25.
An optimizing compiler uses many complex techniques to calculate the smallest and fastest machine code possible that is equivalent to your original program. The -Os
flag enables all of llvm-mos’s optimization rules. Optimization is typically smart enough that you can leave it enabled for nearly all purposes. It’s not always perfect—I can’t explain why llvm-mos generates a few instructions for the square()
function in the optimized version even though it never calls them—but it’s a huge benefit.
Just as modern chess computers can play chess better than any human, modern compilers can write machine code so efficient that it’s almost not worth writing in assembly language at all. We still write assembly language programs for the same reason we still play chess: for fun.
CMake, mega65-libc, and mega65-llvm-template
Running the mos-mega65-clang
command is fine enough for simple one-file experiments. When your project grows to multiple source files or involves third-party libraries, you will want a build management tool. CMake and Ninja are popular with the modern C crowd. (I still use GNU Make, but I’m old school.)
One third-party library you’ll want to consider early on is mega65-libc, the MEGA65 project’s own set of utility functions. This library provides functions that access files from a disk or the SD card, access memory, power a console-like user interface, and more. The library is useful enough for things like debugging that you pretty much always want to include it in your project, even if you don’t intend to use most of it. The compiler will only generate code for the functions that you use.
Thanks to kibo, there’s an easy way to get everything you need to set up your MEGA65 C project, including CMake and mega65-libc. mega65-llvm-template is a Github project template that you can either use to create a new Github repository, or just download to try it out. See the template’s documentation for instructions.
MEGA65 C tips and tricks
Before you dive into a good book on the C language, it’s important to note that llvm-mos has a few limitations when it comes to standard C. For starters, it does not support the float
or double
fractional number types. The 6502 CPU does not have built-in support for floating point math, and llvm-mos does not provide a floating point math library. If you’re ambitious, you could consider third-party assembly language routines for this purpose, such as “Floating Point Routines for the 6502” by Steve Wozniak, from the magazine Dr. Dobb’s Journal in August 1976.
llvm-mos provides a stripped-down version of the C standard library implemented for Commodore computers, using C64-style kernel calls for things like printing messages. The implementation is incomplete, and some functions may not work correctly on the MEGA65. It’s worth trying, just don’t expect everything to work.
If your C program returns from main()
, either with the return
statement or by allowing control to reach the end of the function, the program llvm-mos generates will attempt to return to BASIC. This doesn’t always work with the MEGA65 kernel. I sometimes get weird display artifacts, false BASIC errors, or crashes when returning from main()
. Consider ending your program with an infinite loop (while(1);
) and not relying on exiting to BASIC.
In C, character values ('x'
) and string values ("xyz"
) are treated as ASCII byte values. These are neither PETSCII values nor Commodore screen codes, so you will need to account for this when defining and printing text messages to the screen. Using uppercase ASCII letters in your C source code along with the MEGA65’s default uppercase character mode will print uppercase letters.
If you’re writing a game—or pretty much anything, honestly—your program will need to access registers and memory addresses directly, similar to POKE
and PEEK
from BASIC, or lda
and sta
in assembly language. This is possible in C, taking care to use C’s pointer dereferencing features correctly. Use an appropriate value type for bytes (such as uint8_t
from stdint.h
), and be sure to refer to an I/O register as volatile
to prevent the optimizer from taking shortcuts. mega65-libc’s memory.h
provides useful macros and functions for accessing registers, so it looks cleaner in your code and you don’t have to remember the details:
#include <stdint.h>
// From mega65-libc:
#include <memory.h>
...
POKE(0xD020, 5);
uint8_t joy2 = PEEK(0xDC00);
// Without mega65-libc:
(*(volatile uint8_t *)(0xD020)) = 5;
uint8_t joy2 = (*(volatile uint8_t *)(0xDC00));
Programs generated by llvm-mos start by unmapping the BASIC ROM to make more room for program memory. The program code starts at address $2001
, including the BASIC bootstrap program. The space beyond the end of the program up to address $CFFF
is used for memory managed by the C language, a total of 44 kilobytes for both program and memory. Memory allocated by the program using the C standard library malloc()
function, a mechanism known as the heap, starts just after the program and “grows upward” toward higher addresses. The stack, used for keeping track of function calls and local variables, ends at address $CFFF
and “grows downward” toward lower addresses. The I/O registers remain in $D000-$DFFF
, and the kernel ROM in $E000-$FFFF
.
Sometimes you’ll want the ability to write assembly language code inside your C code. Clang supports the GCC inline assembly syntax, expecting to output 6502 assembly language for llvm-mos’s assembler. This requires giving the optimizer some hints about what the assembly code does, so it can make decisions about using CPU registers and relocating code. The following example from mega65-libc takes an unsigned char
variable from the C code and calls a “Hypervisor trap” with a bit of assembly language, informing the optimizer that this code must be executed in this location (volatile
) and that it uses the accumulator ("a"
). Notice how the assembly language text is a single C string with newline delimiters (\n
).
unsigned char the_char;
void debug_msg(char* m)
{
// Write debug message to serial monitor
while (*m) {
the_char = *m;
asm volatile("lda the_char\n"
"sta $d643\n"
"clv" ::: "a");
m++;
}
// ...
};
A sophisticated program might want to install interrupt handlers that call C functions, such as a raster interrupt handler that is called once per frame. This is an advanced technique that requires disabling interrupts, setting the address of the function, and updating the MAP register to disable the kernel. I won’t include a complete example here, but here’s one way to get the address of a C function as two bytes, using volatile
once again to make sure the optimizer keeps the function even if it isn’t called by the rest of the program.
volatile void do_frame(void) {
// ...
}
// ...
unsigned char addrlow = (long)do_frame & 0xff;
unsigned char addrhigh = ((long)do_frame >> 8) & 0xff;
Enable compiler warnings. The C language gives you plenty of opportunity to make mistakes, and Clang is very smart about noticing potential issues in your code. The command line flags -Wall -Wextra -Wpedantic
tell the compiler to go crazy with the constructive feedback. To enable this using the mega65-llvm-template, add this to the file CMakeLists.txt
:
target_compile_options(hello.prg PRIVATE
-Wall -Wextra -Wpedantic
)
Learning more about C
The most famous book for learning the C programming language is The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie. Ritchie invented the C language, and this concise book is a computer science classic.
My favorite book for learning C is C Programming: A Modern Approach, 2nd edition, by K. N. King. It’s excellent for self-learning with high quality and well tested examples and exercises. I’ve literally read this book cover to cover and have done nearly all of the exercises, and I still go back to it to remind myself of language intricacies.
I wouldn’t be surprised if many of the examples from these books worked on the MEGA65, but I would be surprised if they all did. If you’re new to the C language, I would recommend starting with a C compiler for a modern computer first: Xcode for macOS, GNU C for Linux, Visual Studio Community for Windows. You will only need a subset of C’s features to write substantial programs for the MEGA65.
There are several other C compilers that work reasonably well for MEGA65 development. cc65 is popular with C64 programmers and has a long history. It is used by the MEGA65 project in many places. KickC is another, using the venerable Kick Assembler for generating machine code. Calypsi by hth313 is a C and assembly development toolkit for several vintage computers. Calypsi’s MEGA65 support is in early stages, but it already supports 45GS02 instructions, 64-bit integers (long long
), and floating point math.
I chose llvm-mos for this Digest because it takes advantage of the LLVM ecosystem to have the most complete implementation of the C language and the best optimizer. With a built-in MEGA65 configuration, it’s also just the easiest to get running for MEGA65 projects.
Other llvm-mos tools
The LLVM project reinvents the way cross compilers are built: instead of a single compiler dedicated to a single language and target system, LLVM is divided into separate layers for the language, the optimizer, and the target. llvm-mos is a target backend for Commodore computers, and is able to take advantage of a suite of cutting edge tools and language frontends. Because it is based on LLVM, llvm-mos is able to power multiple language compilers, and use the state-of-the-art optimizer from the LLVM core.
The llvm-mos package you just installed includes not only the C compiler called Clang, but also a C++ compiler Clang++. Start writing C++ programs for your MEGA65 today with the mos-mega65-clang++
command!
The clang-tidy
command is a tool known as a static analyzer. It can examine your C code, find bugs, and recommend improvements. Programming IDEs like Visual Studio Code can be configured to run such tools automatically to highlight problems as you are coding. Tell VSCode to use the language server included with llvm-mos, called clangd
.
You can find all of these and more in the bin/
folder of llvm-mos.
Rust
Take everything you’ve just learned about C and throw it in the bin! The new hotness that all the kids are on about is Rust.
#[start]
fn _main(_argc: isize, _argv: *const *const u8) -> isize {
let mut rng = sid::SIDRng::new(c64::sid());
for offset in 0..80 * 25 {
let character = [77u8, 78u8].choose(&mut rng).copied().unwrap();
unsafe {
mega65::DEFAULT_SCREEN
.add(offset)
.write_volatile(character)
};
}
println!("HELLO MEGA65 FROM RUST");
0
}
Rust is an important new language that intends to be just as powerful as C while making it easier to write programs that are more secure and have fewer bugs. I won’t go into detail on Rust for this Digest—I’m still learning it myself—but I wanted to take a moment to highlight the work wombat has done to bring Rust programming to the MEGA65. See wombat’s MEGAVision talk for a brief video overview.
rust-mos by Mariusz Krynski is a version of the Rust compiler customized for Commodore microcomputers. It requires llvm-mos, with modified installation instructions. Setting up the tool chain is somewhat involved, so read the README file carefully, or consider using a pre-made Docker image if you’re familiar with how to do that. wombat’s mos-hardware is a Rust “crate” with libraries and examples for interfacing with MEGA65 hardware. There’s also a project template: mos-hardware-template.
For learning the Rust language, start with Learn Rust, the official online book and other resources. We’re still figuring out the best way to write Rust programs for the MEGA65, so be sure to join us in the #rust channel on the MEGA65 Discord.
Compared to assembly language, languages like C and Rust provide abstractions that make programs easier to write and understand. They are essential tools for producing large programs whose complexity can be difficult or expensive to manage. Even without perfect compilers that take full advantage of the MEGA65 CPU and memory system, the tools currently available provide plenty of power for MEGA65 development. I still try to write in assembly language occasionally to feel a sense of camaraderie with early programmers that didn’t have these tools, but it’s good to know that they’re just an arm’s reach away.
Happy coding!
— Dan