Monday 22 January 2024

Switching from CC65 to LLVM-MOS for the MEGA65 browser

Up until now, I have been using CC65 for writing the MEGA65 browser. But while CC65 has been a thing of tremendous value for the 6502 community, as one of the only freely available and robust C compilers that can target the 6502, it produces code that is quite large, and quite slow. This was becoming critical for GRAZE, the browser-like thing I am writing for the MEGA65.  I've already turned to using overlays, but even so, the TCP/IP stack alone is about 35KB when compiled using CC65, which doesn't leave a great deal of free space in the first 64KB for the rest of the code, and CC65 doesn't natively support splitting code over multiple segments, so far as I can tell.

Recently, however, a competent port of the LLVM compiler system to target the 6502 has been created. This has really changed the lay of the land for three key reasons in my view:

1. It supports almost all C, C++ and Rust syntax and features in a totally "normal" way, e.g., int is 32 bits and so on;

2. It almost always produces much better and smaller code than CC65 did, which means I can fit more into a given program;

and perhaps almost as importantly as (2):

3. It brings LLVM's fantastic error and warning messages and static analysis to the table. This has already let me trivially fix four sneaky bugs in GRAZE, and that was just during initial porting!

So let's talk about the initial porting process.

I started by modifying the Makefile of GRAZE to support either CC65 or LLVM-mos:

#CC65=  cc65
#LD65=  ld65 -t none
#CL65=  cl65 --config src/tests/vicii.cfg
#MAPFILE=       --mapfile $*.map
#HELPERS=       src/helper-cc65.s

CC65=   llvm-mos/bin/mos-c64-clang -mcpu=mos45gs02
LD65=   llvm-mos/bin/ld.lld
CL65=   llvm-mos/bin/mos-c64-clang -DLLVM -mcpu=mos45gs02
HELPERS=        src/helper-llvm.c

As the MEGA65 libc has also advanced considerably since I last worked on GRAZE, I also reworked how I pull that in:

M65LIBC_INC=-I $(SRCDIR)/mega65-libc/include
M65LIBC_SRCS=$(wildcard $(SRCDIR)/mega65-libc/src/*.c) $(wildcard $(SRCDIR)/mega65-libc/src/$(COMPILER)/*.c) $(wildcard $(SRCDIR)/mega65-libc/src/$(COMPILER)/*.s)
CL65+=-I include $(M65LIBC_INC)

This let me refactor the Makefile rules to be much simpler and shorter:

fetchh65.prg:       $(TCPSRCS) src/fetchh65.c $(HELPERS) include/shared_state.h
        git submodule init
        git submodule update
        $(CL65) -O -o $@ $(MAPFILE) $(TCPSRCS) src/fetchh65.c  $(M65LIBC_SRCS) $(HELPERS)

Apart from that, it's just been a process of fixing a few places where CC65 would automatically promote types (or perhaps wasn't, and it was generating silent bugs), and make sure that the main() function in each program returns int instead of void, and a few other "normalisations" of the code. This was not particularly hard, but did take a little while to go through all the warnings and errors that LLVM was (quite correctly) throwing, and figure out how to fix them in a way that CC65 would still be able to handle.

In the process, I had to fork off the helper assembly routines that are used to make some hypervisor calls, all used to load the overlays.  These are a bit more involved to port, because I don't yet know how handle C calling convention argument passing into assembly routines.  Probably what I will do is make them C functions with inline assembly blocks for the bits that can't be done in C.  

This leads into the related area of all the POKEs and PEEKs that the GRAZE code does to IO registers.  Because LLVM is so smart about optimising, it is liable to optimise these (and any in-line assembly) away, unless it understands that it's vital.  For POKE and PEEK, I can probably deal with this by making them volatile. POKE and PEEK have already been fixed upstream in mega65-libc, which is good.  

Actually, lcopy() and lfill() might also need some special treatment, as they modify memory contents in ways that LLVM doesn't know is happening.  I'm not yet sure what will be needed, if anything.

For the helper routines, it's a bit more complex. There is some helpful information on the LLVM-mos website about this, which in turn does document the C calling convention in LLVM-mos. So it looks like the arguments are passed using imaginary registers in zero-page, which I assume I can just refer to by name. Specifically rc2 and rc3 should hold the pointer to the first argument, if it is a pointer to char.

A bit of a set-back: I thought I had pulled the latest changes to the development branch, but hadn't, so I had to reapply them all, which took a little while. In the process, I can see that the head of development now properly compiles the mega65 libc as a library, rather than just compiling the source files.  It turns out that CC65 produces smaller binaries this way, because it won't exclude functions that are compiled directly from source files, but it only includes functions that are used when they come from a library.  With that change, CC65 now typically produces smaller binaries -- although I am suspecting that it might be a bit of a mirage. I think the LLVM might include global symbols in the binary, while CC65 does not.

Yes, this is indeed what happens: If I declare a global array in a program, when I compile it with LLVM, the binary gets bigger.  All in all, this means that LLVM is probably still producing code that is smaller than CC65's, but it's a bit harder to actually measure the difference. And LLVM still has way better warnings and errors and things, that make development easier, and bugs less likely to go uncaught. So I'm still planning to continue with LLVM.

Now, onto implementing the helper functions.  Having found the calling convention documentation for LLVM-mos, I've swung back to deciding to keep the helpers purely in assembly.  With a little work, I have them compiling -- the question is whether they work correctly, and whether I have interpreted the calling convention correctly.  I can check that by modifying the helper routines to infinite-loop on entry, so that I can check that the arguments are being passed in how I expect.

So let's start by doing that to the read_file_from_sdcard(char *filename, uint32_t load_address) function. This one has the most complicated call signature, and one that does not have a direct equivalent in the LLVM-mos calling convention documentation.  I believe that it will put the pointer to filename in rc2 and rc3, and the 32-bit value in A, X, rc4 and rc5. So let's see if that's the case.  Yes, A and X have the bottom 16-bits. And that is what I am seeing. After fixing a couple bugs, I am now able to use read_file_from_sdcard() to load the font file.  This routine is also the core of the mega65_dos_exechelper() function, which is the next one to test.

That might be working, but I've been waylaid by a problem with DHCP, which is probably due to changing our home internet router between the last time I worked on this and now.  The new router is a fancy Telstra 5G one, and I've seen it be a bit picky in the past, so I'm guessing that's what is going on now, as well.

Actually, it might be an indirect result of that, as the way my MEGA65 connects to the WiFi is, shall we say, a little odd: It goes through a dumb ethernet switch (so far, so sensible), and then to a 40cm Ubiquiti PowerBeam to make the jump from the office to the lounge room, where the router is.  That's a total distance of about 8 metres, somewhat less than the approximately 80,000 metres that the PowerBeam is rated to ;) As a result, the signal strength is about -23dB, instead of the more typical -65dB to -90dB you would see on a wireless link. On the up-side, I should never see a slow link due to WiFi signal strength, since I have 18dB or better of gain.  

What I think is going wrong, is that the Ubiquiti dish is probably locked to the MAC address of our old WiFi router. So I will need to reconfigure it.  I've logged in, and pointed it at the access point, and it seems to connect briefly, before disconnecting. I'm not yet sure which end is disconnecting the other.  This is apparently a known issue with these units, and the solutions seem to largely consist of a lot of rebooting and reconfiguring the AP and PowerBeam until it magically starts working again.  I'm going to start by updating the firmware on the PowerBeam, which will also reboot it, and see if that doesn't solve it. Nope. Next stop, 15 seconds without power.

Well, that's 4 hours of my life I'll never get back.  Basically the PowerBeam won't talk to the Telstra router, so far as I can tell. After considering and trying a whole pile of alternative approaches, I have connected the MEGA65 to the ethernet port of my build box, and set that up to share internet to the ethernet port from it's WiFi connection.  That all works, and ethtest.prg is now able to get an IP address by DHCP, so it looks like that's working.

However the mega65_dos_loadhelper() routine doesn't seem to be working properly.  So on to debugging that. I think the problem here is that LLVM-mos doesn't use a stable SYS entry point for programs it compiles. The range I have seen is like this:

bbs-client.prg:      CBM BASIC, SYS 2065
ethtest.prg:         CBM BASIC, SYS 2068
graze.prg:           CBM BASIC, SYS 2070
grazeerr.prg:        CBM BASIC, SYS 2072
grazeh65.prg:        CBM BASIC, SYS 2072
grazem.prg:          CBM BASIC, SYS 2072
haustierbegriff.prg: CBM BASIC, SYS 2065

So between 2065 and 2072, a range of 8 bytes, inclusive.  I'm guessing LLVM-mos is being clever and making use of the bytes of the BASIC header to match initial values of global variables or some other skullduggery.  

The frustrating part is that I can't for the life of me find where LLVM-mos generates the BASIC header. I can see where the configuration for C64 binaries is when generating a binary. This would suggest that there is a REGION_INIT section that gets built containing the SYS etc. However, that file is in a directory of tests, rather than wherever LLVM-mos generates it.  Ah, found it! It's in the llvm-mos-sdk repository instead. This suggests that the symbol _start gets set with the start address.

Anyway, I could chase my way further down this rabbit hole, or just pragmatically deal with it, by reading the last digit of the SYS address, and using that to determine the correct start address. Ideally I'd parse the entire address, but that's even more annoying, as then I will need a multiply by 10 routine for the decimal and do all the shuffling about required for that.  But, it's probably still the best way. This is the routine that does it in the end, which is a bit easier on the MEGA65 since the 65CE02 added nice instructions like ASW that rotate a 16-bit value in one instruction.

    lda $0800,x
    cmp #$39
    bcs got_digits
    cmp #$2f
    bcc got_digits

    ;; Multiply accumulated value by 10

    ;; multiply by 2
    asw $0101
    ;; stash in $0103-$0104
    lda $0101
    sta $0103
    lda $0102
    sta $0104
    ;; multiply by 4, to get x8
    asw $0101
    asw $0101

    ;; Now add the x2 value
    lda $0101
    adc $0103
    sta $0101
    lda $0102
    adc $0104
    sta $0102

    ;; Now add the digit
    lda $0800,x
    and #$0f
    adc $0101
    sta $0101
    lda $0102
    adc #0
    sta $0102

    bne process_digit

With that in place, I can now start helper programs, so that's the key functionality of the assembly language helper routines re-established.

The next step is to get it actually loading and displaying H65 "web" pages again. I have seen it go through the DHCP and TCP/IP connection process once or twice now, but it's quite unreliable.  What is frustrating, is that I don't know if it's the router or the compiler, or a bit of both.  I guess I can disambiguate those by compiling it again with CC65, and seeing how it goes with that.

Okay, with CC65 I get it making the TCP/IP connection 3 times in a row, while with LLVM-mos, it's a pretty rare event. Thus there is a compiler related issue. There is also something else going on, perhaps an incorrect fixing of one of the LLVM-compatibility problems I had to tackle to get it compiling with LLVM-mos, that is affecting the CC65 builds.

But first, let's get the LLVM-mos built binary going as far as the CC65-built one.  A quick bit of probing reveals that the LLVM one is getting stuck in some hypervisor routine, possibly waiting for the SD card to complete some job. Specifically, the call to load GRAZEH65.M65 from disk. The filename has been setup correctly, so it's not that. 

Hmmm... I wonder if LLVM is optimising away the follow-up hypervisor trap that closes all open files, since it would look like two writes to $D640 in quick succession? Nope, that's all intact. This is really weird: I'd expect the code to be all tied in knots outside of the hypervisor, but not inside it.

It might not be inside the hypervisor that the problem is happening, as it looks like the same sectors are being read again and again. It looks more like the loading of the file is happening over and over. That makes a bit more sense.

Yup! Found the stupid error where I was jumping back into the load file routine instead of actually starting the newly loaded file.  Now it reliably gets to the same place as with CC65, so that's a good bit of progress.

It looks like the TCP/IP connection doesn't actually get established: In fact, no packets flow back from the host that is being connected to.  Is this a problem with the internet sharing, or with the cantankerous Telstra 5G router, or something else? Looks like the internet sharing on the build box isn't forwarding the IP packets to the internet.  How annoying! Fortunately, rebooting the build box sorted it. Most likely the Linux kernel setting for packet forwarding wasn't previously enabled.  

So now it gets a bit further, as in packets come back. But now I am getting HTTP error 400s from the remote webserver.  Suspiciously, I am not seeing the hostname in the HTTP GET request. This might be the first instance of a bug in my corrections, or LLVM optimising something away. Again, let's try with CC65 built binaries and see if it's the same.

With CC65, I get a 404, and the HTTP request looks like this:

GET /showdown65.h65 HTTP/1.1
    Accept: */*
    User-Agent: FETCH MEGA65-WeeIP/20220703

hmm... and now I can't get the LLVM-built one to load a helper program anymore.  Two steps forward, one step back. I wonder if when it was working before, I had a mix of CC65 and LLVM binaries? Anyway, it looks to be the loading of the helper programs that is failing, so I'll just go back to adding some debug facilities into that helper routine, and see what shows up... and magically it worked *sigh*.

One thing I did spot, though: LLVM-mos is using ASCII by default, while CC65 uses PETSCII. This meant that the case of the strings in the HTTP request were being inverted. Maybe that can cause the 400 errors? Nope.

But it does look like the request is being mangled in the LLVM-mos builds, with part of the request string being repeated. Tcpdump of the packet in question looks like this:

10:45:24.531798 IP (tos 0x8, ttl 63, id 5050, offset 0, flags [none], proto TCP (6), length 177) > Flags [P.], cksum 0x869b (correct), seq 1:114, ack 1, win 65535, options [eol], length 113: HTTP, length: 113
    GET /showdown65.h65 HTTP/1.1
    0x0000:  4508 00b1 13ba 0000 3f06 67e2 c0a8 00fe  E.......?.g.....
    0x0010:  4d6f f08d 0440 0050 3a77 306b 1909 98ed  Mo...@.P:w0k....
    0x0020:  b018 ffff 869b 0000 0000 0000 0000 0000  ................
    0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
    0x0040:  4745 5420 2f73 686f 7764 6f77 6e36 352e  GET./showdown65.
    0x0050:  6836 3520 4854 5450 2f31 2e31 0a0d 486f  h65.HTTP/1.1..Ho
    0x0060:  7374 3a20 7777 772e 6261 6467 6572 7075  st:.www.badgerpu
    0x0070:  6e63 682e 636f 6d0a 0d41 6363 6570 743a
    0x0080:  202a 2f2a 0a0d 5573 6572 2d41 6765 6e74  .*/*..User-Agent
    0x0090:  3a20 4752 415a 4520 4d45 4741 3635 2d57  :.GRAZE.MEGA65-W
    0x00a0:  6565 4950 2f32 3032 3430 3132 310a 0d0a  eeIP/20240121...
    0x00b0:  0d     

Which doesn't have the mangling showing... so maybe it was a tcpdump funniness in display. Anyway, that request if we write out the ASCII lines for it looks like this:

GET /showdown65.h65 HTTP/1.1
Accept: */*
User-Agent: GRAZE MEGA65-WeeIP/20240121

Which looks pretty fine to me... Indeed, if I feed that in via telnet on Linux to the webserver, I get a 404, rather than a 400.

So what exactly is going on here? Are the carriage-return/line-feed bytes in the wrong order perhaps? Yes, that seems to have been the problem.

Okay, so now let's remove my helper loader debug code, and see if it still works. And it does.. Weird. Maybe that previous build was wonky in some way.

Anyway, unfortunately the H65 page that the folks at badgerpunch were hosting isn't there any more, so I'll setup a local web server on the lan, and point it at pages on there, to confirm that I can actually load pages.

Then I think I will focus on having ASCII vs PETSCII strings for all the messages that matter, and start re-factoring stuff out that is common in multiple of the helper programs, and moving the network initialisation code out, so that it happens only once on initial launch, so that loading subsequent pages becomes much faster.

Well, pages are now loading again, which is great :) There is something funny going on with proportional text. But I'm fairly sure that that problem is in the md2h65 generation of the page data, as plain-text pages look fine, and embedded images in pages are also loading fine, and without corruption.

It looks like I got part-way through adding support for NCM characters in md2h65, but that it still has problems. It looks specifically as though the number of pixels to trim is too great, resulting in bit 3 of the pixel trim being set when it shouldn't be. Found and fixed the bug. Now pages display properly again, like this:

While there are still a bunch of things to be fixed in the H65 page generation (like reducing vertical space between lines of proportional text, and moving the underline a pixel down, perhaps), it's not working more or less as it did under CC65. So now let's deal with some of the more general issues that we should tackle.

First, I want to improve page loading speed by moving the network initialisation code to the loader, which would do ethernet hardware initialisation and DHCP setup exactly once, so that successive pages can be loaded without big delays.

To do this, I have started refactoring out some of the common routines as well.  Quite a bit of fiddling things around, so that there isn't too much junk visible between page loads, and that the messages that appear during that time have inverted PETSCII/ASCII upper/lower case. It's still not perfect, but much less bad than it was.

Testing the browser I discovered that links that are behind text are now not working, while those behind images are still working.  I'll have to look at the cause of that.  But that will have to wait for the next blog post.


  1. Hey, as an alternative compiler you might check Oscar64 which is built for 6502 in first place.
    I'm building a debugging IDE for it based on VICE

    1. I hadn't heard of that one. You might like to share about it on the MEGA65 discord server.

  2. Have you tried to use LTO (Link-Time Optimization)?

    1. Yes, LTO helps, and is much easier with LLVM-mos.