Making a C64/C65 compatible computer: May 2019

Tuesday 28 May 2019

Laser-cutting a spacer for the MEGAphone prototype

Together with my students we have continued to make progress on the prototype MEGAphone. I've spoken previously about the current state off the PCB, which as a first revision has a few corrections and bodge-fixes required. This means we have some funny lumps and bumps on it. Also, we know we need to lift the screen 2mm off the PCB because the red LEDs are too close to each other. So I decided to make a laser-cut spacer to lift the screen. Actually, I will make a set of laser-cut layers that I will stack on top of each other to hold all the top-side parts in place, mainly the screen and buttons and speaker. This will suffice as a quick-and-dirty case for the prototype, since there is little point making a more sophisticated case for a one-of.

I spent a couple of hours with a ruler and the PCB to work out where all the holes in the plastic spacer should be, and was ready to hit the laser cutter. However, caution says one should test first, before cutting plastic, so I did several test runs just cutting a sheet of paper. Here is the laser cutter doing its thing. Sorry for the poor quality images, I only had my phone as camera, and its camera is getting a bit sad these days.

This turned out to be a great strategy, as I could layer the paper over the PCB, and check the fit of all the cut-outs and holes:

There were quite a few little fiddly bits to get right, as well as some insights from the paper templates that were helpful. For example, the very thin bridge between the buttons I have resting the hole here and the collection of bodges was clearly too thin to work in the plastic:

That was fairly easily fixed with a bit of creative line drawing:

Then a bit more fit testing with the buttons in place:

It's starting to look right :)

Then it was finally time to cut the template from clear 2mm thick acrylic, lay it in place, confident that it would fit, and put the screen and buttons in place so that I could see how it would look:

For a quick-and-dirty one-off this is lookinig pretty nice, I think. Next step is to cut the layer that will go on top of this, and hold the buttons in place with smaller holes only big enough for the plastic parts, like the blue joy pad here. It will also hold the rear of the screen in position. Then another layer will go on top of that, that will have the screen cut out big enough for the black frame around it, and slightly smaller holes again, so that the joypad and fire buttons can't fall out. I might jet do that one in beige, as I think that will look really nice. Then I will probably put a couple of clear strips down either side of the screen, so that the screen itself can't fall out, and that will be the top-side of the prototype case.

I'll then repeat much the same process for the rear-side, so that there is an enclosed compartment for the battery. I'll likely use some spacers bushes on the rear-side, so that I don't need to have too many layers. Three should be enough, so that the battery is safely held in place. Then the whole thing will be held together with some screws that go through several 3mm holes that will be present in all the laser-cut sheets, and that line up with matching holes in the PCB.

MEGAphone screen testing

Just a quick post about recent testing of the interface for the screen on the MEGAphone rev1 PCB.

First, we knew once the PCBs had been ordered that the screen connector was in the wrong place, and will have to be shifted sideways. Then a bit further exploration revealed we had the display enable line tied low instead of high, which had been causing the display to remain blank. With that fixed (seethe thin blue wires near the screen connector), and a bit of the usual jiggery-pokery, we finally had an image on the screen:

We then also found that the viewing angle of these particular screens is all wrong for us: It expects the screen to be used upside-down. When viewed from the wrong angle it is quite dim, and there is blurring and doubling of the image, as the various anti-glare and polarisation filters are all misaligned from that perspective, as the following images show.

However, as the MEGAphone and MEGA65 have no frame buffer, we can't just easily invert the image. So we are looking for another supplier of compatible screens. This is not too hard, as these 800x480 4.3" screens are a dime a dozen, and are manufactured by a whole bunch of Chinese companies. We are talking with a supplier now about one that not only claims full 180 degree viewing angle, but also 2x brightness. This will be very welcome.

Just for fun, we also enabled the on-screen-keyboard to see how it looks. We might still need to darken the background behind the keyboard (it is composited over the top of the regular display) a bit more. I'll make a video of this when we get a bit further, as the animation of it appearing looks quite nice already. I also hooked up one of the buttons so that pressing the button automatically makes the keyboard appear or disappear: No software cooperation required!

Anyway, that's it for now. I did say it would be a short one :)

Wednesday 8 May 2019

Working to reduce the attack surface for copyright infringement claims against our open-source C64 ROMs

As I explained yesterday, we have begun making an open-source set of ROMs that can be run on a C64 or compatible computer.

Today, I want to explain a little about some of the specific measures we are taking to avoid any possibility of copyright infringement. In particular, we are going what we believe to be above-and-beyond to ensure that our alternate ROMs are free of any potential claim of copyright infringement.

There are two reasons for this, that rest in the primary reasons for establishing this project in the first place:

1. To ensure that the rights holders of the original C64 ROMs can quickly determine that they don't need to worry about us infringing on their proprietary rights. We don't want to waste their time or effort.

2. To ensure that users of our alternate ROMs can do so with maximum confidence. I say maximum and not complete confidence, because with anything legal, nothing is truly certain.

In short, the project exists to protect the rights holders of the original C64 ROMs interests, by giving the community a clear option for running emulators and C64 compatible computers, without potentially infringing on any proprietary rights.

So, with this in view, our approach is one of an "abundance of caution". That is, in many places and ways, we are being way more cautious than we believe the law would require us to be. This is, again, so that we can establish a clear moat around our work, so that as far as is possible, all doubts can be excluded, both for the original C64 ROM rights holders, and for us.

Simply having an argument that our work is free of infringements is not enough for this approach, however sound that argument might be. Rather, we want to remove any possibility of arguments that we have infringed, and where possible erect multiple unasaillable arguments that no infringement has occurred. Put another way, we want to have multiple walls around our castle, and at the same time, work carefully to make sure that there are no secret passages, caves, wells leading to underground rivers or any other thing that would undermine the intellectual property fortress we are building. So lets have a look at some of those defences:

Use source control!

The first, is to simply use source control. The reason for this is that it creates a history of the creation of our ROMs, complete with all the mess-ups and steps along the way. That history goes back to the first lines of code written, and provides a strong line of evidence for the creation of our ROMs from scratch.

Thus should any segment of the ROMs end up being similar or identical to that of the original C64 ROMs, we can demonstrate that such similarity occurred through cooincidence or necessity (there are only so many ways to do certain things).

To make this defence as strong as possible, the mantra of "commit early, commit often" is vital, so that the in-between steps as code is written are recorded. If we only commit finished routines, then there is no evidence of working that could be used to substantiate the claim of originality.

Commit messages should also indicate the reason for implementing new functionality or locating routines at specific addresses, e.g., "Implement routine XXX at location YYY required by game ZZZ". Thus we end up with commit messages like:

commit 810dfb75afc59a1349a4fc83e962ef8357bc1ee1
Author: Paul Gardner-Stephen <paul@servalproject.org>
Date: Sun May 5 16:31:24 2019 +0930

put setup VIC-II routine at $E5A0 for Advanced Pinball Simulator. Issue #23
This is a good message, because it explains what was done in terms of an action that would natrually increase similarity with the original, in this case by putting a routine at the same position as in the original ROM, but then justifying that by identifying a piece of software ("Advanced Pinball Simulator" in this case) that requires it to be there, typically because the software directly calls that address, and tags an issue where the compatibility problem was reported.

That is, we start with a simple ROM re-implementation has is so manifestly different and incomplete that it is obviously not an infringing work, and then refine it over time based on clear compatibility improvement justifications, so that similarities with the original ROMs over time will be explained at every step along the way.

Indicate sources, causes and reasons in comments in the source

Also, all through the source we should indicate the source of information that led to the writing of a particular snippet of code. That might be a page out of the C64 Programmer's Reference Guide, Compute's Mapping the 64, or evidence gained from running an emulator or instrumenting a real C64 to discover the entry points into the ROMs used by software that we wish to be compatible with. This helps to remove any claim that we have simply copied functionality from the C64 ROMs by copying code.

Instead, by demonstrating that there is a need for a screen-clear routine at $E544 by referencing such sources, we are showing that this is a functional requirement of the C64 ROM, and must be implemented. Using that routine as an example, look at the number of references in this single routine:

    ;; Clear screen and initialise line link table
    ;; (Compute's Mapping the 64 p215-216)

clear_screen:

    ;; Clear line link table
    ;; (Compute's Mapping the 64 p215)

    lda #$00
    ldy #24
clearscreen_l1:
    sta screen_line_link_table,y
    dey
    bpl clearscreen_l1

    ;; Y now = #$FF


    ;; Clear screen RAM.
    ;; We should do this at HIBASE, which annoyingly
    ;; is no ZP, so we need to make a vector
    ;; (Compute's Mapping the 64 p216)
    ;; Get pointer to the screen into current_screen_line_ptr
    ;; as it is the first appropriate place for it found when
    ;; searching through the ZP allocations listed in
    ;; Compute's Mapping the 64
    sta current_screen_line_ptr+0
    lda hibase
    sta current_screen_line_ptr+1
    ldx #$03        ; countdown for pages to update
    iny             ; Y now = #$00
    lda #$20        ; space character
clearscreen_l2:
    sta (current_screen_line_ptr),y
    iny
    bne clearscreen_l2
    ;; To draw only 1000 bytes, add 250 to address each time
    lda current_screen_line_ptr
    clc
    adc #<250
    sta current_screen_line_ptr
    lda current_screen_line_ptr+1
    adc #>250
    sta current_screen_line_ptr+1
    lda #$20        ; get space character again
    dex
    bpl clearscreen_l2

    ;; Clear colour RAM
    ;; (Compute's Mapping the 64 p216)
    lda text_colour
clearscreen_l3:
    sta $d800,y
    sta $d900,y
    sta $da00,y
    sta $db00-24,y        ; so we only erase 1000 bytes
    iny
    bne clearscreen_l3

    ;; (Compute's Mapping the 64 p216)
    jmp home_cursor
As can be seen, almost every single action in the routine is justified in terms of public information about what this routine must do.

Automatically scan for any identical byte sequences of > 2 bytes

Finally, we have created a tool that looks for any string of at least 3 bytes length that matches between two files. We use this to find any byte sequences that match with the original ROMs, even if the matches are not in the original location. Then for each such match, we provide an explanation in a file in the strings/ directory, where the name of the file is the sequence of matching bytes written in hexadecimal.

Once an explanation has been provided in one of these files, the match is no longer reported by the tool, unless run with --verbose, in which case all the string explanations are shown. This allows for a quick report to be generated that provides an explanation of every significant match.

There are strong arguments why considering matches of only length 3 bytes is excessive, as it is practically impossible to imagine a three byte sequence in the C64 ROMs that could be copyright. But again, out of an abundance of caution we explain even those matches, including when it is just random fragments of CPU instructions, so that a cursory examination of our software can in just a few minutes hopefully satisfy any rights-holder that no infringement has occurred.

This library of reasons also provide a strong defence in the event that any claim of infringement were nonetheless to be made: First, it demonstrates our candour, in that we are not hiding the matches, but making them public -- including to any rights holders. Second, when a claim of infringement is made, it makes it very easy for us to respond by asking which bytes are infringing, and to then point the claimant at the reasoning why those bytes do not constitute an infringement. The onus is then on them to justify why the explanation is not adequate, and it is likely that any court would accept an argument from our side for any claim to be immediately dismissed, and perhaps with prejudice (which means that they can't raise the same claim again), because we will have ready-at-hand prima-facie evidence that there is in fact no infringement.

As an example of how comprehensive even the short-form report is, that displays only the first line of information about why there is no infringement is, here is the current output when comparing the original KERNAL and our ROMs at the time of writing:

$ src/similarity kernal newrom verbose
Searching files for similarities...
Ignoring $0012 = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $0017 = $47A6 + 3 (fragment / SEC / SBC #$xx)
Ignoring $0018 = $02F2 + 3 (SEC / SBC #$01 - subtract 1 from A.)
Ignoring $0112 = $0BDE + 4 (JSR $FFCF / branch based on C flag)
Ignoring $0122 = $4926 + 3 (Fragment / RTS / JSR)
Ignoring $0136 = $1018 + 4 (Push memory location to stack)
Ignoring $013C = $101E + 4 (Fragment + Load X from memory location.)
Ignoring $01A7 = $0B82 + 4 (Store X and Y in ZP location pair)
Ignoring $020A = $4CBB + 3 (fragment / PLA / PLA)
Ignoring $0299 = $059F + 3 (EOR #$FF / STA $xx)
Ignoring $02ED = $09AD + 3 ($05 byte after some leading $00s)
Ignoring $037D = $0A23 + 4 (Instruction fragment, put zero into ZP location somewhere.)
Ignoring $038C = $029C + 3 (Conditionally JMP somewhere.)
Ignoring $03A2 = $02E9 + 6 (increment a pointer in ZP)
Ignoring $03B3 = $0D6A + 3 (substract $30 from A. Not copyrightable.)
Ignoring $040A = $0F9A + 4 (SEC / JSR $FF99 - Read top of memory using public KERNAL API)
Ignoring $0417 = $0F93 + 4 (fragment + TYA + STA ($nn),Y)
Ignoring $0430 = $0531 + 4 (subtract something from a ZP location)
Ignoring $0435 = $0536 + 4 (Subtract the contents of one ZP location from the contents of another.)
Ignoring $0461 = $0008 + 3 (First three letters of BASIC)
Ignoring $0462 = $055A + 5 ("ASIC ")
Ignoring $048B = $0008 + 3 (First three letters of BASIC)
Ignoring $048C = $055A + 7 (BASIC V2 string)
Ignoring $04DA = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $04DB = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $04DC = $4520 + 3 (Fragment / STA ($F3),Y)
Ignoring $0518 = $4A3A + 5 (Setup VIC-II registers and load A with 0)
Ignoring $0574 = $46B0 + 5 (CLC / ADC #40 / STA $D3)
Ignoring $0575 = $48CB + 4 (ADC #40 / STA $D3)
Ignoring $0594 = $4B9B + 3 (Fragment of JMP instruction.)
Ignoring $0599 = $4A39 + 4 (instruction fragment followed by JSR to VIC-II register setup.)
Ignoring $05BB = $4A03 + 5 (fragment / STA $0277,X / INX)
Ignoring $05C8 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $0606 = $45EC + 4 (LDA ($D1),Y / CMP #$20)
Ignoring $060E = $49D7 + 4 (fragment / INY / STY $C8)
Ignoring $0632 = $4E29 + 3 (Fragment of sequence to save registers on the stack. Not copyrightable.)
Ignoring $0633 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $0634 = $498B + 5 (TXA / PHA / LDA $D0 / BRANCH somewhere)
Ignoring $063A = $44FA + 5 (LDY $D3 / LDA ($D1),Y / STA $xx)
Ignoring $0654 = $46C2 + 3 (INC $D3 / JSR somewhere)
Ignoring $0676 = $4A83 + 3 (PLA / TAX / PLA)
Ignoring $0680 = $5193 + 4 (LDA #$FF / CLC / RTS)
Ignoring $0682 = $4C8B + 3 (CLC / RTS / CMP #$xx)
Ignoring $069A = $472F + 5 ( FRagment / BRANCH <skip next instruction> / DEC $D8)
Ignoring $06AA = $472E + 4 (LDA $D8 / BEQ <skip 2 byte instruction>)
Ignoring $06B0 = $4A83 + 3 (PLA / TAX / PLA)
Ignoring $06B3 = $4EBA + 3 (CLC / CLI / RTS)
Ignoring $06E8 = $46B0 + 4 (CLC / ADC #40 / STA $nn)
Ignoring $06F6 = $480E + 3 (fragment / DEC $D6)
Ignoring $06F7 = $4631 + 3 (DEC $D6 / JSR $nnnn)
Ignoring $0717 = $45B7 + 6 (STA $D7 / TXA / PHA / TYA / PHA)
Ignoring $0719 = $5119 + 4 (Preserve registers on stack)
Ignoring $071A = $0A4B + 4 (Push Y and A onto the stack. Load A with something.)
Ignoring $0729 = $45D2 + 4 (fragment + branch based on comparison of A with constant.)
Ignoring $072E = $45DD + 4 (JSR (output carriage return) + fragment)
Ignoring $0762 = $464A + 17 (Copy screen + colour RAM to the left one place)
Ignoring $0773 = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $0776 = $0B80 + 3 (Fragments of instructions. Not copyrightable.)
Ignoring $0777 = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $0778 = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $0779 = $47EB + 3 (Fragment / STA ($F3),Y)
Ignoring $07A1 = $4631 + 3 (DEC $D6 / JSR $nnnn)
Ignoring $07F4 = $45EC + 4 (LDA ($D1),Y / CMP #$20)
Ignoring $080A = $460A + 7 (Copy screen + colour RAM to the right one place)
Ignoring $0810 = $4604 + 7 (Copy screen + colour RAM to the right one place)
Ignoring $0816 = $4610 + 5 (DEY / CPY $D3 / BNE <backwards>)
Ignoring $081B = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $081E = $0B80 + 3 (Fragments of instructions. Not copyrightable.)
Ignoring $081F = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $0820 = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $0821 = $47EB + 3 (Fragment / STA ($F3),Y)
Ignoring $083C = $46CE + 5 (LDA $D3 / SEC / SBC #40)
Ignoring $083D = $48FA + 4 (Fragment / SEC / SBC #40)
Ignoring $083E = $47A7 + 3 (SEC / SBC #40)
Ignoring $084F = $48D7 + 4 (Fragment followed by LDA #$00 / STA $xx)
Ignoring $086D = $46FA + 5 (JSR $E544 (clear screen) surrounded by fragments)
Ignoring $0880 = $491B + 4 (INX / CPX #$19 / BRANCH somewhere)
Ignoring $08A8 = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $08B5 = $48F3 + 4 (LDA #$27 / CMP $D3)
Ignoring $08BA = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $08DA = $497B + 16 (List of C64 colour codes)
Ignoring $0916 = $4914 + 4 (LDX #$00 / LDA $D9,X)
Ignoring $09CB = $4551 + 3 (SOMETHING $0288 / LDA )
Ignoring $09D4 = $47B4 + 4 (LDA ($AC),Y / STA ($D1),Y)
Ignoring $09D8 = $47B0 + 4 (LDA ($AE),Y / STA ($F3),Y)
Ignoring $09E5 = $47D0 + 4 (LDA $AE / STA $AD)
Ignoring $09FA = $4551 + 4 (Most likely LDA $0288 / STA $D2 - Set upper half of screen RAM address)
Ignoring $0A0A = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $0A1C = $451A + 4 (LDY $D3 / STA ($D1),Y)
Ignoring $0A24 = $4950 + 6 (STA $D1 / LDA $F3 / STA $D2 - manipulate screen pointers)
Ignoring $0A52 = $4504 + 5 (LDA ($F3),Y / STA $0287)
Ignoring $0A7F = $4A7F + 10 (Tail end of standard CIA-triggered interrupt, followed by start of another routine)
Ignoring $0A81 = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $0A82 = $4762 + 3 (PLA / TAY / PLA / TAX sequence)
Ignoring $0A88 = $4B09 + 4 (Fragment + STA $028D)
Ignoring $0AAA = $4B87 + 4 (PHA / LDA $DC01)
Ignoring $0AC1 = $4BBB + 6 (ORA $028D / STA $028D)
Ignoring $0AC3 = $4BCE + 4 (Fragment + STA $028D)
Ignoring $0B35 = $4ABF + 5 (Use X register to compare contents of a ZP and absolute address location)
Ignoring $0B3C = $4A04 + 4 (STA somewhere offset by X, increment X -- Loop fragment)
Ignoring $0B47 = $4B01 + 4 (RTS + LDA $028D)
Ignoring $0B49 = $4BEE + 5 (something with $028D / CMP #$03 / BNE somewhere)
Ignoring $0B59 = $4C00 + 8 (LDA $D018 / EOR #$02 / STA $D018)
Ignoring $0B80 = $4CC4 + 16 (List of key codes generated when certain key combinations are pressed)
Ignoring $0B91 = $4CD5 + 36 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BB6 = $4CFA + 5 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BC2 = $4D05 + 9 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BC3 = $4D46 + 7 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BED = $4D30 + 9 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BF7 = $4D3A + 5 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C03 = $4D05 + 8 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C04 = $4D46 + 14 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C13 = $4D55 + 16 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C24 = $4D66 + 19 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C39 = $4D7B + 4 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C80 = $4D8D + 7 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C88 = $4D95 + 24 (List of key codes generated when certain key combinations are pressed)
Ignoring $0CCD = $0999 + 4 (A single $08 byte in a field of $00's)
Ignoring $0CDF = $0033 + 3 (3 bytes of ascending value)
Ignoring $0CE7 = $4E74 + 4 (The string "LOAD". Not copyrightable)
Ignoring $0D2E = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0D2F = $51B2 + 7 (Set a bit in $DD00)
Ignoring $0D85 = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0D89 = $51AB + 5 (set a bit in $DD00, Return from subroutine, read $DD00)
Ignoring $0D9B = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0D9E = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0DBF = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0DC0 = $5180 + 6 (clear a bit in $DD00)
Ignoring $0DC4 = $51A3 + 3 (Do something with $DD00 and return from sub-routine.)
Ignoring $0DEE = $516D + 3 (RTS / SEI / JSR somewhere)
Ignoring $0DF3 = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0DF4 = $51B2 + 7 (Set a bit in $DD00)
Ignoring $0E2F = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0E83 = $50F5 + 3 (CLC / RTS / JSR $xxxx)
Ignoring $0E84 = $50D3 + 6 (Return from sub-routine, clear a bit in $DD00)
Ignoring $0E86 = $50DF + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E89 = $4FF6 + 3 (instruction fragments.)
Ignoring $0E8B = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E8C = $51AF + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E8D = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0E8E = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0E8F = $51BB + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E91 = $51AB + 9 (set a bit in $DD00, Return from subroutine, read $DD00)
Ignoring $0E94 = $51A3 + 7 (Do something with $DD00, return from subroutine. Do something else with $DD00)
Ignoring $0E95 = $51CA + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E96 = $50D3 + 5 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0E98 = $50DF + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E9D = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E9E = $51AF + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E9F = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0EA0 = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0EA3 = $50D9 + 5 (Set a bit in $DD00.)
Ignoring $0EA6 = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0EA7 = $51AF + 5 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0EA8 = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $1011 = $0295 + 3 (Instruction fragment + CLC + RTS)
Ignoring $1012 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1047 = $5DBF + 4 ($nn -> $nnnn)
Ignoring $1084 = $50F5 + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1091 = $5181 + 3 (clear a bit and store result some where.)
Ignoring $109C = $51B4 + 3 (Or A register with $08 and store result somewhere.)
Ignoring $10C5 = $4DE5 + 3 ("OR " string, being part of "ERROR ")
Ignoring $10CA = $4DDA + 9 (The string "SEARCHING". Not copyrightable.)
Ignoring $10D0 = $4E78 + 3 ("ING", fragment of "SEARCHING FOR". Not copyrightable.)
Ignoring $10D4 = $4DE4 + 3 (The string "FOR". Not copyrightable.)
Ignoring $1107 = $4E74 + 6 (The string "LOADIN", part of LOADING. Not copyrightable.)
Ignoring $1112 = $4DE0 + 3 ("ING", fragment of "SEARCHING FOR". Not copyrightable.)
Ignoring $1136 = $0988 + 3 (Part of JSR $FFD2 / INX)
Ignoring $113C = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $1155 = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $11AB = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $11D0 = $500B + 3 (Instruction fragment, PLA, instruction fragment. Single instruction. Not copyrightable.)
Ignoring $11D6 = $500B + 3 (Instruction fragment, PLA, instruction fragment. Single instruction. Not copyrightable.)
Ignoring $11DF = $45B9 + 4 (Preserve registers on stack)
Ignoring $11E0 = $0A4B + 3 (PHA / TYA / PHA )
Ignoring $11FD = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $11FE = $4762 + 4 (Restore Y and X from stack, load A from somewhere.)
Ignoring $1294 = $0294 + 4 (conditionally execute CLC + RTS, i.e., conditionally return success from a routine.)
Ignoring $1295 = $0ADA + 3 (Instruction fragment + CLC + RTS)
Ignoring $1296 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $129B = $498B + 3 (TXA / PHA / LDA $nn)
Ignoring $130C = $0295 + 3 (Instruction fragment + CLC + RTS)
Ignoring $130D = $0D22 + 5 (Return from routine with success, Store $00 somewhere in ZP.)
Ignoring $130E = $0758 + 4 (End of routine followed by $00 -> ZP location)
Ignoring $132E = $0758 + 4 (End of routine followed by $00 -> ZP location)
Ignoring $13D3 = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $13F1 = $4CBC + 3 (PLA / PLA / JMP)
Ignoring $1418 = $097D + 4 (branch based on whether X register holds the number 4.)
Ignoring $1483 = $5DAB + 4 (Put $7F -> $xx0D)
Ignoring $14FB = $0301 + 4 (fragment / BNE <skip following JMP instruction> / JMP somewhere)
Ignoring $15CA = $0988 + 3 (Part of JSR $FFD2 / INX)
Ignoring $164D = $50D7 + 3 (Clear a bit in A and do something with it.)
Ignoring $168D = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $16CE = $4B24 + 5 (Fragment, CPX $DC01, BRANCH)
Ignoring $16CF = $4B7B + 4 (Fragment, CPX $DC01, BRANCH)
Ignoring $1724 = $0985 + 5 (Or A with $30 and print result)
Ignoring $1727 = $0690 + 3 (Probably JSR $FFD2 followed by PLA)
Ignoring $172A = $0298 + 3 (SEC + RTS + fragment of next routine)
Ignoring $175A = $0988 + 4 (fragment of string printing loop (JSR $FFD2 / INX / CPX ... ))
Ignoring $17A7 = $4DEB + 4 (take a branch based on comparison of Y register and memory.)
Ignoring $180B = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1836 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1947 = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $1A1A = $029C + 3 (Conditionally JMP somewhere.)
Ignoring $1A49 = $45D6 + 4 (Fragment + store $00 somewhere into ZP.)
Ignoring $1A85 = $48D7 + 4 (Fragment followed by LDA #$00 / STA $xx)
Ignoring $1BB6 = $5DC3 + 5 (Read from $DC0D to clear CIA interrupts, surrounded by instruction fragments.)
Ignoring $1C63 = $45BD + 3 (fragment + SEI + JSR fragment)
Ignoring $1CF2 = $458E + 4 (JSR $FDA3 (SCAN KEYBOARD) , JSR somewhere else)
Ignoring $1CFE = $459A + 4 (Clear C, jump into BASIC ROM to start BASIC.)
Ignoring $1D0C = $514A + 4 (End of loop fragment (branch backwards based on X, then return when done).)
Ignoring $1D10 = $4A73 + 5 (CBM80 cartridge signature)
Ignoring $1D55 = $080E + 3 (instruction fragments. not copyrightable.)
Ignoring $1D57 = $05DE + 4 (fragments of instructions. Not copyrightable.)
Ignoring $1D58 = $4573 + 3 (part of two instructions $02 STA $xx00,Y)
Ignoring $1D8A = $4EB8 + 3 (Load Y from ZP. Clear carry flag. Simple register manipulations.)
Ignoring $1DE4 = $5DB2 + 4 (STA $DC04 / LDA #$xx sequence)
Ignoring $1DEE = $5DB2 + 4 (STA $DC04 / LDA #$xx sequence)
Ignoring $1E2B = $5168 + 5 (Do something to $0284 followed by STX $0283)
Ignoring $1E2C = $4FD8 + 4 (something followed by STX $0283)
Ignoring $1E31 = $4FFD + 3 (tail end of access to $0284, followed by RTS)
Ignoring $1E36 = $5150 + 6 (Load X and Y from pointer at $0281)
Ignoring $1E40 = $5154 + 3 (tail end of access to $0282, followed by RTS)
Ignoring $1E41 = $516C + 3 (Fragment, RTS, Disable interrupts. Fragment of end and start of routines.)
Ignoring $1E47 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $1E48 = $45B9 + 4 (Preserve registers on stack)
Ignoring $1E49 = $0A4B + 4 (Push Y and A onto the stack. Load A with something.)
Ignoring $1E4C = $5DAB + 4 (Put $7F -> $xx0D)
Ignoring $1E69 = $458E + 4 (JSR $FDA3 (SCAN KEYBOARD) , JSR somewhere else)
Ignoring $1E7C = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $1E83 = $50DC + 3 (Do something with $DD00, then read something from memory. Only instruction fragments.)
Ignoring $1EBC = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $1EBD = $4762 + 3 (PLA / TAY / PLA / TAX sequence)
Ignoring $1EBE = $4A83 + 4 (PLA / TAX / PLA / RTI)
Ignoring $1F48 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $1F49 = $45B9 + 4 (Preserve registers on stack)
Ignoring $1F4A = $0A4B + 3 (PHA / TYA / PHA )
Ignoring $1F79 = $5DC0 + 4 (Do something with $11, then write to $DC0E)
Ignoring $1F84 = $5F84 + 4 (Jump to IOINIT routine. Not copyrightable.)
Ignoring $1F9F = $5F9F + 4 (Jump to keyboard scan routine ($EA87) + instruction fragment.)
Ignoring $1FA0 = $4A35 + 3 (Fragments of instructions. Not copyrightable.)

Then within each of those files, more detailed information can be found, often with references, for example:

Preserve registers on stack

This is the standard form of saving the A X and Y registers on the stack

PHA
TXA
PHA
TYA
PHA

See, for example:

http://6502.org/tutorials/register_preservation.html

Not copyrightable.
We see an explanation as to why this is just boiler plate, and then a reference to a 3rd party source that indicates that this is common practice, and therefore cannot be the proprietary property of the rights holders of the C64 ROMs.

These are the defences we have right now, but we are also planning others:

Comparison with non-Commodore 6502 Microsoft BASIC

The original C64 BASIC was actually derived from a BASIC interpretor written by Microsoft and licensed by Commodore. This means that Commodore and its successors do not own the copyright in those parts that are Microsoft BASIC. We can easily test this, by searching for matching strings also in "negative libraries" of files that were not written by Commodore.

Automatic internet searching for byte sequences to find other instances

We can also generalise this approach by implementing automatic internet searches, to find 3rd party instances of matching byte sequences, again as evidence that the matches are not the result of infringement of the C64 ROM's rights owner's copyrights.

Can you think of any other techniques that we can apply to add even more defence-through-depth?

Tuesday 7 May 2019

Free and Open-Source Replacement ROMs for the C64

While this blog is usually about things for the MEGA65, this post is actually about something for stock standard C64s, and more the point, for emulators, and all re-creations: Free and open-source replacement ROMs, that can be used, modified and distributed by the general public, so that, for example, emulators can ship with fully legal ROMs, without having to be troubled by costs or legal complexities in terms of licensing.

But first, let's step back a bit, and look at the current situation.

The Commodore 64 as we all know uses three ROM parts: The KERNAL, BASIC and the character ROM. These are all different sizes, but together make up the 20KB of total ROM that a C64 needs to operate. Some of you will at this point be saying to yourselves, "no, the KERNAL and BASIC ROMs are the same size". This is actually only a generalisation, because the KERNAL is actually only about 6.5KiB, and BASIC is about 9.5KiB, and uses the bottom 1.5KiB or so of the "KERNAL" ROM.

Anyway, this means that there are these three parts that have to be replaced in order to make a C64 or compatible computer come to life.

The character ROM I have already talked about. Basically it is highly doubtful that a copyright infringement suit could be bought against a user of the font. For a start, in countries like the USA, it simply isn't possible to copyright a bitmap font. Then given the 8x8 size, there aren't many options for implementing most of the symbols, specially the line and block ones. Add to that that the symbols have now been added to Unicode, and the long-standing lack of enforcement against distribution of any C64 ROMs, and it really looks like the character ROM isn't a big drama. Of course, we have also effectively solved this problem by making our own complete char ROM based on a combination of hand-drawn symbols and hand-touched characters from the public domain VGA 8x8 font. It isn't perfect, but it works. So we have the 4KB character ROM already under control.

Now, the KERNAL and BASIC are much more interesting beasts. The KERNAL implements the screen editor, keyboard scanning logic and IEC serial communications protocol, along with a few other bits and pieces. Then BASIC uses the KERNAL's APIs to provide the familiar BASIC interpretor, which itself has quite a lot of complexity, with the line tokeniser and de-tokeniser, expression parser, variable management, commands, functions and operators.

Also, to have even a minimally working system, that would let you load and run a game or other program that was written in assembly language, you still need the BASIC tokeniser, LOAD, RUN and SYS commands at a bare minimum, with LIST also being practically essential, so that you can actually see what is on a disk.

Then, like the character ROM, we have the problem of how to create new ROMs that are non-infringing on the intellectual property rights of the rights-holders of the C64 ROMs. This requires considerable care and thought.

The gold-standard for such endeavours is to have one team produce detailed specifications of the software being recreated, and another team implementing it. Fortunately, with books like Compute's Mapping the 64, we actually have the specification effectively written for us back in the 80s.

This means that we can potentially implement the KERNAL and BASIC ROM functionality using such resources as a guide, and here is the important part, without looking at the C64's ROMs while writing them.

There is a residual risk that because the C64 ROMs are everywhere, and anyone likely to be inclined to write their own ROMs will have been exposed to them, it is very hard to enforce a true "clean room" reimplementation. However, I think that it is still possible, provided that sufficient care is taken.

Basically the challenge is to have a development process that is transparent and makes it unambiguously obvious to any observer, that no infringement is being made of the original ROMs, and that all code being written is being freshly produced. Here in many ways our audience is the rights holders to the original ROMs -- we want to make their job of assessing whether we are infringing their rights or not super easy. We don't want anyone having to waste time and effort on lawyers that will only make everyone poor and sad. Thus it makes sense to take an approach that integrates an "abundance of caution" at every stage, so that all mess can be avoided. This will hopefully also be clear from the outset, since the whole point of this project is to respect the intellectual property rights of the copyright owners of the C64 ROMs. That is, if we didn't care about their rights, we would just use the original C64 ROMs that are available for free download all over the internet like everyone else.

So, back to planning a process, here is the general process that we have come up with:

Begin with the immutable starting point of the 6502 reset entries, IRQ entry and NMI entries, and the rest of the ROM being empty. This starting point can have no copyright problems.
Based on the public calling interface of the C64 KERNAL as documented in the C64 Programmer's Reference Guide, make stub routines for the jump table.
All routines begin at the lowest address in the KERNAL, sorted by routine name. Thus the order of the routines is deterministic, and not the result of any creative process.
Implement publicly documented routines, using secondary sources, such as books about the C64, but without refering to the 64 ROM contents themselves.
Run test programs using the C64 KERNAL, and collect entry points into the ROMs.
Where an entry point does not correspond to a public API of the KERNAL, research the function by searching for it in Google. Implement it according to those references.
Where an entry point means that previously implemented routines have to be moved to make space at a specific address, move only those routines required to do so, to the next available address.
Where understanding of the inner workings of a routine are required to replicate it, secondary sources, such as the "Mapping the C64" or "C64 Programmer's Reference Guide" should be used. When those do not provide the answer, internet searches based on the name of the routine should be done, and failing that, based on the routine's address if it has no well known name or insufficient material is turned up. Reference to actual disassemblies of the ROMs is not to be made, to ensure that we have strong defences against any claim of copyright infringement.

A similar process should be followed for the BASIC ROM.

To help with this, I have created a framework that allows a ROM to be compiled from a collection of assembly files, which get linked together to produce the final ROM. This helps to compartmentalise the work, and with careful design of the framework, makes it very easy to move routines around and assign them fixed locations as the research of the secondary sources and the entry points are discovered from running programs and tracing their entry into the ROMs.

This framework turned out to be quite simple. I used the Ophis assembler, as I am already quite familiar with it, and it has a handy pair of pragmas that make it quite convenient to fix the location of a routine, .checkpc and .advance. These can be used together to make sure that a routine will be located at an exact address, and will complain if there isn't enough space. To help pack the routines into the free space around the routines, the framework implements a greedy packing algorithm that places the largest un-placed routine into each free space, until the free space is full. There is room to improve this, for example by placing exactly the right sized routines into spaces, but that can wait until necessitated by the ROM filling up as we implement the last few features at the end.

The adage of "commit early and commit often" is especially true for this project, because we want the source control history to be strong evidence that we have developed each routine ourselves from scratch, and not copied from the C64 ROMs. Thus commits when things are half-working and half-baked are especially important, as they document this implementation process.

We are also purposely using quite different algorithms and methods for some key parts of the system, so that there is even stronger evidence against infringement. So for example, the BASIC keyword list and tokeniser are implemented using a simple compression scheme for the BASIC keywords. This not only saves a bit of space, it also means that the BASIC keyword list is not present in the ROM in the same format as the original (even though as a list of facts, it is not copyrightable), and the algorithm for searching for keywords in the compressed list is by necessity an entirely new work: There would be no point in deriving it from the C64 ROM's tokeniser.

Similarly, the keyboard scanner in the KERNAL is based on a publicly documented improved keyboard scanner, that supports multi-key roll-over and rejection of spurious joystick input. In this way, once again, we end up with a routine that has a demonstrably independent ancestory, and offers some nice improvements. We even expanded it slightly, so that the joystick can be used to move the cursor.

For the BASIC interpreter, we also decided to implement banking support from the outset, so that more than 38KiB would be available for BASIC. The KERNAL LOAD routine was also improved to support loading files bigger than 202 blocks, without writing over the IO area. Just like the improved keyboard scanner, the result is clearly a new and fresh implementation, and one that brings advantages along with it.

That is, our goal is not to create a 100% identical C64 ROM set, but rather a highly compatible and pleasant to use set of alternate ROMs for C64-compatible computers, and that are free for inclusion in emulators, FPGA-based computers and other projects that would like a C64-style environment, without the legal hazards that come from using the C64's own original ROMs.

So where are we up to?

Well, we have been sneakily working on this in the background for a few weeks now, as we wanted to hold-off until the project had clearly advanced to a point that proved its feasibility, and provided some minimal level of utility. As hinted at above, our idea of minimum utility is the ability to LOAD and RUN assembly-language based software in a manner that feels totally familiar and functional.

And this we have achieved. There are lots of things still missing, like expression parsing and almost all BASIC commands, and a surprising number of bits and pieces in both BASIC and the KERNAL that are not required by a reasonable range of software. Also, things like RS232 and cassette support are very low on our priority list, as any real C64 has its original ROMs, and any emulator or FPGA-based C64-compatible computer worth its salt will have some kind of bulk storage on hand.

But this is perhaps best explained visually. The following videos and images show the current progress we have achieved, and shows a number of old and new software titles that can already run using our ROMs. Also, as a reminder, this is all running on a stock C64 (well, in VICE's C64 emulator). It does not need the MEGA65 in any way (although of course being able to include the ROMs in the MEGA65 is one of the many reasons for creating them).

The source code is at https://github.com/MEGA65/open-roms.

If you want to try the ROMs out yourself in your favourite emulator, you can get the files from here.

In many ways the hardest work is already done, to get this project off the ground, and get minimally functioning KERNAL and BASIC interpreter. However, there is still much to do and much to be implemented. We are thus looking for contributors who would be willing to help us implement the missing functionality and improve compatibility.

The next post in this series is here - reducing the attack surface for legal attacks.

MEGA65 Links