Wednesday, May 8, 2019

Working to reduce the attack surface for copyright infringement claims against our open-source C64 ROMs

As I explained yesterday, we have begun making an open-source set of ROMs that can be run on a C64 or compatible computer.

Today, I want to explain a little about some of the specific measures we are taking to avoid any possibility of copyright infringement. In particular, we are going what we believe to be above-and-beyond to ensure that our alternate ROMs are free of any potential claim of copyright infringement.

There are two reasons for this, that rest in the primary reasons for establishing this project in the first place:

1. To ensure that the rights holders of the original C64 ROMs can quickly determine that they don't need to worry about us infringing on their proprietary rights. We don't want to waste their time or effort.

2. To ensure that users of our alternate ROMs can do so with maximum confidence.  I say maximum and not complete confidence, because with anything legal, nothing is truly certain.

In short, the project exists to protect the rights holders of the original C64 ROMs interests, by giving the community a clear option for running emulators and C64 compatible computers, without potentially infringing on any proprietary rights.

So, with this in view, our approach is one of an "abundance of caution".  That is, in many places and ways, we are being way more cautious than we believe the law would require us to be.  This is, again, so that we can establish a clear moat around our work, so that as far as is possible, all doubts can be excluded, both for the original C64 ROM rights holders, and for us. 

Simply having an argument that our work is free of infringements is not enough for this approach, however sound that argument might be. Rather, we want to remove any possibility of arguments that we have infringed, and where possible erect multiple unasaillable arguments that no infringement has occurred. Put another way, we want to have multiple walls around our castle, and at the same time, work carefully to make sure that there are no secret passages, caves, wells leading to underground rivers or any other thing that would undermine the intellectual property fortress we are building.  So lets have a look at some of those defences:

Use source control!

The first, is to simply use source control.  The reason for this is that it creates a history of the creation of our ROMs, complete with all the mess-ups and steps along the way. That history goes back to the first lines of code written, and provides a strong line of evidence for the creation of our ROMs from scratch. 

Thus should any segment of the ROMs end up being similar or identical to that of the original C64 ROMs, we can demonstrate that such similarity occurred through cooincidence or necessity (there are only so many ways to do certain things).

To make this defence as strong as possible, the mantra of "commit early, commit often" is vital, so that the in-between steps as code is written are recorded.  If we only commit finished routines, then there is no evidence of working that could be used to substantiate the claim of originality.

Commit messages should also indicate the reason for implementing new functionality or locating routines at specific addresses, e.g., "Implement routine XXX at location YYY required by game ZZZ".  Thus we end up with commit messages like:

commit 810dfb75afc59a1349a4fc83e962ef8357bc1ee1
Author: Paul Gardner-Stephen <paul@servalproject.org>
Date:   Sun May 5 16:31:24 2019 +0930

    put setup VIC-II routine at $E5A0 for Advanced Pinball Simulator. Issue #23
This is a good message, because it explains what was done in terms of an action that would natrually increase similarity with the original, in this case by putting a routine at the same position as in the original ROM, but then justifying that by identifying a piece of software ("Advanced Pinball Simulator" in this case) that requires it to be there, typically because the software directly calls that address, and tags an issue where the compatibility problem was reported.

That is, we start with a simple ROM re-implementation has is so manifestly different and incomplete that it is obviously not an infringing work, and then refine it over time based on clear compatibility improvement justifications, so that similarities with the original ROMs over time will be explained at every step along the way.

Indicate sources, causes and reasons in comments in the source

Also, all through the source we should indicate the source of information that led to the writing of a particular snippet of code.  That might be a page out of the C64 Programmer's Reference Guide, Compute's Mapping the 64, or evidence gained from running an emulator or instrumenting a real C64 to discover the entry points into the ROMs used by software that we wish to be compatible with.  This helps to remove any claim that we have simply copied functionality from the C64 ROMs by copying code. 

Instead, by demonstrating that there is a need for a screen-clear routine at $E544 by referencing such sources, we are showing that this is a functional requirement of the C64 ROM, and must be implemented.  Using that routine as an example, look at the number of references in this single routine:

    ;; Clear screen and initialise line link table
    ;; (Compute's Mapping the 64 p215-216)

clear_screen:   
   
    ;; Clear line link table
    ;; (Compute's Mapping the 64 p215)

    lda #$00
    ldy #24
clearscreen_l1:   
    sta screen_line_link_table,y
    dey
    bpl clearscreen_l1

    ;; Y now = #$FF

   
    ;; Clear screen RAM.
    ;; We should do this at HIBASE, which annoyingly
    ;; is no ZP, so we need to make a vector
    ;; (Compute's Mapping the 64 p216)
    ;; Get pointer to the screen into current_screen_line_ptr
    ;; as it is the first appropriate place for it found when
    ;; searching through the ZP allocations listed in
    ;; Compute's Mapping the 64
    sta current_screen_line_ptr+0
    lda hibase
    sta current_screen_line_ptr+1
    ldx #$03        ; countdown for pages to update
    iny             ; Y now = #$00
    lda #$20        ; space character
clearscreen_l2:
    sta (current_screen_line_ptr),y
    iny
    bne clearscreen_l2
    ;; To draw only 1000 bytes, add 250 to address each time
    lda current_screen_line_ptr
    clc
    adc #<250
    sta current_screen_line_ptr
    lda current_screen_line_ptr+1
    adc #>250
    sta current_screen_line_ptr+1
    lda #$20        ; get space character again
    dex
    bpl clearscreen_l2

    ;; Clear colour RAM
    ;; (Compute's Mapping the 64 p216)
    lda text_colour
clearscreen_l3:   
    sta $d800,y
    sta $d900,y
    sta $da00,y
    sta $db00-24,y        ; so we only erase 1000 bytes
    iny
    bne clearscreen_l3

    ;; (Compute's Mapping the 64 p216)
    jmp home_cursor

As can be seen, almost every single action in the routine is justified in terms of public information about what this routine must do.

Automatically scan for any identical byte sequences of > 2 bytes

Finally, we have created a tool that looks for any string of at least 3 bytes length that matches between two files.  We use this to find any byte sequences that match with the original ROMs, even if the matches are not in the original location.  Then for each such match, we provide an explanation in a file in the strings/ directory, where the name of the file is the sequence of matching bytes written in hexadecimal. 

Once an explanation has been provided in one of these files, the match is no longer reported by the tool, unless run with --verbose, in which case all the string explanations are shown. This allows for a quick report to be generated that provides an explanation of every significant match. 

There are strong arguments why considering matches of only length 3 bytes is excessive, as it is practically impossible to imagine a three byte sequence in the C64 ROMs that could be copyright.  But again, out of an abundance of caution we explain even those matches, including when it is just random fragments of CPU instructions, so that a cursory examination of our software can in just a few minutes hopefully satisfy any rights-holder that no infringement has occurred.

This library of reasons also provide a strong defence in the event that any claim of infringement were nonetheless to be made:  First, it demonstrates our candour, in that we are not hiding the matches, but making them public -- including to any rights holders. Second, when a claim of infringement is made, it makes it very easy for us to respond by asking which bytes are infringing, and to then point the claimant at the reasoning why those bytes do not constitute an infringement.  The onus is then on them to justify why the explanation is not adequate, and it is likely that any court would accept an argument from our side for any claim to be immediately dismissed, and perhaps with prejudice (which means that they can't raise the same claim again), because we will have ready-at-hand prima-facie evidence that there is in fact no infringement.

As an example of how comprehensive even the short-form report is, that displays only the first line of information about why there is no infringement is, here is the current output when comparing the original KERNAL and our ROMs at the time of writing:

$ src/similarity kernal newrom verbose
Searching files for similarities...
Ignoring $0012 = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $0017 = $47A6 + 3 (fragment / SEC / SBC #$xx)
Ignoring $0018 = $02F2 + 3 (SEC / SBC #$01 - subtract 1 from A.)
Ignoring $0112 = $0BDE + 4 (JSR $FFCF / branch based on C flag)
Ignoring $0122 = $4926 + 3 (Fragment / RTS / JSR)
Ignoring $0136 = $1018 + 4 (Push memory location to stack)
Ignoring $013C = $101E + 4 (Fragment + Load X from memory location.)
Ignoring $01A7 = $0B82 + 4 (Store X and Y in ZP location pair)
Ignoring $020A = $4CBB + 3 (fragment / PLA / PLA)
Ignoring $0299 = $059F + 3 (EOR #$FF / STA $xx)
Ignoring $02ED = $09AD + 3 ($05 byte after some leading $00s)
Ignoring $037D = $0A23 + 4 (Instruction fragment, put zero into ZP location somewhere.)
Ignoring $038C = $029C + 3 (Conditionally JMP somewhere.)
Ignoring $03A2 = $02E9 + 6 (increment a pointer in ZP)
Ignoring $03B3 = $0D6A + 3 (substract $30 from A. Not copyrightable.)
Ignoring $040A = $0F9A + 4 (SEC / JSR $FF99 - Read top of memory using public KERNAL API)
Ignoring $0417 = $0F93 + 4 (fragment + TYA + STA ($nn),Y)
Ignoring $0430 = $0531 + 4 (subtract something from a ZP location)
Ignoring $0435 = $0536 + 4 (Subtract the contents of one ZP location from the contents of another.)
Ignoring $0461 = $0008 + 3 (First three letters of BASIC)
Ignoring $0462 = $055A + 5 ("ASIC ")
Ignoring $048B = $0008 + 3 (First three letters of BASIC)
Ignoring $048C = $055A + 7 (BASIC V2 string)
Ignoring $04DA = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $04DB = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $04DC = $4520 + 3 (Fragment / STA ($F3),Y)
Ignoring $0518 = $4A3A + 5 (Setup VIC-II registers and load A with 0)
Ignoring $0574 = $46B0 + 5 (CLC / ADC #40 / STA $D3)
Ignoring $0575 = $48CB + 4 (ADC #40 / STA $D3)
Ignoring $0594 = $4B9B + 3 (Fragment of JMP instruction.)
Ignoring $0599 = $4A39 + 4 (instruction fragment followed by JSR to VIC-II register setup.)
Ignoring $05BB = $4A03 + 5 (fragment / STA $0277,X / INX)
Ignoring $05C8 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $0606 = $45EC + 4 (LDA ($D1),Y / CMP #$20)
Ignoring $060E = $49D7 + 4 (fragment / INY / STY $C8)
Ignoring $0632 = $4E29 + 3 (Fragment of sequence to save registers on the stack. Not copyrightable.)
Ignoring $0633 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $0634 = $498B + 5 (TXA / PHA / LDA $D0 / BRANCH somewhere)
Ignoring $063A = $44FA + 5 (LDY $D3 / LDA ($D1),Y / STA $xx)
Ignoring $0654 = $46C2 + 3 (INC $D3 / JSR somewhere)
Ignoring $0676 = $4A83 + 3 (PLA / TAX / PLA)
Ignoring $0680 = $5193 + 4 (LDA #$FF / CLC / RTS)
Ignoring $0682 = $4C8B + 3 (CLC / RTS / CMP #$xx)
Ignoring $069A = $472F + 5 ( FRagment / BRANCH <skip next instruction> / DEC $D8)
Ignoring $06AA = $472E + 4 (LDA $D8 / BEQ <skip 2 byte instruction>)
Ignoring $06B0 = $4A83 + 3 (PLA / TAX / PLA)
Ignoring $06B3 = $4EBA + 3 (CLC / CLI / RTS)
Ignoring $06E8 = $46B0 + 4 (CLC / ADC #40 / STA $nn)
Ignoring $06F6 = $480E + 3 (fragment / DEC $D6)
Ignoring $06F7 = $4631 + 3 (DEC $D6 / JSR $nnnn)
Ignoring $0717 = $45B7 + 6 (STA $D7 / TXA / PHA / TYA / PHA)
Ignoring $0719 = $5119 + 4 (Preserve registers on stack)
Ignoring $071A = $0A4B + 4 (Push Y and A onto the stack. Load A with something.)
Ignoring $0729 = $45D2 + 4 (fragment + branch based on comparison of A with constant.)
Ignoring $072E = $45DD + 4 (JSR (output carriage return) + fragment)
Ignoring $0762 = $464A + 17 (Copy screen + colour RAM to the left one place)
Ignoring $0773 = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $0776 = $0B80 + 3 (Fragments of instructions. Not copyrightable.)
Ignoring $0777 = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $0778 = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $0779 = $47EB + 3 (Fragment / STA ($F3),Y)
Ignoring $07A1 = $4631 + 3 (DEC $D6 / JSR $nnnn)
Ignoring $07F4 = $45EC + 4 (LDA ($D1),Y / CMP #$20)
Ignoring $080A = $460A + 7 (Copy screen + colour RAM to the right one place)
Ignoring $0810 = $4604 + 7 (Copy screen + colour RAM to the right one place)
Ignoring $0816 = $4610 + 5 (DEY / CPY $D3 / BNE <backwards>)
Ignoring $081B = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $081E = $0B80 + 3 (Fragments of instructions. Not copyrightable.)
Ignoring $081F = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $0820 = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $0821 = $47EB + 3 (Fragment / STA ($F3),Y)
Ignoring $083C = $46CE + 5 (LDA $D3 / SEC / SBC #40)
Ignoring $083D = $48FA + 4 (Fragment / SEC / SBC #40)
Ignoring $083E = $47A7 + 3 (SEC / SBC #40)
Ignoring $084F = $48D7 + 4 (Fragment followed by LDA #$00 / STA $xx)
Ignoring $086D = $46FA + 5 (JSR $E544 (clear screen) surrounded by fragments)
Ignoring $0880 = $491B + 4 (INX / CPX #$19 / BRANCH somewhere)
Ignoring $08A8 = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $08B5 = $48F3 + 4 (LDA #$27 / CMP $D3)
Ignoring $08BA = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $08DA = $497B + 16 (List of C64 colour codes)
Ignoring $0916 = $4914 + 4 (LDX #$00 / LDA $D9,X)
Ignoring $09CB = $4551 + 3 (SOMETHING $0288 / LDA )
Ignoring $09D4 = $47B4 + 4 (LDA ($AC),Y / STA ($D1),Y)
Ignoring $09D8 = $47B0 + 4 (LDA ($AE),Y / STA ($F3),Y)
Ignoring $09E5 = $47D0 + 4 (LDA $AE / STA $AD)
Ignoring $09FA = $4551 + 4 (Most likely LDA $0288 / STA $D2 - Set upper half of screen RAM address)
Ignoring $0A0A = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $0A1C = $451A + 4 (LDY $D3 /  STA ($D1),Y)
Ignoring $0A24 = $4950 + 6 (STA $D1 / LDA $F3 / STA $D2 - manipulate screen pointers)
Ignoring $0A52 = $4504 + 5 (LDA ($F3),Y / STA $0287)
Ignoring $0A7F = $4A7F + 10 (Tail end of standard CIA-triggered interrupt, followed by start of another routine)
Ignoring $0A81 = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $0A82 = $4762 + 3 (PLA / TAY / PLA / TAX sequence)
Ignoring $0A88 = $4B09 + 4 (Fragment + STA $028D)
Ignoring $0AAA = $4B87 + 4 (PHA / LDA $DC01)
Ignoring $0AC1 = $4BBB + 6 (ORA $028D / STA $028D)
Ignoring $0AC3 = $4BCE + 4 (Fragment + STA $028D)
Ignoring $0B35 = $4ABF + 5 (Use X register to compare contents of a ZP and absolute address location)
Ignoring $0B3C = $4A04 + 4 (STA somewhere offset by X, increment X -- Loop fragment)
Ignoring $0B47 = $4B01 + 4 (RTS + LDA $028D)
Ignoring $0B49 = $4BEE + 5 (something with $028D / CMP #$03 / BNE somewhere)
Ignoring $0B59 = $4C00 + 8 (LDA $D018 / EOR #$02 / STA $D018)
Ignoring $0B80 = $4CC4 + 16 (List of key codes generated when certain key combinations are pressed)
Ignoring $0B91 = $4CD5 + 36 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BB6 = $4CFA + 5 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BC2 = $4D05 + 9 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BC3 = $4D46 + 7 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BED = $4D30 + 9 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BF7 = $4D3A + 5 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C03 = $4D05 + 8 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C04 = $4D46 + 14 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C13 = $4D55 + 16 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C24 = $4D66 + 19 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C39 = $4D7B + 4 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C80 = $4D8D + 7 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C88 = $4D95 + 24 (List of key codes generated when certain key combinations are pressed)
Ignoring $0CCD = $0999 + 4 (A single $08 byte in a field of $00's)
Ignoring $0CDF = $0033 + 3 (3 bytes of ascending value)
Ignoring $0CE7 = $4E74 + 4 (The string "LOAD". Not copyrightable)
Ignoring $0D2E = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0D2F = $51B2 + 7 (Set a bit in $DD00)
Ignoring $0D85 = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0D89 = $51AB + 5 (set a bit in $DD00, Return from subroutine, read $DD00)
Ignoring $0D9B = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0D9E = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0DBF = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0DC0 = $5180 + 6 (clear a bit in $DD00)
Ignoring $0DC4 = $51A3 + 3 (Do something with $DD00 and return from sub-routine.)
Ignoring $0DEE = $516D + 3 (RTS / SEI / JSR somewhere)
Ignoring $0DF3 = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0DF4 = $51B2 + 7 (Set a bit in $DD00)
Ignoring $0E2F = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0E83 = $50F5 + 3 (CLC / RTS / JSR $xxxx)
Ignoring $0E84 = $50D3 + 6 (Return from sub-routine, clear a bit in $DD00)
Ignoring $0E86 = $50DF + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E89 = $4FF6 + 3 (instruction fragments.)
Ignoring $0E8B = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E8C = $51AF + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E8D = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0E8E = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0E8F = $51BB + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E91 = $51AB + 9 (set a bit in $DD00, Return from subroutine, read $DD00)
Ignoring $0E94 = $51A3 + 7 (Do something with $DD00, return from subroutine. Do something else with $DD00)
Ignoring $0E95 = $51CA + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E96 = $50D3 + 5 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0E98 = $50DF + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E9D = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E9E = $51AF + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E9F = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0EA0 = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0EA3 = $50D9 + 5 (Set a bit in $DD00.)
Ignoring $0EA6 = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0EA7 = $51AF + 5 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0EA8 = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $1011 = $0295 + 3 (Instruction fragment + CLC + RTS)
Ignoring $1012 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1047 = $5DBF + 4 ($nn -> $nnnn)
Ignoring $1084 = $50F5 + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1091 = $5181 + 3 (clear a bit and store result some where.)
Ignoring $109C = $51B4 + 3 (Or A register with $08 and store result somewhere.)
Ignoring $10C5 = $4DE5 + 3 ("OR " string, being part of "ERROR ")
Ignoring $10CA = $4DDA + 9 (The string "SEARCHING". Not copyrightable.)
Ignoring $10D0 = $4E78 + 3 ("ING", fragment of "SEARCHING FOR". Not copyrightable.)
Ignoring $10D4 = $4DE4 + 3 (The string "FOR". Not copyrightable.)
Ignoring $1107 = $4E74 + 6 (The string "LOADIN", part of LOADING. Not copyrightable.)
Ignoring $1112 = $4DE0 + 3 ("ING", fragment of "SEARCHING FOR". Not copyrightable.)
Ignoring $1136 = $0988 + 3 (Part of JSR $FFD2 / INX)
Ignoring $113C = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $1155 = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $11AB = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $11D0 = $500B + 3 (Instruction fragment, PLA, instruction fragment. Single instruction. Not copyrightable.)
Ignoring $11D6 = $500B + 3 (Instruction fragment, PLA, instruction fragment. Single instruction. Not copyrightable.)
Ignoring $11DF = $45B9 + 4 (Preserve registers on stack)
Ignoring $11E0 = $0A4B + 3 (PHA / TYA / PHA )
Ignoring $11FD = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $11FE = $4762 + 4 (Restore Y and X from stack, load A from somewhere.)
Ignoring $1294 = $0294 + 4 (conditionally execute CLC + RTS, i.e., conditionally return success from a routine.)
Ignoring $1295 = $0ADA + 3 (Instruction fragment + CLC + RTS)
Ignoring $1296 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $129B = $498B + 3 (TXA / PHA / LDA $nn)
Ignoring $130C = $0295 + 3 (Instruction fragment + CLC + RTS)
Ignoring $130D = $0D22 + 5 (Return from routine with success, Store $00 somewhere in ZP.)
Ignoring $130E = $0758 + 4 (End of routine followed by $00 -> ZP location)
Ignoring $132E = $0758 + 4 (End of routine followed by $00 -> ZP location)
Ignoring $13D3 = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $13F1 = $4CBC + 3 (PLA / PLA / JMP)
Ignoring $1418 = $097D + 4 (branch based on whether X register holds the number 4.)
Ignoring $1483 = $5DAB + 4 (Put $7F -> $xx0D)
Ignoring $14FB = $0301 + 4 (fragment / BNE <skip following JMP instruction> / JMP somewhere)
Ignoring $15CA = $0988 + 3 (Part of JSR $FFD2 / INX)
Ignoring $164D = $50D7 + 3 (Clear a bit in A and do something with it.)
Ignoring $168D = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $16CE = $4B24 + 5 (Fragment, CPX $DC01, BRANCH)
Ignoring $16CF = $4B7B + 4 (Fragment, CPX $DC01, BRANCH)
Ignoring $1724 = $0985 + 5 (Or A with $30 and print result)
Ignoring $1727 = $0690 + 3 (Probably JSR $FFD2 followed by PLA)
Ignoring $172A = $0298 + 3 (SEC + RTS + fragment of next routine)
Ignoring $175A = $0988 + 4 (fragment of string printing loop (JSR $FFD2 / INX / CPX ... ))
Ignoring $17A7 = $4DEB + 4 (take a branch based on comparison of Y register and memory.)
Ignoring $180B = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1836 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1947 = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $1A1A = $029C + 3 (Conditionally JMP somewhere.)
Ignoring $1A49 = $45D6 + 4 (Fragment + store $00 somewhere into ZP.)
Ignoring $1A85 = $48D7 + 4 (Fragment followed by LDA #$00 / STA $xx)
Ignoring $1BB6 = $5DC3 + 5 (Read from $DC0D to clear CIA interrupts, surrounded by instruction fragments.)
Ignoring $1C63 = $45BD + 3 (fragment + SEI + JSR fragment)
Ignoring $1CF2 = $458E + 4 (JSR $FDA3 (SCAN KEYBOARD) , JSR somewhere else)
Ignoring $1CFE = $459A + 4 (Clear C, jump into BASIC ROM to start BASIC.)
Ignoring $1D0C = $514A + 4 (End of loop fragment (branch backwards based on X, then return when done).)
Ignoring $1D10 = $4A73 + 5 (CBM80 cartridge signature)
Ignoring $1D55 = $080E + 3 (instruction fragments. not copyrightable.)
Ignoring $1D57 = $05DE + 4 (fragments of instructions. Not copyrightable.)
Ignoring $1D58 = $4573 + 3 (part of two instructions $02 STA $xx00,Y)
Ignoring $1D8A = $4EB8 + 3 (Load Y from ZP. Clear carry flag. Simple register manipulations.)
Ignoring $1DE4 = $5DB2 + 4 (STA $DC04 / LDA #$xx sequence)
Ignoring $1DEE = $5DB2 + 4 (STA $DC04 / LDA #$xx sequence)
Ignoring $1E2B = $5168 + 5 (Do something to $0284 followed by STX $0283)
Ignoring $1E2C = $4FD8 + 4 (something followed by STX $0283)
Ignoring $1E31 = $4FFD + 3 (tail end of access to $0284, followed by RTS)
Ignoring $1E36 = $5150 + 6 (Load X and Y from pointer at $0281)
Ignoring $1E40 = $5154 + 3 (tail end of access to $0282, followed by RTS)
Ignoring $1E41 = $516C + 3 (Fragment, RTS, Disable interrupts. Fragment of end and start of routines.)
Ignoring $1E47 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $1E48 = $45B9 + 4 (Preserve registers on stack)
Ignoring $1E49 = $0A4B + 4 (Push Y and A onto the stack. Load A with something.)
Ignoring $1E4C = $5DAB + 4 (Put $7F -> $xx0D)
Ignoring $1E69 = $458E + 4 (JSR $FDA3 (SCAN KEYBOARD) , JSR somewhere else)
Ignoring $1E7C = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $1E83 = $50DC + 3 (Do something with $DD00, then read something from memory. Only instruction fragments.)
Ignoring $1EBC = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $1EBD = $4762 + 3 (PLA / TAY / PLA / TAX sequence)
Ignoring $1EBE = $4A83 + 4 (PLA / TAX / PLA / RTI)
Ignoring $1F48 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $1F49 = $45B9 + 4 (Preserve registers on stack)
Ignoring $1F4A = $0A4B + 3 (PHA / TYA / PHA )
Ignoring $1F79 = $5DC0 + 4 (Do something with $11, then write to $DC0E)
Ignoring $1F84 = $5F84 + 4 (Jump to IOINIT routine. Not copyrightable.)
Ignoring $1F9F = $5F9F + 4 (Jump to keyboard scan routine ($EA87) + instruction fragment.)
Ignoring $1FA0 = $4A35 + 3 (Fragments of instructions. Not copyrightable.)


Then within each of those files, more detailed information can be found, often with references, for example:

Preserve registers on stack

This is the standard form of saving the A X and Y registers on the stack

PHA
TXA
PHA
TYA
PHA

See, for example:

http://6502.org/tutorials/register_preservation.html

Not copyrightable.

We see an explanation as to why this is just boiler plate, and then a reference to a 3rd party source that indicates that this is common practice, and therefore cannot be the proprietary property of the rights holders of the C64 ROMs.

These are the defences we have right now, but we are also planning others:

Comparison with non-Commodore 6502 Microsoft BASIC

The original C64 BASIC was actually derived from a BASIC interpretor written by Microsoft and licensed by Commodore.  This means that Commodore and its successors do not own the copyright in those parts that are Microsoft BASIC. We can easily test this, by searching for matching strings also in "negative libraries" of files that were not written by Commodore.

Automatic internet searching for byte sequences to find other instances

We can also generalise this approach by implementing automatic internet searches, to find 3rd party instances of matching byte sequences, again as evidence that the matches are not the result of infringement of the C64 ROM's rights owner's copyrights.

Can you think of any other techniques that we can apply to add even more defence-through-depth?


Tuesday, May 7, 2019

Free and Open-Source Replacement ROMs for the C64



While this blog is usually about things for the MEGA65, this post is actually about something for stock standard C64s, and more the point, for emulators, and all re-creations: Free and open-source replacement ROMs, that can be used, modified and distributed by the general public, so that, for example, emulators can ship with fully legal ROMs, without having to be troubled by costs or legal complexities in terms of licensing.

But first, let's step back a bit, and look at the current situation.

The Commodore 64 as we all know uses three ROM parts: The KERNAL, BASIC and the character ROM.  These are all different sizes, but together make up the 20KB of total ROM that a C64 needs to operate.  Some of you will at this point be saying to yourselves, "no, the KERNAL and BASIC ROMs are the same size".  This is actually only a generalisation, because the KERNAL is actually only about 6.5KiB, and BASIC is about 9.5KiB, and uses the bottom 1.5KiB or so of the "KERNAL" ROM.

Anyway, this means that there are these three parts that have to be replaced in order to make a C64 or compatible computer come to life.

The character ROM I have already talked about. Basically it is highly doubtful that a copyright infringement suit could be bought against a user of the font. For a start, in countries like the USA, it simply isn't possible to copyright a bitmap font.  Then given the 8x8 size, there aren't many options for implementing most of the symbols, specially the line and block ones.  Add to that that the symbols have now been added to Unicode, and the long-standing lack of enforcement against distribution of any C64 ROMs, and it really looks like the character ROM isn't a big drama.  Of course, we have also effectively solved this problem by making our own complete char ROM based on a combination of hand-drawn symbols and hand-touched characters from the public domain VGA 8x8 font.  It isn't perfect, but it works.   So we have the 4KB character ROM already under control.

Now, the KERNAL and BASIC are much more interesting beasts.  The KERNAL implements the screen editor, keyboard scanning logic and IEC serial communications protocol, along with a few other bits and pieces.  Then BASIC uses the KERNAL's APIs to provide the familiar BASIC interpretor, which itself has quite a lot of complexity, with the line tokeniser and de-tokeniser, expression parser, variable management, commands, functions and operators.

Also, to have even a minimally working system, that would let you load and run a game or other program that was written in assembly language, you still need the BASIC tokeniser, LOAD, RUN and SYS commands at a bare minimum, with LIST also being practically essential, so that you can actually see what is on a disk.

Then, like the character ROM, we have the problem of how to create new ROMs that are non-infringing on the intellectual property rights of the rights-holders of the C64 ROMs.  This requires considerable care and thought.

The gold-standard for such endeavours is to have one team produce detailed specifications of the software being recreated, and another team implementing it.  Fortunately, with books like Compute's Mapping the 64, we actually have the specification effectively written for us back in the 80s.

This means that we can potentially implement the KERNAL and BASIC ROM functionality using such resources as a guide, and here is the important part, without looking at the C64's ROMs while writing them.

There is a residual risk that because the C64 ROMs are everywhere, and anyone likely to be inclined to write their own ROMs will have been exposed to them, it is very hard to enforce a true "clean room" reimplementation.  However, I think that it is still possible, provided that sufficient care is taken.

Basically the challenge is to have a development process that is transparent and makes it unambiguously obvious to any observer, that no infringement is being made of the original ROMs, and that all code being written is being freshly produced.  Here in many ways our audience is the rights holders to the original ROMs -- we want to make their job of assessing whether we are infringing their rights or not super easy.  We don't want anyone having to waste time and effort on lawyers that will only make everyone poor and sad. Thus it makes sense to take an approach that integrates an "abundance of caution" at every stage, so that all mess can be avoided.  This will hopefully also be clear from the outset, since the whole point of this project is to respect the intellectual property rights of the copyright owners of the C64 ROMs. That is, if we didn't care about their rights, we would just use the original C64 ROMs that are available for free download all over the internet like everyone else.

So, back to planning a process, here is the general process that we have come up with:
  1. Begin with the immutable starting point of the 6502 reset entries, IRQ entry and NMI entries, and the rest of the ROM being empty.  This starting point can have no copyright problems.
  2. Based on the public calling interface of the C64 KERNAL as documented in the C64 Programmer's Reference Guide, make stub routines for the jump table.
  3. All routines begin at the lowest address in the KERNAL, sorted by routine name.  Thus the order of the routines is deterministic, and not the result of any creative process.
  4. Implement publicly documented routines, using secondary sources, such as books about the C64, but without refering to the 64 ROM contents themselves. 
  5. Run test programs using the C64 KERNAL, and collect entry points into the ROMs.
  6. Where an entry point does not correspond to a public API of the KERNAL, research the function by searching for it in Google. Implement it according to those references.
  7. Where an entry point means that previously implemented routines have to be moved to make space at a specific address, move only those routines required to do so, to the next available address.
  8. Where understanding of the inner workings of a routine are required to replicate it, secondary sources, such as the "Mapping the C64" or "C64 Programmer's Reference Guide" should be used. When those do not provide the answer, internet searches based on the name of the routine should be done, and failing that, based on the routine's address if it has no well known name or insufficient material is turned up.  Reference to actual disassemblies of the ROMs is not to be made, to ensure that we have strong defences against any claim of copyright infringement.
A similar process should be followed for the BASIC ROM.

To help with this, I have created a framework that allows a ROM to be compiled from a collection of assembly files, which get linked together to produce the final ROM. This helps to compartmentalise the work, and with careful design of the framework, makes it very easy to move routines around and assign them fixed locations as the research of the secondary sources and the entry points are discovered from running programs and tracing their entry into the ROMs.

This framework turned out to be quite simple. I used the Ophis assembler, as I am already quite familiar with it, and it has a handy pair of pragmas that make it quite convenient to fix the location of a routine, .checkpc and .advance.  These can be used together to make sure that a routine will be located at an exact address, and will complain if there isn't enough space.  To help pack the routines into the free space around the routines, the framework implements a greedy packing algorithm that places the largest un-placed routine into each free space, until the free space is full.  There is room to improve this, for example by placing exactly the right sized routines into spaces, but that can wait until necessitated by the ROM filling up as we implement the last few features at the end.

The adage of "commit early and commit often" is especially true for this project, because we want the source control history to be strong evidence that we have developed each routine ourselves from scratch, and not copied from the C64 ROMs.  Thus commits when things are half-working and half-baked are especially important, as they document this implementation process.

We are also purposely using quite different algorithms and methods for some key parts of the system, so that there is even stronger evidence against infringement. So for example, the BASIC keyword list and tokeniser are implemented using a simple compression scheme for the BASIC keywords.  This not only saves a bit of space, it also means that the BASIC keyword list is not present in the ROM in the same format as the original (even though as a list of facts, it is not copyrightable), and the algorithm for searching for keywords in the compressed list is by necessity an entirely new work: There would be no point in deriving it from the C64 ROM's tokeniser.

Similarly, the keyboard scanner in the KERNAL is based on a publicly documented improved keyboard scanner, that supports multi-key roll-over and rejection of spurious joystick input. In this way, once again, we end up with a routine that has a demonstrably independent ancestory, and offers some nice improvements. We even expanded it slightly, so that the joystick can be used to move the cursor.

For the BASIC interpreter, we also decided to implement banking support from the outset, so that more than 38KiB would be available for BASIC.  The KERNAL LOAD routine was also improved to support loading files bigger than 202 blocks, without writing over the IO area.  Just like the improved keyboard scanner, the result is clearly a new and fresh implementation, and one that brings advantages along with it.

That is, our goal is not to create a 100% identical C64 ROM set, but rather a highly compatible and pleasant to use set of alternate ROMs for C64-compatible computers, and that are free for inclusion in emulators, FPGA-based computers and other projects that would like a C64-style environment, without the legal hazards that come from using the C64's own original ROMs.

So where are we up to?

Well, we have been sneakily working on this in the background for a few weeks now, as we wanted to hold-off until the project had clearly advanced to a point that proved its feasibility, and provided some minimal level of utility. As hinted at above, our idea of minimum utility is the ability to LOAD and RUN assembly-language based software in a manner that feels totally familiar and functional.

And this we have achieved. There are lots of things still missing, like expression parsing and almost all BASIC commands, and a surprising number of bits and pieces in both BASIC and the KERNAL that are not required by a reasonable range of software.  Also, things like RS232 and cassette support are very low on our priority list, as any real C64 has its original ROMs, and any emulator or FPGA-based C64-compatible computer worth its salt will have some kind of bulk storage on hand.

But this is perhaps best explained visually.  The following videos and images show the current progress we have achieved, and shows a number of old and new software titles that can already run using our ROMs.  Also, as a reminder, this is all running on a stock C64 (well, in VICE's C64 emulator). It does not need the MEGA65 in any way (although of course being able to include the ROMs in the MEGA65 is one of the many reasons for creating them).






The source code is at https://github.com/MEGA65/open-roms.

If you want to try the ROMs out yourself in your favourite emulator, you can get the files from here.

In many ways the hardest work is already done, to get this project off the ground, and get minimally functioning KERNAL and BASIC interpreter.  However, there is still much to do and much to be implemented.  We are thus looking for contributors who would be willing to help us implement the missing functionality and improve compatibility.

The next post in this series is here - reducing the attack surface for legal attacks.

Saturday, March 30, 2019

Name Plates for the MEGA65 pre-production units have arrived!

So, the next fun part of the pre-production run for the MEGA65 computers has arrived - the name plates!  We thought a lot about how we would like these to look, and are generally quite happy with the result. We hope that you will all like them, too:



These are not stickers, but are 0.5mm thick aluminium plates, with nice rounded corners, and 3M adhesive to hold them in place.  Thee writing on the top is actually raised up as well, as we really wanted something that is simply a delight to behold.  We were very pleased that they fit perfectly in the slots on the SLA 3D printed cases we produced a while back, and look very much forward to fit-testing them on the new pre-production cases when those arrive :)


Friday, March 29, 2019

MEGAphone First Prototype PCB remediation and first light

We began by testing that the voltage levels on the board were correct. This is important, because the MEGAphone is designed to run on battery, and thus has step-up/step-down regulators to provide the various power supplies.

I say the various power supplies, because the MEGA65 has six switchable power supplies, so that you can selectively power down sub-systems based on the security level of what you are doing. For example, you can power down the bluetooth and wifi, so that nothing can attack you that way. Similarly, you can power down the microphones, so that there is no way at all of someone listening in on you from afar.  The cellular modems and LoRa radios can also be separately powered down.

So, anyway, this means we have 7 voltage regulators in total, as the FPGA also has its own power supply, but no switch to control it -- that is instead done via soft control from the FPGA, via a power control line.

In our initial testing, we found that the rail for the FPGA board was 6.45V instead of 3.3V.  Fortunately I had the presence of mind to not plug it in first time.  We eventually tracked the problem down to a resistor that had come loose during the hand-soldering process. This won't be an issue  in production, because all assembly will be fully automated.

In the process of trying to fix it, I did accidentally lift one of the pads of the resistor from the board.  Needless to say, I wasn't real pleased. I was able to bodge it up with a couple of fly wires, and some lovely full-size resistors to create the correct overall resistance, as we didn't have the exact value resistor on hand.  Bodge is indeed the correct word here. I meant to add a photo, but forgot to upload it.


With that solved, it was time to make sure we can program the FPGA board.  This took a little fiddling, as I was trying it not-plugged-into the MEGAphone board, and it turns out you have to set a dip-switch setting for the little programming board to get power from the USB port.  With that figured out, I was able to fairly quickly get a working blinking-LED bitstream running.

Then from there, reset the dipswitch, it was time to insert into the MEGAphone board, and program the FPGA.  Success first time: blinking LED on the FPGA board.

Ok.  So, from here it gets a bit more interesting, as we need to map the FPGA pins to the TE0725 FPGA module pins, and from there to the MEGAphone schematic.  The TE0725 is designed to be rotateable, in that the connectors are symmetric, allowing us to slecxt the preferred orientation of the board.  I have opted to have it oriented so that the programming cable can stick out the side.

Now for a little detective work to match up the pin numbers. This would be a whole lot easier, if we had a directly connected LED on the MEGAphone board, but we don't.

We do have some fairly easily probable lines to the WiFi adapter, though. So that might be the best way to work out which end of the connectors is pin 1, and match everything through.  Once we have the pin order more or less under control, I can try synthesising the entire MEGA65 design, with the VGA output enabled.  That will be a happy moment, to see the MEGA65 boot screen.

So the WiFi UART lines are on connector B pins 44 and 48, according to the MEGAphone PCB schematic. According to the TE0725 schematic, this should be FPGA pins L1 and L4, assuming the orientations are around the right way.  So let's try to make them toggle between high and low. To tell them apart, I will make them toggle at different rates: 0.5Hz for L1 = WIFIRX, and 2.5Hz for L4 = WIFITX.

Fortunately synthesising such a simple design doesn't take too long, but it does still somewhat inexplicably take around 5 minutes.  It feels like I could build the circuit by hand faster!

Ok. Bitstream synthesised.  Now we have the extra fun that the power control circuitry on the MEGAphone has to be reckoned with: There is a line connected to the FPGA board that allows the power rail to the FPGA to be shutdown.  To turn it on, you have to press the power button, or wait for an interrupt from one of the support chips.  Of course, the power on button is on the front of the board, which is on the back from the perspective of how I have it sitting on the desk.

In trying to flip the board over to get to the button pads, I discovered that the power supply GND line had come loose. Fixed that, and I had power again.  Excellent!

While waiting for synthesis, one other annoying thing I have at the moment, is that the fpgajtag program I use for conveniently programming the FPGA from the command line doesn't seem to work with this particular FPGA JTAG adapter. Oddly, the one I have at home does work in this mode. I don't really have any idea why as yet.

Okay, the new bitstream is finally running, and after fighting with a bad ground on the oscilloscope probe, I can find the signal on pin 48, but on the opposite side and end as expected - so our coordinates are rotated.  No problem, find the pins from the schematic, and try again.

So pin 48, which is one in from the end should become pin 3, so that it is on the opposite side, one pin from the end.  Pin 44 should then become pin 7.  And of course, we have to switch between the two connectors.  So WIFIRX should be L15, and WIFITX should be L23.  But now that we know where L1 and L4 are, we can check continuity on the connector to the WIFI port, and work out that it should actually be pins 4 and 8 instead of 3 and 7 to be on the correct side.

Ok. Now another bit of fun, is that the labels for the header pins on the TE0725 FPGA board aren't actually FPGA pins. So a bit of searching and hair-pulling was required. Actually, quite a bit of both.  But in the end I made a chart that lists the pin assignments:


Then it was time to help some students in another lab, but as luck would have it, there were only a few, so I had some more time to work on this.  So everything went on the trolley, and came down with me.  This is not exactly the intention of a portable MEGA65 to require a trolley, however ;)


But as you can see, I managed to get the VGA output, and also the UART monitor interface correctly connected.  However, the SD card doesn't yet work, so booting to BASIC requires the use of the monitor_load utility to load everything.  We can see in the shot below that the hypervisor is not having much luck finding an SD card:


This monitor also wasn't that thrilled about the HSYNC timing, which I will deal with later.  Here is another shot of the screen after booting to BASIC:





Thursday, March 28, 2019

MEGAphone First Prototype PCB assembly complete(ish)

So, I am back in Australia now, and the workshop folks have ordered and received the main missing parts, and have assembled the board. So let's start by taking a look at it.

First, the front: Nothing too much to see here, except that of course we don't yet have the joystick direction and buttons populated, nor the speaker, volume thumb-wheels or system buttons at the bottom:


Then the rear side, where there is a lot more happening:
 First, we have the yellow insulating tape applied below where the cellular modems will fit. Let's continue the tour, by going through the photos where I have noted problems or missing components.

 Here we can see a couple of ICs are mmissing, which we have noted.  The big one with lots of space between the pins is the ADC for the headphone microphone.  The little 8-pin job near the middle of the image, I don't specifically recall what it is, just that it isn't important for initial testing.

Here we can see the power supply for the FPGA and two other sub-systems near the middle (small black chip with cluster of little components around them).  One of those had a loose resistor, which we had to fix. Also, on the upper left, we can see an 8-pin header. This should be female, and is for the ESP8266 Wi-Fi module.
Immediately above that is the bluetooth chipset, which is quite flexible, and should allow the MEGAphone to use headsets, as well as to act as a set of bluetooth speakers, if you ever had the need.  To the right of that the big white thing is a full sized SIM/SmartCard reader.  I've got plans for that, that I will write about in another post.  Then below that is the wide connector for the LCD panel, and the little white thing somewhat to the left of that is the connector for the touch screen digitiser. We already know that the LCD connector needs too be about 1cm to the right, and that both should be a couple mm lower down on the board. You can also see the pair of LoRa radios, which give tri-band UHF functionality at 434MHz, 868MHz and 915MHz. Those are there because I like resilient telecommunications solutions. It also means it should be possible to play multiplayer games upto several km distance between players :)

 I mentioned the problem with the LCD connector placement, you can see that here in a more practical way:


 Also, it turns out the SIM card slots we got are too tall to sit beneath the modem slots, so we will need to do something about that. The big green thing to the right is the battery connector.






 And we accidentally ordered the wrong gender for the joystick port. So no two-player action for us until we replace it, or make a gender-changer:


Now, back to the LCD screen, we spotted another issue: the 5mm LEDs don't leave enough space for the digitiser to sit flat:

 You can see the problem better here: There is about 1/2 an LED too little space:
 Fortunately, if we switch to 3mm LEDs, we save 2 x ( 5 - 3 ) = 2 x 2 mm, which is enough.  Also, nice red LEDs will look better, anyway. So let's do a fit test with one of the unpopulated boards:






Looks like it should work :)

Okay, so that's a quick walk through the hardware.  Now to try to get it to power up and do something.

MEGAphone First Prototype PCB assembly underway

As many of you know, we have a side-project of the MEGA65 to make a hand-held/phone version of the MEGA65.  This project works using project students at the University, and gives them the opportunity to be engaged in the project, without us risking our critical path by depending on student labour, which due to a variety of factors can be complicated.

So what we have here, is the re-worked R1 PCB layout for the MEGAphone hand-held, based on work by some students last year, and then a number of problems fixed and final layout by our engineering workshop folks.

I'm currently in Germany as I write this, so I have only the photos that the workshop folks have taken during the assembly process:



As you can see, the assembly is already well underway, with lots of the surface mount components already in place.  Otherwise, clock-wise from top left we have the VGA connector (for testing and connecting to projectors etc), the Bluetooth module, full-size smart-card reader, which can also be re-purposed to support full-size SIM cards), then a bunch of power components and the headphone jack.

Then down the right side we have the dual sim cards that sit under one of the two cellular modem / alternate radio slots.  Bottom right, we have the joystick port, for if you want to play two-player while underway, or for when the D-PAD joystick (which is on the opposite side) isn't enough.

Then bottom left we have the SD card slot and the dual headers for the FPGA module from Trenz Electronics.

Otherwise in the middle we have the two unpopulated LoRa radio chips for mesh networking experiments, and the white connectors for the touch screen display and sense cables, and the slots cut in the board for those cables to go through.

The current hope is that the board will be finished being populated by the middle of March or so -- very exciting, and just in time to bring in more students to work on the project for the new academic year (which begins in March in the Southern Hemisphere).

Tuesday, March 12, 2019

Speeding up formatting and freezing with multi-sector SD card writes

The MEGA65 format utility can take a terribly long time to run, making you feel rather much more like you are in the 1980s than you might like to, especially when formatting a large capacity 64GB SDXC card or similar.   This is because until now we have not supported multi-sector writes to the SD card.  As a result, the SD card controller must read an entire flash block (often 64KB or larger), modify the 512 byte sector we have written, erase the entire 64KB block in the flash, and then put the new data back down.  The end result is that it is about 10x slower or worse than it could be.

This problem also affects the freeze menu that has to save ~512KB of data every time you freeze.  Because the SD cards are smart internally, they tend to ignore writes with unchanged data, but even then, you have the penalty of the ~64KB block read internally in the card.  Thus freezing a problem that had very different memory contents than the last frozen program could take a number of seconds.  This was quite annoying when all you might want to do in the freeze menu is toggle between PAL and NTSC, for example.

Thus I figured the time had come to attack the root cause of these problems and implement multi-sector writes on the SD card.  But first I had to figure out exactly how they work. This was not as easy as it sounded, as there is a lot of contradictory information out there.

The only certain thing, was that CMD25 should be used instead of CMD24 to initiate a multi-block write. Some sources said you should also use a CMD18 to say how many blocks you are going to write, and others said you should use a CMD23 after to flush the SD card's internal cache.  Then everyone seemed to disagree on the actual mechanics of multi-sector writes.

What I figured out was:  You don't need CMD18 or CMD23, just CMD25.  Then use the 0xFC data start token instead of 0xFE when writing the first sector. Then just keep writing extra sectors, this time with an 0xFC token until done, after which you should write an 0xFD token (without a sector of data following it) to mark the end of the multi-sector write.  If you try to write data after the 0xFD, then really bad/weird things happen that require you to physically power things off and on again.

In the end, the main part of the change to the low-level SD card controller was quite small:

if write_multi = '0' then
+                -- Normal CMD24 write
                 txCmd_v := WRITE_BLK_CMD_C & addr_i & FAKE_CRC_C;  -- Use address supplied by host.
-                addr_v  := unsigned(addr_i);  -- Store address for multi-block operations.
+                state_v    := START_TX;  -- Go to this FSM subroutine to send the command ...
+                rtnState_v := WR_BLK;  -- then go to this state to write the data block.
+              elsif write_multi = '1' and write_multi_first='1' then
+                -- First block of multi-block write
+                -- We should begin things with CMD25 instead of CMD24, but
+                -- then we don't need to repeat it for each extra block
+                txCmd_v := WRITE_MULTI_BLK_CMD_C & addr_i & FAKE_CRC_C;  -- Use address supplied by host.
+                state_v    := START_TX;  -- Go to this FSM subroutine to send the command ...
+                rtnState_v := WR_BLK;  -- then go to this state to write the data block.
+              else
+                -- We are in a multi-block write.  So just go direct to WR_BLK
+                state_v    := WR_BLK;       -- Go to this FSM subroutine to send the command ...               
               end if;

Essentially the change was just to write the correct command for starting a multi-block write, and to write subsequent blocks with the correct data token at the start, and without re-sending any CMD25 or anything else. I actually managed to get that part correct on almost the first go, once I had figured the mechanics out, which rather surprised me.  But it took quite a bit more effort to get FDISK/FORMAT and the hypervisor freeze routines to work properly with it.  Indeed there are still a few wrinkles to work out. For example, when copying the frozen program to a new freeze slot, trying to use multi-sector writes causes things to go haywire. So for now, I have just disabled that.

That said, the changes to the various programs are actually quite small as well.  It's just the usual problem of writing assembly language and low-level C code that works reliably.  Add in the fun of CC65 not telling you when you have run out of memory during compilation, and life remains "interesting".  For example, here are the changes to the freeze routine:

       jsr copy_sdcard_regs_to_scratch

        ; Save current SD card sector buffer contents
-       jsr freeze_write_sector_and_wait
+       jsr freeze_write_first_sector_and_wait

        ; Save each region in the list
        ldx #$00
@@ -31,6 +31,8 @@ freeze_next_region:
        cmp #$ff
        bne freeze_next_region

+       jsr freeze_end_multi_block_write
+
        rts


        jsr sd_wait_for_ready_reset_if_required

-       ; Trigger the write
-       lda #$03
+       ; Trigger the write (subsequent sector of multi-sector write)
+       lda #$05
        sta $d680

        jsr sd_wait_for_ready
@@ -168,6 +206,13 @@ freeze_write_sector_and_wait:
        sec
        rts

+freeze_end_multi_block_write:
+       jsr sd_wait_for_ready
+       lda #$06
+       sta $d680
+       jsr sd_wait_for_ready
+       rts
+

In short, use the new commands $04 (begin multi-sector write), $05 (write next sector of multi-sector write) and $06 (finish multi-sector write) when writing to $D680, the SD card control register. The changes to FDISK/FORMAT were similarly simple, again, just changing the bulk-erase function to use these new commands.  While I was there, I also implemented a little SD card reading speed test, because I was curious how fast (or slow) the SD card was going.  My 8GB class 4 card can read more than 700KB/second. One day I will implement the 4-wire interface, which will allow speeds 8x faster, but for now 700KB/sec is ample.




All up, the result is already very pleasing:  Formatting an SD card is easily 10x faster, taking less than 1 minute for an 8GB card now, instead of 10 minutes or more.  You can see how fast it is in this video:



Freezing is now also MUCH faster, reliably now only about 1 second to get to the menu on my machine here.  The biggest delay when freezing is now waiting for the monitor to resync if the frozen program was PAL (as the freeze menu is always 60Hz NTSC for better monitor compatibility).  You can see how fast the freeze menu is in the videos below:

This time, we will have the English video first, and the German one below:


Und auf Deutsch: