Tuesday, 11 June 2019

Making the internal speaker work on the MEGA65 phone/handheld

This is another of those rather long "whodunit" type posts, where I basically document the process of solving a particular problem.

When we started designing the MEGA65 phone, we did a lot of searching to find a way to give it really nice sound on the internal speaker, both so that games and demos would sound great, but also so that it can ring really loud. There is nothing worse than a phone than rings too quietly to be easily heard.

So we were pretty happy when we found a 40mm diameter and <5mm thick 2W speaker that claims peak levels of close to 100dB.  We coupled this with a nice amplifier chip that can in theory deliver enough power to make good use of this speaker. Privision was also made for stereo, although the first prototype device will have only one speaker installed, to keep life simple.

The amplifier is an SSM2518 I2C controlled digital amplifier, which means we have easy digital control via the I2C bus, both for settings, and also for setting the volume level.  We already have the I2C bus working, and can read and write its registers at $FFD7030-$FFD7042. Also, the audio cross-bar mixer has outputs setup to feed this amplifier.  So in principle, we have all the ingredients we need to make it work.  Now is the time to actually get it working.

First, let's look at the I2C configuration that we need.  There are 19 registers, only a few of which are important to us, and of those, only certain bits are important:

$FFD7030 - bit 0 = Software master power-down, and must be 0 for normal operation.$FFD7030 - bit 5 = "no BCLK". If 1, then MCLK is used instead of BCLK to generate the sample clock. Thus we want this 0, so that we can just have BCLK, and, hopefully, require one less pin.
$FFD7030 - bit 7 = software reset, and must be 0 for normal operation.
$FFD7032 - bits 5 - 6 = Serial Data Format. 01 = left-justified samples, which is what we want.
$FFD7032 - bits 2 - 4 = Serial Audio Interface Format. 000 = I2S, with left or right justification set by bits 5 and 6 of the same register.
$FFD7032 - bits 0 - 1 = Sample rate range. 10 = 32 - 48KHz, 11 = 64-96KHz. I'm not really sure what this does.  I'll also have to work out what our actual real sample rate is, as I have a suspicion that we are providing the audio at ~200KHz.
$FFD7033 - bit 7 = Generate (1) or use external (0) BCLK signal. We want 0.
$FFD7033 - bit 6 = LRCLK shape selection: 0=50% duty cycle, 1= single clock pulse. We want 0.
$FFD7033 - bit 4 = MSB first (0) or LSB first (1) in samples. We want MSB first.
$FFD7035 - Left channel volume. $00 = loudest, $FF = muted.
$FFD7036 - Right channel volume. $00 = loudest, $FF = muted.
$FFD7037 - bit 0 = master mute. 0 = unmuted, which is what we want.
$FFD7037 - bit 1 = left channel mute, as above.
$FFD7037 - bit 2 = right channel mute, as above.

Thus we want, keeping the other bits as their default values from the data sheet:

$FFD7030 = $04
$FFD7032 = $23
$FFD7033 = $00
$FFD7035 = $00
$FFD7036 = $00
$FFD7037 = $00

To test, I have Commando loaded, since it plays a tune while waiting for the game to start, and I can hear it on the headphone jack, but not from the internal speaker, even when I set the above register values.  Time to probe pins...

The audio should be on pin 10, the SDATA pin of the SSM2518, but when I poke it with the oscilloscope, there seems to be nothing there.  Am I generating the audio on the correct pin?  We can use a special bitstream I produced to test this, that plays a unique binary pattern on every pin of the FPGA, so that I can quickly verify this sort of thing. It already proved invaluable when getting other subsystems like the LCD panel and touch interface working.

Ah, interesting! When I run that bitstream with the I2C settings as above, I can hear noise on the speaker, which makes sense, since all the input lines to the SSM2518 are being driven with various wave-forms as part of this identification feature I just described. So good news, we know things are physically wired correctly, in that there is some way to get sound out of it, and that the speaker itself is working as well.

Ok, so let's find out which pin SDATA really is, and whether I have it correctly mapped. The waveform on each pin is a series of narrow spikes to indicate the time-base, and then 8 time steps with the signal high, followed by the pin number encoded in binary.  Thus the SDATA pin's waveform below means it is pin 1+2+4+64 = pin 71.



Pin #71 = FPGA pin U4, which I can confirm is connected to the i2s_speaker signal in megaphoner1.vhdl.  Now to find out where that goes, and why it is not showing any signal. It connects to i2s_speaker_data_out in machine.vhdl. This connects it to i2s_speaker_data_out in iomapper.vhdl, which connects it to the signal of the same name in audio_complex.vhdl, where it is... connected to ground.  Right. That would be a problem.

Looking through audio_complex.vhdl, there are actually quite a few problems to sort out:

1. When I wrote it, we were expecting that the headphone output would be using a similar I2S audio amplifier, and signals are being generated for that. But the headphones are in fact being fed with a circuit that is more or less identical to that of the Nexys4 boards, i.e., directly feeding a single pin for left and another for right at very high speed, and using a 3-stage low-pass filter to produce the acutual audio.
2. The audio going to the headphones is actually the audio marked for the speaker.
3. As noted above, the actual speaker output is not connected to anything.

Thus we need to (1) rename the headphone I2S output to speaker output, and (2) the speaker output to headphones, and (3) connect the freshly renamed speaker output to the actual speaker.  We can also (4) remove the duplicated output signal for the speakers that we are not using, since what was the headphone i2s output is actually what we need.  Okay, those changes weren't too hard. Now to wait the ~30 minutes for synthesis to run, to see if it has worked. Hopefully at least I will see the audio on pin U4, and if I am really lucky, the other audio control signals will all be good, and we should have audio.  I'll be able to tell you in half an hour...

Well, that doesn't seem to have changed anything.  This is quite frustrating, because I can no longer see any obvious reason why this would be the case.  The speaker output in the audio mixer must have audio, because it is what was driving the headphones before the change.  Also, the default mixer configuration from the hypervisor on powerup has both headphone and speaker output configured, hence how the headphones were working when they were actually using the speaker channel. Thus I am confident that it is not that the audio input is zero.

But if the channel being used has real audio, why are we seeing the SDATA line stay low the whole time?  An instance of i2s_transceiver is used to actually produce the signal that is plumbed through to SDATA.  It is being fed the spkr_left and spkr_right channels (both of which have both SIDs mixed in) as inputs, and the only other thing it needs to work is the i2s CLK and SYNC signals. Those two signals are also routed to the SSM2518's corresponding pins, so I can probe those in real-time, and confirm that they have sensible signals on them.  More the point, they both have regular edges, which means that the sample shifting logic in i2s_transceiver should be clocking the samples out without problem.

So some assumption in the above must be false, as otherwise we would be seeing something on the pin.  The question is what, and more the point, how can I tell which of these two parts is wrong?

Probably the first thing to try, is to feed some known waveform out on the U4 pin, but from within the audio_complex.vhdl file, so that the plumbing through to the pin can be verified.  That will at least narrow things down.  The MEMS microphone clock is handily available there, so I'll try feeding that through, and see if we then get a nice pulse-train on the pin.  Either way we will have narrowed the problem down.

Okay, so the pulse-train is visible, so the plumbing is fine.  So now the question is whether the sample data being fed to the i2s transceiver is all zeroes, or whether the i2s transceiver isn't working properly.  Another synthesis run, and I am still seeing flat-line ground output on the SDATA pin, so I presume that the i2s transceiver is not working properly for some reason.

Now, the i2s transceiver is not particularly complex: It takes i2s_sync and i2s_clk signals as timing inputs, and then the samples to be transmitted.  I was about to describe how the thing works, when I spotted what I think is the problem:  It checks for edges on the i2s_sync line to work out when to load the next sample for transmission.  However, the edge detection happens only on the detection of an edge of the i2s_clk signal -- but part of the i2s_sync edge detection was happening outside of that, which means that sync edges could get missed, resulting in the transceiver never knowing when to transmit the next sample, which would cause it to shift out zeroes forever -- which is exactly what I am seeing.  So, I'll try moving that single line of code to the right place, and see if that works...

Okay, so that fixed that problem -- we now have samples visible on the SDATA pin... But still no sound.  Just in case it was the I2C settings had been reset, I checked that, and they look fine. Indeed, running the bitstream that plays unique wave-forms on each pin, I still get noise from the speaker, so everything seems to be generally in order.  I just need to double-check that the settings are all right.

One thing that comes to mind, is that the test bitstream has a waveform on the MCLK pin as well as the BCLK pin, where as my bitstream doesn't, instead having only a signal on BCLK.  Reading again through the datasheet, it looks like we need to have MCLK regardless, but can have no BCLK, if we configure MCLK as the BCLK source.  This likely explains the silence.  So we need to (1): Configure the I2C registers for MCLK as the BCLK source; and (2) route the BCLK to the MCLK pin in the VHDL.

Finally, I am getting some sound out after having rerouted to the MCLK line, with the BCLK line idle -- although it sounds like high-frequency white-noise.  This is a good sign, and as discussed above, not unexpected after having re-read the documentation.

Now I am hoping that by adjusting the registers of the SSM2518, I might be able to get proper sound out, since it is now presumably only a matter of the sample format and frequency. But I might also need to adjust the MCLK signal, because it seems that the SSM2518 is not really designed to just receive a bit clock, but expects many more clock-ticks per sample, than there are bits in a sample.

First, $FFD7030 needs bit 5 set, to tell the SSM2518 that there is no BCLK, just MCLK.

Next comes the problem with the bits per sample:  Bits 1 to 4 of $FFD7030 set the clock:sample ratio, but the lowest ratio available is 64:1, whereas we are using something lower. In fact, I need to go through how I am generating the clock again, so that I can figure out what the current ratio is.  In i2s_clock, I generate these signals based on a target sample rate of 44.1KHz, which results in a rather irregular interval.  It seems to me that this simply can't work.

The datasheet tells us that MCLK must be between 2.048 MHz and 6.144 MHz, if we are going to use it as the source of the BCLK line.  Given that we are expected to have at least 64 BCLK cycles per sample, this gives us a sample rate of between 32 KHz and 96 KHz. 2.822 MHz would be required for 44.1 KHz sample rate, which would be rather difficult to generate from the 40 MHz input clock we have.  This would require 14.1723356 cycles per BCLK, which would be rather annoying to calculate.

Frankly, this part of the operation of the SSM2518 I am finding rather confusing and contradictory. For example, the timing diagrams for digital audio formats indicates that any number of BCLK pulses can be used per sample, which is probably what I built the VHDL implementation assuming.  To add to my confusion, the white noise I am hearing doesn't change if I change the volume settings of the SSM2518.  In fact, I can't seem to find any way to vary the sound level.  Debugging is of course hindered by the ~30 minutes it takes to synthesise.

So maybe it is time to make a simple custom bitstream that just controls the SSM2518, and tries to play some simple low-frequency signal, so that I can try to debug things.  I just found this delightful site: https://www.doulos.com/knowhow/vhdl_designers_guide/models/sine_wave_generator/, that makes it very easy to generate a sine-wave generator in VHDL.  So let's modify the pin probing bitstream to try to play a nice sine-wave tone, and see what progress we can make there, and then when we have it hopefully working without too much trouble, back port the control settings into the main bitstream.

First cut of that is done, and produces a different white noise to the regular bistream, but indeed produces some noise, so that's a start.  Unfortunately, it seems to have zero bearing with whatever I feed on the SDATA line. In fact, I can leave the SDATA line tied low, and still get the white noise.  Frustrating.  I'll have to sleep on this, to see if I can think about what might be going on.

It's now tomorrow.  My first thought is that the white noise I am hearing is some kind of artefact of the sample rate.  To test this, I am resynthesising my little test bitstream with half the sample rate of before.  If this results in a lower tone, then it will be a good clue.

While waiting for that, the other thing that I have discovered is that the audio signal being fed to the speaker is actually a square-wave signal with a time-base of ~200ns = ~5MHz.  There doesn't seem to be any filtering on it, however, to shape the noise out of the audible band.  Interestingly, if I put an oscilloscope probe on pin 6 of the SSM2518, which should be the MCLK signal, noise is introduced on the speaker.  Most curious...  What this does tell me, however, is that this thing is going to produce so much EMI noise, that it isn't funny.  The leads to the speakers will have to be shielded, at a bare minimum, and likely need ferrite beads on them to stop the EMI noise.  Probably we will need some kind of low-pass filter, similar to that on the headphone output as well, so that the acoustic noise can be removed.

Anyway, changing the sample rate doesn't seem to change the sound.  But the MCLK frequency didn't change from ~1MHz, which is probably much of the problem. We should be able to increase this quite a bit, which might be enough to push the acoustic noise well up into the ultrasonic range.  The SSM2158 can take a MCLK of upto ~38 MHz. This is a bit sad, because if it could take 40 MHz, we could just pass the 40MHz clock out.  But we can easily use 25 MHz, being half of the 50 MHz clock we have for ethernet. Let's see if that increases the time-base of the speaker output square wave, and/or pushes the white noise out of the acoustic range.

While waiting for that to synthesise, I did finally find the schematic of the SSM2518 evaluation board at https://ez.analog.com/audio/f/q-a/4096/ssm2518-evb-issue, which tellingly has a pile of filtering coils and capacitors on the speaker outputs on sheet 6.  We'll have to take a closer look at that, and potentially incorporate it onto our rev2 pcb.

Anyway, pushing the frequency up to 25MHz has changed the white-noise. It is now much quieter, but still present.  Oddly I can't pick up any clock on the MCLK pin now, although the SYNC (left/right select) signal is still running at the correct sample rate.  I am not sure if it is my oscilloscope that is the problem here, not being able to pick up the narrow pulses of the 25 MHz clock, although it hasn't been a problem in the past.  It could also be that I need to make these high-speed pins use the fast slew option of the FPGA to get good enough signal integrity.  It's certainly worth a try.

Ok, so using fast slew and 24mA drive strength has made MCLK visible, and also stopped the funny sound artefacts when I probe it, which confirms that it is probably what the problem was.  The noise is still there, but relatively quiet.  Probing the speaker output line confirms that the time-base of the audio signal is now much higher, which explains the reduced volume of the white-noise.  This pretty much confirms that we need some acoustic and EMI-rejection filtering between the SSM2518 and the speaker.

It might be that the same filter circuit we use for the headphones output will work fine, as previously mentioned. Because the speakers connect via a header, we can try some different things here, without having to re-spin the pcb.  We could even make a little daughter-board that fits onto those connectors, and also has a couple of the other bodge fixes that we have implemented lately, so that the prototype device can be a bit more robust, until we make the rev2 device(s) later in the year.

Now, back to trying to get some sensible sound out, I have re-enabled the sine-wave generator, but still just getting the high-frequency noise. At this point, it is possible that the I2C configuration is wrong again, as I have powered everyhthing down again, and only set the bit to clear the mute flag. To change this, I have to load (but thankfully not synthesise) the normal bitstream, so that I can talk to the I2C bus via its memory-mapped registers.  Loading that up, I was  immediately hit by how much worse and high-pitched the acoustic noise is without the increased MCLK frequency.  So I am at least achieving something.

So now the question is whether we need filtering before we can even get any useful sound out, or whether it is only needed to get rid of the noise.  My feeling is the latter.  What I really want to do, is to some how quantify whether the SSM2518 is taking any notice of my samples, or whether it is just putting random samples out.

On that topic, the FPGA is certainly outputting what looks like valid samples, with the correct 1 cycle delay after the SYNC signal toggles, as this shot shows (apologies for the poor quality, trying to get the probes to hold on, and hold the camera at the same time requires more appendages than I possess, and while I have been known to pull my socks on without using my hands, there are limits):


What was interesting, is that in the process of trying to get this shot, I accidentally touched MCLK and the SYNC lines together, and then there was some different noise -- so the SSM2518 is clearly listening for something.

Anyway, let's try to revise what settings we need to accept this sample format:  It is "standard i2s", i.e., the sample occurs just after the SYNC line toggles, not just before it.  We have the most significant bit of the sample first.

$FFD7030 = $20 (use MCLK as BCLK, don't mute, ignore BCLK/sample ratio, since we will specify I2S format later)
$FFD7031 = $00 (no EMI reduction/sound quality trade-off, enable automatic sample  rate detection)
$FFD7032 = $02 (I2S audio format, 32-48KHz sample rate)
$FFD7033 = $00 or $80 (either using real (0) or internally generated BCLK(1) signal, 50% duty cycle expected on SYNC line, MSB comes first in serial data).  Here it is not clear to me if we should be using the "real" BCLK, if we are telling it to use MCLK as BCLK.  My gut feeling is that, yes, we should, as otherwise BCLK will be generated using the BCLKs/sample frequency ratio.
$FFD7034 = $10 (left and right channel mappings as default)
$FFD7035 = $00 (left channel maximum volume)
$FFD7036 = $00 (right channel maximum volume)
$FFD7037 = $80 (not muted, no fancy filters)
$FFD7038 = $0C (auto-restart on over-current and related conditions)
$FFD7039 = $80 (set high-performance mode, and don't automatically power own)

Okay, in trying those out, I have discovered that the BCLK/sample ratio is being used.  Choosing a larger value results in louder and lower-frequency white-noise.
Also, discovering the example driver for the SSM2518 from microchip, I was led to the values for $FFD7038 and $FFD7039. The latter in particular sets the high-performance mode, which seems to get rid of the acoustic noise, so that's a good thing.

However, there is still no sound to be heard, even though there is clearly sample data being fed to it in the I2S format, with a working SYNC/LRCLK signal.  In short, I am now fairly confident that the audio signals I am feeding it are correct, and the I2C settings are also correct -- but still no sound.


So, now I am trying to set the I2S clock generation to exactly match the 64 cycles per sample mode that it explicitly supports, in the hope that this might get it working.  Again, I can see a nice clear SYNC/LRCLK signal, and I can see the SDATA lines, with the MSB first, and the sine table values cycling through.  But still no sound.

More hunting around on the internet. Found this: https://analogdevices.telligenthosting.net/audio/f/q-a/4147/ssm2518-test/3695. This at least has a table that shows how to get the MCLK line to be used to provide BCLK directly.   This confirms that $FFD7033 should be $00, not $80 (i.e., BCLK_GEN=0), so that BCLK is simply a copy of MCLK.

More hunting through the datasheet: It turns out that in this mode, MCLK must be between ~2 and 6 MHz, so I will now modify the I2S clock generator to generate a 5MHz clock, and use 64 cycles per sample, giving a sample rate of ~78 KHz.

Again, silence (not even static noise now, which is nice), unless I bridge the MCLK and SDATA pins, in which case there is nice loud static.  Most weird, but I feel that I am getting closer to a solution.

Is it something stupid like incorrect pin assignment?  Well, first, lets see if it is the MCLK or the SDATA line that needs the signal from the other, by first connecting the MCLK line to the SDATA line internally, so that MCLK ~5MHz clock appears also on the SDATA line.

First attempt at this is causing a quite loud click, and then the FPGA de-programs, presumably because the power rail sags too low. This is probably a good sign that it is trying to drive the speaker loudly.  I'll turn the speaker volume down a bit, to avoid that, which is just done via the I2C registers.

Ok, so by putting the clock on the SDATA pin, I can make an absolute racket, so that even at reduced volume level, it is really loud.  It's no wonder that it was making the power rail sag at full volume.

The question is now exactly what format it is expecting the audio, to get it to play something legible.  Anyway, as much for my rememberance as anything, here is the current register settings:

:0FFD7030: 20 00 02 00 10 50 FF 80 0C 80

The 50 is the volume for the channel with the speaker on it, and at that level it is already plenty loud enough if I run that bitstream that puts MCLK on SDATA.

So, now we know that the only problem is with the format of the audio, not anything else.

Trying the four settings for SDATA_FMT, I2S standard and left-justified are both silent, although there is an audible pop between them, suggesting that they are interpreting the signals differently.  Right-justified formats on the other hand, produce static.  16-bit right-justified is quite a bit louder than 24-bit right-justified.

Now if I enable the "LSB first" bit, the behaviour is different: Now the left-justified (I2S standard and true left-justified) make some sound, with true left-justified louder. The right-justified modes are now silent.

This makes me think that there is something funny with the interpretation of either the LSB/MSB-first and left/right justification interpretation.  What would be really handy right now, would be to be able to see someone else's example waveform that they use to feed an SSM2518, as I am sure it must now be some stupid simple error.

Well, I guess the next step is to work out which part of the 32 BCLK counts in each sample that is being used.  To test this, I produced a bitstream that moves a single bit through all the possible positions, and there was no noticeable difference in the sound.  So now I am trying to vary the number of bits set in each encoded sample, to see if that makes any noticeable difference.  Actually, there is some subtle difference in the background noise when the single set bit was at the start, but I can't make anything else of it.

Basically the chip seems to be behaving rather randomly.  Which just reminded me: This board did get fried with 6.45V on VCC_FPGA early in its life, and it is possible that this chip might have got damaged in the process.  In fact, it is quite possible.  Okay. On that note, it is time to give up for the night, and try to replace the chip in the morning, as it will need the SMT reflow facilities at work to do (and someone who is skilled in driving them).

Okay, we replaced the chip, loaded Commando to test, and then started setting the SSM2518 I2C registers, and suddenly, the sounds of success!

Before I lose them, here are the register settings that have working sound:

:0FFD7030:C00002001080FF800C997C5B57898C77

It was a very pleasant and welcome suprise that everything started working once I had the chip replaced.  I'm not sure what I would have tried, had it not worked.  I'd put a picture of it working, but that doesn't really work for sound...

Now the main remaining problem is that if I make the volume too loud, the whole FPGA resets, presumably because the amplifier suddenly sucks too much current, and the VCC to the FPGA sags too low.  I'll need to think of a way to confirm and fix that, if that is the problem. But for now, I am happy with the progress.

Edit: Here is a short video I made at home of it playing the music in Nebulus:

This was just captured using my phone, so don't go expecting hifi audio quality, but it is pretty obvious that it is working with acceptable audio quality.

Saturday, 8 June 2019

Manufacture of pre-series keyboards

As we have previously reported, the beautiful mechanical keyboards for the MEGA65 are being designed and manufactured by GMK, Germany.  Well, the batch of 25 pre-series keyboards are being manufactured as I type! For those who are worried about Fake News, here is the evidence:


Ah, the sound of a chicken-picker warms the soul.  As does excessive photos of MEGA65 parts being made. Again, we are super-grateful for GMK's support of the MEGA65, and I am very much looking forward to holding these latest keyboards in my hot little hands in a couple of weeks time, when I will again meet with the M.E.G.A. folks in Germany.













Wednesday, 5 June 2019

MEGAphone prototype is taking shape

We are currently waiting for various things to arrive to be able to continue work on the MEGA65 desktop computer. This has let us spend a bit of time working on the handheld, which is largely being driven by student projects here at the University.  If you are hanging out for the desktop MEGA65, don't fear! As soon as we have the hardware we are waiting on, we will be back to making progress on it!

We have been progressively figuring out the problems with the first PCB we had built for the MEGAphone, and worked out a few more things we had wrong, including:

1. RGB pins are mis-wired on the LCD cable connector, resulting in 1/4 brightness, because the two most significant bits of each colour channel are tied to ground. This is REALLY annoying, but we think we can work around it.
2. The viewing angle on the LCD is not that great, so I am talking to another Chinese supplier to get one that is at least 2x brighter than the current one.  Thus we should end 8x brighter than in the shots here.
3. The touch interface is now up and running, although we don't currently have any nice software to show you using it, until we get the cellular modem in, and can run the telephony software from the previous bench prototype.
4. Lots of little bodges and fixes with power supplies and other things.
5. More work on the temporary laser-cut case.

The net result is that we have something that is beginning to look like a handheld console.  The screen is about the same size as on an SX64, but you can actually read text on ours ;)  Anyway, here is some eye-candy showing it booted and running a couple of games:






Also, because the touch screen and buttons are working, it is possible to press the right-most black button below the screen to bring up the on-screen keyboard, and then use it to type!

We will likely make the boxes around the characters on the on-screen keyboard a bit thicker so that they are clearer, and I'll also have a think about making the symbols on the keys double-width to make them easier to read from a distance, although they are already quite clear.  You can also see how the key you are pressing highlights, so you know you got the right (or wrong) one.  There are still some calibration issues to work through, but it already works well enough for the most part.  The main hassle is it is currently easy to end up pressing HOME instead of DEL.  But this is just a matter of tweaking the various coefficients in the touch screen translator. That's all in our own VHDL, and thus freely editable by us -- one of the many joys of open-source.

Anyway, hopefully this has given you something nice to look at, while we all wait for the desktop MEGA65 to be finished (which will be as fast as we can do it, have no fear!)

Tuesday, 28 May 2019

Laser-cutting a spacer for the MEGAphone prototype

Together with my students we have continued to make progress on the prototype MEGAphone. I've spoken previously about the current state off the PCB, which as a first revision has a few corrections and bodge-fixes required. This means we have some funny lumps and bumps on it.  Also, we know we need to lift the screen 2mm off the PCB because the red LEDs are too close to each other. So I decided to make a laser-cut spacer to lift the screen. Actually, I will make a set of laser-cut layers that I will stack on top of each other to hold all the top-side parts in place, mainly the screen and buttons and speaker.  This will suffice as a quick-and-dirty case for the prototype, since there is little point making a more sophisticated case for a one-of.

I spent a couple of hours with a ruler and the PCB to work out where all the holes in the plastic spacer should be, and was ready to hit the laser cutter. However, caution says one should test first, before cutting plastic, so I did several test runs just cutting a sheet of paper. Here is the laser cutter doing its thing. Sorry for the poor quality images, I only had my phone as camera, and its camera is getting a bit sad these days.


This turned out to be a great strategy, as I could layer the paper over the PCB, and check the fit of all the cut-outs and holes:



There were quite a few little fiddly bits to get right, as well as some insights from the paper templates that were helpful. For example, the very thin bridge between the buttons I have resting the hole here and the collection of bodges was clearly too thin to work in the plastic:

That was fairly easily fixed with a bit of creative line drawing:


Then a bit more fit testing with the buttons in place:


It's starting to look right :)


Then it was finally time to cut the template from clear 2mm thick acrylic, lay it in place, confident that it would fit, and put the screen and buttons in place so that I could see how it would look:

For a quick-and-dirty one-off this is lookinig pretty nice, I think.  Next step is to cut the layer that will go on top of this, and hold the buttons in place with smaller holes only big enough for the plastic parts, like the blue joy pad here.  It will also hold the rear of the screen in position.  Then another layer will go on top of that, that will have the screen cut out big enough for the black frame around it, and slightly smaller holes again, so that the joypad and fire buttons can't fall out.  I might jet do that one in beige, as I think that will look really nice.  Then I will probably put a couple of clear strips down either side of the screen, so that the screen itself can't fall out, and that will be the top-side of the prototype case.

I'll then repeat much the same process for the rear-side, so that there is an enclosed compartment for the battery.  I'll likely use some spacers bushes on the rear-side, so that I don't need to have too many layers.  Three should be enough, so that the battery is safely held in place.  Then the whole thing will be held together with some screws that go through several 3mm holes that will be present in all the laser-cut sheets, and that line up with matching holes in the PCB.

MEGAphone screen testing

Just a quick post about recent testing of the interface for the screen on the MEGAphone rev1 PCB.  

First, we knew once the PCBs had been ordered that the screen connector was in the wrong place, and will have to be shifted sideways.  Then a bit further exploration revealed we had the display enable line tied low instead of high, which had been causing the display to remain blank. With that fixed (seethe thin blue wires near the screen connector), and a bit of the usual jiggery-pokery, we finally had an image on the screen:




 We then also found that the viewing angle of these particular screens is all wrong for us: It expects the screen to be used upside-down.  When viewed from the wrong angle it is quite dim, and there is blurring and doubling of the image, as the various anti-glare and polarisation filters are all misaligned from that perspective, as the following images show.

However, as the MEGAphone and MEGA65 have no frame buffer, we can't just easily invert the image.  So we are looking for another supplier of compatible screens.  This is not too hard, as these 800x480 4.3" screens are a dime a dozen, and are manufactured by a whole bunch of Chinese companies.  We are talking with a supplier now about one that not only claims full 180 degree viewing angle, but also 2x brightness. This will be very welcome.


Just for fun, we also enabled the on-screen-keyboard to see how it looks. We might still need to darken the background behind the keyboard (it is composited over the top of the regular display) a bit more.  I'll make a video of this when we get a bit further, as the animation of it appearing looks quite nice already. I also hooked up one of the buttons so that pressing the button automatically makes the keyboard appear or disappear: No software cooperation required!

Anyway, that's it for now. I did say it would be a short one :)


Wednesday, 8 May 2019

Working to reduce the attack surface for copyright infringement claims against our open-source C64 ROMs

As I explained yesterday, we have begun making an open-source set of ROMs that can be run on a C64 or compatible computer.

Today, I want to explain a little about some of the specific measures we are taking to avoid any possibility of copyright infringement. In particular, we are going what we believe to be above-and-beyond to ensure that our alternate ROMs are free of any potential claim of copyright infringement.

There are two reasons for this, that rest in the primary reasons for establishing this project in the first place:

1. To ensure that the rights holders of the original C64 ROMs can quickly determine that they don't need to worry about us infringing on their proprietary rights. We don't want to waste their time or effort.

2. To ensure that users of our alternate ROMs can do so with maximum confidence.  I say maximum and not complete confidence, because with anything legal, nothing is truly certain.

In short, the project exists to protect the rights holders of the original C64 ROMs interests, by giving the community a clear option for running emulators and C64 compatible computers, without potentially infringing on any proprietary rights.

So, with this in view, our approach is one of an "abundance of caution".  That is, in many places and ways, we are being way more cautious than we believe the law would require us to be.  This is, again, so that we can establish a clear moat around our work, so that as far as is possible, all doubts can be excluded, both for the original C64 ROM rights holders, and for us. 

Simply having an argument that our work is free of infringements is not enough for this approach, however sound that argument might be. Rather, we want to remove any possibility of arguments that we have infringed, and where possible erect multiple unasaillable arguments that no infringement has occurred. Put another way, we want to have multiple walls around our castle, and at the same time, work carefully to make sure that there are no secret passages, caves, wells leading to underground rivers or any other thing that would undermine the intellectual property fortress we are building.  So lets have a look at some of those defences:

Use source control!

The first, is to simply use source control.  The reason for this is that it creates a history of the creation of our ROMs, complete with all the mess-ups and steps along the way. That history goes back to the first lines of code written, and provides a strong line of evidence for the creation of our ROMs from scratch. 

Thus should any segment of the ROMs end up being similar or identical to that of the original C64 ROMs, we can demonstrate that such similarity occurred through cooincidence or necessity (there are only so many ways to do certain things).

To make this defence as strong as possible, the mantra of "commit early, commit often" is vital, so that the in-between steps as code is written are recorded.  If we only commit finished routines, then there is no evidence of working that could be used to substantiate the claim of originality.

Commit messages should also indicate the reason for implementing new functionality or locating routines at specific addresses, e.g., "Implement routine XXX at location YYY required by game ZZZ".  Thus we end up with commit messages like:

commit 810dfb75afc59a1349a4fc83e962ef8357bc1ee1
Author: Paul Gardner-Stephen <paul@servalproject.org>
Date:   Sun May 5 16:31:24 2019 +0930

    put setup VIC-II routine at $E5A0 for Advanced Pinball Simulator. Issue #23
This is a good message, because it explains what was done in terms of an action that would natrually increase similarity with the original, in this case by putting a routine at the same position as in the original ROM, but then justifying that by identifying a piece of software ("Advanced Pinball Simulator" in this case) that requires it to be there, typically because the software directly calls that address, and tags an issue where the compatibility problem was reported.

That is, we start with a simple ROM re-implementation has is so manifestly different and incomplete that it is obviously not an infringing work, and then refine it over time based on clear compatibility improvement justifications, so that similarities with the original ROMs over time will be explained at every step along the way.

Indicate sources, causes and reasons in comments in the source

Also, all through the source we should indicate the source of information that led to the writing of a particular snippet of code.  That might be a page out of the C64 Programmer's Reference Guide, Compute's Mapping the 64, or evidence gained from running an emulator or instrumenting a real C64 to discover the entry points into the ROMs used by software that we wish to be compatible with.  This helps to remove any claim that we have simply copied functionality from the C64 ROMs by copying code. 

Instead, by demonstrating that there is a need for a screen-clear routine at $E544 by referencing such sources, we are showing that this is a functional requirement of the C64 ROM, and must be implemented.  Using that routine as an example, look at the number of references in this single routine:

    ;; Clear screen and initialise line link table
    ;; (Compute's Mapping the 64 p215-216)

clear_screen:   
   
    ;; Clear line link table
    ;; (Compute's Mapping the 64 p215)

    lda #$00
    ldy #24
clearscreen_l1:   
    sta screen_line_link_table,y
    dey
    bpl clearscreen_l1

    ;; Y now = #$FF

   
    ;; Clear screen RAM.
    ;; We should do this at HIBASE, which annoyingly
    ;; is no ZP, so we need to make a vector
    ;; (Compute's Mapping the 64 p216)
    ;; Get pointer to the screen into current_screen_line_ptr
    ;; as it is the first appropriate place for it found when
    ;; searching through the ZP allocations listed in
    ;; Compute's Mapping the 64
    sta current_screen_line_ptr+0
    lda hibase
    sta current_screen_line_ptr+1
    ldx #$03        ; countdown for pages to update
    iny             ; Y now = #$00
    lda #$20        ; space character
clearscreen_l2:
    sta (current_screen_line_ptr),y
    iny
    bne clearscreen_l2
    ;; To draw only 1000 bytes, add 250 to address each time
    lda current_screen_line_ptr
    clc
    adc #<250
    sta current_screen_line_ptr
    lda current_screen_line_ptr+1
    adc #>250
    sta current_screen_line_ptr+1
    lda #$20        ; get space character again
    dex
    bpl clearscreen_l2

    ;; Clear colour RAM
    ;; (Compute's Mapping the 64 p216)
    lda text_colour
clearscreen_l3:   
    sta $d800,y
    sta $d900,y
    sta $da00,y
    sta $db00-24,y        ; so we only erase 1000 bytes
    iny
    bne clearscreen_l3

    ;; (Compute's Mapping the 64 p216)
    jmp home_cursor

As can be seen, almost every single action in the routine is justified in terms of public information about what this routine must do.

Automatically scan for any identical byte sequences of > 2 bytes

Finally, we have created a tool that looks for any string of at least 3 bytes length that matches between two files.  We use this to find any byte sequences that match with the original ROMs, even if the matches are not in the original location.  Then for each such match, we provide an explanation in a file in the strings/ directory, where the name of the file is the sequence of matching bytes written in hexadecimal. 

Once an explanation has been provided in one of these files, the match is no longer reported by the tool, unless run with --verbose, in which case all the string explanations are shown. This allows for a quick report to be generated that provides an explanation of every significant match. 

There are strong arguments why considering matches of only length 3 bytes is excessive, as it is practically impossible to imagine a three byte sequence in the C64 ROMs that could be copyright.  But again, out of an abundance of caution we explain even those matches, including when it is just random fragments of CPU instructions, so that a cursory examination of our software can in just a few minutes hopefully satisfy any rights-holder that no infringement has occurred.

This library of reasons also provide a strong defence in the event that any claim of infringement were nonetheless to be made:  First, it demonstrates our candour, in that we are not hiding the matches, but making them public -- including to any rights holders. Second, when a claim of infringement is made, it makes it very easy for us to respond by asking which bytes are infringing, and to then point the claimant at the reasoning why those bytes do not constitute an infringement.  The onus is then on them to justify why the explanation is not adequate, and it is likely that any court would accept an argument from our side for any claim to be immediately dismissed, and perhaps with prejudice (which means that they can't raise the same claim again), because we will have ready-at-hand prima-facie evidence that there is in fact no infringement.

As an example of how comprehensive even the short-form report is, that displays only the first line of information about why there is no infringement is, here is the current output when comparing the original KERNAL and our ROMs at the time of writing:

$ src/similarity kernal newrom verbose
Searching files for similarities...
Ignoring $0012 = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $0017 = $47A6 + 3 (fragment / SEC / SBC #$xx)
Ignoring $0018 = $02F2 + 3 (SEC / SBC #$01 - subtract 1 from A.)
Ignoring $0112 = $0BDE + 4 (JSR $FFCF / branch based on C flag)
Ignoring $0122 = $4926 + 3 (Fragment / RTS / JSR)
Ignoring $0136 = $1018 + 4 (Push memory location to stack)
Ignoring $013C = $101E + 4 (Fragment + Load X from memory location.)
Ignoring $01A7 = $0B82 + 4 (Store X and Y in ZP location pair)
Ignoring $020A = $4CBB + 3 (fragment / PLA / PLA)
Ignoring $0299 = $059F + 3 (EOR #$FF / STA $xx)
Ignoring $02ED = $09AD + 3 ($05 byte after some leading $00s)
Ignoring $037D = $0A23 + 4 (Instruction fragment, put zero into ZP location somewhere.)
Ignoring $038C = $029C + 3 (Conditionally JMP somewhere.)
Ignoring $03A2 = $02E9 + 6 (increment a pointer in ZP)
Ignoring $03B3 = $0D6A + 3 (substract $30 from A. Not copyrightable.)
Ignoring $040A = $0F9A + 4 (SEC / JSR $FF99 - Read top of memory using public KERNAL API)
Ignoring $0417 = $0F93 + 4 (fragment + TYA + STA ($nn),Y)
Ignoring $0430 = $0531 + 4 (subtract something from a ZP location)
Ignoring $0435 = $0536 + 4 (Subtract the contents of one ZP location from the contents of another.)
Ignoring $0461 = $0008 + 3 (First three letters of BASIC)
Ignoring $0462 = $055A + 5 ("ASIC ")
Ignoring $048B = $0008 + 3 (First three letters of BASIC)
Ignoring $048C = $055A + 7 (BASIC V2 string)
Ignoring $04DA = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $04DB = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $04DC = $4520 + 3 (Fragment / STA ($F3),Y)
Ignoring $0518 = $4A3A + 5 (Setup VIC-II registers and load A with 0)
Ignoring $0574 = $46B0 + 5 (CLC / ADC #40 / STA $D3)
Ignoring $0575 = $48CB + 4 (ADC #40 / STA $D3)
Ignoring $0594 = $4B9B + 3 (Fragment of JMP instruction.)
Ignoring $0599 = $4A39 + 4 (instruction fragment followed by JSR to VIC-II register setup.)
Ignoring $05BB = $4A03 + 5 (fragment / STA $0277,X / INX)
Ignoring $05C8 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $0606 = $45EC + 4 (LDA ($D1),Y / CMP #$20)
Ignoring $060E = $49D7 + 4 (fragment / INY / STY $C8)
Ignoring $0632 = $4E29 + 3 (Fragment of sequence to save registers on the stack. Not copyrightable.)
Ignoring $0633 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $0634 = $498B + 5 (TXA / PHA / LDA $D0 / BRANCH somewhere)
Ignoring $063A = $44FA + 5 (LDY $D3 / LDA ($D1),Y / STA $xx)
Ignoring $0654 = $46C2 + 3 (INC $D3 / JSR somewhere)
Ignoring $0676 = $4A83 + 3 (PLA / TAX / PLA)
Ignoring $0680 = $5193 + 4 (LDA #$FF / CLC / RTS)
Ignoring $0682 = $4C8B + 3 (CLC / RTS / CMP #$xx)
Ignoring $069A = $472F + 5 ( FRagment / BRANCH <skip next instruction> / DEC $D8)
Ignoring $06AA = $472E + 4 (LDA $D8 / BEQ <skip 2 byte instruction>)
Ignoring $06B0 = $4A83 + 3 (PLA / TAX / PLA)
Ignoring $06B3 = $4EBA + 3 (CLC / CLI / RTS)
Ignoring $06E8 = $46B0 + 4 (CLC / ADC #40 / STA $nn)
Ignoring $06F6 = $480E + 3 (fragment / DEC $D6)
Ignoring $06F7 = $4631 + 3 (DEC $D6 / JSR $nnnn)
Ignoring $0717 = $45B7 + 6 (STA $D7 / TXA / PHA / TYA / PHA)
Ignoring $0719 = $5119 + 4 (Preserve registers on stack)
Ignoring $071A = $0A4B + 4 (Push Y and A onto the stack. Load A with something.)
Ignoring $0729 = $45D2 + 4 (fragment + branch based on comparison of A with constant.)
Ignoring $072E = $45DD + 4 (JSR (output carriage return) + fragment)
Ignoring $0762 = $464A + 17 (Copy screen + colour RAM to the left one place)
Ignoring $0773 = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $0776 = $0B80 + 3 (Fragments of instructions. Not copyrightable.)
Ignoring $0777 = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $0778 = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $0779 = $47EB + 3 (Fragment / STA ($F3),Y)
Ignoring $07A1 = $4631 + 3 (DEC $D6 / JSR $nnnn)
Ignoring $07F4 = $45EC + 4 (LDA ($D1),Y / CMP #$20)
Ignoring $080A = $460A + 7 (Copy screen + colour RAM to the right one place)
Ignoring $0810 = $4604 + 7 (Copy screen + colour RAM to the right one place)
Ignoring $0816 = $4610 + 5 (DEY / CPY $D3 / BNE <backwards>)
Ignoring $081B = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $081E = $0B80 + 3 (Fragments of instructions. Not copyrightable.)
Ignoring $081F = $4509 + 5 (LDA $0286 / STA $F3)
Ignoring $0820 = $4740 + 4 (something with $0286 / STA ($F3),Y)
Ignoring $0821 = $47EB + 3 (Fragment / STA ($F3),Y)
Ignoring $083C = $46CE + 5 (LDA $D3 / SEC / SBC #40)
Ignoring $083D = $48FA + 4 (Fragment / SEC / SBC #40)
Ignoring $083E = $47A7 + 3 (SEC / SBC #40)
Ignoring $084F = $48D7 + 4 (Fragment followed by LDA #$00 / STA $xx)
Ignoring $086D = $46FA + 5 (JSR $E544 (clear screen) surrounded by fragments)
Ignoring $0880 = $491B + 4 (INX / CPX #$19 / BRANCH somewhere)
Ignoring $08A8 = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $08B5 = $48F3 + 4 (LDA #$27 / CMP $D3)
Ignoring $08BA = $0EEF + 3 (Fragment + CLC + ADC fragment.)
Ignoring $08DA = $497B + 16 (List of C64 colour codes)
Ignoring $0916 = $4914 + 4 (LDX #$00 / LDA $D9,X)
Ignoring $09CB = $4551 + 3 (SOMETHING $0288 / LDA )
Ignoring $09D4 = $47B4 + 4 (LDA ($AC),Y / STA ($D1),Y)
Ignoring $09D8 = $47B0 + 4 (LDA ($AE),Y / STA ($F3),Y)
Ignoring $09E5 = $47D0 + 4 (LDA $AE / STA $AD)
Ignoring $09FA = $4551 + 4 (Most likely LDA $0288 / STA $D2 - Set upper half of screen RAM address)
Ignoring $0A0A = $4558 + 4 (LDA #$20 / STA ($D1),Y - Write a space onto screen memory)
Ignoring $0A1C = $451A + 4 (LDY $D3 /  STA ($D1),Y)
Ignoring $0A24 = $4950 + 6 (STA $D1 / LDA $F3 / STA $D2 - manipulate screen pointers)
Ignoring $0A52 = $4504 + 5 (LDA ($F3),Y / STA $0287)
Ignoring $0A7F = $4A7F + 10 (Tail end of standard CIA-triggered interrupt, followed by start of another routine)
Ignoring $0A81 = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $0A82 = $4762 + 3 (PLA / TAY / PLA / TAX sequence)
Ignoring $0A88 = $4B09 + 4 (Fragment + STA $028D)
Ignoring $0AAA = $4B87 + 4 (PHA / LDA $DC01)
Ignoring $0AC1 = $4BBB + 6 (ORA $028D / STA $028D)
Ignoring $0AC3 = $4BCE + 4 (Fragment + STA $028D)
Ignoring $0B35 = $4ABF + 5 (Use X register to compare contents of a ZP and absolute address location)
Ignoring $0B3C = $4A04 + 4 (STA somewhere offset by X, increment X -- Loop fragment)
Ignoring $0B47 = $4B01 + 4 (RTS + LDA $028D)
Ignoring $0B49 = $4BEE + 5 (something with $028D / CMP #$03 / BNE somewhere)
Ignoring $0B59 = $4C00 + 8 (LDA $D018 / EOR #$02 / STA $D018)
Ignoring $0B80 = $4CC4 + 16 (List of key codes generated when certain key combinations are pressed)
Ignoring $0B91 = $4CD5 + 36 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BB6 = $4CFA + 5 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BC2 = $4D05 + 9 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BC3 = $4D46 + 7 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BED = $4D30 + 9 (List of key codes generated when certain key combinations are pressed)
Ignoring $0BF7 = $4D3A + 5 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C03 = $4D05 + 8 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C04 = $4D46 + 14 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C13 = $4D55 + 16 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C24 = $4D66 + 19 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C39 = $4D7B + 4 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C80 = $4D8D + 7 (List of key codes generated when certain key combinations are pressed)
Ignoring $0C88 = $4D95 + 24 (List of key codes generated when certain key combinations are pressed)
Ignoring $0CCD = $0999 + 4 (A single $08 byte in a field of $00's)
Ignoring $0CDF = $0033 + 3 (3 bytes of ascending value)
Ignoring $0CE7 = $4E74 + 4 (The string "LOAD". Not copyrightable)
Ignoring $0D2E = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0D2F = $51B2 + 7 (Set a bit in $DD00)
Ignoring $0D85 = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0D89 = $51AB + 5 (set a bit in $DD00, Return from subroutine, read $DD00)
Ignoring $0D9B = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0D9E = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0DBF = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0DC0 = $5180 + 6 (clear a bit in $DD00)
Ignoring $0DC4 = $51A3 + 3 (Do something with $DD00 and return from sub-routine.)
Ignoring $0DEE = $516D + 3 (RTS / SEI / JSR somewhere)
Ignoring $0DF3 = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0DF4 = $51B2 + 7 (Set a bit in $DD00)
Ignoring $0E2F = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $0E83 = $50F5 + 3 (CLC / RTS / JSR $xxxx)
Ignoring $0E84 = $50D3 + 6 (Return from sub-routine, clear a bit in $DD00)
Ignoring $0E86 = $50DF + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E89 = $4FF6 + 3 (instruction fragments.)
Ignoring $0E8B = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E8C = $51AF + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E8D = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0E8E = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0E8F = $51BB + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E91 = $51AB + 9 (set a bit in $DD00, Return from subroutine, read $DD00)
Ignoring $0E94 = $51A3 + 7 (Do something with $DD00, return from subroutine. Do something else with $DD00)
Ignoring $0E95 = $51CA + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E96 = $50D3 + 5 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0E98 = $50DF + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $0E9D = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E9E = $51AF + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0E9F = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $0EA0 = $4FC2 + 4 (Load A from $DD0D, then OR it with something)
Ignoring $0EA3 = $50D9 + 5 (Set a bit in $DD00.)
Ignoring $0EA6 = $51A3 + 6 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0EA7 = $51AF + 5 (do something with $DD00, then RTS, then read $DD00)
Ignoring $0EA8 = $50D3 + 4 (Return from subroutine, then read $DD00 , and do something with it.)
Ignoring $1011 = $0295 + 3 (Instruction fragment + CLC + RTS)
Ignoring $1012 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1047 = $5DBF + 4 ($nn -> $nnnn)
Ignoring $1084 = $50F5 + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1091 = $5181 + 3 (clear a bit and store result some where.)
Ignoring $109C = $51B4 + 3 (Or A register with $08 and store result somewhere.)
Ignoring $10C5 = $4DE5 + 3 ("OR " string, being part of "ERROR ")
Ignoring $10CA = $4DDA + 9 (The string "SEARCHING". Not copyrightable.)
Ignoring $10D0 = $4E78 + 3 ("ING", fragment of "SEARCHING FOR". Not copyrightable.)
Ignoring $10D4 = $4DE4 + 3 (The string "FOR". Not copyrightable.)
Ignoring $1107 = $4E74 + 6 (The string "LOADIN", part of LOADING. Not copyrightable.)
Ignoring $1112 = $4DE0 + 3 ("ING", fragment of "SEARCHING FOR". Not copyrightable.)
Ignoring $1136 = $0988 + 3 (Part of JSR $FFD2 / INX)
Ignoring $113C = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $1155 = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $11AB = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $11D0 = $500B + 3 (Instruction fragment, PLA, instruction fragment. Single instruction. Not copyrightable.)
Ignoring $11D6 = $500B + 3 (Instruction fragment, PLA, instruction fragment. Single instruction. Not copyrightable.)
Ignoring $11DF = $45B9 + 4 (Preserve registers on stack)
Ignoring $11E0 = $0A4B + 3 (PHA / TYA / PHA )
Ignoring $11FD = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $11FE = $4762 + 4 (Restore Y and X from stack, load A from somewhere.)
Ignoring $1294 = $0294 + 4 (conditionally execute CLC + RTS, i.e., conditionally return success from a routine.)
Ignoring $1295 = $0ADA + 3 (Instruction fragment + CLC + RTS)
Ignoring $1296 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $129B = $498B + 3 (TXA / PHA / LDA $nn)
Ignoring $130C = $0295 + 3 (Instruction fragment + CLC + RTS)
Ignoring $130D = $0D22 + 5 (Return from routine with success, Store $00 somewhere in ZP.)
Ignoring $130E = $0758 + 4 (End of routine followed by $00 -> ZP location)
Ignoring $132E = $0758 + 4 (End of routine followed by $00 -> ZP location)
Ignoring $13D3 = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $13F1 = $4CBC + 3 (PLA / PLA / JMP)
Ignoring $1418 = $097D + 4 (branch based on whether X register holds the number 4.)
Ignoring $1483 = $5DAB + 4 (Put $7F -> $xx0D)
Ignoring $14FB = $0301 + 4 (fragment / BNE <skip following JMP instruction> / JMP somewhere)
Ignoring $15CA = $0988 + 3 (Part of JSR $FFD2 / INX)
Ignoring $164D = $50D7 + 3 (Clear a bit in A and do something with it.)
Ignoring $168D = $0F30 + 3 (CLC RTS + instruction fragment.)
Ignoring $16CE = $4B24 + 5 (Fragment, CPX $DC01, BRANCH)
Ignoring $16CF = $4B7B + 4 (Fragment, CPX $DC01, BRANCH)
Ignoring $1724 = $0985 + 5 (Or A with $30 and print result)
Ignoring $1727 = $0690 + 3 (Probably JSR $FFD2 followed by PLA)
Ignoring $172A = $0298 + 3 (SEC + RTS + fragment of next routine)
Ignoring $175A = $0988 + 4 (fragment of string printing loop (JSR $FFD2 / INX / CPX ... ))
Ignoring $17A7 = $4DEB + 4 (take a branch based on comparison of Y register and memory.)
Ignoring $180B = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1836 = $0B0F + 3 (CLC / RTS / JSR $xxxx)
Ignoring $1947 = $5DC3 + 4 (Fragment followed by LDA $DC0D)
Ignoring $1A1A = $029C + 3 (Conditionally JMP somewhere.)
Ignoring $1A49 = $45D6 + 4 (Fragment + store $00 somewhere into ZP.)
Ignoring $1A85 = $48D7 + 4 (Fragment followed by LDA #$00 / STA $xx)
Ignoring $1BB6 = $5DC3 + 5 (Read from $DC0D to clear CIA interrupts, surrounded by instruction fragments.)
Ignoring $1C63 = $45BD + 3 (fragment + SEI + JSR fragment)
Ignoring $1CF2 = $458E + 4 (JSR $FDA3 (SCAN KEYBOARD) , JSR somewhere else)
Ignoring $1CFE = $459A + 4 (Clear C, jump into BASIC ROM to start BASIC.)
Ignoring $1D0C = $514A + 4 (End of loop fragment (branch backwards based on X, then return when done).)
Ignoring $1D10 = $4A73 + 5 (CBM80 cartridge signature)
Ignoring $1D55 = $080E + 3 (instruction fragments. not copyrightable.)
Ignoring $1D57 = $05DE + 4 (fragments of instructions. Not copyrightable.)
Ignoring $1D58 = $4573 + 3 (part of two instructions $02 STA $xx00,Y)
Ignoring $1D8A = $4EB8 + 3 (Load Y from ZP. Clear carry flag. Simple register manipulations.)
Ignoring $1DE4 = $5DB2 + 4 (STA $DC04 / LDA #$xx sequence)
Ignoring $1DEE = $5DB2 + 4 (STA $DC04 / LDA #$xx sequence)
Ignoring $1E2B = $5168 + 5 (Do something to $0284 followed by STX $0283)
Ignoring $1E2C = $4FD8 + 4 (something followed by STX $0283)
Ignoring $1E31 = $4FFD + 3 (tail end of access to $0284, followed by RTS)
Ignoring $1E36 = $5150 + 6 (Load X and Y from pointer at $0281)
Ignoring $1E40 = $5154 + 3 (tail end of access to $0282, followed by RTS)
Ignoring $1E41 = $516C + 3 (Fragment, RTS, Disable interrupts. Fragment of end and start of routines.)
Ignoring $1E47 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $1E48 = $45B9 + 4 (Preserve registers on stack)
Ignoring $1E49 = $0A4B + 4 (Push Y and A onto the stack. Load A with something.)
Ignoring $1E4C = $5DAB + 4 (Put $7F -> $xx0D)
Ignoring $1E69 = $458E + 4 (JSR $FDA3 (SCAN KEYBOARD) , JSR somewhere else)
Ignoring $1E7C = $50D5 + 3 (Do something with $DD00 and compare it with some value.)
Ignoring $1E83 = $50DC + 3 (Do something with $DD00, then read something from memory. Only instruction fragments.)
Ignoring $1EBC = $0A5F + 3 (PLA / TAY / PLA)
Ignoring $1EBD = $4762 + 3 (PLA / TAY / PLA / TAX sequence)
Ignoring $1EBE = $4A83 + 4 (PLA / TAX / PLA / RTI)
Ignoring $1F48 = $0907 + 3 (PHA / TXA / PHA)
Ignoring $1F49 = $45B9 + 4 (Preserve registers on stack)
Ignoring $1F4A = $0A4B + 3 (PHA / TYA / PHA )
Ignoring $1F79 = $5DC0 + 4 (Do something with $11, then write to $DC0E)
Ignoring $1F84 = $5F84 + 4 (Jump to IOINIT routine. Not copyrightable.)
Ignoring $1F9F = $5F9F + 4 (Jump to keyboard scan routine ($EA87) + instruction fragment.)
Ignoring $1FA0 = $4A35 + 3 (Fragments of instructions. Not copyrightable.)


Then within each of those files, more detailed information can be found, often with references, for example:

Preserve registers on stack

This is the standard form of saving the A X and Y registers on the stack

PHA
TXA
PHA
TYA
PHA

See, for example:

http://6502.org/tutorials/register_preservation.html

Not copyrightable.

We see an explanation as to why this is just boiler plate, and then a reference to a 3rd party source that indicates that this is common practice, and therefore cannot be the proprietary property of the rights holders of the C64 ROMs.

These are the defences we have right now, but we are also planning others:

Comparison with non-Commodore 6502 Microsoft BASIC

The original C64 BASIC was actually derived from a BASIC interpretor written by Microsoft and licensed by Commodore.  This means that Commodore and its successors do not own the copyright in those parts that are Microsoft BASIC. We can easily test this, by searching for matching strings also in "negative libraries" of files that were not written by Commodore.

Automatic internet searching for byte sequences to find other instances

We can also generalise this approach by implementing automatic internet searches, to find 3rd party instances of matching byte sequences, again as evidence that the matches are not the result of infringement of the C64 ROM's rights owner's copyrights.

Can you think of any other techniques that we can apply to add even more defence-through-depth?


Tuesday, 7 May 2019

Free and Open-Source Replacement ROMs for the C64



While this blog is usually about things for the MEGA65, this post is actually about something for stock standard C64s, and more the point, for emulators, and all re-creations: Free and open-source replacement ROMs, that can be used, modified and distributed by the general public, so that, for example, emulators can ship with fully legal ROMs, without having to be troubled by costs or legal complexities in terms of licensing.

But first, let's step back a bit, and look at the current situation.

The Commodore 64 as we all know uses three ROM parts: The KERNAL, BASIC and the character ROM.  These are all different sizes, but together make up the 20KB of total ROM that a C64 needs to operate.  Some of you will at this point be saying to yourselves, "no, the KERNAL and BASIC ROMs are the same size".  This is actually only a generalisation, because the KERNAL is actually only about 6.5KiB, and BASIC is about 9.5KiB, and uses the bottom 1.5KiB or so of the "KERNAL" ROM.

Anyway, this means that there are these three parts that have to be replaced in order to make a C64 or compatible computer come to life.

The character ROM I have already talked about. Basically it is highly doubtful that a copyright infringement suit could be bought against a user of the font. For a start, in countries like the USA, it simply isn't possible to copyright a bitmap font.  Then given the 8x8 size, there aren't many options for implementing most of the symbols, specially the line and block ones.  Add to that that the symbols have now been added to Unicode, and the long-standing lack of enforcement against distribution of any C64 ROMs, and it really looks like the character ROM isn't a big drama.  Of course, we have also effectively solved this problem by making our own complete char ROM based on a combination of hand-drawn symbols and hand-touched characters from the public domain VGA 8x8 font.  It isn't perfect, but it works.   So we have the 4KB character ROM already under control.

Now, the KERNAL and BASIC are much more interesting beasts.  The KERNAL implements the screen editor, keyboard scanning logic and IEC serial communications protocol, along with a few other bits and pieces.  Then BASIC uses the KERNAL's APIs to provide the familiar BASIC interpretor, which itself has quite a lot of complexity, with the line tokeniser and de-tokeniser, expression parser, variable management, commands, functions and operators.

Also, to have even a minimally working system, that would let you load and run a game or other program that was written in assembly language, you still need the BASIC tokeniser, LOAD, RUN and SYS commands at a bare minimum, with LIST also being practically essential, so that you can actually see what is on a disk.

Then, like the character ROM, we have the problem of how to create new ROMs that are non-infringing on the intellectual property rights of the rights-holders of the C64 ROMs.  This requires considerable care and thought.

The gold-standard for such endeavours is to have one team produce detailed specifications of the software being recreated, and another team implementing it.  Fortunately, with books like Compute's Mapping the 64, we actually have the specification effectively written for us back in the 80s.

This means that we can potentially implement the KERNAL and BASIC ROM functionality using such resources as a guide, and here is the important part, without looking at the C64's ROMs while writing them.

There is a residual risk that because the C64 ROMs are everywhere, and anyone likely to be inclined to write their own ROMs will have been exposed to them, it is very hard to enforce a true "clean room" reimplementation.  However, I think that it is still possible, provided that sufficient care is taken.

Basically the challenge is to have a development process that is transparent and makes it unambiguously obvious to any observer, that no infringement is being made of the original ROMs, and that all code being written is being freshly produced.  Here in many ways our audience is the rights holders to the original ROMs -- we want to make their job of assessing whether we are infringing their rights or not super easy.  We don't want anyone having to waste time and effort on lawyers that will only make everyone poor and sad. Thus it makes sense to take an approach that integrates an "abundance of caution" at every stage, so that all mess can be avoided.  This will hopefully also be clear from the outset, since the whole point of this project is to respect the intellectual property rights of the copyright owners of the C64 ROMs. That is, if we didn't care about their rights, we would just use the original C64 ROMs that are available for free download all over the internet like everyone else.

So, back to planning a process, here is the general process that we have come up with:
  1. Begin with the immutable starting point of the 6502 reset entries, IRQ entry and NMI entries, and the rest of the ROM being empty.  This starting point can have no copyright problems.
  2. Based on the public calling interface of the C64 KERNAL as documented in the C64 Programmer's Reference Guide, make stub routines for the jump table.
  3. All routines begin at the lowest address in the KERNAL, sorted by routine name.  Thus the order of the routines is deterministic, and not the result of any creative process.
  4. Implement publicly documented routines, using secondary sources, such as books about the C64, but without refering to the 64 ROM contents themselves. 
  5. Run test programs using the C64 KERNAL, and collect entry points into the ROMs.
  6. Where an entry point does not correspond to a public API of the KERNAL, research the function by searching for it in Google. Implement it according to those references.
  7. Where an entry point means that previously implemented routines have to be moved to make space at a specific address, move only those routines required to do so, to the next available address.
  8. Where understanding of the inner workings of a routine are required to replicate it, secondary sources, such as the "Mapping the C64" or "C64 Programmer's Reference Guide" should be used. When those do not provide the answer, internet searches based on the name of the routine should be done, and failing that, based on the routine's address if it has no well known name or insufficient material is turned up.  Reference to actual disassemblies of the ROMs is not to be made, to ensure that we have strong defences against any claim of copyright infringement.
A similar process should be followed for the BASIC ROM.

To help with this, I have created a framework that allows a ROM to be compiled from a collection of assembly files, which get linked together to produce the final ROM. This helps to compartmentalise the work, and with careful design of the framework, makes it very easy to move routines around and assign them fixed locations as the research of the secondary sources and the entry points are discovered from running programs and tracing their entry into the ROMs.

This framework turned out to be quite simple. I used the Ophis assembler, as I am already quite familiar with it, and it has a handy pair of pragmas that make it quite convenient to fix the location of a routine, .checkpc and .advance.  These can be used together to make sure that a routine will be located at an exact address, and will complain if there isn't enough space.  To help pack the routines into the free space around the routines, the framework implements a greedy packing algorithm that places the largest un-placed routine into each free space, until the free space is full.  There is room to improve this, for example by placing exactly the right sized routines into spaces, but that can wait until necessitated by the ROM filling up as we implement the last few features at the end.

The adage of "commit early and commit often" is especially true for this project, because we want the source control history to be strong evidence that we have developed each routine ourselves from scratch, and not copied from the C64 ROMs.  Thus commits when things are half-working and half-baked are especially important, as they document this implementation process.

We are also purposely using quite different algorithms and methods for some key parts of the system, so that there is even stronger evidence against infringement. So for example, the BASIC keyword list and tokeniser are implemented using a simple compression scheme for the BASIC keywords.  This not only saves a bit of space, it also means that the BASIC keyword list is not present in the ROM in the same format as the original (even though as a list of facts, it is not copyrightable), and the algorithm for searching for keywords in the compressed list is by necessity an entirely new work: There would be no point in deriving it from the C64 ROM's tokeniser.

Similarly, the keyboard scanner in the KERNAL is based on a publicly documented improved keyboard scanner, that supports multi-key roll-over and rejection of spurious joystick input. In this way, once again, we end up with a routine that has a demonstrably independent ancestory, and offers some nice improvements. We even expanded it slightly, so that the joystick can be used to move the cursor.

For the BASIC interpreter, we also decided to implement banking support from the outset, so that more than 38KiB would be available for BASIC.  The KERNAL LOAD routine was also improved to support loading files bigger than 202 blocks, without writing over the IO area.  Just like the improved keyboard scanner, the result is clearly a new and fresh implementation, and one that brings advantages along with it.

That is, our goal is not to create a 100% identical C64 ROM set, but rather a highly compatible and pleasant to use set of alternate ROMs for C64-compatible computers, and that are free for inclusion in emulators, FPGA-based computers and other projects that would like a C64-style environment, without the legal hazards that come from using the C64's own original ROMs.

So where are we up to?

Well, we have been sneakily working on this in the background for a few weeks now, as we wanted to hold-off until the project had clearly advanced to a point that proved its feasibility, and provided some minimal level of utility. As hinted at above, our idea of minimum utility is the ability to LOAD and RUN assembly-language based software in a manner that feels totally familiar and functional.

And this we have achieved. There are lots of things still missing, like expression parsing and almost all BASIC commands, and a surprising number of bits and pieces in both BASIC and the KERNAL that are not required by a reasonable range of software.  Also, things like RS232 and cassette support are very low on our priority list, as any real C64 has its original ROMs, and any emulator or FPGA-based C64-compatible computer worth its salt will have some kind of bulk storage on hand.

But this is perhaps best explained visually.  The following videos and images show the current progress we have achieved, and shows a number of old and new software titles that can already run using our ROMs.  Also, as a reminder, this is all running on a stock C64 (well, in VICE's C64 emulator). It does not need the MEGA65 in any way (although of course being able to include the ROMs in the MEGA65 is one of the many reasons for creating them).






The source code is at https://github.com/MEGA65/open-roms.

If you want to try the ROMs out yourself in your favourite emulator, you can get the files from here.

In many ways the hardest work is already done, to get this project off the ground, and get minimally functioning KERNAL and BASIC interpreter.  However, there is still much to do and much to be implemented.  We are thus looking for contributors who would be willing to help us implement the missing functionality and improve compatibility.

The next post in this series is here - reducing the attack surface for legal attacks.