Sunday, 20 June 2021

Working on floppy writing

Tonight I am trying to make some progress again on writing to floppies in the MEGA65.  Reading has been more or less working for a long time now, but writing has been stuck on the TODO list.  I'm now working to fix this.

I have already made a bits_to_mfm.vhdl file that takes a byte and clock byte and from those produces the combined 16 MFM bits that should be written, and writes them out.  That module has been tested under simulation, and produces valid bits.

I also pulled that into the development branch of the MEGA65 source, and have bit bitstreams that include it, and in theory, the ability to command the floppy controller to do unformatted track writes for formatting disks.  Once I have that working, I'll update the code for writing sectors to do buffered writes at the correct place on the track where the sector should be written. But before we get to that point, let's review how writing to a floppy disk works, and then look at how this is done on a C65, before returning to how I am implementing and testing it on the MEGA65.

How floppies store data

Floppy disks are a form of magnetic media. For our purposes, the important implication of this is that data is stored by writing magnetic field orientation reversals along a track.  How those reversals are interpreted depends on the "format" and "encoding" of the disk. Two common challenges for all such encodings are: (1) the disk doesn't spin at a constant or well-callibrated rate, which means that the encoding must be self-synchronising, that is the data clock must be retrievable from the data stream itself; and (2) the magnetic field inversions must not occur more frequently than what we shall call the "magnetic resolution" of the disk and drive.

Early disk encodings were pretty horribly inefficient. For example, Frequency Modulation was an early method that basically wrote clock bits between which the data bits were indicated by whether a magnetic field inversion occurred at the beginning of the bit time frame or in the middle of it.

FM was great in that it ensured that every data bit was cocooned by clock bits on either side of it, thus ensuring the self-synchronising property.  However, by inserting those clock bits -- and thus the extra magnetic field reversals -- it means that effectively two field inversions are required for every data bit written, thus causing problems with exceeding the "magnetic resolution" of disks. For this reason, FM data was written at half of the "magnetic resolution" of disks, so that this would not be a problem.

To improve on this situation, Modified Frequency Modulation (MFM) was created that retained the positive properties of FM, but reduced the average rate at which magnetic field inversions were required to only 75% of that required for FM, thus allowing data to be encoded at a higher data rate for a given "magnetic resolution", with each data bit taking up only 1.5 "magnetic bits" of space on average on disk, down from FM's 2. A nice improvement.

Interestingly, the 1541 and 1571 used a substantially more advanced method, Group Code Recording (GCR), which encoded groups of 4 bits using only "5 magnetic bits", i.e., requiring only 1.2 "magnetic bits" per data bit, and without MFM's problem that particular data sequences take up more space than others.  This is one of the reasons why the 1541 and 1571 were able to pack more data onto a Double Density 5.25" disk than PCs could at the time, even though these drives used only 35 tracks vs PCs using 40 tracks. 

(Another of the reasons was varying the data rate based on the length of the track, so that the data rate more closely tracked the "magnetic resolution" of the disk, which spins at a constant rate, and thus has a higher apparent "magnetic resolution" on the outer tracks, because they are longer, and thus more material passes under the head per unit time).

The 1581, and by implication, the C65's internal 3.5" drive don't use GCR, though. Commodore opted for MFM, presumably because it allowed the use of cheap off-the-shelf floppy controller chips. 

An interesting side note is that the Amiga fit more its disks (880KB), not because it used GCR, but because it treated each track as one huge sector, and thus avoided inter-sector gaps (which we will meet again soon), thus allowing increased capacity, at the cost that writing had to be done track at a time, rather than being able to update individual sectors.  Had the Amiga also used GCR, the capacity would have been further increased (and some software did such things). If the 1541's GCR scheme were used on the Amiga, then we would have seen the standard disk format there holding 880KB x 1.5 magnetic intervals per bit / 1.2 magnetic intervals per bit = 880KB x 1.25 = 1,100KB, assuming I haven't messed up the maths, or otherwise made an error of fact in the above discussion.

How the C65 Writes to Floppies

So lets look now at how the C65 writes to floppies. The best way to examine this, is to look at the relevant section of the C65 Specifications, which has the following to say about formatting tracks (which is what we care about for now) in section 2.5.3:


Track Writes

     Full-track  writes  can  be done,  either buffered or unbuffered,
however,  the CLOCK pattern register has no buffer, and writes to this
register must be done "one on one".

     Write track Buffered

           issue "clear buffer" command
           write FF hex to clock register
           issue "write track buffered" command
           write FF hex to data register
           wait for first DRQ flag
           write A1 hex to data register
           write FB hex to clock register
           wait for next DRQ flag
           write A1 hex to data register
           wait for next DRQ flag
           write A1 hex to data register
           wait for next DRQ flag
           write FF hex to clock register
           write your first data byte to the data register
             you may now use fully buffered operation.

     Write Track Unbuffered

           write FF hex to clock register
           issue "write track unbuffered" command
           write FF hex to data register
           wait for first DRQ flag
           write A1 hex to data register
           write FB hex to clock register
           wait for next DRQ flag
           write A1 hex to data register
           wait for next DRQ flag
           write A1 hex to data register
           wait for next DRQ flag
           write FF hex to clock register
     loop: write data byte to the data register
           check BUSY flag for completion
           wait for next DRQ flag
           go to loop


Formatting a track

     In order to be able to read or write sectored data on a diskette,
the diskette MUST be properly formatted. If, for any reason, marks are
missing  or  have  improper  clocks,  track,  sector,  side, or length
information are incorrect,  or the CRC bytes are in error, any attempt
to  perform  a  sectored read or write operation will terminate with a
RNF error.

     Formatting  a  track  is  simply  writing a track with a strictly
specified  series  of  bytes.  A  given  track must be divided into an
integer number of sectors,  which are 128,  256,  512,  or  1024 bytes
long.  Each  sector  must  consist  of  the following information. All
clocks, are FF hex, where not specified.  Data and clock values are in
hexadecimal  notation.  Fill  any left-over bytes in the track with 4E
data.

  quan      data/clock      description
  ----      ----------      -----------
    12      00              gap 3*
    3       A1/FB           Marks
            FE              Header mark
            (track)         Track number
            (side)          Side number
            (sector)        Sector number
            (length)        Sector Length (0=128,1=256,2=512,3=1024)

    2       (crc)           CRC bytes
    23      4E              gap 2
    12      00              gap 2
    3       A1/FB           Marks
            FB              Data mark
    128,
    256,
    512, or
    1024    00              Data bytes (consistent with length)
    2       (crc)           CRC bytes
    24      4E              gap 3*

    * you may reduce the size of gap 3 to increase diskette capacity,
      however the sizes shown are suggested.


So we can see that we just command the controller to format a track, and then wait for the controller to start asking for bytes, and writing them, together with a clock byte, and it writes them out onto the disk.  The clock byte thing is related to what we talked about above about FM using clock bits to provide the self-synchronisation. MFM does something similar, but is able to skip them in certain situations.  

The clock byte allows masking of the clock bits after each data bit.  Thus using a clock byte of $FF means that normal data will be written.  If a different value is used, then some clock bits will be missing, which would normally cause problems.  But MFM disk formatting normally uses a special data byte written with a different clock as a synchronisation marker, to help know where to start decoding data.  The convention is to use data byte $A1 written with clock byte $FB, i.e., the byte $A1 written with one missing clock bit.  This combination results in an on-disk sequence of magnetic field inversions that can never happen as part of normal data, thus allowing it to safely provide the synchronisation function.

MEGA65 Floppy Controller 

This is all handled by the F011 floppy controller on the C65. On the MEGA65, it is part of the MEGA65's enhanced F011 functionality that lives in the SD card controller (so that the SD card can be used to emulate floppy disks for the MEGA65). 

The F011 controller has, among others, a command register, a data register and a clock register.  Various commands such as read a sector, write a sector or format a track can be issued to the command register. In the case of the "format a track" command, we implement behaviour almost identical to that of the C65, i.e., we simply start writing magnetic field inversions to the floppy, based on clock and data bytes provided by the respective registers.

The lowest later of this is done in the bits_to_mfm.vhdl file, which basically takes in a data byte and clock byte, and writes out valid magnetic field inversion signals to the floppy drive based on those.  That module has already been tested.

Current work

The current coal face of work lies at tying this into the overall controller, and the connection to the real floppy drive interface hardware.  I have connected the logic together, and started writing a test program, https://github.com/MEGA65/mega65-tools/blob/master/src/tests/floppytest.c, that attempts to write to a track, and gives me various bits of debug output to see what is going on.

To properly test this, I also need to see what is happening on the floppy drive's WGATE line, which is the "write gate", i.e, "write enable" line, and WDATA, which is the "write data" line.  Whenever the WDATA line is toggled from high to low, or low to high, it causes a magnetic field inversion to be written to the disk.

To monitor those lines, I need access to pins 22 (WDATA) and 24 (WGATE) on the floppy cable interface. The easiest way for me to access those was to put an old dual floppy drive cable in my MEGA65, so that I have a spare connector on the cable that I can tap into with probes:

Using this method of probing, I can see, for example, the index hole pulses on the SYNC line, occurring every 200ms or so (300rpm = 5 revolutions per second = 200ms per revolution):

Looking at the WDATA line, I saw that our pulses are only ~50ns wide, though, which is probably too narrow:



According to Figure 7 of this datasheet, the pulses should instead be half the width of each "magnetic bit" on disk.  So I will modify this in the VHDL, and resynthesise, and revisit in the morning.

Well, its now tomorrow morning, well tomorrow evening, really.  The bitstream has synthesised, so I can see if I have fixed the pulse-widths:

 


Much better :) The spiking at the edges of the clock pulses is just because of the very fast edges, relatively high voltage (5V) and long oscilloscope probes. This should not be problematic for actual operation, however.

So now I need to see if the WGATE line is getting pulsed low when it starts writing... And, yes, after attaching a 2nd probe so that I can watch both WDATA (top) and WGATE (bottom) at the same time, we can see that the idle pulse train changes once the WGATE line is pulled low:

So, this is all looking good right now.  Probably the next step is for me to improve the floppytest.c programme, to actually write an entire track with 10 sectors worth of data, and see if it is readable back as real sectors. I'll be pleasantly surprised if I get that right first time. Preparing for the likely situation that I won't get it right, I'll start having a think about how I can efficiently and reliably read a full track of data from the drive, so that I can see exactly what I have written.

But first, lets go through the process of writing the track, which consists of:

1. Write 12 gap bytes (data=$00, clock=$FF)

Then for each of the 10 sectors on the track:

2. Write three mark (sync) bytes (data=$A1, clock=$FB)

3. Write Header marker (data=$FE, clock=$FF)

4. Write Track number (data=<track number>, clock=$FF)

5. Write Side number (data=<side number>, clock=$FF)

6. Write sector number (data=<sector number>, clock=$FF)

7. Write sector length (data=$02 (meaning 512 bytes), clock=$FF)

8. Write header CRC byte 0 (data=<crc byte>, clock=$FF)

9. Write header CRC byte 1 (data=<crc byte>, clock=$FF)

10. Write 23 gap bytes (data=$4E, clock=$FF)

11. Write 12 gap bytes (data=$00, clock=$FF)

12. Write three mark bytes (data=$A1, clock=$FB)

13. Write Data Marker (data=$FB, clock=$FF)

14. Write 512 data bytes (data=$00, clock=$FF)

15. Write data CRC byte 0 (data=<crc byte>, clock=$FF)

16. Write data CRC byte 1 (data=<crc byte>, clock=$FF)

17. Write 24 gap bytes (data=$4E, clock=$FF)

After which, we have written the whole track.

But before we can do this, we have to work out how to correctly calculate the CRC values.  

Well, actually, the CRC value can wait just a little while, while I first confirm if any data is being written to the track.  This leads us down the path of working out how to easily read a track of raw data from the MEGA65.  Presently we don't have an automated way to do this, unlike on the Amiga that natively handles floppy data at the raw magnetic level.  What we do have is a register that provides us with the quantised MFM gap size data, i.e., whether the MFM decoder thinks it has seen a short, medium or long gap between magnetic flux inversions. That is sufficient to decode an MFM-formatted track, as is the case with what we need.

However, this is just a bare unbuffered register, which means that we need to have a tight loop that samples it, and writes the successive values out into memory.  As short gaps can occur at a rate of up to 500KHz, this means the loop must take less than 80 cycles per value read.  I tried doing this in C, but CC65 generates code that is a bit too big and slow, so I will need to make an assembly routine to do this. It won't be complicated, however. Basically we have to check if we have reached the end of the track (negative edge of index sensor, in bit 7 of $D6A0), and save every different value that appears on $D6AC, which contains the quantised gap information, as well as a counter so we know if we have missed any gaps, and whether the MFM decoder has detected a SYNC byte.

So something like this should do the trick:

_readtrackgaps:   

    SEI
    
    ;; Backup some ZP space for our 32-bit pointer
    LDX #$00
save:   
    LDA $FA,X
    STA temp,X
    INX
    CPX #4
    BNE save   
    
    ;; Initialise 32-bit ZP pointer to bank 5 ($50000)
    LDA #$00
    STA $FA
    STA $FB
    STA $FD
    LDA #$05
    STA $FC
    
waitforindexhigh:
    LDX $D6A0
    BPL waitforindexhigh
waitforfirstindexedge:
    LDX $D6A0
    BMI waitforfirstindexedge
    ;; Pre-load value of $D6AC so we can wait for it to change
    LDA $D6AC
    STA $FF
loop:
waitfornextfluxevent:
    LDA $D6AC
    CMP $FF
    BEQ waitfornextfluxevent
    INC $D020
    ;; Store byte in bank 5   
    ;; STA [$FA],Z
    .byte $EA,$92,$FA    ; STA [$FA],Z to save value
    STA $FF            ; Update comparison value

    ;; Show some activity while doing it
    .byte $9C,$10,$C0     ; STZ $C010
    LDY $FB
    STA $C012

    ;; Are we done yet?
    ;; INZ
    .byte $1B          ; INZ
    BNE loop
    INC $FB
    LDY $FB
    BNE loop

done:
    ;;  Done reading track or we already filled 64KB
    
    LDX #$00
restore:   
    LDA temp,X
    STA $FA,X
    INX
    CPX #4
    BNE restore
    
    CLI
    RTS


temp:    .byte 0,0,0,0

That should be less than 50 cycles per loop, so is probably even just about fast enough to also handle HD disks, but that's not our concern right now.  

With that routine in place, I can now read raw data from the disk, which looks like this:

00000000: e7 eb e8 ec f0 f6 f8 fc 00 04 08 0c 10 14 18 1c   
00000010: 20 24 28 2c 30 34 38 3c 40 44 48 4c 50 54 58 5c     00000020: 60 64 68 6c 70 74 78 7c 80 84 88 8c 90 94 98 9c   
00000030: a0 a4 a8 ac b0 b4 b8 bc c0 c4 c8 cc d0 d4 d8 dc  
00000040: e0 e4 e8 ec f0 f4 f8 fc 00 04 08 0c 10 14 18 1c   
00000050: 20 24 28 2c 30 34 38 3c 40 44 48 4c 50 54 58 5c     00000060: 60 64 68 6d 72 75 76 7a 7d 80 86 89 8e 91 94 9a   
00000070: 9d a2 a5 a8 ac b0 b4 b8 bc c3 c7 c4 c8 cc d0 d7   
00000080: db df e3 e4 e8 ef f3 f4 f8 ff 03 00 04 08 0f 13   
00000090: 14 18 1f 23 24 28 2f 33 34 38 3f 43 44 48 4f 53   
000000a0: 54 58 5f 63 64 68 6f 73 74 78 7f 83 84 88 8f 93   
000000b0: 94 98 9f a3 a4 a8 af b3 b4 b8 bf c3 c4 c8 cb cf    

Only the bottom 2 bits of each byte are MFM data, the remaining six bits are the counter that is used to tell when the next set of bottom 2 bits are available for reading.  In the process of doing this, I realised that the SYNC byte detection reporting in this register was broken, and am synthesising a fix for that.  However, in the meantime, it should be possible to analyse the data that we have, to see what we have, and if it looks like data is being written properly or not.

With 64KB of data, we should around 64Kbit of data, i.e., around 8KB, or about 3/4 of a complete track.  When it run, it seemed to take longer than the 200ms that a single rotation of the floppy would imply, so I want to try to figure out what is going on there, and in particular, if any gaps are being missed, which I can check with the upper bits that contain the counter.

I already have a utility for decoding raw MFM streams into bytes and sync values in mega65-tools:src/tests/mfm-decode.c, however it doesn't expect pre-quantised gaps.  However, it shouldn't be too hard to modify it to support this.  

In the process of doing that, I realised that the gap size sometimes gets updated between gap count updates which was resulting in somewhat messed up data.  I'm synthesising a fix to the VHDL for that problem now, but will continue trying to get the MFM decoder utility working on the data in the meantime.

It turns out that this problem with changing values is noticed for over 3% of all values.  This is probably enough to completely mess up the captured data in terms of decoding it.  So time to wait for the synthesis to run, or do a really low-level flux capture via $D6A0, which allows reading of the raw floppy RDATA line.  That involves reading that line via DMA, thus resulting in 20,250,000 samples per second, or about 65536/20250000 = ~1.6% of a track's worth of data.  (This is why I made the $D6AC capture method to begin with).

Although only 1.6% of a track is not much, its probably still helpful for revealing what we are doing with the writing, and what particular error modes we are inducing in the written data, if any, since they are likely to be at the bit and byte level.  Thinking more about it, it should also be possible to switch the CPU to 1MHz mode or 2MHz mode, which also slows down DMA jobs, to capture more of a track. 

In any case, I need to try something, as the $D6AC capture method seems to be quite unreliable, and reading even a valid track is resulting in gibberish -- or alternatively, the quantised gaps have a different meaning than expected. That said, it seems to detect the SYNC bytes without great trouble, which means that the 1.5x and 2.0x gap lengths must be okay.

Sampling $D6A0 at 1MHz or even 3.5MHz seems to also be problematic, because the pulses on the read line of the floppy drive seem to be very narrow, with most resolving to only a single sample at 3.5MHz.  Reading the data sheet for a TEAC 3.5" floppy drive, apparently the RDATA line can go low for as little as 0.15 microseconds, which is much too narrow to be reliably detected at 3.5MHz when doing a DMA copy, which results in a 3.5MHz / 2 = ~1.75MHz = ~0.57 usec effective minimum pulse width, given that it takes one read and one write cycle for each byte copied. So back to the drawing board yet again.

In short, any adequate sample rate will result in an inacceptably short capture length. So back to using $D6AC.  That register does have a sister register that lets us get the unquantised length, so I might modify the routine I made to read the actual length, instead of the quantised length, and see if that produces any more sensible data.

After a bit of fiddling, I am now confident that I am reading the flux transitions correctly, and without any gaps, as $D699/$D69A also includes a counter in the upper 4 bits, which I check on the captured data for any skipped values. There are no skipped values, so we can be pretty sure that we are collecting all of the flux transitions, and getting the number of 40.5MHz clock cycles between them.

Feeding this into my mfm-decode.c programme, which I have now enhanced to support capture files in this format, I see the SYNC bytes being correctly parsed out, e.g.:


  ...
  gap=2.0 (325)
  gap=1.5 (245)
  gap=2.0 (327)
  gap=1.5 (242)
Sync $A1
  gap=1.0 (165)
  gap=2.0 (318)
  gap=1.5 (244)
  gap=2.0 (322)
  gap=1.5 (240)
Sync $A1
  gap=1.0 (167)
  gap=2.0 (327)
  gap=1.5 (244)
  gap=2.0 (324)
  gap=1.5 (245)
Sync $A1
  gap=1.0 (163)
  gap=1.0 (161)
  gap=1.0 (159)
  gap=1.0 (157)
  gap=1.0 (162)
  ...

However, I am not getting particularly sensible byte data. I am seeing some short runs of bytes between SYNC bytes, and then longer slabs of bytes, which looks like the general structure of sector headers and sector bodies. However, the number of bytes being output are quite wrong, e.g., 1,616 bytes for a sector that should have 512 bytes.  To avoid it being my badly formatted data that is the problem here, I am reading track 38 on a disk that was formatted in another machine.

One part of the mystery that worries me is the gaps of length 1.0 in the SYNC bytes. They should just be 2.0, 1.5, 2.0, 1.5 for each SYNC byte, with no extra flux transition in between, if I understand things correctly.

But the bigger concern for now, is that we are seeing reported gaps between magnetic flux inversions that are _very_ long, e.g.:

Sync $A1
  gap=1.0 (160)
  gap=1.0 (157)
  gap=1.0 (159)
  gap=1.0 (167)
  gap=2.1 (348)
  gap=0.9 (140)
  gap=25.3 (4095)

This gap of 4095 cycles should never happen. In fact, 4095 is the largest it can report, so it is possible that it is in fact reporting 65535 cycles or longer.  

The only cause I can think of here, is that the head was still stepping between tracks, or the WGATE was still open or something like that. But even that doesn't explain why it detected the SYNC bytes immediately before hand. The head certainly isn't stepping, because I trigger the track read on a key press in a busy loop. So several seconds have typically elapsed since the last head track.  And if the WGATE was still open, then we wouldn't read anything, and thus we shouldn't expect to see the SYNC bytes -- but we do see them, and the correct number of them.

Also adding to the mystery, the timing of the flux inversions for the SYNC bytes are quite accurate, to within 10% or so of the expected time, but the data following those is then all over the shop, in terms of the gap lengths.  But it is also very common to see gap lengths of 3.0x, which is very weird, as we should be seeing 1.0x, 1.5x and 2.0x, as we do in the SYNC bytes.

If I didn't know better, I would suspect that this disk has a messed up track 38.  So to eliminate that possibility, I am running a full disk read test on it using the MEGA65. That is, using the same MFM decoder implementation that is turning up these funny gap lengths.  Maybe it will turn out that the disk is more messed up than I suspect... and it looks like that may well be the case:

 


So maybe I should be trying to read from track 0 for my MFM testing, since that track seems to be good...

Ah, that's much better, and I am indeed seeing correct sector structures:

 ...
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $51 $39 $39 $39 $39
 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39
 $39 $39 $39 $39 $39 $39 $39 $39 $39 $39 $38 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $02
(73 bytes since last sync)
Sync $A1
Sync $A1
Sync $A1
 $fe $00 $00 $01 $02 $ca $6f $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00
(41 bytes since last sync)
Sync $A1
Sync $A1
Sync $A1
 $fb $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 ...
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $da $6e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00
(562 bytes since last sync)

And the capture completes faster, and actually reads several complete sectors, in accordance with expectation.

So after all that chasing my tail, I now have a nice reliable way to read multiple sectors from a track, if not quite the entire track, and then to decode it to see what is actually there.

Now to try to read the track that I have attempted to format, and see what we see there, and where it looks sensible or not. I know the CRC bytes will still be wrong, but we should, in theory at least, see the SYNC bytes and other bytes.

The SYNC bytes are indeed visible, but the rest of the data is rubbish, like this:

Sync $A1
Sync $A1
Sync $A1
 $fd $7a $ab $ae $ba $eb $ae $ba $eb $ae $ba $eb $ae $ba $eb $ae
 $ba $eb $ae $bb
(20 bytes since last sync)
Sync $A1
Sync $A1
Sync $A1
 $f6 $ba $eb $ae $ba $eb $ae $ba $eb $ae $ba $eb $ae $ba $eb $ae
 $ba $eb $b4
(19 bytes since last sync)
Sync $A1

Looking closer, I realised I was setting the MFM clock byte to the data value, and the MFM clock value in the MFM data byte. Naturally this will result in a garbled mess.  After fixing that and a few other little bits and pieces, its now looking _much_ better:

Sync $A1
Sync $A1
Sync $A1
 $fe $27 $00 $01 $02 $00 $00 $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00
(42 bytes since last sync)
Sync $A1
Sync $A1
Sync $A1
 $fb $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 ...
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
(536 bytes since last sync)
Sync $A1
Sync $A1
Sync $A1
 $fe $27 $00 $02 $02 $00 $00 $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00
(42 bytes since last sync)
Sync $A1
Sync $A1
Sync $A1
 $fb $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 ...

In short, we are writing essentially valid data, just with incorrect checksum bytes. Yay!

(There is another minor wrinkle I have to sort out at some point, which is that I need to be bug-compatible with the MFM clock byte latching behaviour of the C65's floppy controller, which basically doesn't latch at all, thus allowing the clock byte to be changed after setting the value of the byte to write.  But I'll worry about that when I have the rest of the formatting system working, and producing valid tracks from my test programme.  The bug compatibility is required so that the C65 DOS ROM routines will work to format disks.  I'll also have to make the format command disable the auto-tracking feature, so that the head doesn't accidentally seek while writing.)

The checksum algorithm for floppy sectors is not too bad, but I do hate writing CRC code, because it is a bit fiddly. Basically to do it properly, you have to make a test harness with a bunch of known correct inputs and outputs, and verify that it is indeed doing what it should. Then the fun comes when it doesn't produce correct output, and you have to work out what minor (or major) detail you have wrong. But that can wait for another day, because I'm going to go to bed happy now, knowing that I can write bytes to the disk.


Wednesday, 16 June 2021

MEGAphone PCB Re-spin

In amongst everything else, we have had an engineer, Goran, working on the MEGAphone PCB design under the NLnet Foundation grant that they awarded us.  


 

This is the fourth milestone from NLnet on this project:

4. Either Complete MEGAphone r3 PCB design or Design simple near ultrasound communications protocol

This is the first of two sub-tasks that will be decided based on the outcome of sub-task 1, the prevailing conditions of the COVID19 crisis, and the progress of other projects working in this area. Essentially, if the MEGAphone is capable of near ultrasound communications, and there continues to be a need for this, and no other project has obsoleted the need for advancing this field, then it is anticipated that the near ultrasound pathway will be pursued, and that otherwise the revision 3 PCB design will be pursued.

Milestone(s)

  • This milestone shall be considered complete as soon as any of the two following options are complete.

  • Complete MEGAphone r3 PCB

  • Transform the schematic and PCB changes necessitated by sub-task 2 into a PCB design that is ready for test-fabrication.

  • The milestone shall be considered complete when the four preceding dotpoints have been completed and their results communicated via the MEGA65 blog (https://c65gs.blogspot.com.au), specifically announcing revised schematics and PCB layouts.

  • Design simple near ultrasound communications protocol

  • Taking into account the capabilities of the low-cost commodity MEMS microphones and readily available speakers that can be included in the MEGAphone, and the centre frequency and communications bandwidth that these entail, design a simple near ultrasound communications protocol.

  • Creating an optimised protocol is beyond the scope of this milestone. Rather, the focus shall remain on creating a protocol with sufficient bandwidth and appropriate properties that would allow its use in an application like COVID19 contact tracing, i.e., a bandwidth of 10 – 10,000 bits per second, and a range of 1 – 10 metres.

  • The primary activity of this sub-task is to outline the physical and link-layers of a potential protocol that would satisfy the requirements outlined above. It is possible that more than one protocol option will be described, which are likely to contain performance/complexity of implementation trade-offs.

  • It is possible that a negative result will occur, i.e., that after careful consideration, it is not believed to be possible, for one or more reasons that may or may not be surmountable.

  • The milestone shall be considered complete when one or more such protocols have been designed and sufficiently described as to be implementable, and communicated via the MEGA65 blog (https://c65gs.blogspot.com.au), or in the negative, that the reasons why such a protocol is not possible at the current time, together with suggestions as to how this might be remedied. (€ 10000)

Reading the above, there were two different options that we were allowed to pursue.  In an earlier post we explored the ultrasonic communications capabilities of the MEGAphone design as part of NLnet's desire to consider non-BlueTooth options for automatic contact tracing.  While it turned out that ultrasonic communications would indeed be possible, we also came to the conclusion that it was not really feasible for that use-case.  QR codes and manual logging of entry into businesses has taken over that role here in Australia at least.  So anyway, we've all had enough of COVID19 I am sure, so no one will be sad that the rest of this post focuses on the more fun option of doing the PCB redesign for fabrication... and that design is now complete, ready for the next iteration of the PCB to get fabricated.

So let's start with the set of desired changes that we had on our list to resolve, and work our way through those:

1 Make assembly easier 

The goal here was to use less variety of resistors etc to make hand assembly simpler.  As the other changes ended up being quite major as you will read below, we came to the unfortunate conclusion that it is still a bit too early to optimise by trying to use the same resistor values in more places.  Thus only some relatively small optimisations were possible at this time.

2 Remove all intentional transmitters from the main PCB

This has been accomplished, with RFD900-compatible connectors in place of the LoRa radios, for example, and WiFi and BT reserved for a future little mezzanine PCB, thus making the main PCB devoid of all intentional transmitters, as planned.  The following 3D model of the underside of the PCB shows how these modules can be fitted:

In this image, from left to right, we can see the blue JTAG adapter for the FPGA board, the green FPGA board, then the two grey RFD900 radio modules, which can instead be other radios (including ones we might design in the future) that use the same fairly common pinout. Then on the right we have the two M.2 bays with cellular modems (light blue) fitted, above which sit the D-SUB 9 joystick port, headphone jack, battery connector and DC barrel power connector.

3 PCB Outline Changes

The PCB outline has been returned to the original rectangle, with no excursions for the big D-SUB connectors.

4 Remove Smart-Card Connector

This has been removed, and made space for a Raspberry Pi Compute Module, as we will discuss further down.

5 Speakers / Headphone Jack / BT Circuit Changes

The old incorrect BT module has been removed from the PCB, and the new WiFi/BT mezzanine connector has the necessary connections for the new BT module to be wired in, when we get to that point.  The speaker and headphone circuits have been moved to accommodate this and other changes on the PCB.

6 MEGAphone Power Management Circuit Ideas

The long-term goal is to have power management capable of charging an LiFePO4 battery from solar panel on the rear of the device, or from a USB or other DC power source.  The existing power supply could not run from a 1S battery without a bulky boost converter.  Fixing all these things to make a solar chargeable device is well beyond the scope of this milestone.  But what we were able to do was to fix the soft power-on and power-off functions, so that the existing power supply design can work properly with a 2S battery configuration.

7 Power Switch changes

The power switches are now all on one side of the device, which is much better and simpler. We also fixed problems with incorrect wiring of one of the power switches, so that these will now all work.

8 NOR OTP Flash + IO Expander U12

The corrections to the interface for the NOR flash for one-time pad storage have been deferred due to all the other changes made.

9 LCD & Panel Touch Connector Relocation

These connectors have been moved to the correct location.

10 Touch-screen 6-pin connector footprint wrong

This has been resolved as part of the movement of the connector relocation above.

11 LoRa Modules

These have been removed and replaced with RFD900-compatible connectors to allow the use of a wide variety of existing and future radio modules, giving better flexibility, and improving the modularity of the design, allowing those modules to be worked on in future, without having to build whole new PCBs.

12 Q15 Fix

This problematic power circuit section has been reworked and corrected.

13 Replace VGA with HDMI-compatible interface

Done. We now have an HDMI-compatible digital video + audio connector, which is fed directly from the FPGA using work that Goran had done for another open-source HDMI-compatible device. 

14 Relocate and verify orientation of ESP8266 Connector

Fixed as part of moving to having the WiFi+BT mezzanine PCB

15 Use Opto-Isolators for improved power isolation

The power isolation is to ensure that when we power down the cellular modem or other untrustworthy peripherals, that there is no back-current leakage allowing such devices to retain some power.  The previous revision of the PCB was horrible in this area, and there is now much better separation of the power domains between the major components, however we expect that there will still be some more work to achieve total power separation.  This new revision of the PCB will allow us to determine where such remaining back-current flows are occurring, now that we will have removed the major known ones.

16 Verify microSD card connectivity

Done -- this is now correctly wired to matching the working reference design.

17 M.2 Expansion Bays

The change required here was to make the UARTs have 1.8V level converters, so that the M.2 cellular modems can have the right voltage for the UARTs, instead of 3.3V which we were using previously.  3.3V seemed to work, but possibly only through luck.  We now have level converters in the UART path to solve this.

18 Real Time Clock

The original plan here was to switch to using the same RTC chip as on the MEGA65 R3 PCB.  However, as we looked into it, there seemed to be little point in doing this, as there are in fact a number of target specific components and registers on the MEGA65 already, and a system for managing this. Thus given all the other major changes, it was felt that achieving commonality for the RTC was thus no longer a priority. 

19 D-SUB9 Joystick Port

This connector is one of the ones that was sticking out from the rectangle outline of the PCB on the previous revision. This has been corrected, and the D-SUB connector moved to accommodate all the other changes.

20 D-Pad

The D-Pad connector is now the correct size.

21 Thumb-wheel knobs

Trim-pots and improved resistor ladders have been put in to replace the problematic fixed resistor ladders that were problematic to get the correct voltage range on the thumbwheels. The new trim-pot arrangement will allow us to determine the exact resistor values required.

22 Indicator LEDs

The indicator LEDs illuminate with uneven brightness, and require changing of the resistor values to reflect the different voltages being fed to them.  This requires only changing the resistor values, not the PCB layout, and thus have been deferred until we optimise the various resistor values.

23 Infrared LED Receiver

The infrared transceiver changes have been deferred due to being reassessed as lower priority than adding provisional support for a Raspberry Pi slave processor.

24 Raspberry Pi Slave Processor

Originally we were not expecting to do much on this front, however using Goran's experience with digital video circuitry, we have managed to make space for the Pi Compute Module, and connected the video output from the CM connector, as well as the UART lines to the FPGA on the MEGAphone. That is, we have the facility on this new PCB revision to test integration with a Pi, that can in principle be used to run Android, and have its video output relayed to the LCD panel in real-time via the FPGA.  This is an exciting development, and helps to address the anticipated objection that people will make about the MEGAphone being "too simple" and "can't run Android apps", by making this possible in a controlled manner.  

The following image shows how the Pi CM4 would sit between the PCB and RFD900 (or other) radio modules:



Summary and Next Steps

So, over all we now have the new PCB revision done, and it addresses the issues that we documented in the previous milestone, and are ready to send the boards off for manufacture.  The few small items we have not addressed have been assessed as being able to be deferred, to make room for higher-priority work, primarily around making it possible to incorporate a Raspberry Pi Compute Module that can be used to run Android, and the shift from VGA to HDMI-compatible video output, to free up sufficient FPGA pins to make this possible.
 


Sunday, 2 May 2021

Debugging the 32-bit virtual-register instructions

The MEGA65 has a function where you can use A,X,Y and Z together as a 32-bit virtual register, so that 32-bit operations can be done much less painfully.

For example, to add two 32-bit values, on a 6502 you need:

CLC
LDA val1+0
ADC val2+0
STA out+0
LDA val1+1
ADC val2+1
STA out+1
LDA val1+2
ADC val2+2
STA out+2
LDA val1+3
ADC val2+3STA out+3

That's a lot of instructions and CPU cycles, and plenty of chance to get copy-paste errors as you do the carry through the various bytes.

What would be nice, is to be able to do:

CLC
LDQ val1
ADQ val2
STQ out

And the MEGA65 makes this possible, by using special prefixes on various instructions. So to do the above, you put the "next instruction is a Q instruction" prefix (two NEG instructions) on the front of the normal version of the instruction, so LDQ val becomes:

NEG
NEG
LDA val


So our whole little 32-bit addition using Q would look like this fully expanded:

CLC
NEG
NEG
LDA val1
NEG
NEG
ADC val2
NEG
NEG
STA out

But you don't need to do this, because most C64 assemblers now support MEGA65's 45GS02 CPU, and will let you just do "ADQ $1234" etc.

So that's all great, except that the instruction implementation on the MEGA65 had some timing closure problems, as it took too long to get the A,X,Y and Z registers, potentially do some 32-bit operation on them with a long carry-chain, and then get the results back to the A,X,Y and Z registers again.

I started hacking away at fixing those problems, which then led to the need for a convenient test harness for verifying that the instructions work correctly.

I ended up writing this using CC65, with a little helper routine in assembly language that tests the instruction.  The helper routine looks like this:

  /* Setup our code snippet:
     SEI
     ; LDQ $0380
     NEG
     NEG
     LDA $0380
     ; Do some Q instruction
     CLC
     NEG
     NEG
     XXX $0384
     ; Store result back
     ; STQ $0388
     NEG
     NEG
     STA $0388
     ; And store back safely as well
     STA $038C
     STX $038D
     STY $038E
     STZ $038F
     CLI
     RTS
   */
unsigned char code_snippet[31]=
  {
   0x78,0x42,0x42,0xAD,0x80,0x03,0x18,0x42,0x42,0x6D,0x84,0x03,0x42,0x42,0x8d,0x88,
   0x03,0x8d,0x8c,0x03,0x8e,0x8d,0x03,0x8c,0x8e,0x03,0x9c,0x8f,0x03,0x60,0x00
  };
#define INSTRUCTION_OFFSET  9                
 

 Then to run a test, we can just mash the right values into $0380-$0387, and check the results in $0388-$038F (or $0384-$0387, if testing an RMW instruction):

  // Run each test
  for(i=0;tests[i].opcode;i++) {
    expected= tests[i].expected;
    // Setup input values
    *(unsigned long*)0x380 = tests[i].val1;
    *(unsigned long*)0x384 = tests[i].val2;
    
    code_buf[INSTRUCTION_OFFSET]=tests[i].opcode;
    __asm__ ( "jsr $0340");
    if (tests[i].rmw) result_q= *(unsigned long*)0x384;
    else result_q= *(unsigned long*)0x388;
    if (result_q!=expected) {
      snprintf(msg,64,"FAIL:#%d:$%02X:%s",
           (int)i,(int)tests[i].opcode,tests[i].instruction);
      print_text(0,line_num++,2,msg);
      snprintf(msg,64,"     Expect=$%08lx, Saw=$%08lx",expected,result_q);
      print_text(0,line_num++,2,msg);
      errors++;
    if (line_num>=23) {
    print_text(0,line_num,8,"TOO MANY ERRORS: Aborting");
    while(1) continue;
      }
    }
  }
  snprintf(msg,64,"%d tests complete, with %d errors.",
       i,errors);
  print_text(0,24,7,msg);

Then the last key part, was to make a simple way to define the tests. I do this using a struct in C, which makes life much easier to add new tests: Just add the appropriate single line to the tests block:

struct test tests[]=
  {
   // ADC - Check carry chain works properly
   {0,0x6d,"ADC",0x12345678,0x00000000,0x12345678},
   {0,0x6d,"ADC",0x12345678,0x00000001,0x12345679},
   {0,0x6d,"ADC",0x12345678,0x00000100,0x12345778},
   {0,0x6d,"ADC",0x12345678,0x00000101,0x12345779},
   {0,0x6d,"ADC",0x12345678,0x000000FF,0x12345777},
   {0,0x6d,"ADC",0x12345678,0x0000FF00,0x12355578},
   {0,0x6d,"ADC",0x12345678,0x0DCBA989,0x20000001},
   // EOR
   {0,0x4d,"EOR",0x12345678,0x12340000,0x00005678},
   {0,0x4d,"EOR",0x12345678,0x00005678,0x12340000},
   // AND
   {0,0x2d,"AND",0x12345678,0x0000FFFF,0x00005678},
   {0,0x2d,"AND",0x12345678,0xFFFF0000,0x12340000},
   // ORA
   {0,0x2d,"AND",0x12340000,0x00005678,0x00000000},
   {0,0x2d,"AND",0x12345600,0x00005678,0x00005600},
   // INC
   {1,0xEE,"INC",0,0x12345678,0x12345679},
   {1,0xEE,"INC",0,0x00000000,0x00000001},
   {1,0xEE,"INC",0,0x00FFFFFF,0x01000000},
   // DEC
   {1,0xCE,"DEC",0,0x12345678,0x12345677},
   {1,0xCE,"DEC",0,0x00000000,0xFFFFFFFF},
   {1,0xCE,"DEC",0,0x00FFFFFF,0x00FFFFFE},
   
   {0,0x00,"END",0,0,0}
  };

This made it all very nice and comfortable to test that the latest bitstream had fixed the known problems with those instructions (more tests for others need to be written still):

And to make sure I wasn't imagining things, I tried it out on an older bitstream that didn't have the corrections in it, and confirmed that it fails horribly, as expected:

So now we can write more tests for the rest of the Q instructions, and make sure that they are all fine.


Saturday, 24 April 2021

Working on a simple ethernet-enabled terminal programme for the MEGA65

There are still some nice C64 BBSs around that can be accessed via the internet.  So the topic came up in conversation about making a PETSCII terminal programme for the MEGA65.  This struck me as a good test for the MEGA65 port of WeeIP I have been grinding away on in the background.

I already had WeeIP to the point where it mostly works, including implementing ARP, DHCP, and DNS facilities, which all seem to work well enough that I can plug the MEGA65 into my Fritzbox router via a pair of ethernet-over-powerline adaptors to reach between my office and router.

The core of a PETSCII terminal programme is really just piping the keyboard input over a TCP/IP socket, and printing out whatever comes back over the socket.  Thus with a little bit of work, I can _sometimes_ get a display like this:

Very exciting and tantalising!

Except that it doesn't always display that, and often displays what we are lovingly calling "Haustiermist" (literally, "pet manure"), in the form of junk characters at the bottom.  This name comes from the cheesy name I came up for the terminal programme, "Haustierbegriff" which is the word-for-word translation of "PET TERM", but purposely using both words in the wrong sense, as a fine example of pseudo "Denglish", because we were talking about Denglish on the MEGA65 Discord server at the time.

So, now I need to investigate the "pet manure" and work out what is going wrong. LGB had previously pointed out this problem with WeeIP, and we both believe it is some faulty piece of packet length handling or similar.  So its time to investigate...

We will start by looking at the most recently received ethernet frame in a connection where the Haustiermist was particularly bad, which will hopefully be the last TCP packet containing the data + manure.  

Digging around in this, I found that the offset to the first byte in the TCP packet was not being calculated properly. It was fixed at 68, rather than adding the length of the Ethernet, IP and TCP header lengths.  Fixing that, we now have increased happiness, with no Haustiermist visible :)

 


This is very nice progress :)

But it doesn't matter if I press the delete key or anything else, nothing seems to happen.  There is also a problem with closing the TCP connection, but that can wait until I get actual interaction working...

I'm currently using the ASCII hardware accelerated keyboard scanner on the MEGA65, i.e., reading $D610 to get each new character. This is nice when you want ASCII, but less ideal for PETSCII, where using the ROM-supplied character input routine is probably better.

Also, it looks like we might still have some problem with the TCP data reception stuff, because when the BBS times out and disconnects, it displays part of the disconnection display, but then shows gibberish.  I'm not really sure of the cause of that yet, and whether it is on my side or something else.  

This is the kind of thing I am seeing:


The junk looks rather repetitive, like some kind of escape code. Or it could just be more "pet poo".

So I installed cgterm and tried that, and that let me connect, register and do everything just fine.  So I know that I can connect to the BBS if all is well... Just not from the MEGA65 yet.

Looking at how it works, the protocol doesn't seem to support many strange escape sequences that would be causing that kind of rubbish on the screen.

What would be nice, is if I could connect to my laptop from the MEGA65, as I could then feed various things and see how it behaves.  However my home router doesn't seem to want to allow connections between wired and wireless clients -- or else there is something else fishy going on.

While I think about potential solutions to that, I might try to work on why sending backspace doesn't work, and try calling the KERNAL to check for keyboard input, to get PETSCII keyboard input working.

In trying to do that, I hit a weird problem where just including conio.h in CC65 would stop the TCP connection from being established.  So I ended up just hammering the C64's keyboard buffer directly, which seems to work, in that it gets the characters.  Here is the little bit of horror:

     // Directly read from C64's keyboard buffer
     if (PEEK(198)) {
       buf[0]=PEEK(631);
       socket_select(s);
       socket_send(buf, 1);
       POKE(0xD020,PEEK(0xD020)+1);
       POKE(198,0);
     }

However the characters still don't actually seem to get received by the BBS.  I'm also seeing the logon banner for the BBS being truncated.  If I remove the above code, it magically works again.  Something VERY weird is going on here... No, actually it just works _sometimes_.  

After several attempts, I did get to the point where the BBS was asking me for my ID:

 

However, no matter what I tried, I could not get it to show any further signs of interaction.

At this point, I am beginning to suspect that my port of WeeIP has some interesting bugs left in it, that I need to fix.  They are very likely my fault, as I did hack things around quite a bit to make it work with the MEGA65's Ethernet controller.

Update: I did manage to get it to accept my ID and start letting me enter my password, but after the first character of the password, it stopped responding:


 

This doesn't really change my view on the most probable cause, which is that the TCP/IP state machine is not correctly handling things, in particular, I suspect that packet loss in one or the other direction is occurring (possibly due to problems in the MEGA65's Ethernet controller, but that's pure speculation right now), and that it never recovers, due to not sending the correct ACK or otherwise.

I thus started digging through to find what is going on.  

First up, I found why I couldn't make it connect to my laptop via the local wifi: There was a bug in weeIP's handling of sending TCP/UDP packets when there is no ARP entry: It marked the packet as sent, even though it couldn't yet be sent.  I fixed this, so that it would retry to send the packet after a short interval, to give the ARP resolution time to run (it was being correctly triggered).  With that done, I can now connect to sockets on my local network. 

That makes debugging MUCH easier from here, as I can just use netcat to listen, and try typing stuff from each end, and seeing what happens.

This revealed that typing on the netcat side always results in output on the MEGA65, although sometimes with delays of several seconds for it to appear, for no reason I can currently figure out.  Maybe some Ethernet frames do get dropped on the MEGA65 side due to CRC errors or something, although I'm pretty sure that is not a problem any more.  

Anyway, that's less of an issue than in the opposite direction, where only sometimes does what is typed get through, and it doesn't back-log, but rather gets lost, if it didn't get through at first. So there is some fairly major bug in weeIP that needs fixing here.

On the plus side, though, once that is fixed, it should work. Also, it means I might be able to try to interact with the BBS by just typing the same key over and over until it does get recognised.

I think part of the problem is that when there is still unacknowledged data, the socket_send() function just replaces the buffer of what needs sending, instead of either: (a) reporting "not yet ready"; or (b) adding new data to the end of the queued data. Either of those would be helpful improvements on the current state of affairs.

It turned out that (a) was the easier option for me to implement. So now if socket_send() realises that there is pending data to be sent, it returns failure. It is up to the caller to then asynchronously retry.  This does get typing text in order to work correctly now, for the most part, and more the point, without losing typed text. Of course, if acknowledging takes too long, and you have over-filled the C64's 10 character keyboard buffer, you will still lose key strokes. But that's easy enough to fix by adding our own intermediate buffer which is much larger.

So... this has made a BIG difference, and I am now basically able to use the BBS to some degree, as we can see here:

However, if the BBS is engaged, then I don't see the engaged message.  I think this is because weeIP doesn't correctly handle the case when there is a FIN or RST flag, but the packet contains data.

This is going to be a little tricky to fix, as we need to simultaneously return a WEEIP_EV_DISCONNECT and WEEIP_EV_DATA.  I might just have to make a new event that covers both, or see if the event codes are bits in a bit field, in which case we can handle it that way.

I also still need to find out why there are patches of rubbish data that are received.  I still don't know if this is some special character codes that I need to support, or whether some graphics characters are shifted when sent, or something like that.

But both of these will have to wait for another day.

I had a little time tonight to investigate, and added support for receiving data with a RST packet, but it still doesn't solve the problem.  I also added a very simple menu to choose from a pre-determined list of BBSs to help testers have some fun, but the latency bug is still there, which makes things a bit annoying.

But still, it led to some fun testing, connecting to some other BBSs:

I think I'll forget about the data on disconnect problem for now, and focus on tracking down the latency bug, so that it becomes much more pleasant and responsive to use.  If nothing else, I don't want people getting the false idea that the MEGA65's Ethernet interface is slow, when it is in reality super fast.

So I have had a little bit of time to think about the latency bug, and begin investigating it. I recall that there was a problem with the Ethernet controller not returning packets immediately, but rather, only releasing a packet when another packet had been received.  That would cause this problem we are seeing, but I need to be sure that that is what is happening.

To check that, I need a simple packet analyser that can run on the MEGA65. WireShark would be nice, but is much too big, so I have started writing "WireKrill" as somthing a bit smaller.  Basically it just displays the first part of each received packet as hex for now:

I'll add some further pretties to it, like being able to correctly decode ICMP PING packets, and display info about the pinger. That way I can then use ping on my laptop to prod the MEGA65, and see if the MEGA65 receives the packets immediately, or delayed by one packet.

Okay, I can now display ICMP ping packets:

That display was caught mid-scroll, thus the funny colours at the bottom etc, but you get the general idea.

I did have a bug where each packet was being received 3 times, which I have figured out how to work around.

But more importantly, having this tool has let me confirm that the Ethernet controller seems to always be one frame late in announcing the reception of frames. This means that the last received frame isn't actually available to the computer until another frame arrives.  This could very easily cause the very laggy behaviour for the BBS interaction.

I was able to confirm that this is the problem, and that by feeding a regular stream of packets to the network that were totally unrelated, I was able to greatly reduce the lag when using the BBS client.

So now I need to work out why the Ethernet controller does this funny delay of one packet before releasing the packets to the CPU-side, or how exactly I am mis-handling the rotation of the ring buffers.  But that will need to wait for another day.

Tuesday, 30 March 2021

Guest Post from Bitshifter: Fixing the Oldest and Nastiest Bug in Commodore BASIC

We again are able to enjoy another guest-post from Bitshifter, as he fixes and enhances the C65 ROMs for use on the MEGA65:

---  

The probably oldest and nastiest bug in Commodore BASIC

A huge part of working on the ROM for the MEGA65, which contains the resident part of the operating system and BASIC interpreter, is Debugging. Not counting the character sets, the source code, written in 45GS02 assembler, has about 30000 lines, separated into the modules kernel, editor, DOS, BASIC and graphics.

The debugging is necessary mainly for two reasons:
1) Programmers, developers, software engineers are humans.
2) A source code, that is not your own, is full of traps, side effects and assumptions, that one is not aware of.
Of course my own code is full of traps and side effects too, but I know them (at least for the next few months).

While working on the ROM, fixing Commodore bugs, optimising code to get free space for extensions and introducing new features (and sometimes new bugs), I write often some hundred lines of assembly code and make changes on existing code. Sometimes these changes seem to work perfectly, some result in crashes and freezes, because of errors and some only seem to work fine in my own test, but fail if a developer named „ubik“ demonstrates situations, which the code cannot handle.

Well, I’ll not tell a long story, how I debug, but come directly to the bug mentioned in the title. I tracked his existence down to BASIC 2.0 as used in the VIC-20, C64 and the early PET/CBM series and it seems, that it was never detected, documented or fixed.

It is related to temporary strings, the stack of descriptors for temporary strings, that has a size of 3, and the so called „garbage collection“, which in reality doesn’t collect garbage, but does a defragmentation of string storage.

Let’s look at an example:

10 V1 = 12345: V2 = 6789

20 A$ = RIGHT$(MID$(STR$(V1),2)) + RIGHT$(MID$(STR$(V2),2))


and let’s look into string memory, after the execution of these two statements:



The interpreter needed 6 temporary strings, these are the strings, which are followed by the two byte „garbage“ word in magenta, to create the resulting permanent string, marked here with the „back link“ word in yellow.


Each time, where a string is modified or new assigned, it gets a new allocation in string memory and the old string is flagged as „garbage“ by replacing its back link, which points to the string descriptor, by the garbage word, which is the length of the string, followed by $ff (The high byte $ff can never appear as part of a back link, because the highest address in string memory is $F6FF, therefore it is safe, to use this value as flag).


In programs with much string activity, the occupied string space, which grows from top ($f6ff) to bottom (top of array space), will be filled very fast and if there is no more space left for creating a new string, the call of the defragmentation routine is triggered. This starts at top of screen memory and copies all strings, which do not carry the garbage flag, to adjacent addresses, skipping the garbage strings. Doing this it is necessary, to update the string pointers of the descriptors. That’s the purpose of the backlink: After copying a string to the new address the back link is used to find the descriptor, and the two pointer bytes of the descriptor get the new address. Te descriptor can reside in scalar space, e.g. for a variable like AB$, or in array space, e.g. XY$(I,J) or it can be a temporary descriptor on the descriptor stack in the zero page. And this is, where the bug lurks.

The descriptor stack is 9 bytes long in the zero page and has therefore room for 3 descriptors. Additionally we have a stackpointer, which has four valid values for 0,1,2 or 3 descriptors in use. And we have a LASTPT called variable, which points to the last temporary descriptor, which was used to allocate a string. This is an optimisation tool. When a string is no more needed and it is the last string, that was allocated, the pointer to the used string memory can be updated by the length (plus 2 for the back link) instead of just flagging the string as garbage. This method can slow down the speed of filling the string memory somewhat and let’s the defragmentation happen less often.

This LASTPT is initialised at the execution start, alas only the high byte!




The low byte therefore has the value, that was either initialised at power up or has the value from a previous use. This use can for example be the loading of a program, because this involves string handling for the filename and therefore the use of the descriptor stack.


So it can happen, that the routine, that pops a value from the descriptor stack compares the current pointer with LASTPT, sees equality and decides not to flag the string as garbage, but to update the pointer to free string memory instead. This is an error, if the value of LASTPT is from a previous usage and was not set in the current statement. Alas, it will have no consequences in 99.9 % of the cases, because at the end of a statement all temporary descriptors are freed anyway.

The bug can drive mischief if following conditions meet:


1) A program with heavy usage of strings, which triggers the garbage collection / defragmentation frequently.


2) Statements with complex string operations, which need more than one temporary descriptor.


Then a really rare event can happen:

The garbage collection is triggered in the middle of a complex string formula with temporary descriptors on the stack. The garbage collection is aware of temporary descriptors and copies their strings too and updates the descriptors, but if one of the descriptors freed his string due to a false LASTPT comparison, his string is outside of the garbage collection area and will not be updated. And this causes strange things to be happen. The descripror has now a pointer, that does not point to a valid string. So the whole system of links and back links is corrupt. To make the bug hunting more interesting, this corrupt pointer often does no obvious harm until the next garbage collection, or the collection after it, but each will make the error worse, until back links appear in screen memory or strings dissappear or the computer freezes.


And all this because the setting of LASTPNT was not done with:

STA LASTPT

STA LASTPT+1


instead the first part, setting the low byte, was forgotten.

BTW, the highy byte is always zero, because the descriptor stack resides in the zero page, so this usage of the high byte is redundant.

I found this bug in all versions of Commodore BASIC, that I investigated, VIC-20, C64, C128, C65, MEGA65.

But it needed the „11 BASIC“ preprocessor/compiler of UBIK, to let the bug appear.

And it was very difficult to detect, because only huge programs with heavy string activity could activate it.


So the old programmer’s talk is true:

You can only find the penultimate error in a program!
There is always the ultimate error, that remained undetected.