Sunday 15 August 2021

Write and format support for the internal floppy drive

In the last blog post where I was attacking the floppy stuff, i reported how we can now almost format floppies using the C65 ROM, with two outstanding problems:

1. The CPU has to be at 40MHz due to some problem, which I suspect is a CPU instruction timing problem.

2. Actually writing sectors on the real floppy has not been tested and confirmed working as yet.

 [Ed: And 3., Reading sectors sometimes fails, as I discovered part way through writing this post.]

The first problem can be easily worked around for now, by switching to 40MHz before starting to format a disk, so my focus for the moment is on the 2nd problem.

The first step is to add sector reading and writing functions to my floppytest.c programme, so that I can ask for a sector to be written over, and then see what is in there.

I already have the various bits parts and pieces of sector reading routines in floppytest.c, so its just a case of pulling them all together.  I have had to go through the logic in the read/write test harness, as there was continuous stepper chatter happening after trying to read a sector. It turns out that the problem there was that I was not masking bit 7 off the "last track seen under the head" register of the MEGA65's floppy controller.  That bit gets set when a track number matches the requested track, and so track 2 becomes $82 when it matches the request, and that was then not == 2, and so my little auto-stepper routine (not to be confused with the auto-tune autonomous track stepping feature of the MEGA65's VHDL) would think it didn't match, and that it needed to go to a lower track (since it thought it was on track 129 and wanted track 2), but then it would see track 1, and step back again, which would result in $82 appearing there again, and thus rinse repeat.

With that fixed, I now have simple keyboard cursor key controls to select the track to read, and then pressing R will cause it to try to read sector 0 of side 0 of that track.  This now more or less works.  But so far, I am not displaying the sector contents anywhere. 

I am using an 80x50 text mode, but with a funny setup using nybl colour mode so that I need only 4 bits per graphics pixel, which means 16 pixel wide chars, and thus only 40 chars per row.  What I would like is to be able to display 80 columns of text.  I want to keep that funny mode for the other functions in floppytest.c, and reuse as many of the support routines for that as possible, so that the compiled code doesn't end up too large. 

But I might just need to make a new text mode and simple printing routines nonetheless, as it would be great to be able to show a whole 512 byte sector.  With 80 columns we can show 16 chars per row, if we want to show both hex and a glyph.  512 / 16 = 32. So an 80x50 text mode would be ideal here.  That would need 4,000 bytes of screen RAM, which might cause us a bit of grief, as we are using $C000 upwards for that, and I have a vague recollection that putting stuff in $C800-$CFFF with a CC65 compiled programme causes problems.  It might actually just be $CF00-$CFFF that's the problem (yes, its because CC65 uses $Cxxx as the C stack, so the top part is quite unsafe, but the rest is okay, if you don't nest function calls too deeply, or use too many stack-allocated variables).  We can work around that by making a video mode that is less than 50 rows high.

Next step is to write the bit of code to display the contents of the floppy sector buffer.  Together with the new video mode, this gives us a display like this:

Again, excuse the glitched characters due to bugs in the screen capture workflow. But you can see the general idea of displaying the 512 bytes of data nicely.

The current problem is that when I read a sector, the data is not appearing in the sector buffer.  I pre-fill the buffer with $BD bytes, and then after reading the sector, that same data is read back out, as shown above, thus confirming that the sector from the disk is not being read properly, or that I am reading from the wrong buffer memory or something along those lines.

So by booting to BASIC 10 and accessing a D81 disk image there, I have been able to confirm that the buffer is indeed where I expect, at $FFD6C00-DFF.  And if I command a read after C65 DOS has been reading things, then I can indeed read the sector.

The problem here was that I wasn't resetting the buffer pointers for the FDC with a command $01 to $D081.  With that, I can now read sectors, but there is something fishy going on with the flags: The BUSY flag in particular seems to clear well before the sector has finished reading in. This means I am frequently trying to display the sector contents before they have loaded in. The question is what is clearing the BUSY flag early.

The FDCReadingSector state in the state machine has no path that clears the f011_busy signal, without also stopping the sector reading process. So that seems unlikely to be the problem.   The main loop does check for the busy_countdown to reach zero, and clears f011_busy at that time. However, the busy_countdown does get set when stepping tracks -- and this was the cause of that problem: When you issue a command $10 or $18 to $D081 to step the head in or out a track, you have to wait for the BUSY flag to clear, rather than wait a fixed time.

That now has it reliably reading whole sectors, however not always the correct sector.  This might in fact be the root cause of the mystery hanging of the C65 DOS when using real disks sometimes, in that if it reads the wrong sector data, it might cause strange things to happen. But this will require further investigation.  But first, lets see what we get read into the sector buffer when repeatedly trying to read track 39 sector 1 side 0, which should be the header sector of the disk.

Let's start by mapping out what we *should* see in each of the 10 sectors on side 0  of track 39. The disk I am using has a couple of files, followed by 255 files numbered 1 to 255, which makes it easy to see where directory material has come from.  So here are the ten 512 byte sectors from the disk:



 


And so on.

Some of the bad reads have seeming utter junk in them, like this read of sector 10:

It has lots of $BD bytes at the end, which tells me that the sector read probably failed part-way through.  My best guess here is that the sector read gets desynchronised with the bytes of MFM data on the disk, since it still seems to show a periodicity of 32 bytes, matching the size of the 1581 directory entries. In fact, it is possible it is shifted by some number of bits. The correct sector should look like this:

So where we have $05s in our bad sector, we have similar numbers of $A0s in the proper data. Thus if we shift the bad sector data right by 3 bits, it should line up correctly, and we would then get:

0000: X0 33 32 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0
0010: 00 00 00 00 00 00 00 00 00 01 00 00 00 81 27 23

Which actually matches part of track 39 side 0 sector 4 in the image above, from offset $0125 in that sector.

This is quite, quite weird, as it would seem to suggest that our MFM decoding logic is detecting sync marks and a seemingly valid sector header and data section marker in this random part of the disk.

Look at other cases where bad reads are occurring, I can see much the same strange results, where it looks like the disk is just being read from some random point, rather than the correct starting point.

I'll have to stop for now and try to think about what could be causing this: Is it that the MFM decode logic is getting false positives for sync markers, and not properly checking the track, side and sector values, and thus starts decoding some wrong section of the track as a sector. Or is it that it begins, but then somehow freezes or something, causing a delay before it actually starts reading the data, and thus the disk has rotated in the meantime, causing wrong data to be read.

One thing that might be useful is to time when the read occurs versus the index hole or the sector to be read, to see if it isn't some kind of race condition based on where we are around the disk.

Similarly, I could introduce a requirement that the data section be located within a very short period of the header section being found, so that the controller can't miss the body of the sector, and then just keep going until it hits the data section of the following sector.

Anyway, I'm back at looking at this after almost three weeks of other activity.  Where I might actually start is by modifying my disk formatting code to write predictable material in all the sectors, so that when I try to read them, I know exactly where the material will have come from.  I will then also know the interleave order of the tracks, as well, so if I read material from the wrong sector, I will know where that is in relation to the correct data. I should even be able to automate characterisation of what is going wrong, perhaps.

Right-oh. I now write "Track N, Sector M, Offset $X." repeatedly in each sector, where N and M are the correct track and sector numbers, and X is the offset of the start of the string in the sector. So in this way I can determine whether we are just reading part of the sector, or reading some of the wrong sector.

I have also improved my mfm-decode.c program, so that I can see if the floppy controller is reading the correct MFM data for the sectors. And the good news, is that it is reading the correct data, and because of how I collected the data, I can also get a pretty good idea of whether the SYNC flag is being correctly detected, and that looks fine, too.

In the process of doing all that, I discovered I was calculating the CRC of data sectors incorrectly when formatting the disk, so I have corrected that, so it's now all behaving as expected. I can now get nice displays like this, when it reads a sector correctly:

And then when it gets it wrong, I am seeing something like this:

So now rather than having to guess what has happened, I can clearly see that it is Track 0, Sector 1 that is being read, but that its missing the first 2/3rds of the sector in this case.  These sorts of errors are occurring with the actual reading of the sector beginning at seemingly random offsets, so I don't think it is a 

I'm keen to get an idea of how often this happens, so I am instrumenting the program to keep reading the sector until it spots an error, as it only seems to happen about 10% or so of the time.  However, if I make the program repeatedly read the sector until an error occurs, it seems that an error never occurs: It tries 63 times to get a bad read, and doesn't hit one.   I can even re-trigger it several times, and it still doesn't happening. 

So with this extra check, the failure rate is dropping to <1%, which makes no sense at all, since it is only retrying the read to get a failure if the first read succeeded, which is exactly the same action -- doing a single read -- that it was doing before.

Ah, finally I have it failing now on one of those runs, and I am not at all surprised to discover that it was on the first try.  So there is something about asking it to start to read a sector, but which is neatly avoided when doing the retries.

I already have the head on the track, and the motor running and the drive selected, so there shouldn't be any functional difference, but clearly there is.

I am forming a theory: I think it has to do with the rotational position of the disk under the head somehow.  With some testing, I was able to confirm that this is indeed the problem.  What is happening, is that if the head is already over the target sector, the "last sector under the head" flags correctly indicate the target sector, and so the FDC starts reading the data. But it is possible that the head has already advanced over some fraction of the sector, so we only read the tail-end of the sector.  This has about a one-in-the-number-of-sectors-on-a-track chance of happening, so about a 10% chance, which is what I was seeing.

This should be fairly easy to fix, as I just make sure that the floppy read logic waits for the "last sector under the head" flag to change to the requested sector, rather than doing a simple level triggered action.

I have a fix for that synthesising now. That will take a while, so I might have a look back at writing of sectors. In theory, that should be more or less working, although I have spotted a problem with the CRC calculation that I have just fixed, and will try synthesising after the read fix has synthesised: It was not including the three $A1 sync mark bytes in the CRC.

But since we have the means to write a sector, and to read a track of raw data and decode it, I can do some before and after comparisons, to see if the sector writing is really happening.  So let's give that a whirl.

Here is how track 0, sector 1, side 0 looks after formatting it with the contents described previously:

(539 bytes since last sync)
SYNC MARK (3x $A1)
SECTOR HEADER: Track=0, Side=1, Sector=1, Size=2 (512 bytes) CRC ok
(42 bytes since last sync)
SYNC MARK (3x $A1)
SECTOR DATA:
  0000 : cf 46 46 53 45 54 20 24 30 2c 20 d4 52 41 43 4b    Offset $0, Track
  0010 : 20 30 2c 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46     0, Sector 1. Of
  0020 : 46 53 45 54 20 24 31 45 2c 20 d4 52 41 43 4b 20    fset $1e, Track
  0030 : 30 2c 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46 46    0, Sector 1. Off
  0040 : 53 45 54 20 24 33 44 2c 20 d4 52 41 43 4b 20 30    set $3d, Track 0
  0050 : 2c 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46 46 53    , Sector 1. Offs
  0060 : 45 54 20 24 35 43 2c 20 d4 52 41 43 4b 20 30 2c    et $5c, Track 0,
  0070 : 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46 46 53 45     Sector 1. Offse
  0080 : 54 20 24 37 42 2c 20 d4 52 41 43 4b 20 30 2c 20    t $7b, Track 0,
  0090 : d3 45 43 54 4f 52 20 31 2e 20 cf 46 46 53 45 54    Sector 1. Offset
  00a0 : 20 24 39 41 2c 20 d4 52 41 43 4b 20 30 2c 20 d3     $9a, Track 0, S
  00b0 : 45 43 54 4f 52 20 31 2e 20 cf 46 46 53 45 54 20    ector 1. Offset
  00c0 : 24 42 39 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45    $b9, Track 0, Se
  00d0 : 43 54 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24    ctor 1. Offset $
  00e0 : 44 38 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43    d8, Track 0, Sec
  00f0 : 54 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 46    tor 1. Offset $f
  0100 : 37 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    7, Track 0, Sect
  0110 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 31    or 1. Offset $11
  0120 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0130 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 33    or 1. Offset $13
  0140 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0150 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 35    or 1. Offset $15
  0160 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0170 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 37    or 1. Offset $17
  0180 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0190 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 39    or 1. Offset $19
  01a0 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  01b0 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 42    or 1. Offset $1b
  01c0 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
 01d0 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 44    or 1. Offset $1d
  01e0 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  01f0 : 4f 52 20 31 2e 20 00 00 00 00 00 00 00 00 00 00    or 1. ..........
CRC ok
(539 bytes since last sync)
SYNC MARK (3x $A1)
SECTOR HEADER: Track=0, Side=1, Sector=2, Size=2 (512 bytes) CRC ok
(42 bytes since last sync)
SYNC MARK (3x $A1)

In short, we have a perfectly good sector with correct CRCs on the header and body following the formatting. 

But after we write the sector, things don't look so good. I have increased the output of the program, so that I can get some better clues as to what is going on. In particular, I have it printing out every byte -- sync mark or not -- so that I can see what's happening. This is what I see:

Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $00 $00 $01 $02 $ca $6f $13 $93 $93 $93 $93 $93 $93 $93 $93 $93 $93 $80 $00 $00 $00 $00 $00 $28SECTOR HEADER: Track=0, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(25 bytes since last sync)
Sync $A1 x #1
Sync $A1 x #2
 $d7 $52 $54 $45 $20 $4f $54 $41 $4b $30 $20 $45 $54 $52 $31 $20 $49 $45 $30 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00

So we see the sector header ok. That's expected, because we don't start writing until after that has passed under the head.  Then we see the $4E padding bytes following the sector header are messed up, being $93 instead. I'm guessing that this is because we are writing slightly out of phase. That's okay, actually, because the sync bytes should get it back into phase when the come. That's their purpose, after all. But we are only seeing two sync mark bytes, and then we are seeing $D7 instead of $FE to mark the start of the data sector.

Despite that, there is actually pretty good news in this, because (a) it is writing something, and (b) I can see the string I wrote, or rather, I can see parts of it, having been written.  So our write path is close to working.

Let's look at those data bytes, and compare them to what I try to write to them, which is "Written to track %d, sector %d, side %d".  The data is written as a PETSCII string, so lower case letters will be in the range $41 to $5A, so our data bytes:

$52 $54 $45 $20 $4f $54 $41 $4b $30 $20 $45 $54 $52 $31 $20 $49 $45 $30

correspond to the string "rte otak0 etr1 ie0". That looks suspiciously to me like every other character of the string... That could also explain why we are seeing only two sync mark bytes, if we are advancing two bytes at a time, we might be writing sync mark #1 and sync mark #3, and skipping sync mark #2.

I have a fix for that synthesising now.

While that builds, I am testing the fix for sector reading, and that seems to have done the trick, so that's a great relief. I can now even run the dreaded "sequential program load test disk" that Deft supplied me with, which used to crash after reading only 5 or 6 of the couple of hundred files on the disk. As I type it is up to file 168 and counting, and indeed it now completes successfully!

So now to just wait for the bug fixes to the sector writing to synthesise, and then see how that looks.

And it doesn't work. It looks like it is totally butchering up the entire track.  It also takes a long time to think it has written the sector, as in long enough for several rotations of the disk.  Thus I'm suspecting that the counter that keeps track of when the MFM writer is ready for the next byte is messing up. So I'll add a couple of debug registers that let me watch that during the writing.  I might also try simulating it as well, as that might get a faster answer as to what is going wrong. It's a bit frustrating, as it now feels super close to having working sector writing.

Another day, and fresh eyes to look at the problem.  I found and fixed a couple of bugs that were responsible for this mis-behaviour.  I now have it to the point where is writing what looks more or less like a valid sector, such that I can at least read it back:

A raw read of the track reveals that we have things more or less under control:

Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $00 $00 $01 $02 $ca $6f $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $00 $00 $00 $00 $00 $00 $
00 $00 $00 $00 $00 $00 $51
SECTOR HEADER: Track=0, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(43 bytes since last sync)
Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fb
 $d7 $52 $49 $54 $54 $45 $4e $20 $54 $4f $20 $54 $52 $41 $43 $4b $20 $30 $2c $20 $53 $45 $43 $54 $4f $52 $20 $31 $2c $20 $53 $49 $44 $45 $20 $
30 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
...
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $55 $83 $4e $4e $4e $4e $4e $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $
24 $24 $24 $24 $24 $14
SECTOR DATA:
  0000 : d7 52 49 54 54 45 4e 20 54 4f 20 54 52 41 43 4b    Written to track
  0010 : 20 30 2c 20 53 45 43 54 4f 52 20 31 2c 20 53 49     0, sector 1, si
  0020 : 44 45 20 30 00 00 00 00 00 00 00 00 00 00 00 00    de 0............
  0030 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
  ...
  01f0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
CRC FAIL!  (included field = $5583, calculated as $4f0f)
(539 bytes since last sync)
Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe


So we can see a few things here:

1. The gap bytes are $27 instead of $4E, i.e., they are shifted one bit to the right. That is, we are skipping one bit before writing them, and so end up out of phase.  This really doesn't matter, and I don't care.

2. The sector data is correctly written, and can be correctly decoded! Yay!

3. BUT it has the incorrect CRC.  That I should fix up.

3. The gap bytes after change from $4E to $24 part way through, as we switch from the sync bytes we wrote out, to the ones provided on the formatted track already. Again, not a problem that I'm worried about.  The gaps are there to get rubbish in them.

4. The following sector is correctly identified without problem.

So of all that, I really just need to fix the CRC calculation stuff.  

The first step there, is to figure out what is actually doing with the CRC. Is it doing the CRC one one too many bytes?  Is it skipping the first byte or one or more of the sync marks etc?

Taking a close look at the VHDL, I realised it was not including the $A1 sync mark bytes or the $FB data marker in the CRC calculation. I was also writing the two bytes of the CRC out in reverse order. I'm now synthesising a fix for those problems.

But while I wait for that, I'm really tempted to try to use the HEADER command of BASIC 10 to format a disk, as it should now work if I run the CPU at 40MHz to work around the bug there.  It chugs through formatting the disk, but then fails with a 27,READ ERROR,40,0.  That error indicates a CRC error in the sector header.  

Digging through the C65 DOS source code, this could in fact be triggered by the CRC of a sector being bad. It checks $D082, and if bit 3 is set, then it fails with a 27 READ ERROR:

;* 'CheckEr'   Check for CRC/RNF status following read of currently selected drive
;*
;* If an error is detected the error handler is JuMPed to.  Preserve Z register.

CheckEr
        jsr BusyWait            ;Busywait for op to complete
        lda stata
        and #bit4+bit3          ;Check RECORD NOT FOUND and CRC
        beq noerror             ;    branch if no errors
        cmp #bit4               ;    what is the error?
        beq 10$
        bcs 20$

        ldx #5                  ;       DOS error # 23 (data CRC)
                .byte $2c
10$     ldx #9                  ;       DOS error # 27 (header CRC)
                .byte $2c

So I probably just have to be patient, and wait for that new bitstream to finish synthesising.  Which it now has, and the CRC calculation is still wrong. Now to try to work out how it is wrong, again.

Turns out when I did that fix above, I told the CRC logic to expect a byte, but didn't actually give it the value, so it is reading rubbish instead, and thus generating a bogus CRC value.  So we shall have to wait for yet another synthesis run, which means another day or two delay, as I have to be up in 8.5 hours time for work.

It's now tomorrow, and the synthesis has completed.  So first thing is to check whether we are now generating valid CRCs when writing the sectors, which we seem to be:

SECTOR DATA:
  0000 : d7 52 49 54 54 45 4e 20 54 4f 20 54 52 41 43 4b    Written to track
  0010 : 20 30 2c 20 53 45 43 54 4f 52 20 31 2c 20 53 49     0, sector 1, si
  0020 : 44 45 20 30 00 00 00 00 00 00 00 00 00 00 00 00    de 0............
  0030 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
  ...
  01f0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
CRC ok


But if I try to format a disk, I am still getting the 27 READ ERRORs:



So let's see if we are setting the CRC error flag when reading the sector: Yes, we are correctly setting it.  On closer examination of the DOS code above, I can now see that the 27 READ ERROR is if the "RNF" = Request Not Found bit is set, which happens if the sector header can't be found, or equivalently, has a CRC error, as matches the error code according to Commodore.

Prodding about a bit more, I am noticing something funny: Reading from track 0 in my test program seems to work, but reading from track 39, the directory track, is resulting in this kind of error.  Oddly, it looks like the sector is found, even though the error indicates that it is not found.  I'll do a low-level read of that track to see if its been butchered by writing sectors to it.

Nothing looks amiss with it, which is a bit odd.  I'm formatting the disk again with my test utility, and will try reading the sectors on track 39 again, and see if that works... and it does. I can read sectors on that track fine now.  

But trying to use the HEADER command to do a quick-format, that preserves the low-level formatting on the disk doesn't seem to work properly, either.  The fresh BAM etc don't get written. Writing sectors and formatting to D81 images works fine, so I assume the floppy controller is getting the correct information it needs to write them.  But I don't know if the quick format code tries to read before writing or not.  

So let's focus on reading sectors from disks formatted using the C65 HEADER command.  Reading from track 0 works fine, but not from any other track. But if I try a disk that was formatted in a catweazle -- or one I have formatted using my test utility -- then they read fine.  So I'm guessing that there is something subtle difference with the way the C65 ROM formats the disks, and that this is tickling our floppy controller into not accepting the sector header. 

The fact that track 0 can be read fine feels like it should be a significant clue, but I'm not yet sure what it tells me.

So let's compare what we have in the sector header for track 1, sector 1, side 0 on a disk formatted by the C65 DOS, and thus doesn't work, vs the catweazle formatted disk that does work:

C65 DOS:

Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $00 $00 $01 $02 $ca $6f $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00SECTOR HEADER: Track=0, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(45 bytes since last sync)

Catweazle disk:

Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $01 $00 $01 $02 $bc $db $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00SECTOR HEADER: Track=1, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(41 bytes since last sync)

Well, the obvious thing here, is that the C65 DOS has written track 0, not track 1.

Assuming that I have correctly seeked to the track, and that I aren't just seeking one track wrong. But if that were the case, then I'd expect to see the catweazle disk not working. But I want to be 100% sure.

Ah... I think I have an idea of what is going on now: Track zero has not been formatted. I think the C65 ROM somehow ends up starting to format from track 1, instead of track 0, so it looks like the track numbers are wrong. 

I have indeed confirmed that it starts formatting from track 1, instead of track 0, and thus all the track numbers are out by one, and track 0 doesn't actually get formatted. The challenge is now to work out how on earth this happens.

The format code in the C65 DOS looks like this:

FormatDisk
    jsr cleardchange
    lda #2
    sta sz            ;sector size = 512
    ldx cdrive        ;drive number
    lda #$80        ;force INIT to bump
    sta curtrk,x
    sta last_drive        ;forget previous accesses            [910129] FAB
    jsr InitCtlr        ;init controller
    jsr CheckProt        ;check write protect status
    bcs wtabort        ;   can't continue- protected

    lda $d011        ;Disable VIC                    [910815] FAB
    sta rc8            ;    to be restored later
    and #%01101111
    sta $d011        ;    blank to suspend VIC DMA's

    lda #0            ;Set starting track (0-79, initially zero)
    sta tt1

fmttrk    lda #0            ;Set starting side (0-1, initially zero)
    sta sd
    lda #side
    tsb control        ;  (1581 has physical/logical swapped)

fmtside    jsr BusyWait
    lda #1            ;Set starting sector (1-10, initially one)
    jsr docrc        ;CRC track
    jsr wttrk        ;Format track
    lda sd
    bne 10$
    inc sd            ;Flip to second side
    lda #side
    trb control        ;  (1581 has physical/logical swapped)
    bra fmtside

10$    lda tt1
    cmp #79            ;Check max track
    bcs fmtdone        ;  yes- Done
    inc tt1            ;  no-    Move to the next track

20$    jsr CheckDC        ;Check for disk change
    beq 30$            ;  no-    Continue
    jsr setnodrv        ;  yes- set no drive flag        [900822] FAB
;    lda #0
;    sta command        ;???? to get rid of "nobuf"        [910203]
    jsr hed2ts
    lda rc8            ;Reenable VIC                [910815] FAB
    and #%01111111        ;    must keep RC8 low
    sta $d011
    ldx #12            ;DOS error #75 format error        [910420]
    bra nderror

30$    lda #stout        ;Step head out to next track
    sta command
    jsr SettleHead        ;Delay 24ms
    bra fmttrk        ;  continue


fmtdone
    lda rc8            ;Reenable VIC                [910815] FAB
    and #%01111111        ;    must keep RC8 low
    sta $d011
;    lda #0
;    sta command        ;???? to get rid of "nobuf"        [910203]
    ldx cdrive
    lda #bit7        ;????force a bump
    sta curtrk,x        ;(have to- head is now at track 80, DOS thinks it's at 1)

    bit iobyte        ;Verify?                [911008] FAB
    bpl done        ;    no
    jsr VerifyDisk        ;    yes


done    ldx jobnum        ;Flag job complete
    lda #0            ;                    [901127]
    sta jobs,x
    clc
wtabort    rts

The important bit right now is InitCtlr, which seeks to track 0 as part of its function, but that seems to work fine.  I tried to find the matching code in the ROM binary, so that I could put a break point on it, and then single step through the track stepping code. However, it seems that the latest MEGA65 patched C65 ROMs have refactored that part of the code.  So I will try it with one of the unpatched ROMs, and see if that works. It might be that the patched DOS code, which has only been tested using D81 disk images, might possibly have a bug with formatting or seeking on real disks.

Nope, the problem was the auto-stepper I added to the FDC ages ago to work around a bug we had with stepping was to blame. So I have removed that, and will resynthesise. But I can disable the auto-stepper temporarily via a register for testing, which results in a disk that is is formatted from track 0, instead of track 1, and thus doesn't have two track 0s. But formatting still fails for some reason. 

Maybe it is the disk side flag being inverted or something like that?  But if it were that simple, then if I formatted a disk using my test program, it would work for a quick format after, and it doesn't.

I did spot one error I had, which is that my test program was formatting with sector numbers 0 to 9 instead of 1 to 10, which I have fixed. But even with that, doing a quick format fails.  What is interesting, is that the quick format *thinks* it has worked, but ends up not actually writing the various sectors to the disk.  So that's what I'll have to investigate tomorrow.

I woke up a bit early, so have squeezed in a bit of investigation:  It looks like the floppy controller was calling the virtualisation hypervisor trap when writing to real disks as well as doing the actual write. This very likely causes the RNF signal to pulse as the hypervisor says it doesn't know about any disk image or virtualised access for that disk, which the C65 DOS notices, and thus fails.  So I have fixed that, and will synthesise a fix, and check it when I get back from work.

Well, that might have been _a_ problem, but its not the root cause problem.  I have taken to making an instrumented C65 DOS build, and I can confirm that attempting to write sectors in that is what is causing these errors. 

Further, whatever it is that the C65 DOS does before trying to write, it causes later attempts to write to also fail.

So we finally have a clue of sorts, and now my challenge is to follow it. So let's try to collect some evidence to help us infer what is happening:

1. Both the CRC Error and Request Not Found bits get set in this situation.

2. The head is on the correct track, and I can use the $D6A3-5 registers to see the target sector appearing periodically under the head.

3. I can write to the exact sector (Physical track 39, sector 1, side 0) from my floppy test program, so its not the disk, and its not that the floppy controller can't write to sectors, or can't write to that exact sector.

And I think I might have just found the root cause:

My test program sets the "use real floppy drive" bit, but doesn't clear the "use disk image" bit, and there is a piece of logic in the sector write code that doesn't correctly deal with the situation when the "use real floppy drive" bit is set: It returns an immediate Request Not Found if the "use disk image" bit is clear, erroneously thinking that someone is trying to write to a drive that has no disk image, and thus has nothing to write to.

The good thing is that I can immediately test this by setting the "use disk image" bit while keeping the "use real floppy drive" bit, and then trying to format a disk.

And, it does indeed enable sector writes to happen, so I need to fix that logic.

The sector completes, and I get a nice friendly READY back, without any errors, but when I try to look at the directory of the disk, something is borked. I'm suspecting the "swap" bit of our floppy controller is not behaving in a bug compatible way with the C65's floppy controller.

To check this, I have read out the header and BAM sectors from the disk I just formatted, so that I can compare them with those of the known-good disk:

Remember that the physical sectors are 512 bytes, so there are two 256 byte logical sectors in each: Thus we see the header in the first half of the first sector, and then we are seeing two BAM sectors, and a $FF $FF track sector indicator in the 4th logical sector, which should be the first directory sector, which I am suspecting is wrong. So lets look at what a good disk has in that one.

And we see that it has $28 $04 in the track/sector link bytes, and because that disk is not empty, we see the directory entries in there, as well.

So I am guessing that something is going wrong with writing the two $FF bytes into the sector buffer.  To get some more clues, I have just checked what appears on a D81 image when I format it using the HEADER command, and it looks like this:

There we have $00 $FF as the track/sector link bytes, and what I think is an end of directory marker.

Now this is rather mysterious, as the same code path is used to populate the floppy sector buffer in both cases, and so if there was a problem with populating the buffer, it should show up on both, but it doesn't.

Trying to reproduce the problem, it seems to not be being run -- unless my patch to instrument the code has broken it.  And it is now producing a half-valid disk, on which I can run a DIR command, and it correctly reports 3160 BLOCKS FREE, but doesn't display the header line:


So there is obviously still something wrong. But I think I'll leave it for the moment, until I have confirmed that I have the logic error in sector writing fixed, so that I can do a clean format, confident that no sector writes are being missed or messed up.

I am loading up the new bitstream at the moment, and will give that a try.  At the same time, I am thinking about the funny thing that formatting D81 images works fine, but not real disks. This says to me that there should not be any problem with the format code preparing incorrect contents of sectors.  Rather, my feeling is that some sector write must still not be happening, or something like that. So I might format a D81 and then compare those track 39 sectors.

Ah, that's interesting: Even with that last fix to try to fix the "am I using a disk image or a real drive" logic, if I only set the "I am using a real drive" bit, formatting still fails.  BUT if I do as before, and set both, then it seems to write to the real disk properly.  I'm just running a complete format again, to verify that it is really happening. If that works, then it means I have some other spot where the logic is messed up for real drive vs disk image. 

Looking at the logic, I think I can see the problem. Here is what I had:

if f011_ds="000" and ((diskimage1_enable or use_real_floppy0)='0'
   or (f011_disk1_present and use_real_floppy0) ='0'
   or (f011_disk1_present ='1' and use_real_floppy0 = '0' and f011_disk1_write_protected='1'))
then

  -- Fail with Request Not Found
end if;

What we want is when diskimage1_enable=0, f011_disk1_present=0 and use_real_floppy0=1 to NOT cause a request not found.  

Can you spot the problem?

The line or (f011_disk1_present and use_real_floppy0) ='0' is the problem: If either of those signals are zero, then it will be activated, and cause the Request Not Found to be triggered, so it should instead be or (f011_disk1_present or use_real_floppy0) ='0'

So I'll resynthesise that. 

But in the meantime, I can set both bits as I did yesterday, and it should work, which it _mostly_ does. What I mean by that, is that I can format the disk, and no error is reported, but when displaying the directory listing of a disk, the header line is not shown, or has some corruption in it. I'm also wondering if it isn't behaving differently based on whether the CPU was set to fast or slow during the format.

Yes, confirmed that if the CPU is at 40MHz, then we get a disk no header line, and if we format it with the CPU slow, then we get a mostly valid disk, but with the ID field borked up.  I'll grab a copy of the header sector in both cases, and see play spot the differences.

This is the header sector that was from a "fast CPU" format, and doesn't display the header line:

 0000: 28 03 44 00 46 41 53 54 43 50 55 a0 a0 a0 a0 a0  (cD`FASTCPU               
 0010: a0 a0 a0 a0 a0 a0 d0 39 a0 1b 44 a0 a0 00 00 00        P9 {D  ```          
 0020: 01 51 00 00 00 00 00 00 00 00 00 00 00 00 00 00  aQ``````````````          
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````
 ...
 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````          
 

Then the same for slow CPU, with the disk ID set to "89", except that this has also messed up in the same way this time. Let's grab an image of that sector as well:

 0000: 28 03 44 00 53 4c 4f 57 43 50 55 a0 a0 a0 a0 a0  (cD`SLOWCPU               
 0010: a0 a0 a0 a0 a0 a0 d0 39 a0 1b 44 a0 a0 00 00 00        P9 {D  ```          
 0020: 01 51 00 00 00 00 00 00 00 00 00 00 00 00 00 00  aQ``````````````          
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````          
 ...
 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````    

 So I'll try again to get one where the header appears... which it's now refusing to do. So I'll just format the D81 image, and compare the above to that.

 0000: 28 03 44 00 4d 45 47 41 36 35 a0 a0 a0 a0 a0 a0  (cD`MEGA65                
 0010: a0 a0 a0 a0 a0 a0 44 4a a0 31 44 a0 a0 00 00 00        DJ 1D  ```          
 0020: 01 51 00 00 00 00 00 00 00 00 00 00 00 00 00 00  aQ``````````````          
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````          
 ...      
 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````   

I have underlined the key differences: The DOS version and disk ID fields are different. In particular, the first byte of the disk ID is getting borked when we are writing to a real disk: I asked for 89 = $38 $39, but we got $D0 $39. Also, the first byte of the DOS version is $1B instead of $31. 

Those are almost certainly the cause of the problems, but the big question is how on earth this happens: The code that prepares the sector doesn't know if it is writing to a real disk or a D81 file.  We know that writing to sectors itself also works, so it seems like all the necessary steps are working.  So I might have to resort to instrumenting the sector write code in the C65 DOS, so that I can take a look at the sector buffer when it goes to write, and see if the contents there really are correct or not, and then whether the write succeeds or not.  I'll attack that after the new bitstream has finished synthesising, and I have run a few errands.

Bitstream is cooked, and I can indeed now select the real drive, and format a disk as above, without having to set the D81 image present flag, so that's a good step forward.  And now when I format, I am getting a header line, just with the borked disk ID and DOS version fields.  Again, its the first byte of those two byte fields that is the problem:

The code that builds the header sector looks like this, including the bit of debug code I added that should flash the border and wait for me to press a key before proceeding:

20$ jsr intdsk  ;init disk
 jsr initdr  ;clear directory only
 lda dskver  ;Get the disk's format type
 beq 30$   ;No version is okay
 cmp fmttyp  ;Is it our disk formating method?
 bne vnerr  ;wrong version #

30$
* PGS DEBUG XXX
zoop
 inc $d020
 lda $d016
 beq zoop
 sta $d016
 
 lda jobnum
 tay
 asl a
 tax
 lda #dsknam  ;set ptr to disk name
 sta buftab,x
 ldx filtbl
 ldz #27
 jsr trname  ;transfer cmd buf to bam
 ldy #0
 sty dirbuf  ;reset lsb
 ldx drvnum  ;      [900415]
 lda dirtrk,x
 sta (dirbuf),y  ;directory track
 iny
 lda #sysdirsec
 sta dirst,x  ;      [900415]
 sta (dirbuf),y  ;link to first dir blk
 iny
 lda fmttyp  ;Get our formating method
 sta dskver  ;Make it the current one
 sta (dirbuf),y  ;And this disk's format method
 iny
 lda #0   ;null
 sta (dirbuf),y
 ldy #22   ;skip name
 txa   ;      [900415]
 asl a
 tax
 lda dskid,x  ;Just in case it was never formatted
 bne 40$
 lda #'D'
40$ sta (dirbuf),y
 iny
 lda dskid+1,x
 bne 45$
 lda #'J'
45$ sta (dirbuf),y
 iny
 lda #160  ;shifted space
 sta (dirbuf),y
 iny
 lda dosver  ;Send out this dos's version number
 sta (dirbuf),y
 iny
 lda dskver  ;Get the current disk's formatting
 bne 31$   ;Method, if no number then okay

 lda fmttyp  ;Else use our version instead
31$ sta (dirbuf),y
 iny
 lda #160  ;shifted space
 sta (dirbuf),y
 iny
 sta (dirbuf),y
 iny
 lda #0   ;Pad remainder of the directory
32$ sta (dirbuf),y  ;Sector with zeros
 iny
 bne 32$

It all looks pretty straight-forward. 

Single-stepping through, the bogus format type byte for offset $16 of the header sector is coming from the lda dskid,x instruction.  That location has $D0 in it in this test run, thus causing the first byte of the disk ID in the header sector to be set to $D0.  My guess is that the disk ID is not being fully copied into place, but I will have to confirm this by trying to format a D81 image. In short, it is possible that it is a bug with the C65 DOS, although I don't remember it being a problem on the C65 I had -- but maybe its only present in certain versions of the C65 ROM. But, anyway, let's continue to the other format type field that also has the first byte borked.

So for the format ID, we get $31 $44 = "1D", which is in fact the correct values for the C65, so that's a red herring -- its only the disk ID that is the problem.

And investigating that, it looks like the C65 DOS command buffer has the $D0 byte in it:

:00010200:4E303A504F5441544F2CD03500352A00  N0:POTATO,<$D0>5<NULL>

My suspicion is that the BASIC 10 HEADER command is not copying the ID field in to place correctly, which I can verify by using the form HEADER "NAME,ID" instead of HEADER "NAME",IID, because in the second form BASIC parses the ID field, while in the 1st form it just passes the string intact.  So I am formatting a disk that way, and we will soon discover if it solves the problem.

... and of course the corruption of the DOS version field has chosen this moment to reappear, and thus I get an empty directory listing:


Most frustrating. But I can at least use the floppy test program to read the header sector and see whether the disk ID got written properly or not, which it does:

So at least we know the cause of that one.

Digging through BASIC, I can see that the dosdid bytes get set correctly while parsing the command initially, but then the first one seems to get stomped on at some point before the command is built for dispatch

The BASIC ROM stores the disk ID in $118D and $118E, so I'll keep an eye on those for unexpected modification. The values are still intact when BASIC asks the user if they are sure they want to format the disk, so far, so good.... And in fact, they are still intact when the header command is running. So this looks like the command building stage is where it breaks.

Ok, that's weird: Its fine before and after it builds the DOS command, but the DOS command still gets built wrongly.  Maybe something in the DOS command building breaks it after it has been written correctly? 

Bug found. Here is the faulty code-snippet:


***
rid
*** lda dosdid       ;include id
    sta $0850
    sta dosstr,x
    inx
    lda dosdid+1
    sta $0851
    jsr pgs_debug_key_wait
    bra sdp5         ;always

Can you spot the bug?  Post a comment when you spot it :)

It turns out the other bug was also due to an error in the C65 DOS ROM, which BitShifter kindly found for me quite quickly, once I had used the CPU watch point to trigger when it was being corrupted.  I'm very glad I fixed that facility again :)

So now I am running a HEADER command on a real disk, with disk ID set, at 3.5MHz, and let's see how it goes.  I shall be very pleased if it has the correct disk ID and DOS version info. You know, if I get an output something like this:


Yay! I shall now stop, before anything can go wrong :)

Seriously though, this is a really nice milestone, as it means that we have working read and write and format support for the internal floppy drive.

4 comments:

  1. Nice way over my head but I liked that chase! Is this a C= retro thing as well Mega65? I have 1581. Many don't just a thought....

    ReplyDelete
  2. The bug: Missing a "sta dosstr,x"?

    ReplyDelete
  3. Fantastic work! (As always)
    Can't wait for the pre order.

    I assume the bug is a missing "sta dosstr,x".

    ReplyDelete