Sunday, 12 September 2021

Adding transparent support for HD floppies

The internal floppy drive in the MEGA65 is actually a standard PC-compatible HD / 1.44MB floppy drive, so while the C65 DOS only understands DD disks, the hardware is capable of more.

While we could just stick to DD media, there are some good reasons to support HD media.  One of the key ones, is that the MEGA65's advanced features mean that its quite conceivable to imagine a game that would like more than 800KB of data on a disk.  In fact, this thought was triggered exactly by @Shallan50k wanting to fit 1MB maps for his kart-racer game for the MEGA65 onto a disk.

There are several challenges that I would like to attack in doing this:

1. The resulting disks should work with the C65 DOS, without modification, at least to the extent of being able to get a directory listing, and load one modest size program from it.

2. The disk format should allow creating disks that can be written to using the MEGA65's floppy controller, if that is the user's intention.

3. For mastering disks for games/software distribution, we don't care about write-ability (at least not on all tracks), but we would really like to be able to cram as much data as possible onto a disk.  By reducing the inter-sector gaps, it is possible to fit more sectors on a track. That's how the Amiga gets 880KB on a DD disk, compared to the 1581's 800KB.

4. We need a common disk image format that can be used for all variants of the above, to ease software development, and make it possible to run these images from SD card.

(1) and (2) are the easiest ones to solve. In fact, I have solved them already, by implementing a 2nd parallel MFM decoder in the MEGA65's floppy controller, that runs at 2x the data rate of the main decoder. As HD disks run at 2x the data rate, this means that we automatically are able to read (but not write) HD formatted disks, without having to modify the C65 DOS -- but the C65 DOS will only see the first 10 sectors on each track, which is totally fine for goal (1), as the directory listing can appear, and we can load upto ~800KB of files from it, by using only the DD-compatible sector numbers.

The floppytest.prg program in the https://github.com/mega65/mega65-tools repository now includes options to format and test HD-formatted disks created in this way, with 20 sectors per side instead of 10 sectors per side, giving a 1600KB fully-writeable disk. 

To be able to write to such disks, you just have to poke $28 into $D6A2, to set the floppy controller to the HD data rate, so that the MFM encoder (of which there is still only one) is looking for HD-formatted sectors.

So that's all solved.

It's with (3) is where it gets more interesting. We could just go Amiga style, and settle for 21 sectors per side per track, and thus get 1760KB per disk. But I know that we can fit more.  For a start, we can go 1541 style, and vary the data encoding rate on different tracks, and fit more sectors onto the outer tracks, since the normal data rate is good enough for the inner-most track.  But before we do that, let's talk for a bit about how a floppy works, and some of the important aspects of magnetic recording that affect us.

Magnetic grain size refers to the size of the individual magnetic domains on the floppy.  For the inner most tracks, fewer particles will pass under the head per second, so we have to use longer magnetic pulses, i.e., store less.  For the outer tracks, more magnetic domains will pass under the head per second, so we can have more data on each track out there.  I'll talk about the relative length of the inner and outer tracks shortly.

Magnetic signal strength is how strong the raw signal from the floppy drive is. It's strength scales with the square of the velocity of the magnetic transitions going past it. So on the inner tracks, the media is passing by more slowly, and the signal will be quite a bit weaker than on those lovely longer outer tracks, where that longer circumference goes past in the same time, i.e., at higher speed.

So in short, we should absolutely be able to cram more sectors on those outer tracks by increasing the data rate.  

By way of comparison, the 1541 that used only 35 tracks varies between 17 and 21 sectors per track. That is, on the outer tracks, it crams in 21/17 = 123% more data than on the inner tracks. And fully half the disk uses that maximum number of sectors per track, as it really is only the innermost few tracks that are so short as to be a problem. In fact, only the first 5 tracks on the 1541 have the minimum 17 sectors per track.  Thus compared with a naive 17 x 35 = 595 sectors, the 1541 actually fits 683 sectors, i.e., almost 115% of the constant-sector-count capacity.

So let's think about what should be possible on an 80 track 3.5" floppy disk. We know that the floppy drive can read a standard 11 sectors per inner-track format, because that's what the Amiga did. So we will scale up from there. I'll come back to how we actually produce such tracks later.

First up, we need to know that track 0 is actually the outermost, i.e., longest track, so we will fit more sectors on lower-numbered tracks, and fewer sectors on the higher-numbered tracks.

This site claims the following about 3.5" disks:

        track spacing: .0074 inches 
	drive track radius from center (inner to outer) 
                 side 0 .9719 inches to 1.5551 inches
                 side 1 .9129 inches to 1.4961 inches
	track width: .115 mm (.0045 inch) after "trim erase" on either side (not confirmed for 4MB format)
		also see erase notes

If this is true, and the difference really is 8 tracks, I find that quite amazing, as it means that we could have had an extra 8 tracks on side 1, and that side 1 should have much worse properties near the inner-most track than side 0 has -- a property that we might be able to exploit.

But for now, let's just assume the smallest benefit, which comes from the largest diameters, i.e., that of side 0: The outer-most track is 1.5551/0.9719 times longer = 160% the length of the inner most track.  I was expecting some nice benefit, but fully 160% is even more than I had anticipated.  This means that we should be able to fit 160% x 11 = 35 sectors on the outermost track, instead of the 18 that a PC fits. Even the Amiga's HD floppy's "cramming" of 22 sectors onto the track starts to look quite lame.

So let me do a bit of a calculation here as to how many sectors we can fit, using Amiga style track-at-once, and also 1581-style sector-at-once fitting. Basically we work out the relative length of the tracks versus the inner-most track, and then scale up the number of sectors to the largest integer that fits.

Let's look at how many sectors each track would fit, and the cumulative number of sectors on the disk to that point using 1581 and Amiga style track writing:

              1581-style (sector)   Amiga-style (track)
Track #00 :   32/  64 (  32 KB) :   35/  70 (  35 KB)
Track #01 :   31/ 126 (  63 KB) :   35/ 140 (  70 KB)
Track #02 :   31/ 188 (  94 KB) :   34/ 208 ( 104 KB)
Track #03 :   31/ 250 ( 125 KB) :   34/ 276 ( 138 KB)
Track #04 :   31/ 312 ( 156 KB) :   34/ 344 ( 172 KB)
Track #05 :   31/ 374 ( 187 KB) :   34/ 412 ( 206 KB)
Track #06 :   31/ 436 ( 218 KB) :   34/ 480 ( 240 KB)
Track #07 :   30/ 496 ( 248 KB) :   34/ 548 ( 274 KB)
Track #08 :   30/ 556 ( 278 KB) :   33/ 614 ( 307 KB)
Track #09 :   30/ 616 ( 308 KB) :   33/ 680 ( 340 KB)
Track #10 :   30/ 676 ( 338 KB) :   33/ 746 ( 373 KB)
Track #11 :   30/ 736 ( 368 KB) :   33/ 812 ( 406 KB)
Track #12 :   30/ 796 ( 398 KB) :   33/ 878 ( 439 KB)
Track #13 :   30/ 856 ( 428 KB) :   33/ 944 ( 472 KB)
Track #14 :   29/ 914 ( 457 KB) :   32/1008 ( 504 KB)
Track #15 :   29/ 972 ( 486 KB) :   32/1072 ( 536 KB)
Track #16 :   29/1030 ( 515 KB) :   32/1136 ( 568 KB)
Track #17 :   29/1088 ( 544 KB) :   32/1200 ( 600 KB)
Track #18 :   29/1146 ( 573 KB) :   32/1264 ( 632 KB)
Track #19 :   29/1204 ( 602 KB) :   32/1328 ( 664 KB)
Track #20 :   29/1262 ( 631 KB) :   31/1390 ( 695 KB)
Track #21 :   28/1318 ( 659 KB) :   31/1452 ( 726 KB)
Track #22 :   28/1374 ( 687 KB) :   31/1514 ( 757 KB)
Track #23 :   28/1430 ( 715 KB) :   31/1576 ( 788 KB)
Track #24 :   28/1486 ( 743 KB) :   31/1638 ( 819 KB)
Track #25 :   28/1542 ( 771 KB) :   31/1700 ( 850 KB)
Track #26 :   28/1598 ( 799 KB) :   30/1760 ( 880 KB)
Track #27 :   27/1652 ( 826 KB) :   30/1820 ( 910 KB)
Track #28 :   27/1706 ( 853 KB) :   30/1880 ( 940 KB)
Track #29 :   27/1760 ( 880 KB) :   30/1940 ( 970 KB)
Track #30 :   27/1814 ( 907 KB) :   30/2000 (1000 KB)
Track #31 :   27/1868 ( 934 KB) :   30/2060 (1030 KB)
Track #32 :   27/1922 ( 961 KB) :   29/2118 (1059 KB)
Track #33 :   27/1976 ( 988 KB) :   29/2176 (1088 KB)
Track #34 :   26/2028 (1014 KB) :   29/2234 (1117 KB)
Track #35 :   26/2080 (1040 KB) :   29/2292 (1146 KB)
Track #36 :   26/2132 (1066 KB) :   29/2350 (1175 KB)
Track #37 :   26/2184 (1092 KB) :   29/2408 (1204 KB)
Track #38 :   26/2236 (1118 KB) :   28/2464 (1232 KB)
Track #39 :   26/2288 (1144 KB) :   28/2520 (1260 KB)
Track #40 :   26/2340 (1170 KB) :   28/2576 (1288 KB)
Track #41 :   25/2390 (1195 KB) :   28/2632 (1316 KB)
Track #42 :   25/2440 (1220 KB) :   28/2688 (1344 KB)
Track #43 :   25/2490 (1245 KB) :   28/2744 (1372 KB)
Track #44 :   25/2540 (1270 KB) :   27/2798 (1399 KB)
Track #45 :   25/2590 (1295 KB) :   27/2852 (1426 KB)
Track #46 :   25/2640 (1320 KB) :   27/2906 (1453 KB)
Track #47 :   24/2688 (1344 KB) :   27/2960 (1480 KB)
Track #48 :   24/2736 (1368 KB) :   27/3014 (1507 KB)
Track #49 :   24/2784 (1392 KB) :   27/3068 (1534 KB)
Track #50 :   24/2832 (1416 KB) :   26/3120 (1560 KB)
Track #51 :   24/2880 (1440 KB) :   26/3172 (1586 KB)
Track #52 :   24/2928 (1464 KB) :   26/3224 (1612 KB)
Track #53 :   24/2976 (1488 KB) :   26/3276 (1638 KB)
Track #54 :   23/3022 (1511 KB) :   26/3328 (1664 KB)
Track #55 :   23/3068 (1534 KB) :   26/3380 (1690 KB)
Track #56 :   23/3114 (1557 KB) :   25/3430 (1715 KB)
Track #57 :   23/3160 (1580 KB) :   25/3480 (1740 KB)
Track #58 :   23/3206 (1603 KB) :   25/3530 (1765 KB)
Track #59 :   23/3252 (1626 KB) :   25/3580 (1790 KB)
Track #60 :   23/3298 (1649 KB) :   25/3630 (1815 KB)
Track #61 :   22/3342 (1671 KB) :   25/3680 (1840 KB)
Track #62 :   22/3386 (1693 KB) :   24/3728 (1864 KB)
Track #63 :   22/3430 (1715 KB) :   24/3776 (1888 KB)
Track #64 :   22/3474 (1737 KB) :   24/3824 (1912 KB)
Track #65 :   22/3518 (1759 KB) :   24/3872 (1936 KB)
Track #66 :   22/3562 (1781 KB) :   24/3920 (1960 KB)
Track #67 :   21/3604 (1802 KB) :   24/3968 (1984 KB)
Track #68 :   21/3646 (1823 KB) :   23/4014 (2007 KB)
Track #69 :   21/3688 (1844 KB) :   23/4060 (2030 KB)
Track #70 :   21/3730 (1865 KB) :   23/4106 (2053 KB)
Track #71 :   21/3772 (1886 KB) :   23/4152 (2076 KB)
Track #72 :   21/3814 (1907 KB) :   23/4198 (2099 KB)
Track #73 :   21/3856 (1928 KB) :   23/4244 (2122 KB)
Track #74 :   20/3896 (1948 KB) :   22/4288 (2144 KB)
Track #75 :   20/3936 (1968 KB) :   22/4332 (2166 KB)
Track #76 :   20/3976 (1988 KB) :   22/4376 (2188 KB)
Track #77 :   20/4016 (2008 KB) :   22/4420 (2210 KB)
Track #78 :   20/4056 (2028 KB) :   22/4464 (2232 KB)
Track #79 :   20/4096 (2048 KB) :   22/4508 (2254 KB)


So remember that this is on a nominal "1.44MB" floppy, and using only 80 tracks. With 82 or 84 tracks, we can squeeze a bit more out. But remember those high-numbered tracks are inner-tracks, so the benefit will only be quite small.

What is interesting with the 1581-style approach is that we end up with exactly 2MiB. I have no idea if that is coincidence or part of the "2MB unformatted capacity" that is touted around 3.5" HD disks.  It might well be the latter, as this capacity calculation is based on constant bits per inch.

Now, coming back to "practical land", we can see 32 sectors per track is the most with a 1581-style format, that would allow disks to be written to using normal sector operations, or for about a 10% capacity increase, we need to deal with upto 35 sectors per track.  For those not familiar with floppy formatting, the Amiga squeezes its 10% extra capacity out of disks by having much shorter gaps between the sectors, because it doesn't need to tolerate variation in rotational speed between the drive that formatted the disk, and the drive that is writing to the disk right now.

If you have been reading these blog posts recently, you will know that the C65 DOS writes 71 gap bytes in addition to the 512 data bytes for every sector.  There are also an overhead of another 13 bytes per sector that are unavoidable, to mark where the sector starts and ends, and which sector it is. So, in short, the 1581 uses 13 + 512 + 71 = 596 bytes per sector written.  The Amiga reduces the number of gap bytes, so that it can safely fit an extra sector on.  

Think about it like this: The 1581 requires 596 bytes to write a single sector, and fits 10 on a disk, so needs a track to fit 5,960 bytes on it. If the Amiga wants to fit 11 sectors, it needs to reduce that down to 5,960 / 11 = ~541 bytes.  We know we can get away with as little as 13 + 512 = 525 bytes per sector, and 525 x 11 < 5,960, so the Amiga can fit the extra sector in. But 525 x 12 = 6,300 bytes, which is a bit too much, so this is why the Amiga couldn't fit 12 sectors per track.

Now, back to our situation, if we use that refined information, and scale the 5,960 bytes on the inner-most  track, and needing 525 bytes per sector, its possible we can squeeze an extra sector per track here and there.  But its probably not worth flying quite that close to the wire. 

If we were going to do that, we could just add support for some kind of bizarre "super sector" that fills a whole track with as many bytes as we can.  The CRC16 would then probably not be strong enough anymore, and we would probably want to consider using an even higher data rate and using some appropriate error correction code to handle the kinds of errors that happen on floppy media.  I might do such a thing in the future, but for now, I think its overkill.

The real question is whether we think its worth fitting an extra 10% on a disk in return for not being able to write to it sector-at-once. Or more the point, whether I should make the corresponding disk image format allow for this or not.

I'm really tempted to stick with 2048KB, as it means 32 sectors per track, which is easy to implement in the hardware for track offset calculation, and is just a pleasant round number. 

There is also a kind of half-way house, where we could have 32 sectors on the first 20 tracks, which would be track-at-once, and then have normally writeable tracks after that, to squeeze a few extra KB out of the disk.  This wouldn't require a different disk image format, as I am planning on just making the disk image format be 80 tracks x 32 sectors per track x 2 sides x 512 bytes = 2,560KB, and when on SD card, you would have that full capacity available, and if you write it to a real disk, there is some dead space. But maybe I will just stop whinging about multiplying by 35 in hardware, and make the image format allow for, say, 40 sectors per track = 3200KB total size, in case I come up with future improvements that allow reaching that (like finding out how much we can creep the data rate up, over all ;)

This would mean that the programmer has the responsibility to know which sectors are safe to write to, but I think that's not unreasonable, since there will already need to be some mechanism for changing the data rate based on which track you are on, which leads me back to a particular problem... the directory track.

If we are going to support these disks, we still need to have some magic to make reading the directory track work, in the very least. The loader program could be required to live on that track, or more easily, on track 79, which would be at the normal data rate.

We also have to look at whether we can specify the data rate accurately enough to get all of these track sizes.  The data rate is specified in "40.5MHz cycles per magnetic interval", with normal HD disks at a setting of $28 = 40.  For the outer-most tracks we want 160% of the data rate, so we need 40 / 1.6 = 25 = $19.  So we have 15 different steps along the way, although because they are based on different divisors, they are not equally spaced.

I'll tweak my little program to work out the data rates that are possible on each track from the real ones available, and map the track fitting to those.  Hopefully it won't result in the loss of too many actual sectors on the disk format.  If it does, I could look at changing the way the floppy data rate is calculated from a simple divisor to an accumulator approach, that would allow much more accurate specification. But we will see if it is necessary, first.

So by using the rates we currently have available, we need to allow a tolerance of only 1.7% to get exactly 2048KB on a disk.  If we are strict and require 0% maximum excess data rate, then it drops to 2011KB.  If we were to allow 5% over-rate, then we could fit 2116KB, which says very much diminishing returns to me. These are all for sector-by-sector capacities.  For Amiga-style track-at-once writing, then the capacity would be 2403KB at 5%, 2328KB at 1.7% and 2287KB with strict 0% excess data rate.

If we really want to get that bit extra, then it probably makes much more sense to just use 82 or 84 tracks, which almost all drives can read, which at 1.7% speed tolerance allows up to 2,416KB on an Amiga-style track-at-once disk, and 2,128KB on a 1581-style sector-at-once disk.

So, back to specifying a convenient disk image format, I am going to go for 40 sectors x 85 tracks maximum = 3,400KiB = 3,481,600 bytes. Of course, as described above, only about 2MB of that will be usable on a real floppy, but this inefficiency is the cost of having the MEGA65's SD card floppy emulation logic being able to efficiently handle them, and at the same time, allowing for us to extract some future improved capacity out of the real floppy drive -- although I think 2MB -- 2.2MB on nominally 1.44MB media is still a pretty nice result. 

Thus I will now turn my attention to testing the feasibility of all of this by writing some code to actually master such variable data-rate disks, and making sure that the sectors fit, and that they can be reliably read back.  This will probably also require getting some other folks to do the same, to test repeatability on different drives -- although the fact that we are keeping withing 1.7% of the fairly conservative officially supported data rates of the media gives me a fair bit of confidence.

What I am more interested in finding out, is just how close we actually come to filling tracks with these extra sectors at those rates: Does it all fit at all? Is there enough spare space to try cramming an extra sector or two onto some of the tracks?

To answer these questions, I am refactoring the floppytest.c program, so that I can request the writing of individual tracks, so that I can then try reading the sectors on them back, to find out how many sectors at which data rates fit on a track, to confirm if my back-of-envelope calculations above are correct.

In the process of doing that, I hit a funny bug, which I think is in CC65, where adding a little bit of extra code was causing some unrelated stuff to crash.  In particular, as I refactored out the track formatting code, the track reading code would break, even if I never called the formatting code first.  I've seen funny things like this with CC65 compiled programs before, and don't really know the cause.  But in this case, I could at least see that the generated code was incorrect.

In the end, I worked around it by reducing the amount of code a bit, which was incidentally writing to $C0xx, which can end up on the CC65 C stack, but is at the bottom of the 4KB stack, and shouldn't change what code the generator produced for the track reading code. Anyway, its a bit of a mystery, and I might have to keep an eye on it, to see if it happens more.

Anyway, now that I have code to format a track factored out, I'll start work on code that tries various data rates and counts how many sectors it can fit -- both with gaps for sector-at-once writing, and without them, for Amiga-style track-at-once writing, and the ~10% more sectors it should allow us to fit on each track.

That was easy to get working, and I have confirmed that it is writing tracks without the gaps. I can tell this, because the track read testing program is no longer able to keep up with an interleave of 2, because the sectors come around a bit quicker.

So next step is to make a routine that tries formatting a single track multiple times at multiple data-rates, and then checks which sectors can be read back.

Thinking ahead to a denser coding, I was reminded that what I want to use is RLL2,7 coding, not GCR coding, which actually is no more efficient than MFM.  RLL2,7 coding is a bit funny, because it uses variable length codes for different bit patterns, such as the following:

Input    Encoded

11       1000
10       0100
000      100100
010      000100
011      001000
0011     00001000
0010     00100100 

I also spent a long time trying to find out what the sync mark is for RLL2,7 encoding, and eventually found out from here, that it might be:

 1000100010001000100001001000100010001000

For it to be correct, it has to be impossible to build this by concatenating the pieces above. So let's try, beginning at all possible bit offsets into the sequence:

Starting at bit 0: it would decode as: 

1000 1000 1000 1000 1000 0100 1000 1000 1000 1000

 11   11   11   11   11   10   11   11   11   11

So that decodes, which means it shouldn't be the sync mark.

I eventually found the answer in this thesis:

RLL2,7 cannot generate 100000001001, because the only sequence with four leading zeroes is 00001000, thus it is not possible to get 100000001001 -- super simple :)

Meanwhile, back in the world of MFM encoded disks, I did some more work on working out just how much I can cram on each track using MFM coding with variable data rate for each track.  In fact, I live streamed for a couple of hours this morning working on it.

So now I have a tool that will let me try different data rates on every track, with or without inter-sector gaps, and report the highest number of sectors that could be written and read back.  This takes quite a while to run, as it of course hits bad sectors which have several seconds of timeout.  Writing this, I just realised an efficiency problem with this code, where it would try all sectors, even after one had failed, which would slow things down on the later tracks where we know for sure it can't have all sectors. More the point, if we have any error on a track, we stop at that point, so there is no point continuing after the first error. So I have set that running again now.

What I am also doing now is synthesising VHDL that will allow a complete track format command, without this need to do software feeding of every byte as it goes along.  The reason for this is that I was unable to increase the data rate to the level where I think it should go, and I think the actual problem is that the CC65 compiled code is just too slow to reliably feed the bytes.  So its possible that by moving to hardware-assisted formatting that I will be able to recover those extra potential sectors.  For example, I think I should be able to fit 36 or more sectors on track 0, but can't get above 31.  So the difference is potentially quite large.

In fact, as I discovered on the stream, its possible to cram an extra couple of sectors on a track, so it is probably more like 38 sectors as the maximum compared to the 31 I am seeing now. If I can pull that off, we should be able to get 2,176KiB on a sector-by-sector disk and 2,471KiB on an Amiga-style track-at-once disk.  All of this is also equally relevant for the RLL2,7 encoding, as it still means +50% over whatever we can do on MFM, plus potentially the odd extra sector here or there.  

So a sector-by-sector disk with 3,264KiB (which is 3.34MB using floppy "marketing MB" of 1,024,000 bytes like the 1.44MB standard does), or an Amiga-style track-at-once disk of 3,742KB (= 3.83MB in floppy marketing terms) should be possible.  Thus my dream of outclassing a 2.88MB ED floppy drive with a standard HD drive and media really does look to be in reach -- assuming a lot of things yet to be proven.

But first, its sit back and wait while I synthesise the hardware formatter, and hope that I haven't got too many bugs in it, so that I can know for sure if I have formatted disks properly when testing if they can be reliably read back.

I will probably also pull out the floppy histogram display code again, too, to see how close to overlapping the peaks for the 1.0, 1.5 and 2.0 period buckets of bits are, as a further guide as to whether this will all have a chance of working reliably.

So the hardware formatter is indeed writing things to the disk, but it is mis-calculating the CRCs for the sector headers and bodies.  I'll have to have a think about the best way to test this via simulation, as its likely that one or more bytes are not being included in the CRC, or being counted twice or something.  Past experience tells me its hard to work that out from just staring at the source.  The challenge to simulating this, is that it will take a long time, and requires the sdcardio.vhdl as part of the simulation, rather than just being the MFM encoder.  

Update: I realised that I probably wasn't feeding the CRC engine the bytes I was writing, having mistakenly thought that I had it automatically plumbed, which I didn't. So I'm hopeful that this will work now.

While I wait for that to finish synthesising, let's take another aside in the history of floppy storage, and consider the SFD1001 drive from Commodore: This was a beast in its day: 1MB on DD 5.25" disks, compared to the 1541's 170KB.  Of course it was double sided, so the fairer comparison would have been 340KB.  Also, it was designed for quad-density media, so really 680KB, but DD media tended to work just fine. It achieved this capacity by doubling the number of tracks, and I presume, increasing the data rate of the standard Commodore GCR encoding.  So with ~80 tracks, they got ~500KB per side. We should be able to double that on a 3.5" HD disk -- which we are, with >1MB per side -- but not by much.  So in a sense, what we are doing now with HD 3.5" floppies is not too different to what Commodore did with the SFD1001, except that the SFD1001 officially used media with double the density.

Meanwhile, I finally have the hardware auto-formatter generating correct CRC values, and can now format disks using it. One of the nice things, is that it reduces the time between tracks, allowing the format to complete faster.

So now its time to update floppycapacity.c, so that it uses the hardware-assisted formatting, so that we can see if we can't actually get to the maximum number of sectors we think we should -- somewhere around 36 with Amiga-style track-at-once.  

Unfortunately, it looks like the floppy drive hardware refuses to behave properly if the magnetic interval is less than 30 cycles, i.e., about 40.5MHz/30 = 1.35MHz, compared to the nominal 1MHz that HD floppies use. This is probably because the filtering circuits in the floppy drive itself thinks that anything that fast is noise, not a signal, so is suppressing it.  This is a bit of a blow for our desire for maximum density on the longer tracks, as we can only get 35% more on those tracks, not the 60% that we should be able to get.

Running the test program, this is the rates and numbers of sectors that fit on each track, with sector-gaps, like an 1581:

Tracks 0 -- 12 : Rate = 30 cycles (1.34MHz) : 28 sectors

Tracks 13 -- 25 : Rate = 31 cycles (1.31MHz) : 27 sectors

Tracks 26 -- 43 : Rate = 32 cycles (1.27MHz) : 27 sectors

Tracks 44 -- 47 : Rate = 33 cycles (1.23MHz) : 26 sectors

Tracks 48 -- 68 : Rate = 32 cycles (1.27MHz) : 27 sectors

Tracks 69 -- 75 : Rate = 33 cycles (1.23MHz) : 26 sectors

Tracks 76 -- 84 : Rate = 34 cycles (1.19MHz) : 25 sectors

First up, note that there is something weird with four tracks near the middle of the disk: Its possible that those tracks are just a bit more worn out, as I can't think of any other reason for those four consecutive tracks being worse -- unless its a bit of luck as to what was written there before, but I do wipe the tracks before writing to them.

Second, notice that for some intervals, the number of sectors we can cram on a track doesn't change, even if we drop the bitrate a bit. That's because it might be that one bit rate can fit 27.8 sectors, while the next slower bitrate can fit, say, 27.1 sectors. Those differences might become important when we try again without gaps, as it might just be enough space to fit another sector in.

But if we assume we have to drop to 26 sectors per track from track 44, that gives us:

Tracks 0 -- 12 (13 tracks) @ 28 sectors per side

Tracks 13 -- 43 (31 tracks) @ 27 sectors per side

Tracks 44 -- 75 (32 tracks) @ 26 sectors per side

Tracks 76 -- 84 (9 tracks) @ 25 sectors per side

Remember that PCs put 18 sectors per track for 1.44MB, so this is quite a bit more.  And with those 5 extra tracks, that all adds up to 2,258KB, i.e., 2.26MB "storage industry megabytes", or 1.56x the PC HD standard storage. Whether we are flying too close to the wind with any of these densities, I'm not sure, and only time will tell.

So now let's try it Amiga-style, without gaps between sectors. But first, I have to fix a bug with the hardware-assisted track formatting without inter-sector gaps. I was messing up the CRC calculation again. We should get at least 10% extra, and maybe a few "leap sectors" where there was not quite enough space to put an extra sector on with sector-gaps, but the 2.5 to 2.8 saved sectors helps us cram an extra one in.  

So I'm expecting between 170KB and 255KB extra, pulling us up into the 2.4MB -- 2.5MB range -- funnily enough about where I originally expected, just not for the exact reasons expected: We are limited with our maximum data rate to 1.34MHz, instead of 1.6MHz, but we are fitting a few extra sectors per track, regardless. But let's see what the reality is, after that synthesis completes.

A quick note while that runs, as I think about RLL2,7 encoding, though: We will still be limited to the 1.35MHz maximum pulse rate, so can still only hope for 50% greater density than MFM.  But it does make me think about longer RLL codes that have longer minimum distances between pulses, e.g., RLL4,13, that would allow for doubling the MFM data rate, but will require more accurate timing of pulses. That might let us cram more on the disk.

That has also just reminded me: It is possible that the problem we are hitting at rates above 1.34MHz is not in fact the floppy electronics, but rather the need for aggresive write pre-compensation, so that the gaps come out correctly when they are placed so closely together on the media.  The way to verify this is to read a raw track after formatting it, and see how it looks in terms of raw flux.  If I do that at various speeds, and see how the various transitions are detected (or not), and how early (or late) they appear, I should be able to get some interesting intelligence on this: It might end up being possible to push towards 1.6MHz after all, which would get us a further 18% or so on top of our 2,258KB, which would get us towards 2.65MB while still keeping sector gaps.  The thought is tantalising... But first some sleep.

The quick summary is: 2,493KB without sector gaps, so an extra 235KB, which is near the upper-end of what I was hoping for.  Now for the track-zone break-down:

Tracks 0 -- 10 (11 tracks) : Rate 30 : 31 sectors per side

Tracks 11 -- 24 (14 tracks) : Rate 31 : 30 sectors per side

Tracks 25 -- 76 (52 tracks) : Rate 33 : 29 sectors per side

Tracks 77 -- 84 (8 tracks) : Rate 34 : 28 sectors per side

So now to think about how to test if lack of write-precompensation is the problem, or if it is that the floppy.  Ideally I would look at a waveform of the data when read-back, to see if all the pulses are there, and if they have moved.  If pulses are missing, then its floppy analog electronics, and if its that pulses have moved, then it is magentic physics, and write pre-compensation should be able to fix it.

The trick to viewing the waveforms is that capturing them on the MEGA65 itself is a bit tricky using a program, because we are talking about pulses that can occur every 30 cycles, which means we need a very tight loop -- tighter than our current loop.  

What we can do, is go back to using DMA to read the FDC debug register $D6A0, which lets us directly read then floppy RDATA line. The DMA will alternate between reading that, and writing to the memory buffer, resulting in a sampling rate of 40.5MHz / 3 cycles (the register takes 2 cycles to read) = ~13MHz, which is a healthy ~10x the expected pulse rate.  We will be limited to a single DMA job of 65536 samples, for ~192K cycles = ~4.9 milliseconds. But that will be more than enough data to see what is happening.  If I run this over tracks recorded at various rates, we should get a clear picture of what is going on.

After chasing my tail on a faulty DMA job definition for a while, I am now collecting some data. For track 0, with a rate of 40.5MHz/30 = 1.35MHz, and using the DMA capture method, I am seeing the data pulses being 8 samples wide.  That means 24 / 40.5MHz = 0.6 microsecond pulse duration, which sets a hard upper limit on the data rate, as otherwise the data pulse will just be continuous, presumably. That means we should be able to detect pulses at up to about 1.69MHz minus a bit, well above the 1.35MHz we have managed.

Let's try formatting a track at rate 26, which should yield 1.56MHz, and thus should have a short gap between each of the data pulses, and we seem in fact to still be getting quite large gaps. Ah: I have of course confused myself a bit here: Because we are using MFM with an MFM rate of ~1MHz, this means an actual maximum flux inversion rate of ~0.5MHz.  Thus we should be able to safely result well above 2MHz MFM rate.  Because I can see the pulses, this makes me think that it might well be the lack of write pre-compensation after all.  So back to drawing those waveforms from the data I read.

We are getting increasingly shifted data pulses, as the pulse frequency increases, i.e., the distance between the pulses reduces. In short, we need to implement write pre-compensation, if we want to support faster data rates.

I dug around on the internet for a while, but could not find any clear explanations of the write-precompensation algorithms used on MFM floppies.  I think it might be because these algorithms were held as trade-secrets by the floppy controller manufacturers, so we will need to do some reverse-engineering.  What I was able to discover, is that the write-precompensation shift can be positive or negative in time, and is dependent on the last several magnetic bits written. 

I also read somewhere that the need for write-precompensation isn't because the magnetic domains shift during writing, but rather the result of the magnetic fields on the head during reading causing an apparent shift in bit position. This probably means that the required shift depends on past and future bits to be written.

What I think I will do, is create a way to write various known bit patterns to the floppy, and then read them back, and from that, work out how to shift the pulses to be properly lined up. 

I did a bit of this on the live-stream this morning, and basically came to the conclusion that I would need a look-up table to make any sense of it.  I did eventually find the following interesting simple set of rules:

https://marc.info/?l=classiccmp&m=137609524004633

That then led me to the software for controlling a CatWeasel floppy controller, like this here:

https://github.com/qbarnes/cw2dmk/blob/master/dmk2cw.c

The important bit is here:

if (len == 2 && nextlen > 2) {
   adj = -(precomp * CWHZ/1000000000.0);
} else if (len > 2 && nextlen == 2) {
   adj = (precomp * CWHZ/1000000000.0);
} else {
   adj = 0.0;    

}

In other words, place a pulse early if the last pulse was short, and the next one is long, or place the pulse late, if its the other way around.  That looks like a very simple rule.  Because we are doing variable data rates, we will need to vary the amount of precompensation based on the track number, but finally I have a bit of a starting point.

So I might have a quick go at modifying my model to try implementing that, and see how well it matches with the measured values.  I can make the model match the first 180 or so clocked pulses that I logged. After that it goes to rubbish for a while, because the data seems to have rubbish in it after that, and is not well aligned between the various sample logs I made at each speed.  Basically there isn't enough data there to be sure that I have properly generalised out the rules.

In the end, I have gone back to the first link and the table-based approach, but allowing for "small" and "big" corrections, depending on the difference in the duration between inversions. The code looks like this:

          case f_write_buf is
            when "0101000" =>
              -- short pulse before, long one after : pulse will be pushed
              -- early, so write it a bit late              
              f_write_time_adj <= to_integer(write_precomp_magnitude_b);
            when "1001000" =>
              -- medium pulse before, long one after : pulse will be pushed
              -- early, so write it a bit late
              f_write_time_adj <= to_integer(write_precomp_magnitude);              
            when "0001000" =>
              -- equal length pulses either side
              f_write_time_adj <= 0;
              
            when "0101010" =>
              -- equal length pulses either side
              f_write_time_adj <= 0;
            when "1001010" =>
              -- Medium pulse before, short one after : pulse will be pushed late,
              -- so write it a bit early
              f_write_time_adj <= - to_integer(write_precomp_magnitude);              
            when "0001010" =>
              -- Long pulse before, short one after
              --
              f_write_time_adj <= - to_integer(write_precomp_magnitude_b);

            when "0101001" =>
              -- Short pulse before, medium after
              f_write_time_adj <= to_integer(write_precomp_magnitude);
            when "1001001" =>
              -- equal length pulses either side
              f_write_time_adj <= 0;
            when "0001001" =>
              -- Long pulse before, medium after
              f_write_time_adj <= - to_integer(write_precomp_magnitude);

            when others =>
              -- All other combinations are invalid for MFM encoding, so do no
              -- write precompensation
              f_write_time_adj <= 0;                
          end case;

We only need cases with a 1 in the middle, as it is the middle bit we are writing, where 1 means a magnetic inversion, and 0 means no magnetic inversion, and we only need precompensation if the pattern either side is asymmetric. I just hope that I have the sign on the corrections correct! If not, I should be able to test with putting negative numbers in the corrections.  We shall see after synthesis has finished, probably in about 24 hours time in reality, as I have long day with my 60km round trip bike ride to work tomorrow, which I am really looking forward to, as I didn't get to ride in earlier in the week, so its too long between rides.

Back at the desk, and had time to look further into this, and record another stream, where I got the pre-compenssation stuff all working, and was able to drop the data rate divisor from 30 to 29, but more importantly, to sustain using divisor 29 over the first 40 tracks, and divisor 30 all the way to track 60.  Compare that with the prior effort without pre-compensation, where we had to drop to divisor 32 by track 26.

What was interesting is that a constant write-precompensation of ~100ns and 200ns respectively for short and long differences in gaps between pulses was pretty much ideal across the whole range of tracks and data rates I tests.

However, I couldn't get anything below divisor 29 to work, and all rates below 32 were sometimes a bit funny.  It occurred to me during the stream that this is most likely because the start-of-track gap bytes are now too short at the higher data rates, and thus the start of the first sector is not actually getting written, thus stuffing the whole track up -- even though the peaks in the histograms indicate that going down to rate 28 or even 26 should be possible.

To resolve this, I am adding more gaps to the start of the track.  But rather than it being wasted, I am going to write a short "Track Information Block" at the start of each track together with the normal 12 start-of-track gap bytes -- but these will always be written at the DD / 720KB data rate, so that it will be a fixed length prefix to the track, thus avoiding any variability caused by differing data rates.  This Track Info Block will contain the divisor used for the rest of the track, as well as flags that indicate if it contains sector gaps, or conversely, is a track-at-once and thus read-only track. I have also reserved a flag that indicates if the encoding is MFM or RLL2,7, and for good measure, I have included the track number.  

So now I just need to get that all working... I have implemented the code to write the Track Info Block, and also to read it back, and use it to set the data rate divisor, but it isn't getting detected.  One of many good reasons to blog about code you are writing, or to do live-streams, for that matter, is that you spend more time thinking about what you are doing and getting your thinking clear.  The revelation just now, is that it might be stuffing up, because I am using multiple data rates on each track, and that the timer loop in the inner MFM encoder might mess up if the divisor is reduced mid-bit, as it might have already counted past that divisor, and will thus have to wrap around.

I can verify if this is the problem, and generally get a better idea of what is going wrong, if I make a test harness that includes the whole sdcardio.vhdl, and feeds it synthetic register writes to command it to format a sector, and then decode what it writes out, to see what it looks like.

As usual, making a test harness turns out to be a very good idea: I almost immediately spotted that I was using $FB clock bytes instead of $FF clock bytes for the Track Info Block bytes. That would certainly have caused problems.  So I have a fix for that synthesising, but while that runs, I might run further in the test harness, and see what I can see. Which I should have done, before synthesising, as I would have picked up the next bug that way, which is that I was switching writing speed before the CRC bytes had a chance to be written out, but I was tired, and had a nap while it was synthesising, instead. But this time I did check, and of course there was no further problem with it ;)

After synthesising I am still seeing a problem, that only one sector is being found, even though there are multiple sectors being written.  In simulation, I can see the various sector headers and sector data fields, and they all have CRCs and are being correctly numbered, so this one is being a bit annoying to debug -- which is a bit frustrating, because it is possibly the last barrier before I can test writing tracks with the safe constant-length lead-in.  Most frustrating!

A bit more digging, and I can read DD format disks just fine -- so its possible the problem lies with the switch to enable the use of variable rate recording, which I did "clean up" a bit before.  That was part of the problem, the part that was stopping it from seeing the sectors. But what remains is that trying to read any of the sectors still just hangs.  I'll first synthesise the fix for selecting the variable rate recording properly, and then revisit it a bit later.

And that has it back to being able to read disks that I have formatted at high density again, but unfortunately it hasn't enabled any higher data rates to work.  It is possible that I can get one or two more, by making all 1.5x pulses slightly longer, as they all seem to be shifted towards the 1.0x pulses, and away from the 2.0x pulses.  I'm not sure of the precise mechanism by which this would be occurring, nor if simply delaying all 1.5x pulse signals a bit will really help, although it seems that it should be worth a try.  Otherwise, I suspect that its time to start working on the RLL encoding, and see what we can wring out of that.

Righty oh. I have just implemented variable delay of the 1.5x pulses, to see if that doesn't help us improve peak resolution at the faster data rates. I suspect we really are now starting to get into the area where other magnetic effects and electronics filter effects in the floppies are coming into play, so this will be the last attempt to increase the data rate with MFM, after which I will switch to working on the RLL stuff, and see how much benefit that gets us in terms of increased capacity. I'm not feeling super confident about it right now, because of the various magnetic effects I am seeing at the current data rate, but I might just be being pessimistic, since the whole point of RLL is that it lowers the effective data rate in the magnetic domain for the same real data rate.  We shall see after it all synthesises.  

The other little fix I have made in this run is to include the number of sectors in the Track Info Block.  I have also rolled in fixes for the hypervisor code and freezer to support the 85 track x 64 sector = ~5.5MB ".D65" HD disk images.

Right, that's built, so now to modify floppytest.c again, to allow fiddling with the setting to allow shifting the 1.5x pulses, and see if that works. Interestingly it helps only if I shift the pulses in the opposite direction -- but the gain is very marginal. To give an idea, here is rate divisor 28 (=$1C), i.e., about 1.45MHz MFM rate, with sensible write pre-compensation settings, and no adjustment to the position of the 1.5x pulses:

We see three peaks for 1.0x, 1.5x and 2,0x MFM intervals, with the 1.5x peak in the middle somewhat nearer the 1.0x peak on the left, which is the problem I have been describing. The WPC 04/04/00 means write pre-compensation of 4 and 4 for small and big gap differences, and 1.5x pulse delay of 0.  If we now advanced the pulse delay in the expected direction, we get something like this:

We can now see the 1.5x peak has split in two, but the extra peak is now further left, not moved to the right as we hoped. So if we reverse the direction by putting a negative correction on the 1.5x pulse position, we get the following, which is quite a bit better:

It is now quite nicely spaced out.  If I move it any further to the right, not only is it worse, but all sorts of other bad effects are happening. Its interesting that there are still quite clearly double peaks in the 1.0x and 1.5x peaks, so there is something else there that can in theory be found and improved, but how much further benefit is possible is hard to tell, and it is a bit of diminishing returns at this point, although each further divisor drop we can achieve at this point, does result in progressively larger amounts of data per track. 

But I am starting to chase my tail a bit here, and conclude that (for the time being at least), that I think we are at the limit of what we can get out of this drive and media using MFM encoding, and that further work should be focused on RLL encoding, to see if that can't get us a substantial further boost for less effort.  For that, you will have to wait for the next blog-post, however.

What I do want to close out in this post though, is making sure that normal DD disk activity still works correctly.  Specifically, when the C65 DOS ROM issues a format command, we don't want the Track Info Block data overriding and causing the tracks to be written at the wrong rate.  

The logic here is a little subtle: We want write operations to the drive to switch to the TIB indicated rate and encoding settings, so that sector writes succeed. And there is something going a bit odd with this, but it might have been a problem with a disk I borked up when it was writing the wrong speed.  A HEADER command with ID supplied, which should do a full format was failing with "75, FORMAT ERROR", but a quick format would succeed.  Trying to save a file on the disk after that would result in messed up sectors, most likely from writing at the wrong data rate. But if I instead do a quick format, then a full format, the format succeeds.  So it was probably just that I had made a complete mess of the disk formatting some tracks at funny rates with incorrect TIBs etc, as I was testing things.

But even after the clean low-level re-format, if I then try to save a file on the disk, the directory is all messed up, so something is still wrong. So I'll have to fix that up, too.  But that will have to also wait for the next blog post, or you will all be stuck reading this one long after the pre-orders have sold-out, because it has got so long ;)


Tuesday, 31 August 2021

Fixing some CPU timing and floppy formatting bugs

In recent posts I reported the progress on getting formatting and write support working for the internal 3.5" floppy drive. However, formatting currently had to be done at 40MHz to work reliably. The cause was that the tight little loop that the C65 DOS does when checking when to write the next byte to the floppy when formatting a track is right on the edge of the speed required at 3.5MHz, but we had some errors in cycle timing for the CPU, that was tipping it over the edge.

Let's start by looking at the DOS's loop when formatting. The same basic loop is used in a few places, and looks like this:

10$ ldy #16          ;write post index gap 12 sync
15$ lda secdat-1,y
    ldx secclk-1,y
20$ bit stata
    bpl wtabort      ;oops
    bvc 20$
    sta data1        ;Always write data before clock
    stx clock
    dey
    bne 15$

So we basically do a BIT on the stata register to check if an error has occured (bit 7 set) or the drive is ready for the next byte (bit 6 set), which is what the BVC does.  So while waiting to see if we need to write the next byte, we do the BIT/BPL/BVC sequence.  If we check the 65CE02 datasheet, we see that those should take 5, 2 and 2 cycles each, for an inner loop of just 9 cycles.

I was mistakenly charging 3 cycles for branches taken, and thus the BVC was taking one more cycle, meaning our inner loop was taking 10 cycles instead of 9.  At 3.5MHz, this means the inner loop was taking about 2.86 microseconds.  This is much faster than the time between bytes on a 720K floppy, where the effective data rate is about 250kbit/sec, or about 32 microseconds, or about 112 cycles, assuming the C65's clock is exactly 3.5MHz (which it isn't, but doesn't matter for our current purposes). 

But we have to pay the cost of the LDA, LDX, STA, STX, DEY and BNE in there as well, which cost 4, 4, 4, 4, 1 and 2 cycles each, for a total extra cost of 19 cycles.  So it should take 19 + 9n cycles for that loop to detect when the next byte has to be written. So really we mean that we have a jitter of about 9 cycles from when the last bit starts to be written, to when we will detect that the next byte is required, and then it costs us another 4 + 4 cycles = 8 cycles to actually write the byte.  

There is also a bit of a fun race condition when providing the next byte, if the clock byte has changed since the previous byte, which means if we are a bit too late, the floppy controller might latch the wrong clock value and mess everything up.  But even without that, it is conceivable that we can end up a tiny bit too late and miss the deadline for the next byte, causing everything to slip by one bit, and making the resulting part of the track unreadable.  And this was happening quite a lot.

Once I worked out that this was the problem, I fixed the branch instructions to not charge the extra cycles when the CPU is at >=3.5 MHz, thus reducing our loop down to the correct 9 cycles. 

This reduced the number of dud sectors being written, but it didn't actually eliminate them. So I had to investigate what was going on with this, as I was pretty sure I had the cycle times correct in the CPU.  Pulling up the 65CE02 datasheet, I double-checked things, and apart from spotting a couple of unrelated timing errors (I was only charging 2 cycles for JSR due to a typo, and there was one of the ROR instructions I also had the timing wrong for, both of which I have fixed now), the BIT and branch instructions were all looking spot-on.  Most strange.

Then it occurred to me that it was a bit weird that BIT $nnnn cost 5 cycles, as it only requires 3 instruction byte cycles, and then one memory read cycle, since it doesn't need to write anything back to memory after.  After all, one of hte 65CE02's stated marketing claims was the removal of almost all "dead cycles", giving it about a 25% speed-up vs the 6502 at the same clock speed.  

But the Commodore datasheet does indeed claim that the BIT instruction in all modes requires an extra cycle.  Even the VICE emulator implements it this way. But I dug around further to see if there were any experimentally derived instruction timings for the 65CE02, and found this page, which suggested that the BIT $nnnn instruction indeed requires only 4 cycles instead of 5. I also found another page that had different timing again for a lot of instructions, such as indicating that TRB/TSB require 6 instead of 5 cycles, but again, indicated just 4 cycles for the BIT $nnnn instruction.

So I am deeply suspicious that the correct timing for the BIT instructions should indeed be one cycle less than the Commodore datasheet indicates.  If fixing this results in perfect formatting at 3.5MHz, it will be very strong circumstantial evidence for this, until we can borrow a real C65 (or one of those Amiga 8-way serial cards that includes a 65CE02 on it), to check.

While I wait for that, there is one other piece of circumstantial evidence in favour of these changes, which is that I had previously established that the CPU had to be about 15% faster than 3.5MHz to reliably format disks correctly.  10 cycles for the inner format loop divided by 115% = 8.7 cycles, i.e., just below the 9 cycles that the Commodore datasheet indicates that this loop should take.  So I'm crossing my fingers that this will fix the problem.  I'll check in the morning when the synthesis has finished.

After some mishaps with the synthesis process, I have a bit stream now, but it is still resulting in a few dud sectors when formatting at 3.5MHz.  So there must still be something a bit fruity going on. I still feel it has to be the CPU timing, as that is the only thing that can cause the sector writing to fail in this way. I know its late bytes, because I have seen the slipped bits when I examine the disks.

Maybe what I have to do, is to buffer 2 bytes instead of 1 in the track format logic, so that if one byte is a bit late, its not a big drama, and the jitter in getting the bytes delivered is very unlikely to result in two consecutive bytes being late, since its only getting ~5 late bytes when formatting an entire disk, i.e., with a probability somewhere around 5 bytes in a million means p(late)  = 0.000002, so the chance of two consecutive late bytes would be something like 0.00000000004, i.e., about once every 10,000 disks.

Actually, looking at the logic, there is also the slim chance that it is a race-condition where a byte gets ignored if arrives exactly when it is needed. Fixing that may well be enough, but since I have the bonnet up, I'll add the extra byte of buffering, as well, so that it should all be 100% stable next time.

The race-condition means that I need to latch the byte_valid signal coming into the MFM encoder, in case that is exactly the cycle when we remove the byte from the buffer, as otherwise the removing the byte from the buffer was taking priority over storing the new byte in the buffer, and thus the new byte would be ignored, and its replacement byte would seem to be late, because a byte had been ignored.

It will be a bit funny if that turns out to be the problem, because it will still have caused me to find and fix some CPU instruction timing bugs that may well have been upsetting other things.  In particular, BIT taking an extra cycle may have been causing C64 disk fast loaders problems, because BIT is often used to check the state of lines on the IEC bus.  But anyway, the main thing is that it will hopefully just be fixed.

Well, the synthesis completed, but now it isn't writing anything to disk.  A bit of poking around in simulation revealed several logic errors in my addition of an extra byte of buffer, which I have now fixed, such that simulation of MFM writing is working again.  So its time for another synthesis run while I sleep.  If I get up on time, I might have the chance for a quick peek to see if it has worked.  Fingers crossed I have it right this time!

And it has. I'd include a screen-shot here, but for some reason it's refusing to grab one using the m65 tool, which I will have to investigate separately.  

I'm going to have to investigate why the m65 tool has broken, so in the meantime, I'll include this image I took earlier, which shows what it should look like, since every blog post must contain at least one image:



Sunday, 15 August 2021

Write and format support for the internal floppy drive

In the last blog post where I was attacking the floppy stuff, i reported how we can now almost format floppies using the C65 ROM, with two outstanding problems:

1. The CPU has to be at 40MHz due to some problem, which I suspect is a CPU instruction timing problem.

2. Actually writing sectors on the real floppy has not been tested and confirmed working as yet.

 [Ed: And 3., Reading sectors sometimes fails, as I discovered part way through writing this post.]

The first problem can be easily worked around for now, by switching to 40MHz before starting to format a disk, so my focus for the moment is on the 2nd problem.

The first step is to add sector reading and writing functions to my floppytest.c programme, so that I can ask for a sector to be written over, and then see what is in there.

I already have the various bits parts and pieces of sector reading routines in floppytest.c, so its just a case of pulling them all together.  I have had to go through the logic in the read/write test harness, as there was continuous stepper chatter happening after trying to read a sector. It turns out that the problem there was that I was not masking bit 7 off the "last track seen under the head" register of the MEGA65's floppy controller.  That bit gets set when a track number matches the requested track, and so track 2 becomes $82 when it matches the request, and that was then not == 2, and so my little auto-stepper routine (not to be confused with the auto-tune autonomous track stepping feature of the MEGA65's VHDL) would think it didn't match, and that it needed to go to a lower track (since it thought it was on track 129 and wanted track 2), but then it would see track 1, and step back again, which would result in $82 appearing there again, and thus rinse repeat.

With that fixed, I now have simple keyboard cursor key controls to select the track to read, and then pressing R will cause it to try to read sector 0 of side 0 of that track.  This now more or less works.  But so far, I am not displaying the sector contents anywhere. 

I am using an 80x50 text mode, but with a funny setup using nybl colour mode so that I need only 4 bits per graphics pixel, which means 16 pixel wide chars, and thus only 40 chars per row.  What I would like is to be able to display 80 columns of text.  I want to keep that funny mode for the other functions in floppytest.c, and reuse as many of the support routines for that as possible, so that the compiled code doesn't end up too large. 

But I might just need to make a new text mode and simple printing routines nonetheless, as it would be great to be able to show a whole 512 byte sector.  With 80 columns we can show 16 chars per row, if we want to show both hex and a glyph.  512 / 16 = 32. So an 80x50 text mode would be ideal here.  That would need 4,000 bytes of screen RAM, which might cause us a bit of grief, as we are using $C000 upwards for that, and I have a vague recollection that putting stuff in $C800-$CFFF with a CC65 compiled programme causes problems.  It might actually just be $CF00-$CFFF that's the problem (yes, its because CC65 uses $Cxxx as the C stack, so the top part is quite unsafe, but the rest is okay, if you don't nest function calls too deeply, or use too many stack-allocated variables).  We can work around that by making a video mode that is less than 50 rows high.

Next step is to write the bit of code to display the contents of the floppy sector buffer.  Together with the new video mode, this gives us a display like this:

Again, excuse the glitched characters due to bugs in the screen capture workflow. But you can see the general idea of displaying the 512 bytes of data nicely.

The current problem is that when I read a sector, the data is not appearing in the sector buffer.  I pre-fill the buffer with $BD bytes, and then after reading the sector, that same data is read back out, as shown above, thus confirming that the sector from the disk is not being read properly, or that I am reading from the wrong buffer memory or something along those lines.

So by booting to BASIC 10 and accessing a D81 disk image there, I have been able to confirm that the buffer is indeed where I expect, at $FFD6C00-DFF.  And if I command a read after C65 DOS has been reading things, then I can indeed read the sector.

The problem here was that I wasn't resetting the buffer pointers for the FDC with a command $01 to $D081.  With that, I can now read sectors, but there is something fishy going on with the flags: The BUSY flag in particular seems to clear well before the sector has finished reading in. This means I am frequently trying to display the sector contents before they have loaded in. The question is what is clearing the BUSY flag early.

The FDCReadingSector state in the state machine has no path that clears the f011_busy signal, without also stopping the sector reading process. So that seems unlikely to be the problem.   The main loop does check for the busy_countdown to reach zero, and clears f011_busy at that time. However, the busy_countdown does get set when stepping tracks -- and this was the cause of that problem: When you issue a command $10 or $18 to $D081 to step the head in or out a track, you have to wait for the BUSY flag to clear, rather than wait a fixed time.

That now has it reliably reading whole sectors, however not always the correct sector.  This might in fact be the root cause of the mystery hanging of the C65 DOS when using real disks sometimes, in that if it reads the wrong sector data, it might cause strange things to happen. But this will require further investigation.  But first, lets see what we get read into the sector buffer when repeatedly trying to read track 39 sector 1 side 0, which should be the header sector of the disk.

Let's start by mapping out what we *should* see in each of the 10 sectors on side 0  of track 39. The disk I am using has a couple of files, followed by 255 files numbered 1 to 255, which makes it easy to see where directory material has come from.  So here are the ten 512 byte sectors from the disk:



 


And so on.

Some of the bad reads have seeming utter junk in them, like this read of sector 10:

It has lots of $BD bytes at the end, which tells me that the sector read probably failed part-way through.  My best guess here is that the sector read gets desynchronised with the bytes of MFM data on the disk, since it still seems to show a periodicity of 32 bytes, matching the size of the 1581 directory entries. In fact, it is possible it is shifted by some number of bits. The correct sector should look like this:

So where we have $05s in our bad sector, we have similar numbers of $A0s in the proper data. Thus if we shift the bad sector data right by 3 bits, it should line up correctly, and we would then get:

0000: X0 33 32 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0 A0
0010: 00 00 00 00 00 00 00 00 00 01 00 00 00 81 27 23

Which actually matches part of track 39 side 0 sector 4 in the image above, from offset $0125 in that sector.

This is quite, quite weird, as it would seem to suggest that our MFM decoding logic is detecting sync marks and a seemingly valid sector header and data section marker in this random part of the disk.

Look at other cases where bad reads are occurring, I can see much the same strange results, where it looks like the disk is just being read from some random point, rather than the correct starting point.

I'll have to stop for now and try to think about what could be causing this: Is it that the MFM decode logic is getting false positives for sync markers, and not properly checking the track, side and sector values, and thus starts decoding some wrong section of the track as a sector. Or is it that it begins, but then somehow freezes or something, causing a delay before it actually starts reading the data, and thus the disk has rotated in the meantime, causing wrong data to be read.

One thing that might be useful is to time when the read occurs versus the index hole or the sector to be read, to see if it isn't some kind of race condition based on where we are around the disk.

Similarly, I could introduce a requirement that the data section be located within a very short period of the header section being found, so that the controller can't miss the body of the sector, and then just keep going until it hits the data section of the following sector.

Anyway, I'm back at looking at this after almost three weeks of other activity.  Where I might actually start is by modifying my disk formatting code to write predictable material in all the sectors, so that when I try to read them, I know exactly where the material will have come from.  I will then also know the interleave order of the tracks, as well, so if I read material from the wrong sector, I will know where that is in relation to the correct data. I should even be able to automate characterisation of what is going wrong, perhaps.

Right-oh. I now write "Track N, Sector M, Offset $X." repeatedly in each sector, where N and M are the correct track and sector numbers, and X is the offset of the start of the string in the sector. So in this way I can determine whether we are just reading part of the sector, or reading some of the wrong sector.

I have also improved my mfm-decode.c program, so that I can see if the floppy controller is reading the correct MFM data for the sectors. And the good news, is that it is reading the correct data, and because of how I collected the data, I can also get a pretty good idea of whether the SYNC flag is being correctly detected, and that looks fine, too.

In the process of doing all that, I discovered I was calculating the CRC of data sectors incorrectly when formatting the disk, so I have corrected that, so it's now all behaving as expected. I can now get nice displays like this, when it reads a sector correctly:

And then when it gets it wrong, I am seeing something like this:

So now rather than having to guess what has happened, I can clearly see that it is Track 0, Sector 1 that is being read, but that its missing the first 2/3rds of the sector in this case.  These sorts of errors are occurring with the actual reading of the sector beginning at seemingly random offsets, so I don't think it is a 

I'm keen to get an idea of how often this happens, so I am instrumenting the program to keep reading the sector until it spots an error, as it only seems to happen about 10% or so of the time.  However, if I make the program repeatedly read the sector until an error occurs, it seems that an error never occurs: It tries 63 times to get a bad read, and doesn't hit one.   I can even re-trigger it several times, and it still doesn't happening. 

So with this extra check, the failure rate is dropping to <1%, which makes no sense at all, since it is only retrying the read to get a failure if the first read succeeded, which is exactly the same action -- doing a single read -- that it was doing before.

Ah, finally I have it failing now on one of those runs, and I am not at all surprised to discover that it was on the first try.  So there is something about asking it to start to read a sector, but which is neatly avoided when doing the retries.

I already have the head on the track, and the motor running and the drive selected, so there shouldn't be any functional difference, but clearly there is.

I am forming a theory: I think it has to do with the rotational position of the disk under the head somehow.  With some testing, I was able to confirm that this is indeed the problem.  What is happening, is that if the head is already over the target sector, the "last sector under the head" flags correctly indicate the target sector, and so the FDC starts reading the data. But it is possible that the head has already advanced over some fraction of the sector, so we only read the tail-end of the sector.  This has about a one-in-the-number-of-sectors-on-a-track chance of happening, so about a 10% chance, which is what I was seeing.

This should be fairly easy to fix, as I just make sure that the floppy read logic waits for the "last sector under the head" flag to change to the requested sector, rather than doing a simple level triggered action.

I have a fix for that synthesising now. That will take a while, so I might have a look back at writing of sectors. In theory, that should be more or less working, although I have spotted a problem with the CRC calculation that I have just fixed, and will try synthesising after the read fix has synthesised: It was not including the three $A1 sync mark bytes in the CRC.

But since we have the means to write a sector, and to read a track of raw data and decode it, I can do some before and after comparisons, to see if the sector writing is really happening.  So let's give that a whirl.

Here is how track 0, sector 1, side 0 looks after formatting it with the contents described previously:

(539 bytes since last sync)
SYNC MARK (3x $A1)
SECTOR HEADER: Track=0, Side=1, Sector=1, Size=2 (512 bytes) CRC ok
(42 bytes since last sync)
SYNC MARK (3x $A1)
SECTOR DATA:
  0000 : cf 46 46 53 45 54 20 24 30 2c 20 d4 52 41 43 4b    Offset $0, Track
  0010 : 20 30 2c 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46     0, Sector 1. Of
  0020 : 46 53 45 54 20 24 31 45 2c 20 d4 52 41 43 4b 20    fset $1e, Track
  0030 : 30 2c 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46 46    0, Sector 1. Off
  0040 : 53 45 54 20 24 33 44 2c 20 d4 52 41 43 4b 20 30    set $3d, Track 0
  0050 : 2c 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46 46 53    , Sector 1. Offs
  0060 : 45 54 20 24 35 43 2c 20 d4 52 41 43 4b 20 30 2c    et $5c, Track 0,
  0070 : 20 d3 45 43 54 4f 52 20 31 2e 20 cf 46 46 53 45     Sector 1. Offse
  0080 : 54 20 24 37 42 2c 20 d4 52 41 43 4b 20 30 2c 20    t $7b, Track 0,
  0090 : d3 45 43 54 4f 52 20 31 2e 20 cf 46 46 53 45 54    Sector 1. Offset
  00a0 : 20 24 39 41 2c 20 d4 52 41 43 4b 20 30 2c 20 d3     $9a, Track 0, S
  00b0 : 45 43 54 4f 52 20 31 2e 20 cf 46 46 53 45 54 20    ector 1. Offset
  00c0 : 24 42 39 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45    $b9, Track 0, Se
  00d0 : 43 54 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24    ctor 1. Offset $
  00e0 : 44 38 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43    d8, Track 0, Sec
  00f0 : 54 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 46    tor 1. Offset $f
  0100 : 37 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    7, Track 0, Sect
  0110 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 31    or 1. Offset $11
  0120 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0130 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 33    or 1. Offset $13
  0140 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0150 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 35    or 1. Offset $15
  0160 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0170 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 37    or 1. Offset $17
  0180 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  0190 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 39    or 1. Offset $19
  01a0 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  01b0 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 42    or 1. Offset $1b
  01c0 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
 01d0 : 4f 52 20 31 2e 20 cf 46 46 53 45 54 20 24 31 44    or 1. Offset $1d
  01e0 : 36 2c 20 d4 52 41 43 4b 20 30 2c 20 d3 45 43 54    6, Track 0, Sect
  01f0 : 4f 52 20 31 2e 20 00 00 00 00 00 00 00 00 00 00    or 1. ..........
CRC ok
(539 bytes since last sync)
SYNC MARK (3x $A1)
SECTOR HEADER: Track=0, Side=1, Sector=2, Size=2 (512 bytes) CRC ok
(42 bytes since last sync)
SYNC MARK (3x $A1)

In short, we have a perfectly good sector with correct CRCs on the header and body following the formatting. 

But after we write the sector, things don't look so good. I have increased the output of the program, so that I can get some better clues as to what is going on. In particular, I have it printing out every byte -- sync mark or not -- so that I can see what's happening. This is what I see:

Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $00 $00 $01 $02 $ca $6f $13 $93 $93 $93 $93 $93 $93 $93 $93 $93 $93 $80 $00 $00 $00 $00 $00 $28SECTOR HEADER: Track=0, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(25 bytes since last sync)
Sync $A1 x #1
Sync $A1 x #2
 $d7 $52 $54 $45 $20 $4f $54 $41 $4b $30 $20 $45 $54 $52 $31 $20 $49 $45 $30 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00

So we see the sector header ok. That's expected, because we don't start writing until after that has passed under the head.  Then we see the $4E padding bytes following the sector header are messed up, being $93 instead. I'm guessing that this is because we are writing slightly out of phase. That's okay, actually, because the sync bytes should get it back into phase when the come. That's their purpose, after all. But we are only seeing two sync mark bytes, and then we are seeing $D7 instead of $FE to mark the start of the data sector.

Despite that, there is actually pretty good news in this, because (a) it is writing something, and (b) I can see the string I wrote, or rather, I can see parts of it, having been written.  So our write path is close to working.

Let's look at those data bytes, and compare them to what I try to write to them, which is "Written to track %d, sector %d, side %d".  The data is written as a PETSCII string, so lower case letters will be in the range $41 to $5A, so our data bytes:

$52 $54 $45 $20 $4f $54 $41 $4b $30 $20 $45 $54 $52 $31 $20 $49 $45 $30

correspond to the string "rte otak0 etr1 ie0". That looks suspiciously to me like every other character of the string... That could also explain why we are seeing only two sync mark bytes, if we are advancing two bytes at a time, we might be writing sync mark #1 and sync mark #3, and skipping sync mark #2.

I have a fix for that synthesising now.

While that builds, I am testing the fix for sector reading, and that seems to have done the trick, so that's a great relief. I can now even run the dreaded "sequential program load test disk" that Deft supplied me with, which used to crash after reading only 5 or 6 of the couple of hundred files on the disk. As I type it is up to file 168 and counting, and indeed it now completes successfully!

So now to just wait for the bug fixes to the sector writing to synthesise, and then see how that looks.

And it doesn't work. It looks like it is totally butchering up the entire track.  It also takes a long time to think it has written the sector, as in long enough for several rotations of the disk.  Thus I'm suspecting that the counter that keeps track of when the MFM writer is ready for the next byte is messing up. So I'll add a couple of debug registers that let me watch that during the writing.  I might also try simulating it as well, as that might get a faster answer as to what is going wrong. It's a bit frustrating, as it now feels super close to having working sector writing.

Another day, and fresh eyes to look at the problem.  I found and fixed a couple of bugs that were responsible for this mis-behaviour.  I now have it to the point where is writing what looks more or less like a valid sector, such that I can at least read it back:

A raw read of the track reveals that we have things more or less under control:

Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $00 $00 $01 $02 $ca $6f $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $27 $00 $00 $00 $00 $00 $00 $
00 $00 $00 $00 $00 $00 $51
SECTOR HEADER: Track=0, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(43 bytes since last sync)
Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fb
 $d7 $52 $49 $54 $54 $45 $4e $20 $54 $4f $20 $54 $52 $41 $43 $4b $20 $30 $2c $20 $53 $45 $43 $54 $4f $52 $20 $31 $2c $20 $53 $49 $44 $45 $20 $
30 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
...
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $55 $83 $4e $4e $4e $4e $4e $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $24 $
24 $24 $24 $24 $24 $14
SECTOR DATA:
  0000 : d7 52 49 54 54 45 4e 20 54 4f 20 54 52 41 43 4b    Written to track
  0010 : 20 30 2c 20 53 45 43 54 4f 52 20 31 2c 20 53 49     0, sector 1, si
  0020 : 44 45 20 30 00 00 00 00 00 00 00 00 00 00 00 00    de 0............
  0030 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
  ...
  01f0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
CRC FAIL!  (included field = $5583, calculated as $4f0f)
(539 bytes since last sync)
Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe


So we can see a few things here:

1. The gap bytes are $27 instead of $4E, i.e., they are shifted one bit to the right. That is, we are skipping one bit before writing them, and so end up out of phase.  This really doesn't matter, and I don't care.

2. The sector data is correctly written, and can be correctly decoded! Yay!

3. BUT it has the incorrect CRC.  That I should fix up.

3. The gap bytes after change from $4E to $24 part way through, as we switch from the sync bytes we wrote out, to the ones provided on the formatted track already. Again, not a problem that I'm worried about.  The gaps are there to get rubbish in them.

4. The following sector is correctly identified without problem.

So of all that, I really just need to fix the CRC calculation stuff.  

The first step there, is to figure out what is actually doing with the CRC. Is it doing the CRC one one too many bytes?  Is it skipping the first byte or one or more of the sync marks etc?

Taking a close look at the VHDL, I realised it was not including the $A1 sync mark bytes or the $FB data marker in the CRC calculation. I was also writing the two bytes of the CRC out in reverse order. I'm now synthesising a fix for those problems.

But while I wait for that, I'm really tempted to try to use the HEADER command of BASIC 10 to format a disk, as it should now work if I run the CPU at 40MHz to work around the bug there.  It chugs through formatting the disk, but then fails with a 27,READ ERROR,40,0.  That error indicates a CRC error in the sector header.  

Digging through the C65 DOS source code, this could in fact be triggered by the CRC of a sector being bad. It checks $D082, and if bit 3 is set, then it fails with a 27 READ ERROR:

;* 'CheckEr'   Check for CRC/RNF status following read of currently selected drive
;*
;* If an error is detected the error handler is JuMPed to.  Preserve Z register.

CheckEr
        jsr BusyWait            ;Busywait for op to complete
        lda stata
        and #bit4+bit3          ;Check RECORD NOT FOUND and CRC
        beq noerror             ;    branch if no errors
        cmp #bit4               ;    what is the error?
        beq 10$
        bcs 20$

        ldx #5                  ;       DOS error # 23 (data CRC)
                .byte $2c
10$     ldx #9                  ;       DOS error # 27 (header CRC)
                .byte $2c

So I probably just have to be patient, and wait for that new bitstream to finish synthesising.  Which it now has, and the CRC calculation is still wrong. Now to try to work out how it is wrong, again.

Turns out when I did that fix above, I told the CRC logic to expect a byte, but didn't actually give it the value, so it is reading rubbish instead, and thus generating a bogus CRC value.  So we shall have to wait for yet another synthesis run, which means another day or two delay, as I have to be up in 8.5 hours time for work.

It's now tomorrow, and the synthesis has completed.  So first thing is to check whether we are now generating valid CRCs when writing the sectors, which we seem to be:

SECTOR DATA:
  0000 : d7 52 49 54 54 45 4e 20 54 4f 20 54 52 41 43 4b    Written to track
  0010 : 20 30 2c 20 53 45 43 54 4f 52 20 31 2c 20 53 49     0, sector 1, si
  0020 : 44 45 20 30 00 00 00 00 00 00 00 00 00 00 00 00    de 0............
  0030 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
  ...
  01f0 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
CRC ok


But if I try to format a disk, I am still getting the 27 READ ERRORs:



So let's see if we are setting the CRC error flag when reading the sector: Yes, we are correctly setting it.  On closer examination of the DOS code above, I can now see that the 27 READ ERROR is if the "RNF" = Request Not Found bit is set, which happens if the sector header can't be found, or equivalently, has a CRC error, as matches the error code according to Commodore.

Prodding about a bit more, I am noticing something funny: Reading from track 0 in my test program seems to work, but reading from track 39, the directory track, is resulting in this kind of error.  Oddly, it looks like the sector is found, even though the error indicates that it is not found.  I'll do a low-level read of that track to see if its been butchered by writing sectors to it.

Nothing looks amiss with it, which is a bit odd.  I'm formatting the disk again with my test utility, and will try reading the sectors on track 39 again, and see if that works... and it does. I can read sectors on that track fine now.  

But trying to use the HEADER command to do a quick-format, that preserves the low-level formatting on the disk doesn't seem to work properly, either.  The fresh BAM etc don't get written. Writing sectors and formatting to D81 images works fine, so I assume the floppy controller is getting the correct information it needs to write them.  But I don't know if the quick format code tries to read before writing or not.  

So let's focus on reading sectors from disks formatted using the C65 HEADER command.  Reading from track 0 works fine, but not from any other track. But if I try a disk that was formatted in a catweazle -- or one I have formatted using my test utility -- then they read fine.  So I'm guessing that there is something subtle difference with the way the C65 ROM formats the disks, and that this is tickling our floppy controller into not accepting the sector header. 

The fact that track 0 can be read fine feels like it should be a significant clue, but I'm not yet sure what it tells me.

So let's compare what we have in the sector header for track 1, sector 1, side 0 on a disk formatted by the C65 DOS, and thus doesn't work, vs the catweazle formatted disk that does work:

C65 DOS:

Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $00 $00 $01 $02 $ca $6f $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00SECTOR HEADER: Track=0, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(45 bytes since last sync)

Catweazle disk:

Sync $A1 x #1
Sync $A1 x #2
SYNC MARK (3x $A1)
Sync $A1 x #3
Data field type $fe
 $01 $00 $01 $02 $bc $db $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00SECTOR HEADER: Track=1, Side=0, Sector=1, Size=2 (512 bytes) CRC ok
(41 bytes since last sync)

Well, the obvious thing here, is that the C65 DOS has written track 0, not track 1.

Assuming that I have correctly seeked to the track, and that I aren't just seeking one track wrong. But if that were the case, then I'd expect to see the catweazle disk not working. But I want to be 100% sure.

Ah... I think I have an idea of what is going on now: Track zero has not been formatted. I think the C65 ROM somehow ends up starting to format from track 1, instead of track 0, so it looks like the track numbers are wrong. 

I have indeed confirmed that it starts formatting from track 1, instead of track 0, and thus all the track numbers are out by one, and track 0 doesn't actually get formatted. The challenge is now to work out how on earth this happens.

The format code in the C65 DOS looks like this:

FormatDisk
    jsr cleardchange
    lda #2
    sta sz            ;sector size = 512
    ldx cdrive        ;drive number
    lda #$80        ;force INIT to bump
    sta curtrk,x
    sta last_drive        ;forget previous accesses            [910129] FAB
    jsr InitCtlr        ;init controller
    jsr CheckProt        ;check write protect status
    bcs wtabort        ;   can't continue- protected

    lda $d011        ;Disable VIC                    [910815] FAB
    sta rc8            ;    to be restored later
    and #%01101111
    sta $d011        ;    blank to suspend VIC DMA's

    lda #0            ;Set starting track (0-79, initially zero)
    sta tt1

fmttrk    lda #0            ;Set starting side (0-1, initially zero)
    sta sd
    lda #side
    tsb control        ;  (1581 has physical/logical swapped)

fmtside    jsr BusyWait
    lda #1            ;Set starting sector (1-10, initially one)
    jsr docrc        ;CRC track
    jsr wttrk        ;Format track
    lda sd
    bne 10$
    inc sd            ;Flip to second side
    lda #side
    trb control        ;  (1581 has physical/logical swapped)
    bra fmtside

10$    lda tt1
    cmp #79            ;Check max track
    bcs fmtdone        ;  yes- Done
    inc tt1            ;  no-    Move to the next track

20$    jsr CheckDC        ;Check for disk change
    beq 30$            ;  no-    Continue
    jsr setnodrv        ;  yes- set no drive flag        [900822] FAB
;    lda #0
;    sta command        ;???? to get rid of "nobuf"        [910203]
    jsr hed2ts
    lda rc8            ;Reenable VIC                [910815] FAB
    and #%01111111        ;    must keep RC8 low
    sta $d011
    ldx #12            ;DOS error #75 format error        [910420]
    bra nderror

30$    lda #stout        ;Step head out to next track
    sta command
    jsr SettleHead        ;Delay 24ms
    bra fmttrk        ;  continue


fmtdone
    lda rc8            ;Reenable VIC                [910815] FAB
    and #%01111111        ;    must keep RC8 low
    sta $d011
;    lda #0
;    sta command        ;???? to get rid of "nobuf"        [910203]
    ldx cdrive
    lda #bit7        ;????force a bump
    sta curtrk,x        ;(have to- head is now at track 80, DOS thinks it's at 1)

    bit iobyte        ;Verify?                [911008] FAB
    bpl done        ;    no
    jsr VerifyDisk        ;    yes


done    ldx jobnum        ;Flag job complete
    lda #0            ;                    [901127]
    sta jobs,x
    clc
wtabort    rts

The important bit right now is InitCtlr, which seeks to track 0 as part of its function, but that seems to work fine.  I tried to find the matching code in the ROM binary, so that I could put a break point on it, and then single step through the track stepping code. However, it seems that the latest MEGA65 patched C65 ROMs have refactored that part of the code.  So I will try it with one of the unpatched ROMs, and see if that works. It might be that the patched DOS code, which has only been tested using D81 disk images, might possibly have a bug with formatting or seeking on real disks.

Nope, the problem was the auto-stepper I added to the FDC ages ago to work around a bug we had with stepping was to blame. So I have removed that, and will resynthesise. But I can disable the auto-stepper temporarily via a register for testing, which results in a disk that is is formatted from track 0, instead of track 1, and thus doesn't have two track 0s. But formatting still fails for some reason. 

Maybe it is the disk side flag being inverted or something like that?  But if it were that simple, then if I formatted a disk using my test program, it would work for a quick format after, and it doesn't.

I did spot one error I had, which is that my test program was formatting with sector numbers 0 to 9 instead of 1 to 10, which I have fixed. But even with that, doing a quick format fails.  What is interesting, is that the quick format *thinks* it has worked, but ends up not actually writing the various sectors to the disk.  So that's what I'll have to investigate tomorrow.

I woke up a bit early, so have squeezed in a bit of investigation:  It looks like the floppy controller was calling the virtualisation hypervisor trap when writing to real disks as well as doing the actual write. This very likely causes the RNF signal to pulse as the hypervisor says it doesn't know about any disk image or virtualised access for that disk, which the C65 DOS notices, and thus fails.  So I have fixed that, and will synthesise a fix, and check it when I get back from work.

Well, that might have been _a_ problem, but its not the root cause problem.  I have taken to making an instrumented C65 DOS build, and I can confirm that attempting to write sectors in that is what is causing these errors. 

Further, whatever it is that the C65 DOS does before trying to write, it causes later attempts to write to also fail.

So we finally have a clue of sorts, and now my challenge is to follow it. So let's try to collect some evidence to help us infer what is happening:

1. Both the CRC Error and Request Not Found bits get set in this situation.

2. The head is on the correct track, and I can use the $D6A3-5 registers to see the target sector appearing periodically under the head.

3. I can write to the exact sector (Physical track 39, sector 1, side 0) from my floppy test program, so its not the disk, and its not that the floppy controller can't write to sectors, or can't write to that exact sector.

And I think I might have just found the root cause:

My test program sets the "use real floppy drive" bit, but doesn't clear the "use disk image" bit, and there is a piece of logic in the sector write code that doesn't correctly deal with the situation when the "use real floppy drive" bit is set: It returns an immediate Request Not Found if the "use disk image" bit is clear, erroneously thinking that someone is trying to write to a drive that has no disk image, and thus has nothing to write to.

The good thing is that I can immediately test this by setting the "use disk image" bit while keeping the "use real floppy drive" bit, and then trying to format a disk.

And, it does indeed enable sector writes to happen, so I need to fix that logic.

The sector completes, and I get a nice friendly READY back, without any errors, but when I try to look at the directory of the disk, something is borked. I'm suspecting the "swap" bit of our floppy controller is not behaving in a bug compatible way with the C65's floppy controller.

To check this, I have read out the header and BAM sectors from the disk I just formatted, so that I can compare them with those of the known-good disk:

Remember that the physical sectors are 512 bytes, so there are two 256 byte logical sectors in each: Thus we see the header in the first half of the first sector, and then we are seeing two BAM sectors, and a $FF $FF track sector indicator in the 4th logical sector, which should be the first directory sector, which I am suspecting is wrong. So lets look at what a good disk has in that one.

And we see that it has $28 $04 in the track/sector link bytes, and because that disk is not empty, we see the directory entries in there, as well.

So I am guessing that something is going wrong with writing the two $FF bytes into the sector buffer.  To get some more clues, I have just checked what appears on a D81 image when I format it using the HEADER command, and it looks like this:

There we have $00 $FF as the track/sector link bytes, and what I think is an end of directory marker.

Now this is rather mysterious, as the same code path is used to populate the floppy sector buffer in both cases, and so if there was a problem with populating the buffer, it should show up on both, but it doesn't.

Trying to reproduce the problem, it seems to not be being run -- unless my patch to instrument the code has broken it.  And it is now producing a half-valid disk, on which I can run a DIR command, and it correctly reports 3160 BLOCKS FREE, but doesn't display the header line:


So there is obviously still something wrong. But I think I'll leave it for the moment, until I have confirmed that I have the logic error in sector writing fixed, so that I can do a clean format, confident that no sector writes are being missed or messed up.

I am loading up the new bitstream at the moment, and will give that a try.  At the same time, I am thinking about the funny thing that formatting D81 images works fine, but not real disks. This says to me that there should not be any problem with the format code preparing incorrect contents of sectors.  Rather, my feeling is that some sector write must still not be happening, or something like that. So I might format a D81 and then compare those track 39 sectors.

Ah, that's interesting: Even with that last fix to try to fix the "am I using a disk image or a real drive" logic, if I only set the "I am using a real drive" bit, formatting still fails.  BUT if I do as before, and set both, then it seems to write to the real disk properly.  I'm just running a complete format again, to verify that it is really happening. If that works, then it means I have some other spot where the logic is messed up for real drive vs disk image. 

Looking at the logic, I think I can see the problem. Here is what I had:

if f011_ds="000" and ((diskimage1_enable or use_real_floppy0)='0'
   or (f011_disk1_present and use_real_floppy0) ='0'
   or (f011_disk1_present ='1' and use_real_floppy0 = '0' and f011_disk1_write_protected='1'))
then

  -- Fail with Request Not Found
end if;

What we want is when diskimage1_enable=0, f011_disk1_present=0 and use_real_floppy0=1 to NOT cause a request not found.  

Can you spot the problem?

The line or (f011_disk1_present and use_real_floppy0) ='0' is the problem: If either of those signals are zero, then it will be activated, and cause the Request Not Found to be triggered, so it should instead be or (f011_disk1_present or use_real_floppy0) ='0'

So I'll resynthesise that. 

But in the meantime, I can set both bits as I did yesterday, and it should work, which it _mostly_ does. What I mean by that, is that I can format the disk, and no error is reported, but when displaying the directory listing of a disk, the header line is not shown, or has some corruption in it. I'm also wondering if it isn't behaving differently based on whether the CPU was set to fast or slow during the format.

Yes, confirmed that if the CPU is at 40MHz, then we get a disk no header line, and if we format it with the CPU slow, then we get a mostly valid disk, but with the ID field borked up.  I'll grab a copy of the header sector in both cases, and see play spot the differences.

This is the header sector that was from a "fast CPU" format, and doesn't display the header line:

 0000: 28 03 44 00 46 41 53 54 43 50 55 a0 a0 a0 a0 a0  (cD`FASTCPU               
 0010: a0 a0 a0 a0 a0 a0 d0 39 a0 1b 44 a0 a0 00 00 00        P9 {D  ```          
 0020: 01 51 00 00 00 00 00 00 00 00 00 00 00 00 00 00  aQ``````````````          
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````
 ...
 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````          
 

Then the same for slow CPU, with the disk ID set to "89", except that this has also messed up in the same way this time. Let's grab an image of that sector as well:

 0000: 28 03 44 00 53 4c 4f 57 43 50 55 a0 a0 a0 a0 a0  (cD`SLOWCPU               
 0010: a0 a0 a0 a0 a0 a0 d0 39 a0 1b 44 a0 a0 00 00 00        P9 {D  ```          
 0020: 01 51 00 00 00 00 00 00 00 00 00 00 00 00 00 00  aQ``````````````          
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````          
 ...
 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````    

 So I'll try again to get one where the header appears... which it's now refusing to do. So I'll just format the D81 image, and compare the above to that.

 0000: 28 03 44 00 4d 45 47 41 36 35 a0 a0 a0 a0 a0 a0  (cD`MEGA65                
 0010: a0 a0 a0 a0 a0 a0 44 4a a0 31 44 a0 a0 00 00 00        DJ 1D  ```          
 0020: 01 51 00 00 00 00 00 00 00 00 00 00 00 00 00 00  aQ``````````````          
 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````          
 ...      
 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ````````````````   

I have underlined the key differences: The DOS version and disk ID fields are different. In particular, the first byte of the disk ID is getting borked when we are writing to a real disk: I asked for 89 = $38 $39, but we got $D0 $39. Also, the first byte of the DOS version is $1B instead of $31. 

Those are almost certainly the cause of the problems, but the big question is how on earth this happens: The code that prepares the sector doesn't know if it is writing to a real disk or a D81 file.  We know that writing to sectors itself also works, so it seems like all the necessary steps are working.  So I might have to resort to instrumenting the sector write code in the C65 DOS, so that I can take a look at the sector buffer when it goes to write, and see if the contents there really are correct or not, and then whether the write succeeds or not.  I'll attack that after the new bitstream has finished synthesising, and I have run a few errands.

Bitstream is cooked, and I can indeed now select the real drive, and format a disk as above, without having to set the D81 image present flag, so that's a good step forward.  And now when I format, I am getting a header line, just with the borked disk ID and DOS version fields.  Again, its the first byte of those two byte fields that is the problem:

The code that builds the header sector looks like this, including the bit of debug code I added that should flash the border and wait for me to press a key before proceeding:

20$ jsr intdsk  ;init disk
 jsr initdr  ;clear directory only
 lda dskver  ;Get the disk's format type
 beq 30$   ;No version is okay
 cmp fmttyp  ;Is it our disk formating method?
 bne vnerr  ;wrong version #

30$
* PGS DEBUG XXX
zoop
 inc $d020
 lda $d016
 beq zoop
 sta $d016
 
 lda jobnum
 tay
 asl a
 tax
 lda #dsknam  ;set ptr to disk name
 sta buftab,x
 ldx filtbl
 ldz #27
 jsr trname  ;transfer cmd buf to bam
 ldy #0
 sty dirbuf  ;reset lsb
 ldx drvnum  ;      [900415]
 lda dirtrk,x
 sta (dirbuf),y  ;directory track
 iny
 lda #sysdirsec
 sta dirst,x  ;      [900415]
 sta (dirbuf),y  ;link to first dir blk
 iny
 lda fmttyp  ;Get our formating method
 sta dskver  ;Make it the current one
 sta (dirbuf),y  ;And this disk's format method
 iny
 lda #0   ;null
 sta (dirbuf),y
 ldy #22   ;skip name
 txa   ;      [900415]
 asl a
 tax
 lda dskid,x  ;Just in case it was never formatted
 bne 40$
 lda #'D'
40$ sta (dirbuf),y
 iny
 lda dskid+1,x
 bne 45$
 lda #'J'
45$ sta (dirbuf),y
 iny
 lda #160  ;shifted space
 sta (dirbuf),y
 iny
 lda dosver  ;Send out this dos's version number
 sta (dirbuf),y
 iny
 lda dskver  ;Get the current disk's formatting
 bne 31$   ;Method, if no number then okay

 lda fmttyp  ;Else use our version instead
31$ sta (dirbuf),y
 iny
 lda #160  ;shifted space
 sta (dirbuf),y
 iny
 sta (dirbuf),y
 iny
 lda #0   ;Pad remainder of the directory
32$ sta (dirbuf),y  ;Sector with zeros
 iny
 bne 32$

It all looks pretty straight-forward. 

Single-stepping through, the bogus format type byte for offset $16 of the header sector is coming from the lda dskid,x instruction.  That location has $D0 in it in this test run, thus causing the first byte of the disk ID in the header sector to be set to $D0.  My guess is that the disk ID is not being fully copied into place, but I will have to confirm this by trying to format a D81 image. In short, it is possible that it is a bug with the C65 DOS, although I don't remember it being a problem on the C65 I had -- but maybe its only present in certain versions of the C65 ROM. But, anyway, let's continue to the other format type field that also has the first byte borked.

So for the format ID, we get $31 $44 = "1D", which is in fact the correct values for the C65, so that's a red herring -- its only the disk ID that is the problem.

And investigating that, it looks like the C65 DOS command buffer has the $D0 byte in it:

:00010200:4E303A504F5441544F2CD03500352A00  N0:POTATO,<$D0>5<NULL>

My suspicion is that the BASIC 10 HEADER command is not copying the ID field in to place correctly, which I can verify by using the form HEADER "NAME,ID" instead of HEADER "NAME",IID, because in the second form BASIC parses the ID field, while in the 1st form it just passes the string intact.  So I am formatting a disk that way, and we will soon discover if it solves the problem.

... and of course the corruption of the DOS version field has chosen this moment to reappear, and thus I get an empty directory listing:


Most frustrating. But I can at least use the floppy test program to read the header sector and see whether the disk ID got written properly or not, which it does:

So at least we know the cause of that one.

Digging through BASIC, I can see that the dosdid bytes get set correctly while parsing the command initially, but then the first one seems to get stomped on at some point before the command is built for dispatch

The BASIC ROM stores the disk ID in $118D and $118E, so I'll keep an eye on those for unexpected modification. The values are still intact when BASIC asks the user if they are sure they want to format the disk, so far, so good.... And in fact, they are still intact when the header command is running. So this looks like the command building stage is where it breaks.

Ok, that's weird: Its fine before and after it builds the DOS command, but the DOS command still gets built wrongly.  Maybe something in the DOS command building breaks it after it has been written correctly? 

Bug found. Here is the faulty code-snippet:


***
rid
*** lda dosdid       ;include id
    sta $0850
    sta dosstr,x
    inx
    lda dosdid+1
    sta $0851
    jsr pgs_debug_key_wait
    bra sdp5         ;always

Can you spot the bug?  Post a comment when you spot it :)

It turns out the other bug was also due to an error in the C65 DOS ROM, which BitShifter kindly found for me quite quickly, once I had used the CPU watch point to trigger when it was being corrupted.  I'm very glad I fixed that facility again :)

So now I am running a HEADER command on a real disk, with disk ID set, at 3.5MHz, and let's see how it goes.  I shall be very pleased if it has the correct disk ID and DOS version info. You know, if I get an output something like this:


Yay! I shall now stop, before anything can go wrong :)

Seriously though, this is a really nice milestone, as it means that we have working read and write and format support for the internal floppy drive.