Sunday 12 September 2021

Adding transparent support for HD floppies

The internal floppy drive in the MEGA65 is actually a standard PC-compatible HD / 1.44MB floppy drive, so while the C65 DOS only understands DD disks, the hardware is capable of more.

While we could just stick to DD media, there are some good reasons to support HD media.  One of the key ones, is that the MEGA65's advanced features mean that its quite conceivable to imagine a game that would like more than 800KB of data on a disk.  In fact, this thought was triggered exactly by @Shallan50k wanting to fit 1MB maps for his kart-racer game for the MEGA65 onto a disk.

There are several challenges that I would like to attack in doing this:

1. The resulting disks should work with the C65 DOS, without modification, at least to the extent of being able to get a directory listing, and load one modest size program from it.

2. The disk format should allow creating disks that can be written to using the MEGA65's floppy controller, if that is the user's intention.

3. For mastering disks for games/software distribution, we don't care about write-ability (at least not on all tracks), but we would really like to be able to cram as much data as possible onto a disk.  By reducing the inter-sector gaps, it is possible to fit more sectors on a track. That's how the Amiga gets 880KB on a DD disk, compared to the 1581's 800KB.

4. We need a common disk image format that can be used for all variants of the above, to ease software development, and make it possible to run these images from SD card.

(1) and (2) are the easiest ones to solve. In fact, I have solved them already, by implementing a 2nd parallel MFM decoder in the MEGA65's floppy controller, that runs at 2x the data rate of the main decoder. As HD disks run at 2x the data rate, this means that we automatically are able to read (but not write) HD formatted disks, without having to modify the C65 DOS -- but the C65 DOS will only see the first 10 sectors on each track, which is totally fine for goal (1), as the directory listing can appear, and we can load upto ~800KB of files from it, by using only the DD-compatible sector numbers.

The floppytest.prg program in the https://github.com/mega65/mega65-tools repository now includes options to format and test HD-formatted disks created in this way, with 20 sectors per side instead of 10 sectors per side, giving a 1600KB fully-writeable disk. 

To be able to write to such disks, you just have to poke $28 into $D6A2, to set the floppy controller to the HD data rate, so that the MFM encoder (of which there is still only one) is looking for HD-formatted sectors.

So that's all solved.

It's with (3) is where it gets more interesting. We could just go Amiga style, and settle for 21 sectors per side per track, and thus get 1760KB per disk. But I know that we can fit more.  For a start, we can go 1541 style, and vary the data encoding rate on different tracks, and fit more sectors onto the outer tracks, since the normal data rate is good enough for the inner-most track.  But before we do that, let's talk for a bit about how a floppy works, and some of the important aspects of magnetic recording that affect us.

Magnetic grain size refers to the size of the individual magnetic domains on the floppy.  For the inner most tracks, fewer particles will pass under the head per second, so we have to use longer magnetic pulses, i.e., store less.  For the outer tracks, more magnetic domains will pass under the head per second, so we can have more data on each track out there.  I'll talk about the relative length of the inner and outer tracks shortly.

Magnetic signal strength is how strong the raw signal from the floppy drive is. It's strength scales with the square of the velocity of the magnetic transitions going past it. So on the inner tracks, the media is passing by more slowly, and the signal will be quite a bit weaker than on those lovely longer outer tracks, where that longer circumference goes past in the same time, i.e., at higher speed.

So in short, we should absolutely be able to cram more sectors on those outer tracks by increasing the data rate.  

By way of comparison, the 1541 that used only 35 tracks varies between 17 and 21 sectors per track. That is, on the outer tracks, it crams in 21/17 = 123% more data than on the inner tracks. And fully half the disk uses that maximum number of sectors per track, as it really is only the innermost few tracks that are so short as to be a problem. In fact, only the first 5 tracks on the 1541 have the minimum 17 sectors per track.  Thus compared with a naive 17 x 35 = 595 sectors, the 1541 actually fits 683 sectors, i.e., almost 115% of the constant-sector-count capacity.

So let's think about what should be possible on an 80 track 3.5" floppy disk. We know that the floppy drive can read a standard 11 sectors per inner-track format, because that's what the Amiga did. So we will scale up from there. I'll come back to how we actually produce such tracks later.

First up, we need to know that track 0 is actually the outermost, i.e., longest track, so we will fit more sectors on lower-numbered tracks, and fewer sectors on the higher-numbered tracks.

This site claims the following about 3.5" disks:

        track spacing: .0074 inches 
	drive track radius from center (inner to outer) 
                 side 0 .9719 inches to 1.5551 inches
                 side 1 .9129 inches to 1.4961 inches
	track width: .115 mm (.0045 inch) after "trim erase" on either side (not confirmed for 4MB format)
		also see erase notes

If this is true, and the difference really is 8 tracks, I find that quite amazing, as it means that we could have had an extra 8 tracks on side 1, and that side 1 should have much worse properties near the inner-most track than side 0 has -- a property that we might be able to exploit.

But for now, let's just assume the smallest benefit, which comes from the largest diameters, i.e., that of side 0: The outer-most track is 1.5551/0.9719 times longer = 160% the length of the inner most track.  I was expecting some nice benefit, but fully 160% is even more than I had anticipated.  This means that we should be able to fit 160% x 11 = 35 sectors on the outermost track, instead of the 18 that a PC fits. Even the Amiga's HD floppy's "cramming" of 22 sectors onto the track starts to look quite lame.

So let me do a bit of a calculation here as to how many sectors we can fit, using Amiga style track-at-once, and also 1581-style sector-at-once fitting. Basically we work out the relative length of the tracks versus the inner-most track, and then scale up the number of sectors to the largest integer that fits.

Let's look at how many sectors each track would fit, and the cumulative number of sectors on the disk to that point using 1581 and Amiga style track writing:

              1581-style (sector)   Amiga-style (track)
Track #00 :   32/  64 (  32 KB) :   35/  70 (  35 KB)
Track #01 :   31/ 126 (  63 KB) :   35/ 140 (  70 KB)
Track #02 :   31/ 188 (  94 KB) :   34/ 208 ( 104 KB)
Track #03 :   31/ 250 ( 125 KB) :   34/ 276 ( 138 KB)
Track #04 :   31/ 312 ( 156 KB) :   34/ 344 ( 172 KB)
Track #05 :   31/ 374 ( 187 KB) :   34/ 412 ( 206 KB)
Track #06 :   31/ 436 ( 218 KB) :   34/ 480 ( 240 KB)
Track #07 :   30/ 496 ( 248 KB) :   34/ 548 ( 274 KB)
Track #08 :   30/ 556 ( 278 KB) :   33/ 614 ( 307 KB)
Track #09 :   30/ 616 ( 308 KB) :   33/ 680 ( 340 KB)
Track #10 :   30/ 676 ( 338 KB) :   33/ 746 ( 373 KB)
Track #11 :   30/ 736 ( 368 KB) :   33/ 812 ( 406 KB)
Track #12 :   30/ 796 ( 398 KB) :   33/ 878 ( 439 KB)
Track #13 :   30/ 856 ( 428 KB) :   33/ 944 ( 472 KB)
Track #14 :   29/ 914 ( 457 KB) :   32/1008 ( 504 KB)
Track #15 :   29/ 972 ( 486 KB) :   32/1072 ( 536 KB)
Track #16 :   29/1030 ( 515 KB) :   32/1136 ( 568 KB)
Track #17 :   29/1088 ( 544 KB) :   32/1200 ( 600 KB)
Track #18 :   29/1146 ( 573 KB) :   32/1264 ( 632 KB)
Track #19 :   29/1204 ( 602 KB) :   32/1328 ( 664 KB)
Track #20 :   29/1262 ( 631 KB) :   31/1390 ( 695 KB)
Track #21 :   28/1318 ( 659 KB) :   31/1452 ( 726 KB)
Track #22 :   28/1374 ( 687 KB) :   31/1514 ( 757 KB)
Track #23 :   28/1430 ( 715 KB) :   31/1576 ( 788 KB)
Track #24 :   28/1486 ( 743 KB) :   31/1638 ( 819 KB)
Track #25 :   28/1542 ( 771 KB) :   31/1700 ( 850 KB)
Track #26 :   28/1598 ( 799 KB) :   30/1760 ( 880 KB)
Track #27 :   27/1652 ( 826 KB) :   30/1820 ( 910 KB)
Track #28 :   27/1706 ( 853 KB) :   30/1880 ( 940 KB)
Track #29 :   27/1760 ( 880 KB) :   30/1940 ( 970 KB)
Track #30 :   27/1814 ( 907 KB) :   30/2000 (1000 KB)
Track #31 :   27/1868 ( 934 KB) :   30/2060 (1030 KB)
Track #32 :   27/1922 ( 961 KB) :   29/2118 (1059 KB)
Track #33 :   27/1976 ( 988 KB) :   29/2176 (1088 KB)
Track #34 :   26/2028 (1014 KB) :   29/2234 (1117 KB)
Track #35 :   26/2080 (1040 KB) :   29/2292 (1146 KB)
Track #36 :   26/2132 (1066 KB) :   29/2350 (1175 KB)
Track #37 :   26/2184 (1092 KB) :   29/2408 (1204 KB)
Track #38 :   26/2236 (1118 KB) :   28/2464 (1232 KB)
Track #39 :   26/2288 (1144 KB) :   28/2520 (1260 KB)
Track #40 :   26/2340 (1170 KB) :   28/2576 (1288 KB)
Track #41 :   25/2390 (1195 KB) :   28/2632 (1316 KB)
Track #42 :   25/2440 (1220 KB) :   28/2688 (1344 KB)
Track #43 :   25/2490 (1245 KB) :   28/2744 (1372 KB)
Track #44 :   25/2540 (1270 KB) :   27/2798 (1399 KB)
Track #45 :   25/2590 (1295 KB) :   27/2852 (1426 KB)
Track #46 :   25/2640 (1320 KB) :   27/2906 (1453 KB)
Track #47 :   24/2688 (1344 KB) :   27/2960 (1480 KB)
Track #48 :   24/2736 (1368 KB) :   27/3014 (1507 KB)
Track #49 :   24/2784 (1392 KB) :   27/3068 (1534 KB)
Track #50 :   24/2832 (1416 KB) :   26/3120 (1560 KB)
Track #51 :   24/2880 (1440 KB) :   26/3172 (1586 KB)
Track #52 :   24/2928 (1464 KB) :   26/3224 (1612 KB)
Track #53 :   24/2976 (1488 KB) :   26/3276 (1638 KB)
Track #54 :   23/3022 (1511 KB) :   26/3328 (1664 KB)
Track #55 :   23/3068 (1534 KB) :   26/3380 (1690 KB)
Track #56 :   23/3114 (1557 KB) :   25/3430 (1715 KB)
Track #57 :   23/3160 (1580 KB) :   25/3480 (1740 KB)
Track #58 :   23/3206 (1603 KB) :   25/3530 (1765 KB)
Track #59 :   23/3252 (1626 KB) :   25/3580 (1790 KB)
Track #60 :   23/3298 (1649 KB) :   25/3630 (1815 KB)
Track #61 :   22/3342 (1671 KB) :   25/3680 (1840 KB)
Track #62 :   22/3386 (1693 KB) :   24/3728 (1864 KB)
Track #63 :   22/3430 (1715 KB) :   24/3776 (1888 KB)
Track #64 :   22/3474 (1737 KB) :   24/3824 (1912 KB)
Track #65 :   22/3518 (1759 KB) :   24/3872 (1936 KB)
Track #66 :   22/3562 (1781 KB) :   24/3920 (1960 KB)
Track #67 :   21/3604 (1802 KB) :   24/3968 (1984 KB)
Track #68 :   21/3646 (1823 KB) :   23/4014 (2007 KB)
Track #69 :   21/3688 (1844 KB) :   23/4060 (2030 KB)
Track #70 :   21/3730 (1865 KB) :   23/4106 (2053 KB)
Track #71 :   21/3772 (1886 KB) :   23/4152 (2076 KB)
Track #72 :   21/3814 (1907 KB) :   23/4198 (2099 KB)
Track #73 :   21/3856 (1928 KB) :   23/4244 (2122 KB)
Track #74 :   20/3896 (1948 KB) :   22/4288 (2144 KB)
Track #75 :   20/3936 (1968 KB) :   22/4332 (2166 KB)
Track #76 :   20/3976 (1988 KB) :   22/4376 (2188 KB)
Track #77 :   20/4016 (2008 KB) :   22/4420 (2210 KB)
Track #78 :   20/4056 (2028 KB) :   22/4464 (2232 KB)
Track #79 :   20/4096 (2048 KB) :   22/4508 (2254 KB)


So remember that this is on a nominal "1.44MB" floppy, and using only 80 tracks. With 82 or 84 tracks, we can squeeze a bit more out. But remember those high-numbered tracks are inner-tracks, so the benefit will only be quite small.

What is interesting with the 1581-style approach is that we end up with exactly 2MiB. I have no idea if that is coincidence or part of the "2MB unformatted capacity" that is touted around 3.5" HD disks.  It might well be the latter, as this capacity calculation is based on constant bits per inch.

Now, coming back to "practical land", we can see 32 sectors per track is the most with a 1581-style format, that would allow disks to be written to using normal sector operations, or for about a 10% capacity increase, we need to deal with upto 35 sectors per track.  For those not familiar with floppy formatting, the Amiga squeezes its 10% extra capacity out of disks by having much shorter gaps between the sectors, because it doesn't need to tolerate variation in rotational speed between the drive that formatted the disk, and the drive that is writing to the disk right now.

If you have been reading these blog posts recently, you will know that the C65 DOS writes 71 gap bytes in addition to the 512 data bytes for every sector.  There are also an overhead of another 13 bytes per sector that are unavoidable, to mark where the sector starts and ends, and which sector it is. So, in short, the 1581 uses 13 + 512 + 71 = 596 bytes per sector written.  The Amiga reduces the number of gap bytes, so that it can safely fit an extra sector on.  

Think about it like this: The 1581 requires 596 bytes to write a single sector, and fits 10 on a disk, so needs a track to fit 5,960 bytes on it. If the Amiga wants to fit 11 sectors, it needs to reduce that down to 5,960 / 11 = ~541 bytes.  We know we can get away with as little as 13 + 512 = 525 bytes per sector, and 525 x 11 < 5,960, so the Amiga can fit the extra sector in. But 525 x 12 = 6,300 bytes, which is a bit too much, so this is why the Amiga couldn't fit 12 sectors per track.

Now, back to our situation, if we use that refined information, and scale the 5,960 bytes on the inner-most  track, and needing 525 bytes per sector, its possible we can squeeze an extra sector per track here and there.  But its probably not worth flying quite that close to the wire. 

If we were going to do that, we could just add support for some kind of bizarre "super sector" that fills a whole track with as many bytes as we can.  The CRC16 would then probably not be strong enough anymore, and we would probably want to consider using an even higher data rate and using some appropriate error correction code to handle the kinds of errors that happen on floppy media.  I might do such a thing in the future, but for now, I think its overkill.

The real question is whether we think its worth fitting an extra 10% on a disk in return for not being able to write to it sector-at-once. Or more the point, whether I should make the corresponding disk image format allow for this or not.

I'm really tempted to stick with 2048KB, as it means 32 sectors per track, which is easy to implement in the hardware for track offset calculation, and is just a pleasant round number. 

There is also a kind of half-way house, where we could have 32 sectors on the first 20 tracks, which would be track-at-once, and then have normally writeable tracks after that, to squeeze a few extra KB out of the disk.  This wouldn't require a different disk image format, as I am planning on just making the disk image format be 80 tracks x 32 sectors per track x 2 sides x 512 bytes = 2,560KB, and when on SD card, you would have that full capacity available, and if you write it to a real disk, there is some dead space. But maybe I will just stop whinging about multiplying by 35 in hardware, and make the image format allow for, say, 40 sectors per track = 3200KB total size, in case I come up with future improvements that allow reaching that (like finding out how much we can creep the data rate up, over all ;)

This would mean that the programmer has the responsibility to know which sectors are safe to write to, but I think that's not unreasonable, since there will already need to be some mechanism for changing the data rate based on which track you are on, which leads me back to a particular problem... the directory track.

If we are going to support these disks, we still need to have some magic to make reading the directory track work, in the very least. The loader program could be required to live on that track, or more easily, on track 79, which would be at the normal data rate.

We also have to look at whether we can specify the data rate accurately enough to get all of these track sizes.  The data rate is specified in "40.5MHz cycles per magnetic interval", with normal HD disks at a setting of $28 = 40.  For the outer-most tracks we want 160% of the data rate, so we need 40 / 1.6 = 25 = $19.  So we have 15 different steps along the way, although because they are based on different divisors, they are not equally spaced.

I'll tweak my little program to work out the data rates that are possible on each track from the real ones available, and map the track fitting to those.  Hopefully it won't result in the loss of too many actual sectors on the disk format.  If it does, I could look at changing the way the floppy data rate is calculated from a simple divisor to an accumulator approach, that would allow much more accurate specification. But we will see if it is necessary, first.

So by using the rates we currently have available, we need to allow a tolerance of only 1.7% to get exactly 2048KB on a disk.  If we are strict and require 0% maximum excess data rate, then it drops to 2011KB.  If we were to allow 5% over-rate, then we could fit 2116KB, which says very much diminishing returns to me. These are all for sector-by-sector capacities.  For Amiga-style track-at-once writing, then the capacity would be 2403KB at 5%, 2328KB at 1.7% and 2287KB with strict 0% excess data rate.

If we really want to get that bit extra, then it probably makes much more sense to just use 82 or 84 tracks, which almost all drives can read, which at 1.7% speed tolerance allows up to 2,416KB on an Amiga-style track-at-once disk, and 2,128KB on a 1581-style sector-at-once disk.

So, back to specifying a convenient disk image format, I am going to go for 40 sectors x 85 tracks maximum = 3,400KiB = 3,481,600 bytes. Of course, as described above, only about 2MB of that will be usable on a real floppy, but this inefficiency is the cost of having the MEGA65's SD card floppy emulation logic being able to efficiently handle them, and at the same time, allowing for us to extract some future improved capacity out of the real floppy drive -- although I think 2MB -- 2.2MB on nominally 1.44MB media is still a pretty nice result. 

Thus I will now turn my attention to testing the feasibility of all of this by writing some code to actually master such variable data-rate disks, and making sure that the sectors fit, and that they can be reliably read back.  This will probably also require getting some other folks to do the same, to test repeatability on different drives -- although the fact that we are keeping withing 1.7% of the fairly conservative officially supported data rates of the media gives me a fair bit of confidence.

What I am more interested in finding out, is just how close we actually come to filling tracks with these extra sectors at those rates: Does it all fit at all? Is there enough spare space to try cramming an extra sector or two onto some of the tracks?

To answer these questions, I am refactoring the floppytest.c program, so that I can request the writing of individual tracks, so that I can then try reading the sectors on them back, to find out how many sectors at which data rates fit on a track, to confirm if my back-of-envelope calculations above are correct.

In the process of doing that, I hit a funny bug, which I think is in CC65, where adding a little bit of extra code was causing some unrelated stuff to crash.  In particular, as I refactored out the track formatting code, the track reading code would break, even if I never called the formatting code first.  I've seen funny things like this with CC65 compiled programs before, and don't really know the cause.  But in this case, I could at least see that the generated code was incorrect.

In the end, I worked around it by reducing the amount of code a bit, which was incidentally writing to $C0xx, which can end up on the CC65 C stack, but is at the bottom of the 4KB stack, and shouldn't change what code the generator produced for the track reading code. Anyway, its a bit of a mystery, and I might have to keep an eye on it, to see if it happens more.

Anyway, now that I have code to format a track factored out, I'll start work on code that tries various data rates and counts how many sectors it can fit -- both with gaps for sector-at-once writing, and without them, for Amiga-style track-at-once writing, and the ~10% more sectors it should allow us to fit on each track.

That was easy to get working, and I have confirmed that it is writing tracks without the gaps. I can tell this, because the track read testing program is no longer able to keep up with an interleave of 2, because the sectors come around a bit quicker.

So next step is to make a routine that tries formatting a single track multiple times at multiple data-rates, and then checks which sectors can be read back.

Thinking ahead to a denser coding, I was reminded that what I want to use is RLL2,7 coding, not GCR coding, which actually is no more efficient than MFM.  RLL2,7 coding is a bit funny, because it uses variable length codes for different bit patterns, such as the following:

Input    Encoded

11       1000
10       0100
000      100100
010      000100
011      001000
0011     00001000
0010     00100100 

I also spent a long time trying to find out what the sync mark is for RLL2,7 encoding, and eventually found out from here, that it might be:

 1000100010001000100001001000100010001000

For it to be correct, it has to be impossible to build this by concatenating the pieces above. So let's try, beginning at all possible bit offsets into the sequence:

Starting at bit 0: it would decode as: 

1000 1000 1000 1000 1000 0100 1000 1000 1000 1000

 11   11   11   11   11   10   11   11   11   11

So that decodes, which means it shouldn't be the sync mark.

I eventually found the answer in this thesis:

RLL2,7 cannot generate 100000001001, because the only sequence with four leading zeroes is 00001000, thus it is not possible to get 100000001001 -- super simple :)

Meanwhile, back in the world of MFM encoded disks, I did some more work on working out just how much I can cram on each track using MFM coding with variable data rate for each track.  In fact, I live streamed for a couple of hours this morning working on it.

So now I have a tool that will let me try different data rates on every track, with or without inter-sector gaps, and report the highest number of sectors that could be written and read back.  This takes quite a while to run, as it of course hits bad sectors which have several seconds of timeout.  Writing this, I just realised an efficiency problem with this code, where it would try all sectors, even after one had failed, which would slow things down on the later tracks where we know for sure it can't have all sectors. More the point, if we have any error on a track, we stop at that point, so there is no point continuing after the first error. So I have set that running again now.

What I am also doing now is synthesising VHDL that will allow a complete track format command, without this need to do software feeding of every byte as it goes along.  The reason for this is that I was unable to increase the data rate to the level where I think it should go, and I think the actual problem is that the CC65 compiled code is just too slow to reliably feed the bytes.  So its possible that by moving to hardware-assisted formatting that I will be able to recover those extra potential sectors.  For example, I think I should be able to fit 36 or more sectors on track 0, but can't get above 31.  So the difference is potentially quite large.

In fact, as I discovered on the stream, its possible to cram an extra couple of sectors on a track, so it is probably more like 38 sectors as the maximum compared to the 31 I am seeing now. If I can pull that off, we should be able to get 2,176KiB on a sector-by-sector disk and 2,471KiB on an Amiga-style track-at-once disk.  All of this is also equally relevant for the RLL2,7 encoding, as it still means +50% over whatever we can do on MFM, plus potentially the odd extra sector here or there.  

So a sector-by-sector disk with 3,264KiB (which is 3.34MB using floppy "marketing MB" of 1,024,000 bytes like the 1.44MB standard does), or an Amiga-style track-at-once disk of 3,742KB (= 3.83MB in floppy marketing terms) should be possible.  Thus my dream of outclassing a 2.88MB ED floppy drive with a standard HD drive and media really does look to be in reach -- assuming a lot of things yet to be proven.

But first, its sit back and wait while I synthesise the hardware formatter, and hope that I haven't got too many bugs in it, so that I can know for sure if I have formatted disks properly when testing if they can be reliably read back.

I will probably also pull out the floppy histogram display code again, too, to see how close to overlapping the peaks for the 1.0, 1.5 and 2.0 period buckets of bits are, as a further guide as to whether this will all have a chance of working reliably.

So the hardware formatter is indeed writing things to the disk, but it is mis-calculating the CRCs for the sector headers and bodies.  I'll have to have a think about the best way to test this via simulation, as its likely that one or more bytes are not being included in the CRC, or being counted twice or something.  Past experience tells me its hard to work that out from just staring at the source.  The challenge to simulating this, is that it will take a long time, and requires the sdcardio.vhdl as part of the simulation, rather than just being the MFM encoder.  

Update: I realised that I probably wasn't feeding the CRC engine the bytes I was writing, having mistakenly thought that I had it automatically plumbed, which I didn't. So I'm hopeful that this will work now.

While I wait for that to finish synthesising, let's take another aside in the history of floppy storage, and consider the SFD1001 drive from Commodore: This was a beast in its day: 1MB on DD 5.25" disks, compared to the 1541's 170KB.  Of course it was double sided, so the fairer comparison would have been 340KB.  Also, it was designed for quad-density media, so really 680KB, but DD media tended to work just fine. It achieved this capacity by doubling the number of tracks, and I presume, increasing the data rate of the standard Commodore GCR encoding.  So with ~80 tracks, they got ~500KB per side. We should be able to double that on a 3.5" HD disk -- which we are, with >1MB per side -- but not by much.  So in a sense, what we are doing now with HD 3.5" floppies is not too different to what Commodore did with the SFD1001, except that the SFD1001 officially used media with double the density.

Meanwhile, I finally have the hardware auto-formatter generating correct CRC values, and can now format disks using it. One of the nice things, is that it reduces the time between tracks, allowing the format to complete faster.

So now its time to update floppycapacity.c, so that it uses the hardware-assisted formatting, so that we can see if we can't actually get to the maximum number of sectors we think we should -- somewhere around 36 with Amiga-style track-at-once.  

Unfortunately, it looks like the floppy drive hardware refuses to behave properly if the magnetic interval is less than 30 cycles, i.e., about 40.5MHz/30 = 1.35MHz, compared to the nominal 1MHz that HD floppies use. This is probably because the filtering circuits in the floppy drive itself thinks that anything that fast is noise, not a signal, so is suppressing it.  This is a bit of a blow for our desire for maximum density on the longer tracks, as we can only get 35% more on those tracks, not the 60% that we should be able to get.

Running the test program, this is the rates and numbers of sectors that fit on each track, with sector-gaps, like an 1581:

Tracks 0 -- 12 : Rate = 30 cycles (1.34MHz) : 28 sectors

Tracks 13 -- 25 : Rate = 31 cycles (1.31MHz) : 27 sectors

Tracks 26 -- 43 : Rate = 32 cycles (1.27MHz) : 27 sectors

Tracks 44 -- 47 : Rate = 33 cycles (1.23MHz) : 26 sectors

Tracks 48 -- 68 : Rate = 32 cycles (1.27MHz) : 27 sectors

Tracks 69 -- 75 : Rate = 33 cycles (1.23MHz) : 26 sectors

Tracks 76 -- 84 : Rate = 34 cycles (1.19MHz) : 25 sectors

First up, note that there is something weird with four tracks near the middle of the disk: Its possible that those tracks are just a bit more worn out, as I can't think of any other reason for those four consecutive tracks being worse -- unless its a bit of luck as to what was written there before, but I do wipe the tracks before writing to them.

Second, notice that for some intervals, the number of sectors we can cram on a track doesn't change, even if we drop the bitrate a bit. That's because it might be that one bit rate can fit 27.8 sectors, while the next slower bitrate can fit, say, 27.1 sectors. Those differences might become important when we try again without gaps, as it might just be enough space to fit another sector in.

But if we assume we have to drop to 26 sectors per track from track 44, that gives us:

Tracks 0 -- 12 (13 tracks) @ 28 sectors per side

Tracks 13 -- 43 (31 tracks) @ 27 sectors per side

Tracks 44 -- 75 (32 tracks) @ 26 sectors per side

Tracks 76 -- 84 (9 tracks) @ 25 sectors per side

Remember that PCs put 18 sectors per track for 1.44MB, so this is quite a bit more.  And with those 5 extra tracks, that all adds up to 2,258KB, i.e., 2.26MB "storage industry megabytes", or 1.56x the PC HD standard storage. Whether we are flying too close to the wind with any of these densities, I'm not sure, and only time will tell.

So now let's try it Amiga-style, without gaps between sectors. But first, I have to fix a bug with the hardware-assisted track formatting without inter-sector gaps. I was messing up the CRC calculation again. We should get at least 10% extra, and maybe a few "leap sectors" where there was not quite enough space to put an extra sector on with sector-gaps, but the 2.5 to 2.8 saved sectors helps us cram an extra one in.  

So I'm expecting between 170KB and 255KB extra, pulling us up into the 2.4MB -- 2.5MB range -- funnily enough about where I originally expected, just not for the exact reasons expected: We are limited with our maximum data rate to 1.34MHz, instead of 1.6MHz, but we are fitting a few extra sectors per track, regardless. But let's see what the reality is, after that synthesis completes.

A quick note while that runs, as I think about RLL2,7 encoding, though: We will still be limited to the 1.35MHz maximum pulse rate, so can still only hope for 50% greater density than MFM.  But it does make me think about longer RLL codes that have longer minimum distances between pulses, e.g., RLL4,13, that would allow for doubling the MFM data rate, but will require more accurate timing of pulses. That might let us cram more on the disk.

That has also just reminded me: It is possible that the problem we are hitting at rates above 1.34MHz is not in fact the floppy electronics, but rather the need for aggresive write pre-compensation, so that the gaps come out correctly when they are placed so closely together on the media.  The way to verify this is to read a raw track after formatting it, and see how it looks in terms of raw flux.  If I do that at various speeds, and see how the various transitions are detected (or not), and how early (or late) they appear, I should be able to get some interesting intelligence on this: It might end up being possible to push towards 1.6MHz after all, which would get us a further 18% or so on top of our 2,258KB, which would get us towards 2.65MB while still keeping sector gaps.  The thought is tantalising... But first some sleep.

The quick summary is: 2,493KB without sector gaps, so an extra 235KB, which is near the upper-end of what I was hoping for.  Now for the track-zone break-down:

Tracks 0 -- 10 (11 tracks) : Rate 30 : 31 sectors per side

Tracks 11 -- 24 (14 tracks) : Rate 31 : 30 sectors per side

Tracks 25 -- 76 (52 tracks) : Rate 33 : 29 sectors per side

Tracks 77 -- 84 (8 tracks) : Rate 34 : 28 sectors per side

So now to think about how to test if lack of write-precompensation is the problem, or if it is that the floppy.  Ideally I would look at a waveform of the data when read-back, to see if all the pulses are there, and if they have moved.  If pulses are missing, then its floppy analog electronics, and if its that pulses have moved, then it is magentic physics, and write pre-compensation should be able to fix it.

The trick to viewing the waveforms is that capturing them on the MEGA65 itself is a bit tricky using a program, because we are talking about pulses that can occur every 30 cycles, which means we need a very tight loop -- tighter than our current loop.  

What we can do, is go back to using DMA to read the FDC debug register $D6A0, which lets us directly read then floppy RDATA line. The DMA will alternate between reading that, and writing to the memory buffer, resulting in a sampling rate of 40.5MHz / 3 cycles (the register takes 2 cycles to read) = ~13MHz, which is a healthy ~10x the expected pulse rate.  We will be limited to a single DMA job of 65536 samples, for ~192K cycles = ~4.9 milliseconds. But that will be more than enough data to see what is happening.  If I run this over tracks recorded at various rates, we should get a clear picture of what is going on.

After chasing my tail on a faulty DMA job definition for a while, I am now collecting some data. For track 0, with a rate of 40.5MHz/30 = 1.35MHz, and using the DMA capture method, I am seeing the data pulses being 8 samples wide.  That means 24 / 40.5MHz = 0.6 microsecond pulse duration, which sets a hard upper limit on the data rate, as otherwise the data pulse will just be continuous, presumably. That means we should be able to detect pulses at up to about 1.69MHz minus a bit, well above the 1.35MHz we have managed.

Let's try formatting a track at rate 26, which should yield 1.56MHz, and thus should have a short gap between each of the data pulses, and we seem in fact to still be getting quite large gaps. Ah: I have of course confused myself a bit here: Because we are using MFM with an MFM rate of ~1MHz, this means an actual maximum flux inversion rate of ~0.5MHz.  Thus we should be able to safely result well above 2MHz MFM rate.  Because I can see the pulses, this makes me think that it might well be the lack of write pre-compensation after all.  So back to drawing those waveforms from the data I read.

We are getting increasingly shifted data pulses, as the pulse frequency increases, i.e., the distance between the pulses reduces. In short, we need to implement write pre-compensation, if we want to support faster data rates.

I dug around on the internet for a while, but could not find any clear explanations of the write-precompensation algorithms used on MFM floppies.  I think it might be because these algorithms were held as trade-secrets by the floppy controller manufacturers, so we will need to do some reverse-engineering.  What I was able to discover, is that the write-precompensation shift can be positive or negative in time, and is dependent on the last several magnetic bits written. 

I also read somewhere that the need for write-precompensation isn't because the magnetic domains shift during writing, but rather the result of the magnetic fields on the head during reading causing an apparent shift in bit position. This probably means that the required shift depends on past and future bits to be written.

What I think I will do, is create a way to write various known bit patterns to the floppy, and then read them back, and from that, work out how to shift the pulses to be properly lined up. 

I did a bit of this on the live-stream this morning, and basically came to the conclusion that I would need a look-up table to make any sense of it.  I did eventually find the following interesting simple set of rules:

https://marc.info/?l=classiccmp&m=137609524004633

That then led me to the software for controlling a CatWeasel floppy controller, like this here:

https://github.com/qbarnes/cw2dmk/blob/master/dmk2cw.c

The important bit is here:

if (len == 2 && nextlen > 2) {
   adj = -(precomp * CWHZ/1000000000.0);
} else if (len > 2 && nextlen == 2) {
   adj = (precomp * CWHZ/1000000000.0);
} else {
   adj = 0.0;    

}

In other words, place a pulse early if the last pulse was short, and the next one is long, or place the pulse late, if its the other way around.  That looks like a very simple rule.  Because we are doing variable data rates, we will need to vary the amount of precompensation based on the track number, but finally I have a bit of a starting point.

So I might have a quick go at modifying my model to try implementing that, and see how well it matches with the measured values.  I can make the model match the first 180 or so clocked pulses that I logged. After that it goes to rubbish for a while, because the data seems to have rubbish in it after that, and is not well aligned between the various sample logs I made at each speed.  Basically there isn't enough data there to be sure that I have properly generalised out the rules.

In the end, I have gone back to the first link and the table-based approach, but allowing for "small" and "big" corrections, depending on the difference in the duration between inversions. The code looks like this:

          case f_write_buf is
            when "0101000" =>
              -- short pulse before, long one after : pulse will be pushed
              -- early, so write it a bit late              
              f_write_time_adj <= to_integer(write_precomp_magnitude_b);
            when "1001000" =>
              -- medium pulse before, long one after : pulse will be pushed
              -- early, so write it a bit late
              f_write_time_adj <= to_integer(write_precomp_magnitude);              
            when "0001000" =>
              -- equal length pulses either side
              f_write_time_adj <= 0;
              
            when "0101010" =>
              -- equal length pulses either side
              f_write_time_adj <= 0;
            when "1001010" =>
              -- Medium pulse before, short one after : pulse will be pushed late,
              -- so write it a bit early
              f_write_time_adj <= - to_integer(write_precomp_magnitude);              
            when "0001010" =>
              -- Long pulse before, short one after
              --
              f_write_time_adj <= - to_integer(write_precomp_magnitude_b);

            when "0101001" =>
              -- Short pulse before, medium after
              f_write_time_adj <= to_integer(write_precomp_magnitude);
            when "1001001" =>
              -- equal length pulses either side
              f_write_time_adj <= 0;
            when "0001001" =>
              -- Long pulse before, medium after
              f_write_time_adj <= - to_integer(write_precomp_magnitude);

            when others =>
              -- All other combinations are invalid for MFM encoding, so do no
              -- write precompensation
              f_write_time_adj <= 0;                
          end case;

We only need cases with a 1 in the middle, as it is the middle bit we are writing, where 1 means a magnetic inversion, and 0 means no magnetic inversion, and we only need precompensation if the pattern either side is asymmetric. I just hope that I have the sign on the corrections correct! If not, I should be able to test with putting negative numbers in the corrections.  We shall see after synthesis has finished, probably in about 24 hours time in reality, as I have long day with my 60km round trip bike ride to work tomorrow, which I am really looking forward to, as I didn't get to ride in earlier in the week, so its too long between rides.

Back at the desk, and had time to look further into this, and record another stream, where I got the pre-compenssation stuff all working, and was able to drop the data rate divisor from 30 to 29, but more importantly, to sustain using divisor 29 over the first 40 tracks, and divisor 30 all the way to track 60.  Compare that with the prior effort without pre-compensation, where we had to drop to divisor 32 by track 26.

What was interesting is that a constant write-precompensation of ~100ns and 200ns respectively for short and long differences in gaps between pulses was pretty much ideal across the whole range of tracks and data rates I tests.

However, I couldn't get anything below divisor 29 to work, and all rates below 32 were sometimes a bit funny.  It occurred to me during the stream that this is most likely because the start-of-track gap bytes are now too short at the higher data rates, and thus the start of the first sector is not actually getting written, thus stuffing the whole track up -- even though the peaks in the histograms indicate that going down to rate 28 or even 26 should be possible.

To resolve this, I am adding more gaps to the start of the track.  But rather than it being wasted, I am going to write a short "Track Information Block" at the start of each track together with the normal 12 start-of-track gap bytes -- but these will always be written at the DD / 720KB data rate, so that it will be a fixed length prefix to the track, thus avoiding any variability caused by differing data rates.  This Track Info Block will contain the divisor used for the rest of the track, as well as flags that indicate if it contains sector gaps, or conversely, is a track-at-once and thus read-only track. I have also reserved a flag that indicates if the encoding is MFM or RLL2,7, and for good measure, I have included the track number.  

So now I just need to get that all working... I have implemented the code to write the Track Info Block, and also to read it back, and use it to set the data rate divisor, but it isn't getting detected.  One of many good reasons to blog about code you are writing, or to do live-streams, for that matter, is that you spend more time thinking about what you are doing and getting your thinking clear.  The revelation just now, is that it might be stuffing up, because I am using multiple data rates on each track, and that the timer loop in the inner MFM encoder might mess up if the divisor is reduced mid-bit, as it might have already counted past that divisor, and will thus have to wrap around.

I can verify if this is the problem, and generally get a better idea of what is going wrong, if I make a test harness that includes the whole sdcardio.vhdl, and feeds it synthetic register writes to command it to format a sector, and then decode what it writes out, to see what it looks like.

As usual, making a test harness turns out to be a very good idea: I almost immediately spotted that I was using $FB clock bytes instead of $FF clock bytes for the Track Info Block bytes. That would certainly have caused problems.  So I have a fix for that synthesising, but while that runs, I might run further in the test harness, and see what I can see. Which I should have done, before synthesising, as I would have picked up the next bug that way, which is that I was switching writing speed before the CRC bytes had a chance to be written out, but I was tired, and had a nap while it was synthesising, instead. But this time I did check, and of course there was no further problem with it ;)

After synthesising I am still seeing a problem, that only one sector is being found, even though there are multiple sectors being written.  In simulation, I can see the various sector headers and sector data fields, and they all have CRCs and are being correctly numbered, so this one is being a bit annoying to debug -- which is a bit frustrating, because it is possibly the last barrier before I can test writing tracks with the safe constant-length lead-in.  Most frustrating!

A bit more digging, and I can read DD format disks just fine -- so its possible the problem lies with the switch to enable the use of variable rate recording, which I did "clean up" a bit before.  That was part of the problem, the part that was stopping it from seeing the sectors. But what remains is that trying to read any of the sectors still just hangs.  I'll first synthesise the fix for selecting the variable rate recording properly, and then revisit it a bit later.

And that has it back to being able to read disks that I have formatted at high density again, but unfortunately it hasn't enabled any higher data rates to work.  It is possible that I can get one or two more, by making all 1.5x pulses slightly longer, as they all seem to be shifted towards the 1.0x pulses, and away from the 2.0x pulses.  I'm not sure of the precise mechanism by which this would be occurring, nor if simply delaying all 1.5x pulse signals a bit will really help, although it seems that it should be worth a try.  Otherwise, I suspect that its time to start working on the RLL encoding, and see what we can wring out of that.

Righty oh. I have just implemented variable delay of the 1.5x pulses, to see if that doesn't help us improve peak resolution at the faster data rates. I suspect we really are now starting to get into the area where other magnetic effects and electronics filter effects in the floppies are coming into play, so this will be the last attempt to increase the data rate with MFM, after which I will switch to working on the RLL stuff, and see how much benefit that gets us in terms of increased capacity. I'm not feeling super confident about it right now, because of the various magnetic effects I am seeing at the current data rate, but I might just be being pessimistic, since the whole point of RLL is that it lowers the effective data rate in the magnetic domain for the same real data rate.  We shall see after it all synthesises.  

The other little fix I have made in this run is to include the number of sectors in the Track Info Block.  I have also rolled in fixes for the hypervisor code and freezer to support the 85 track x 64 sector = ~5.5MB ".D65" HD disk images.

Right, that's built, so now to modify floppytest.c again, to allow fiddling with the setting to allow shifting the 1.5x pulses, and see if that works. Interestingly it helps only if I shift the pulses in the opposite direction -- but the gain is very marginal. To give an idea, here is rate divisor 28 (=$1C), i.e., about 1.45MHz MFM rate, with sensible write pre-compensation settings, and no adjustment to the position of the 1.5x pulses:

We see three peaks for 1.0x, 1.5x and 2,0x MFM intervals, with the 1.5x peak in the middle somewhat nearer the 1.0x peak on the left, which is the problem I have been describing. The WPC 04/04/00 means write pre-compensation of 4 and 4 for small and big gap differences, and 1.5x pulse delay of 0.  If we now advanced the pulse delay in the expected direction, we get something like this:

We can now see the 1.5x peak has split in two, but the extra peak is now further left, not moved to the right as we hoped. So if we reverse the direction by putting a negative correction on the 1.5x pulse position, we get the following, which is quite a bit better:

It is now quite nicely spaced out.  If I move it any further to the right, not only is it worse, but all sorts of other bad effects are happening. Its interesting that there are still quite clearly double peaks in the 1.0x and 1.5x peaks, so there is something else there that can in theory be found and improved, but how much further benefit is possible is hard to tell, and it is a bit of diminishing returns at this point, although each further divisor drop we can achieve at this point, does result in progressively larger amounts of data per track. 

But I am starting to chase my tail a bit here, and conclude that (for the time being at least), that I think we are at the limit of what we can get out of this drive and media using MFM encoding, and that further work should be focused on RLL encoding, to see if that can't get us a substantial further boost for less effort.  For that, you will have to wait for the next blog-post, however.

What I do want to close out in this post though, is making sure that normal DD disk activity still works correctly.  Specifically, when the C65 DOS ROM issues a format command, we don't want the Track Info Block data overriding and causing the tracks to be written at the wrong rate.  

The logic here is a little subtle: We want write operations to the drive to switch to the TIB indicated rate and encoding settings, so that sector writes succeed. And there is something going a bit odd with this, but it might have been a problem with a disk I borked up when it was writing the wrong speed.  A HEADER command with ID supplied, which should do a full format was failing with "75, FORMAT ERROR", but a quick format would succeed.  Trying to save a file on the disk after that would result in messed up sectors, most likely from writing at the wrong data rate. But if I instead do a quick format, then a full format, the format succeeds.  So it was probably just that I had made a complete mess of the disk formatting some tracks at funny rates with incorrect TIBs etc, as I was testing things.

But even after the clean low-level re-format, if I then try to save a file on the disk, the directory is all messed up, so something is still wrong. So I'll have to fix that up, too.  But that will have to also wait for the next blog post, or you will all be stuck reading this one long after the pre-orders have sold-out, because it has got so long ;)


2 comments:

  1. One of the interesting and useful things about the classic CAV, fixed clock formats implemented in pure hardware is that they are extraordinarily reliable in use because there's a pretty hard interlock that protects all data except the sector being written.

    The fixed sector size means that after reading a header, everything is strictly controlled by time - if there's a watchdog that prevents writing after time X data physically can't be damaged, and if after writing a sector you verify the results you know the data is there and nothing else was damaged.

    CLV or variable clock breaks that assumption. It needs a hard interlock between head position and clock to give the same guarantee. There's a few reported incidents of C64 disks corrupted via the wrong write rate being used on certain tracks.


    I'd therefore be pretty wary of any complicated or marginal sofyware,implemented scheme for saving important data. For distribution formats, of course it's a different business. Reading should always be safe.

    Says the man whose Atari ST interrupt-driven disk routines used to eat my data...

    ReplyDelete
  2. Wow, that was a read. Thank you :)

    ReplyDelete