Saturday 6 June 2020

Fixing some floppy bugs

Among everything else, we have been looking at some bugs with the MEGA65's internal floppy controller.  It was working most of the time, but would hang in various situations.

The first problem was that it would hang during loading files a long way from the directory track.  I was worried at first, that it was some problem with the MFM decoding not being good enough.  So I wrote a nice test programme that  reads some MFM decode debug info, and shows a histogram of the gap sizes.  This should result in 3 very clear peaks corresponding to the different bit gap lengths that MFM produces. As the test disk I have here is empty, it's quite heavily skewed, but it is still clear that the peaks are there, are well spaced, and nice and narrow:

The third peak here is really just a little blip, because of the disk being empty. But watching multiple frames, I could see that it is there and real. The colours really just indicate the height of the lines.  The left edge of the chart is shorter intervals, and the right side longer intervals.

I'm actually really happy with this nice little tool, as it runs continuously, and you can swap disks etc, and see the content change.  With a formatted disk, it does several frames per second.  This is of course running natively on the MEGA65.

The video mode is 640x200 using a combination of normal text and 16-colour text mode, where each nybl of a character byte encodes one pixel.  This means the whole screen fits in 640x200x0.5 bytes = ~64KB.  Being able to mix normal chars in makes it much easier (and faster) to draw text over the display.  This all contributes to the quite fast performance, even though I wrote it all in CC65, which while quite handy, doesn't really produce particularly fast compiled C code.  One day we will teach it some of the 4510 and 45GS02's tricks to produce MUCH faster output, but that will have to wait for another day.

Meanwhile, if you are curious what the distribution of an unformatted track looks like, here is an example:





We still see indication of the first and second peaks, perhaps because of some factory formatting artefact or something, or from whatever else was on these disks previously.  But we see the distribution is continuous, and thus it isn't really possible to classify any given sample with certainty. The drop-off on the left edge is presumably due the limit of the magnetic medium.

I find the whole low-level signal processing side of floppies is quite fascinating.  One day when I have time, I want to see just how much data I can cram on a 720K or 1.44MB floppy using modern RLL(2,7) encoding, a single really long sector, variable write speed per track to match the varying linear velocity of the tracks, and using modern error correcting codes to enable us to tolerate some errors.  My gut feeling is that at least double the capacity should be possible.  But that, also, will have to wait for another day.

Anyway, having confirmed that the floppy was being read reliably, I started implementing a random track seek function, so that I could see if it was the seeking that was the problem.  And indeed it was: Sometimes the drive would seek either one track too few or one track too many.

I thought about a few different ways to solve this problem. In the end, I opted to include a feature that makes it easier to use the controller:  If the MFM decoder spots a sector on the track under the head, and it doesn't match the track we are expecting, the controller will step the head one track in the correct direction. It's a bit like an auto-tuner for a celebrity who can't reliably stay on the correct notes, but for floppy drives.

This is nice from a programmer's perspective, because you don't have to step the drive to the right track before scheduling a read or a write. It can still be turned off, if you don't want it, but for most use-cases, its probably a good idea.

With auto-tune implemented, the tracking was now quite reliable.  That fixed the problem of loading files that were a long way from the directory track.  However, loading big files would sometimes hang, and Falk working on the MEGA65 GEOS port was also having drive lock-up problems.  So I enhanced the floppy test utility to include a looping read test.  This reproduced the problem, with the test locking up after random amounts of time.  It would also hang completely if the drive was on an unformatted track, or no disk was insterted.

So I went through the read timeout logic with a fine-tooth comb, and found some corner cases and fixed them.  That got it working nicely. Here is the read test working:


The two-tone green is just so that you can more easily work out which track is involved. Track 0 is on the left, and it will try up to track 85, just because I felt like it.

In the process of this, I also discovered that you can't really trust the side byte in the sector header of disks formatted in a 1581, so I modified the controller so that it only checks the track and sector match.


There will probably be a few more wrinkles to sort out in all this, but its a nice step forward.

1 comment: