Wednesday 31 January 2018

Creating some documentation on the MEGA65's enhanced text mode

This is a bit of a work in progress, but I have created some documentation for the enhanced text mode of the MEGA65. It is still missing the 256-colour mode information, but does explain the core of the memory layout when using 2 bytes per char instead of one.

Hopefully I will be able to expand on this over the coming week, however, this post also marks the end of my annual leave, and thus, sadly, progress will almost certainly slow down again for a while. That said, I am very happy with the progress made over the past six weeks or so. I tried to keep a bit of a progress log for my own personal satisfaction, and while I know I missed a bunch of stuff, it is still quite a list of things that have been dealt with:

24DEC17 - 800x600 video modes work
24DEC17 - Joystick input not working
24DEC17 - CPU bug fixed (Boulder Mark etc runs fine)
24DEC17 - b0 command in UART monitor stops CPU on BRK instruction
25DEC17 - $DC00 always reads as zeros
26DEC17 - Fix sprite fine horizontal placement problem
26DEC17 - PDM/Sigma-Delta audio output working
26DEC17 - Kickstart looks for file "NTSC", if not present, switches to PAL
26DEC17 - CIA clock speed is always 1MHz, except in C128 2MHz mode.
26DEC17 - Fix CIA clock halving bug
26DEC17 - $D016 smooth scroll in 320H mode fixed
26DEC17 - CIA is 1MHz even in 2MHz mode
26DEC17 - NumLock on PS/2 / USB keyboard is now "joystick lock" (WASD+shift, cursors+space)
27DEC17 - Got rid of single stray pixel by right border
27DEC17 - Make On-screen-keyboard X position variable via $D619
27DEC17 - C= + <- key to toggle matrix mode on C64 keyboards
27DEC17 - On-screen-keyboard again shows key events
28DEC17 - Stop kickstart screen format getting clobbered when setting PAL/NTSC
28DEC17 - Fix doubled first row of pixels in chargen/bitmap
28DEC17 - Joystick input on MEGA65 r1 PCB
28DEC17 - Stereo channel swap/merge on $D6F9
29DEC17 - Speed up PCB synthesis via map command line option
29DEC17 - Find bug stopping IEC serial working (was driving lines high)
29DEC17 - Find hardware errata; No SRQ line on IEC serial port
29DEC17 - Make joystick controlled quick-synthesising debug rig
29DEC17 - Right SID is now on right channel
29DEC17 - $D612 bits 6-7 allow rotation of joystick inputs by 180 degrees
30DEC17 - $D03x can no longer be written over in VIC-II mode
30DEC17 - IEC serial port works, at least partially
30DEC17 - Digital audio outputs reduced volume to prevent amplifier complaining
30DEC17 - Find and fix timer b and ISR reading bugs in CIAs
31DEC17 - Investigate and fix lack of shift register in CIAs for C65 DOS fast mode
31DEC17 - C65 disk drive check succeeds without shift-register status kludge
01JAN18 - VIC-II Bitmap mode displays last pixel row as though from next char row
01JAN18 - Fix trimming of sprite pixels vertically and when expanded horizontally
01JAN18 - Cartridge port accesses now work
01JAN18 - Cartridge ROMs now map to memory
02JAN18 - International Soccer cartridge works
03JAN18 - Ultimax-Mode cartridges work
04JAN18 - 1351/POT interface proof-of-concept
04JAN18 - 3.5" floppy drive proof-of-concept
05JAN18 - dotclock on cartridge port sped up from 4MHz to ~6.5MHz
06JAN18 - dotclock on cartridge port correctly at 8MHz
06JAN18 - 1351 mouse/POT inputs work correctly, and are in MEGA65 VHDL
07JAN18 - 16-colour sprite mode
07JAN18 - Sprite rendering bugs fixed
08JAN18 - Top row of pixels in sprite can now collide
08JAN18 - Sprite:sprite collision detection much more accurate (one Impossible Mission bug remains)
08JAN18 - Sprite Y position corrected
09JAN18 - Ethernet can received (but frames lack CRC!)
09JAN18 - 1581 repaired, and 3.5" test disks prepared
09JAN18 - Ethernet TX phase correction for r1 PCB (but CRC received by nexys still wrong)
09JAN18 - Ethernet TX and RX fully working on both r1 PCB and Nexys4DDR
10JAN18 - Resistor pull-up pack for floppy interface
10JAN18 - Floppy drive reads from real disk
10JAN18 - Worked out how to decode floppy data
10JAN18 - Real drive tracks F011 activity
10JAN18 - Step and spin-up delays set for real floppy drive
11JAN18 - MFM gap finder
11JAN18 - MFM gap quantiser
11JAN18 - MFM byte decoder
11JAN18 - 1581 sector decoder
11JAN18 - CRC16
11JAN18 - Real floppy mode for F011
12JAN18 - MFM decoder decodes real disks (but MEGA65 doesn't get the data for some reason)
13JAN18 - Amiga/1351 mouse / joystick auto detection
13JAN18 - Copy 1351 mouse status to Amiga mouse status to avoid mouse cursor jumping
14JAN18 - Fix problems with buffer writing when reading from FDC
14JAN18 - $14 F011 command waits instead of steps, as expedted
15JAN18 - Can load sector from FDC into sector buffer (but job doesn't complete properly)
15JAN18 - Ethernet MIIM now working
15JAN18 - FDC sector rotation bug fixed
15JAN18 - sector buffer collapsed to one physical copy (saves 2 BRAMs)
15JAN18 - FDC sector reading now works with C65 ROM (can DIR a real disk)
20JAN18 - Pull in Daniel England's diskmenu improvements
19JAN18 - Fix FDISK bugs for system partition creation
20JAN18 - Implement basic system partition reading
20JAN18 - Use CTRL,ALT and SHIFT to control boot process instead of FPGA switches
21JAN18 - Enhanced DMA list mode (and update FDISK, Kickstart to use it)
21JAN18 - Multiplier in CPU
21JAN18 - Hold RESTORE for HyperTrap, instead of double-tap (no stray NMI caused)
26JAN18 - Config program loads config from SD card
26JAN18 - Psygnosis owl sprite demo
26JAN18 - Fix logo display bug with new enhanced DMA
28JAN18 - Config utility works and saves and loads
28JAN18 - V400 character rendering bugs fixed
28JAN18 - V400 border position bugs fixed
29JAN18 - Virtual D81 F011 reading works again
29JAN18 - VD81 (buffered) writing seems to work

The main thing is that the hardware has been tested well enough to allow for the second revision of the motherboard to be designed and assembled over the next couple of months.

Raster splits in BASIC

I have started trying to generate some documentation on the advanced character modes of the MEGA65's VIC-IV video controller, and in the process decided to write some simple example programs, so that people can more easily see what each feature does, and experiment with them themselves.  To make the tests more accessible, I figured that I would write them in BASIC, rather than assembly language. 

The trick was, if I am changing the character mode, and yet want to have some kind of useful information on the display, then I would need a raster split, so that part of the screen could be in the special mode, but the information in ordinary text mode. But hang on, I am programming in BASIC, not assembly, so how can possibly have a raster split?  The answer is with 50MHz! What used to take a second on a C64 can be done in a single frame on the MEGA65, and what used to take a frame, can be done in just a few rasters.  Thus I figured that if I used the rather under rated WAIT command in C64 BASIC, I could probably get a stable enough raster split for this purpose. It wont be perfect, but it will still be pretty good.

Wait is a bit of a strange beast.  It takes three arguments: The address to wait for certain values on, a value to AND with the contents of the location, and a value to XOR before doing the AND.  So:

WAIT 53265,128,0

Will wait while  (PEEK(53265) XOR 0) AND 128) = 0

That is, it will wait until the VIC-II(or III or IV) is near the bottom of the screen, and:

WAIT 53265,64,0

will wait while   (PEEK(53266) XOR 0) AND 64) = 0

That is, until the VIC-II/III/IV has just about finished drawing the top two lines of text.  If I only want to put text in the top line, that's close enough for my little test program.

So, lets put this together into a little loop that changes the border colour based on where we are on the screen:

2000 R = 53248 + 17: R2 = R + 1
2020 WAIT R,128,0: POKE 53280,0
2060 WAIT R,128,128: POKE 53280,2
2070 WAIT R1,64,0: POKE 53280, 1
2090 GET A$: IF A$="" GOTO 2010

Line 2000 works out $D011 and $D012, the two raster indicator registers on the VIC-II in decimal, so that we can easily use them in the routine.

Line 2020 waits until bit 7 of $D011 = 1, i.e., for the bottom part of the screen to start being drawn, and then sets the border colour to black.

Line 2060 waits until we are out of the vertical fly back with $D011=0 again, and then sets the border colour to red.

Line 2070 waits for raster #64, i.e., almost the end of the second row of text, and then makes the border white.

So, if this works, we should see white border for most of the screen, but red at the top, and black at the bottom, and indeed we do:

So now you don't need to learn assembly language to do a raster split any more ;)

Displaying 256 colour images and 16 colour sprites

The material in this post is from Daniel, who has been working hard on tools for preparing and displaying 256-colour images on the MEGA65, using the full-colour text mode, where each pixel is represented by a byte, and where characters can be re-used to save space when drawing a low-entropy image. But over to Daniel...

As you all might know, I've been working on a few demos getting a few things tested and working...  At first, I needed a nice sprite pointer and there were some problems I couldn't solve so I handed the tests over to Paul and he sent me back an awesome demo.  After implementing my pointer, I was quite inspired and wanted to try to improve on what he had started.  So I upped-the-ante and released a three sprite animation with a few more frames.  I had to generate all of the data myself, including extracting each frame of the animation.  I wrote a GUI tool to handle the data conversion because I couldn't find Paul's for looking.  I had tried to do a five sprite version but I couldn't figure out the memory utilisation at the time but I'll will be revisiting it in the future.

Next, I wanted to get some 320x200x256 images on the screen.  Quite some time ago (gosh, is it two years already?) I built a tool to convert images into a format that I could use with the 16bit tile feature on the Mega65.  I dragged it out and tried to get it working.  I found out that it was well out-of-date and didn't work.  In fact, when I built the tool, the feature was still in the design stage and it was never actually used to produce an image on the M65 (at the time the C64GS).  

After some consultation with Paul I started to rework the tool.  I also had to build a loader program for the M65 side.  After quite a bit of trial-and-error and annoying Paul with some silly questions, I finally got an image on the screen.  I didn't actually release that demo because it was too simple and seemed to have a few problems.  And...  The image was a very basic outline of something that I might want to reuse later...

I needed to test the colour use so I made the very pretty spectrum example.  I just used a gradient brush in GIMP to paint out the gradient and converted it to 256 colours using dithering which I don't normally do because I thought it might be nice for this example.  However, when I tried to convert it, my tool came back with some 800 tiles.  That was impossible for my loader to handle at the time because it required that the tiles be stored in a contiguous block.  So, I reworked the image, copying sections such that it is actually comprised of only two unique sections (one of them repeated three times).  I was worried it would look terrible but I think its okay.

[Ed: each 8x8 tile in this mode requires 64 bytes, one byte for each pixel, so 800 tiles requires 51,200 bytes, so some would need to go under IO or under ROMs to fit in the C64 memory map].

So the general principle was now sound but I needed to be sure that the tile reuse by foreground colour replacement was actually working.  I came up with the idea of making a little Andy Warhol tribute using the breadbox image which I had made some time ago.  For something that seems so simple, I had to struggle with GIMP to get it to do what I wanted.  I had to munge the images a few times before I got what I was looking for.  In the end, I think its rather pretty and the colour substitution was working.  I did have a problem with the flip bits initially but I checked some information Paul gave me and corrected the problem in my tool.

Now onto the big stuff...  In staring the process I had wanted to put a 1000 tile image on the screen.  That's a unique tile for every location, a real 320x200x256 image and I wanted to have sprites, too.  I had already made the image I wanted to use (a very long process in Photoshop, let me tell you!  I recoloured the whole image - every single detail and that was just the start!).  It was actually for a little animation demo I did a long time ago...

I wanted to replicate some of that demo but my tool and loader were not up to the task in any shape or form.  Firstly, I was having to modify my loader for each individual image and secondly, my tool could only output to a contiguous block of RAM.  If you look at the memory map for the M65, you'll see that there just isn't a contiguous 64kB block of RAM that you can use when you need screen RAM and so on, too.

When I was asking Paul about it he quite casually said, "just don't make it contiguous and use whatever memory you can."  Hmm...  I don't know if he knew the complexity of what he was suggesting but I knew what I had to do.  I had to rework my tool to and image format to allow for segment based mapping.  I also wanted it to be able to be processed by a generic loader because I was really over all of the complicated calculations I had to do to get the data loading loops to work and it just wasn't something that could be sold as a solution.

I built a test application that would allow me to specify free, reserved and system segments in the memory range that I could use.  It actually came together rather quickly.  Next, I planned how to change my data format and make it loadable by a generic loader.  I then incorporated my changes into my conversion tool.  That wasn't quite so easy...  I found some issues with the GUI messaging that were preventing me from knowing when the user had actually done the mapping, of all things that could go wrong.  Other than a few minor bugs, the tool side was completed more quickly than I had anticipated.  

Next I had to write the loader.  Oh boy.  My loaders up to that point where horribly hacked so I pretty much had to start from scratch.  That was okay because the data format had changed quite significantly.  After a few calculation problems, complete and utter system failures (I'm sure it was my fault), copy-and-paste bugs and a weird, you have to do this, in this order thing, I got the loader working.  It took me a few more hours to do than I'd planned for though but it didn't stop me being ecstatic at my success!

While I was waiting for something, I'd written the small sprite test that I wanted to use in the final demo.  I quickly incorporated it into my image program (and bothered Paul with silly questions again which I figured out before he got back to me) and voila!  A true 320x200x256 image with 16 colour sprites!  1000 unique tiles and 256 colours!

There is still a bug though...  In the middle of the screen, offset a little to the right, a tile is incorrect.  I have no idea why.  I'll need to consult Paul.  Also, as the sprite approaches the bottom border, it gets a strange warping effect on it.  Something Paul will need to fix.

My tool allows for arbitrary sizes of images (up to 255x255 cells).  You may be able to guess what's comming next...

I'll release the updated version of my tool and loader in a day or two.  I promised someone I would be available tomorrow and have a day away from coding.

ARGH!  I was supposed to sleep but I just had to add the music I wanted, too...  Please enjoy "Beachparty" by Zyron.  It sounds a little off pitch, though.  My humble apologies!

Edit:  I got up first thing in the morning to look at why that tile was broken.  I finally figured it out and with some direction from Paul was able to fix it. [Ed: The sprite pointer list that normally lives directly after the screen memory was in the middle of one of the tiles. Fortunately, this list can be moved around easily on the MEGA65 by writing to $D06C/D/E, so that was easily solved.]

 And the following video shows all three loading and being run on the MEGA65.  Apologies for the effectively silent audio level in the video, as I don't have my decent camera here, and the microphone on my phone is half dead.

Tuesday 30 January 2018

MEGA65 configuration utility / Konfigurations Programme

[Diese Post versuche ich nochmal auf Englisch und Deutsch beide vor zu stellen. Hoffentlich ist mein Deutsch nicht so schlecht, dass es noch bequem zu lesen wird. Auch danke Arndt für einige Korrigierungen dabei.]

After lots of posts full of words, here is one mostly full of pictures.  Daniel has been beavering away on a configuration utility for the MEGA65, so that you can set all the important settings, without having to load anything.  Daniel has done a great job, and freed me up to work on various other things, mostly bug fixing various little things, in the meantime.

Die letzten Posten haben alle viele Wörter gemacht, hier ist mal einer mit vielerr Bildern. Daniel ist bis zum Umfallen geschuftet und ein "Konfigurations-Utility" für den MEGA65 geschrieben. Jetzt kann man alle wichtigen Eingstellungen vornehmen, ohne irgendwas dafür laden zu müssen. Daniel hat das richtig gut gemacht und mir damit Gelegenheit gegeben, verschiedene Dinge voranzubringen, meist nötige Bug-Fixes. 

So first, we start by holding C= and resetting the MEGA65, and then pressing ALT while still holding C= after kickstart says "release control to continue booting". This makes the Utility Menu appear. Think of it like Batman's utility belt, only 8-bit, and generally lacking in the shark repellant department:

Als Erstes: Wenn wir beim Starten oder Resetten C= drücken, meldet das Kickstart "Release control to continue booting". Wenn stattdessen die ALT-Taste dazu gedrückt wird, erscheint das Utility-Menü. Das muss man wie Batmans Ausrüstungsgürtel vorstellen, nur in 8-bit,  und natürlich ohne die Haiabwehr-Sachen:

When we press 1 to load the new configuration utility, it quickly appears, and can be controlled by mouse or keyboard to check and set various configuration options. The following few photos show the current contents of the screens (which is likely to change over time).
Wenn wir die 1 drücken, kommt ziemlich schnell das Konfigurations-Utility. Es wird entweder mit der Maus oder per Tastatur bedient, um damit alle möglichen Konfigurationsoptionen einzustellen. Die nächsten Bilder zeigen den gegenwärtigen Inhalt dieses Bildschirms (was sich in Zukunft sehr wahrscheinlich noch ändern wird).

Then when you are finished checking and setting everything, you can save and exit as you wish.  There is a confirmation prompt for added comfort.

Nach der Anschluss der Einstellungen kann man (wenn man möchte) speichern und die Konfigurationen verlassen. Es gibt eine Abfrage, die einen an die verschiedenen Speichermöglichkeiten erinnert.

Friday 26 January 2018

Testing 16-colour sprite mode

What originally started out as debuging the 16-colour sprite mode to allow for a more colourful mouse pointer in the MEGA65 configuration program ended up going a bit further.

Daniel wrote a test program for 16 colour sprites to reproduce some strange behaviour that he was seeing. The mistake in the VHDL was simple, and quickly fixed. In the process, we also tweaked the transparent colour selection, so that it is now the foreground colour of the sprite.

Having fixed that and a few other little problems, I wanted to make a more interesting test of the 16-colour sprites.  So I decided it would be fun to animate a well-known 16-colour sprite assembly from the Amiga.

The first step was to produce the sprite data in the correct format. To do this, I added a new mode to the pngprepare program in the MEGA65 source tree, that takes a PNG with 16 or less colours, and produces a nice binary format that can be used natively on the MEGA65, including the palette entries, and information about how tall the sprites are, so that extended height mode can be used.

Then I modified Daniel's program to use that binary format, and to animate the resulting sprites. The result is quite nice:

This uses just two of the eight sprites, side-by-side to get the 32 pixel horizontal resolution.  The sprite itself is at 320x200.

So then I started thinking about using some multiplexing to get more owls on the screen.  I can easily use 3 pairs of sprites to get three owls on without multiplexing (the 4th pair could be used, but I would have to do a bit more palette fiddling, and it is already late). As the owl is $57 pixels tall, and I can have three overlapping owls, this means one every 29 (decimal) rasters.  I could also trim the sprites vertically (at the moment the blank space above and below the own as it moves is part of the sprite, wasting about 50% of the height in any one frame), so it would be quite easy to get more than double this number of owls on the screen.  But, let's have a few more owls anyway, for good measure:

I gave in and did the palette fiddling I mentioned, so there are four pairs of sprites, which are then multiplex 3x over (parts are concealed by the top and bottom borders).  So there are probably 8 or 9 whole 32x86 pixel 16-colour owls worth -- all using the 8 hardware sprites, not Amiga-style BOBs.  It would also be possible to horizontally multiplex the sprites (this is easier on the MEGA65 than on the C64, in part because the CPU is 50x faster, and there is DMA to splat the sprite X positions during a raster line), and the blank space above/below the owls could also be trimmed out to optimise things, allowing fairly easily to have about 4x more owls on the screen if we wanted to.  But I'll leave that fun to someone else...

Improving the ethernet adapter

I have made a couple of improvements to the ethernet interface in the MEGA65:

1. The Ethernet PHY MIIM registers now work, so you can find out if the ethernet port is connected, and at what speed, for example.  This was not too tricky, except that I had the bus running at 2x the correct speed, because I read the 400ns edge-to-edge time, i.e., half-clock time, as being the clock cycle time.  After that, it was all peachy.

2. MAC address filtering, including recognising broadcast and multi-cast ethernet frames.

This is basically an extension of the "allow bad-CRC" flag, together with adding registers to store the MAC address of the machine, so that ethernet frames can be filtered if the CRC is bad, if the frame is not broadcast, not multicast, and not addressed to the MEGA65.  The broadcast and multicast checks can be independently disabled, if you really want to cut down on processing load, by only listening for packets addressed to the machine itself.

After chasing down the bugs introduced in the process, it now all works. LGB wrote a nice little ethernet test utility, that responds to ARP requests and PING.  This now works very nicely, and I can ping the MEGA65 from my linux laptop:

$ ping

PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.316 ms
64 bytes from icmp_seq=2 ttl=64 time=0.304 ms
64 bytes from icmp_seq=3 ttl=64 time=0.304 ms
64 bytes from icmp_seq=4 ttl=64 time=0.336 ms
64 bytes from icmp_seq=5 ttl=64 time=0.306 ms
--- ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4085ms
rtt min/avg/max/mdev = 0.304/0.313/0.336/0.016 ms

$ arp -na? ( auf 38:10:d5:29:66:ef [ether] auf wlp1s0
? ( auf 02:47:53:65:65:65 [ether] auf enp0s31f6

Here we can see the program running on the MEGA65, with counters on the top line for the number of packets received (RX), transmitted (TX), ARP packets replied to (ARP), PING packets replied to (PING) and UDP packets seen (UDP).
Below that is the MAC address of the MEGA65 displayed three times over:

Thursday 25 January 2018

Upgrading the SD card controller: SDHC cards now work!

 This little job has been a LONG time coming. We knew several years ago that we wanted at least SDHC support (and ideally, SDXC, as well) for the MEGA65.
The limitation was in the open-source VHDL SD card controller that we were using.  Fortunately, some nice people have improved that controller, so that it now supports SDHC cards, i.e., SD cards between 4GB and 32GB in size.  SDXC begins from 64GB and goes up to 2TB, which seems to be rather larger than one might need for the MEGA65, however, SDHC cards will start to be harder to get in due course, so we want to add support for SDXC. For now, the job is about SDHC cards, however.

The first step was to pull in the new version of the SD controller in VHDL.  This proved to be not too hard, as it really is a re-vamp of the old  one we were already using. There were some changes to the handshaking, which took a little work to get right, and it also assumed you knew at compile time whether you had an SD or SDHC card -- whereas we want to be able to work out what is plugged in when you turn on the computer.

With all that working, it was then time to fix FDISK/FORMAT, and Kickstart itself. With a little effort, both of these now reliably detect whether you have an SD or SDHC card, and can prepare and boot from either.  Of course, the SDHC cards are quite a bit faster as well, which is nice. I am using only a class 4 card, and it is very noticeably faster than the old 2GB SD card I was using.

Also, preliminary testing suggests that the unrealiability issues I was seeing with the SD card interface have also been resolved with the use of this new interface, which is very welcome, since they were stopping me from making a functional freeze function.  So, finally, I can get back to doing exactly that.

Wednesday 24 January 2018

Freezing and Unfreezing - part 1

Freezer cartridges are a common tool on the C64, to allow for saving, debugging and a bunch of other useful tasks.  However, getting the things to work on the MEGA65 is not easy, and even if we could get them working, they wouldn't be aware of the extra memory and special functionality of the MEGA65. So for these and other reasons, the MEGA65 will have a built-in freeze function, which will be accessed by pressing the RESTORE key for 0.5 - 2 seconsds, i.e., longer than a quick tap as would be used for RUN/STOP-RESTORE, but shorter than the 2 - 4 seconds which is used to trigger a reset (we might remove the reset function, once we have a functional reset button on the MEGA65, since it will become a bit redundant).

(Talking about the difficulties of supporting real freezer/fast load cartridges, the Epyx Fastload is a good example of the kind of strange problems that can come up.  As I discovered here, there is a little Resistor-Capacitor circuit on that cartridge, that causes it to disable itself, if it is not accessed for ~400 clock cycles.  Kickstart takes longer than this to check the SD card and load the user's preferences etc, and so the cartridge is never visible.  The Action Replay doesn't do that, but gets upset if you try to read from $DExx, and I have yet to actually get it to present its ROM. Apparently the Action Replay also uses some illegal opcodes. All in all, I have discovered them to be rather a pain to work with.  All the more reason to have a native freeze function on the MEGA65.)

One of the advantages we have on the MEGA65 is that we have the 16KB Hypervisor memory, plus Hypervisor traps, so we can completely suspend a running program, without having to overwrite even a single byte of the stack.  Thus, the resulting freeze function should be quite robust.

A challenge, however, is the saving and restoring of the SD card access registers and sector buffer.  The access registers are only a few bytes, so we can copy those to a scratch area in the Hypervisor memory.  Then we can write whatever is in the sector buffer to a reserved area on the SD card, so that it is safely recorded.  At that point we can safely make use of the SD card direct access facilities, without corrupting the state of the program being frozen, which should probably start with the stashed SD card access registers.

To un-freeze, we can finish off by loading the stashed sector buffer contents, and reading those stashed access registers back into the actual SD card access registers.  At that point, even a program that was making direct SD card access should be safely restored.

There are a couple further complications.

First, the F011 floppy controller sector buffer and the SD card direct access sector buffer cannot be direct memory mapped at the same time, so copying the F011 sector buffer to the SD card sector buffer is rather painful.  In fact, it is painful enough, that I have already written a fix, that makes the buffers visible at $FFD6E00-FFF (SD card direct access) and $FFD6C00-DFF (F011 floppy sector buffer).  A pleasant side-effect of this is that it also gives us 3KB of scratch space at $FFD6000-$FFD6BFF, which we can use in the freeze process, or indeed elsewhere in the Hypervisor.

Second, if a program was accessing the SD card directly, the things it was accessing may not be there any longer, which may result in the reading of sensitive data, or the corruption of who knows what.  This is actually a very strong argument for not allowing programs to have direct SD card access, except in the rarest of situations, and even then, having the user confirm granting of the permission to do this before the Hypervisor grants direct SD card access.  In fact, all storage access has similar problems. However, at least for F011 disk images and the real floppy drive, these can be reasonably managed by remembering the disk image that needs to be re-mounted on unfreeze, or similarly, if the F011 floppy controller should instead be connected to the real 3.5" floppy drive.

Apart from the above, it should just be a reasonably straight-forward process of writing blocks of memory out to the SD card. It would be really nice to have a DMA to/from SD card option to rather automate this, however, we don't (yet) have such a beast.

In terms of the various blocks of memory, some are not entirely trivial, such as the VIC-IV palette blocks, which we have to select which is visible in the memory map.  To solve this, and the fact that all of the state that we need to save is not contiguously arranged in the 28-bit address space, there is a list of regions that need to be saved/restored, together with their lengths, and a one-byte option that is used to select a little routine that is run before access, so that the region is visible and otherwise ready for copying.

As I have started to write the freeze routing, it hasn't surprised me that I have managed to mess up the SD card file system by writing sectors in the wrong place -- notably over the Master Boot Record, i.e., the partition table.  While I can use the native MEGA65 FDISK and FORMAT utility to put it back, it does wipe the FAT32 file system in the process, and thus the C65  ROM is removed, as well as the MEGA65 system partition.  This is important, because the freeze function can only work if there is a MEGA65 system partition, as otherwise it doesn't know where it can safely save the frozen program, and without the C65 ROM, the machine doesn't ever leave the Hypervisor, and so the freeze trap to the Hypervisor cannot happen.

My solution to this is to allow the native FDISK/FORMAT utility to realise when a ROM has been loaded, and to save that to the newly formatted file system at the end of the FORMAT function.  So now, if I mess up the SD card, I can just reset while holding the ALT key to get the Hypervisor Utility menu, select FDISK/FORMAT, reformat the SD card, and be on my way again, ready to mess it up again in no time at all -- and without having to pull the SD card out, which is a bit annoying to do with the prototype PCB.

The problem I am facing now is some reliability problems when wrting to the SD card: Basically the SD card controller locks up under certain conditions, and has to be reset to fix it.  These problems have become annoying enough, that I want to solve them once and for all.  This ties in with the lack of SDHC support, which is related to the same module. So my very next step is to try to switch out the old SD controller for the newer version of the open-source one that has been released in the meantime. Hopefully that will fix both SDHC support, as well as the lock-ups, and after that it should hopefully be relatively smooth sailing to having a working freeze function.

Saturday 20 January 2018

Improving the DMAgic controller interface

The C65 has a DMA controller, the "DMAgic". Well, actually, each C65 has either the A or B revision of the F018 DMAgic IC.  This means we already have some magic to support different revisions of the C65 ROM that use assume either one or the other revision of the F018.

Then add to that that the MEGA65 already has extensions to DMAgic to support memory access outside of the 1st mega-byte of memory. Until now, those were implemented as memory mapped registers, as that was the most convenient way to implement them.  However, as those extensions have grown, having more and more memory mapped registers that affect the way that a DMA job is interpreted was getting a bit hairy, as it meant that each caller of a DMAgic job needed to be aware of those registers, or alternatively, the previous caller had to be well behaved and clear all the extra registers out. Neither seemed a really satisfactory option.

So instead, I have removed all those extra registers, leaving only two additional registers, that are required for issuing DMA jobs, plus the register that allows selection between F018A and F018B mode for normal DMA jobs:

$D703 - Select F018A/B mode.
$D704 - Sets the upper 8 bits of the 28-bit address where the DMA list is to be loaded from, i.e., which mega-byte of memory the DMA list lives in.
$D705 - Like $D700, it sets the bottom 8 bits of the DMA list address, and triggers the start of a DMA job, but unlike $D700, it triggers an enhanced DMA job.

All the extra options, like which MB of RAM is being copied from and to, and DMA stepping rates are now specified in a set of variable-length options prefixed to the front of the DMA list.

For example, the DMA job to clear the screen in the hypervisor now looks like:

        ; Set bottom 22 bits of DMA list address as for C65
        ; (8MB address range)
        lda #$ff
        sta $d702

        ; Kickstart ROM is at $FFFE000 - $FFFFFFF, so
        ; we need to tell DMAgic that DMA list is in $FFxxxxx.
        ; this has to be done AFTER writing to $d702, as $d702
        ; clears bits 27 - 22 of the DMA list address to help with
        ; compatibility.
        lda #$ff
        sta $d704

        lda #>erasescreendmalist
        sta $d701

        ; set bottom 8 bits of address and trigger DMA.
        lda #<erasescreendmalist
        sta $d705

        ; Clear screen RAM
        ; MEGA65 enhanced DMA options
        .byte $0A      ; Request format is F018A
        .byte $00 ; end of options marker
; F018A DMA list
        .byte $04   ; COPY + chained request
        .word 1996  ; 40x25x2-4 = 1996
        .word $0400 ; copy from start of screen at $0400
        .byte $00   ; source bank 00
        .word $0404 ; ... to screen at $0402
        .byte $00   ; screen is in bank $00
        .word $0000 ; modulo (unused)

The bold lines are the ones that are different to the old method of calling such a job: We write to $D705 instead of $D700 to initiate the DMA job, and then the DMA job now begins, in this case, with two option bytes: The first tells the DMAgic that the DMA list will be in F018A format, after the end of option marker ($00).  Now there is no confusion for a particular list as to whether it expects an F018A or B, and any further extensions that we add can be safely ignored, because they are all disabled by default for each job.  Also, the job setup code is shorter, because it doesn't need to set or clear any DMA options, and there is no longer any need for DMA cleanup code, to put options back to how a naive caller might expect them.

The current list of supported options are:

$00 = End of options
$06 = Use $86 $xx transparency value (don't write source bytes to destination, if byte value matches $xx)
$07 = Disable $86 $xx transparency value.
$0A = Use F018A list format
$0B = Use F018B list format
$80 $xx = Set MB of source address
$81 $xx = Set MB of destination address
$82 $xx = Set source skip rate (/256ths of bytes)
$83 $xx = Set source skip rate (whole bytes)
$84 $xx = Set destination skip rate (/256ths of bytes)
$85 $xx = Set destination skip rate (whole bytes)
$86 $xx = Don't write to destination if byte value = $xx, and option $06 enabled

$00 and $0A we have already met.
$0B is the opposite of $0A, and tells the DMAgic to expect an F018B format DMA list.
$06 and $07 allow enabling/disabling of a "transparent value", that is a value that is not written during a DMA copy. For example, if you were copying an image with a transparent colour, you can now tell the DMAgic what colour that is, and it will copy all the bytes that don't have that value. The value is set via the $86 option
Then $80 and $81 allow setting of the upper 8 bits of the 28-bit source and destination addresses, i.e., which mega-byte of memory to copy to/from.
$82 - $85 allow setting the stepping rate of the DMA. This allows memory copies that smear out or squish up the source, say, for example, if you wanted to scale a texture when drawing it.

Anyway, the net result is a nicely extensible architecture for the DMAgic in the MEGA65, and one that results in increased compatibility with the C65 when faced with lazy programmers, as well as saving bytes  for the typical case where most of the options are not required.  It also makes it easier to freeze and resume a MEGA65 program, because there are now fewer registers to save and restore.

Thursday 18 January 2018

Repairing an Amiga Mouse, and then using it on the MEGA65

As anticipated earlier, I have added transparent support for Amiga mouses to the MEGA65, so that people don't have to find a 1351, and can use the existing USB to Amiga mouse adapters, to allow use of newer mouses.

To get that working, I put out an appeal on Facebook for anyone who had an Amiga mouse to spare (not necessarily 100% working, just enough so that I could test with), and a kind person from Tasmania sent me one with a problem with the X axis.  Nonetheless, it was enough for me to do what I needed to do, and as a result the MEGA65 works very nicely indeed when an Amiga mouse is plugged in, and can automatically detect an Amiga mouse, 1351 mouse, paddles and joystick -- in real time -- and switch modes on the joystick port as required, so that no user fiddling is required.

This all works by looking at what the digital and analog lines are doing on each joystick port, to see whether the behaviour matches one or the other type of device.  It all turned out to be relatively simple in the end, requiring only a modest amount of fiddling to get it stable.

However, as I mentioned, the Amiga mouse that was donated was a bit sick, and I would really like it fully working, as I can use it with the MEGA65 r1 PCB without any funny adapters to route the POT lines, as I currently do for the 1351 mouse (this will of course be fixed on the r2 PCB).  So I started poking around in the mouse today to find out what is wrong, exactly.

It didn't take long with the multi-meter to work out that the voltage output of one of the infra-red light sensors on the mouse was a bit lower than it should be. The mouse shows signs of having been physically repaired around that part of the assembly, so it isn't too surprising, really.  A bit of looking at Amiga mouse schematics I could see that the mouse sensors are fed through a LM339 type comparator to produce a nice 5V square-wave output from the fuzzy light sensor readings. 

Each light sensor is paired with a reference voltage that is used to determine whether to output 5V or 0V, depending whether the sensor voltage exceeds the reference voltage or not.  Since the problem was the sensor voltage was a bit low, I figured the easy solution was to put a bit of extra bias towards 0V on the reference voltage, so that the sensor voltage would again be correct.  It looked like there was about 1.5K resistance between the reference voltage and ground, so I thought I would start by adding a 2K Ohm resistor between the reference voltage and ground, like this:

Only I realised when testing that I had put the bias on the sensor voltage instead, i.e., making the problem worse, rather than better.  So then I though, well, why don't I just put the bias towards 5V on the sensor voltage, like this:

Only it turns out that that doesn't work. I didn't bother exploring why.  So, back to Plan A, only this time with the correct pins:

The astute observer will notice that the above picture shows a 1K not 2K resistor. This was because in testing I found a 1K resistor provided the correct bias correction.

Then it was time to test on the MEGA65 using the Mouse Test 2 program, which as we can see below I was able to use to move the 1351 mouse indicator, with the others staying put -- even though I was using an Amiga mouse:

 However, that is all a bit dull to look at, so I made a short video showing it in action, with me turning on and off the 1351 emulation mode on and off in real-time:

So now the mouse all works very nicely, and I can work on software that uses the mouse.

Wednesday 17 January 2018

Rebuilding SD card access

In my enthusiasm reworking the SD card and F011 code, it turns out I got a bit overzealous, and stripped out the code that provides the memory-mapped sector buffer for SD card access. Clearly not optimal. So I had better fix that up.

Before the rework, there were three separate sector buffers:
1. One for the CPU to read as a memory mapped sector buffer;
2. One for the CPU to write to, and the SD controller to read from; and
3. One for the F011 emulation.

At the moment, only the third one is still there.

This means that SD card access is currently only working for mounted disk images -- but there is no way to mount a disk image, as the Hypervisor can't even read the partition table of the SD card.

We could go back to having three buffers, but that seems a waste of precious BRAMs.  The question is whether we can do everything we need with just the one buffer, or whether we need two.  The answer to that is that we do need a second buffer if we want to have memory mapped access to the sector buffer.  That second buffer would be connected to the MEGA65's FastIO bus for reading, so that the CPU can read from it when it wants to. This buffer would be written to when the SD card controller reads data from the SD card.  To allow the CPU to write to the buffer, we would trap writes to the sector buffer location, and pass them to the SD controller to actually perform the write operation.

The complication is that when the SD card controller is asked to write a sector, it has no way to read this buffer, as only the CPU can cause it to be read.  We can solve this by making the F011 sector buffer 1K instead of 512 bytes (since BRAMs are 4K, this doesn't cost us anything extra), and whenever the CPU writes to the SD card sector buffer, we also write to the second 512 bytes of the F011 buffer.  When the CPU is asked to write a sector to the SD card, it uses this second copy of the data, which it does have read access to, in order to perform the write.

Another wrinkle is virtualised F011 mode, where the hypervisor gets a trap whenever a sector read or write is attempted.  This is used with monitor_load to allow feeding a disk image via the serial monitor interface, instead of using the SD card (handy for some software development tasks, and if your SD card interface is broken, as it is on Falk's r1 PCB).  So I need to preserve that.

Probably the best solution here is to have the two buffers, each with 2 x 512 bytes, with the lower half the F011 buffer, and the upper half the SD card buffer, and have a register bit that allows selecting which one is being accessed.

After a bit of fiddling about, this is all done now and working nicely, and the saved BRAM is also a nice result.

Switching back and forth between the SD card and floppy drive works, but it seems that it is possible for the track numbers to get out of sync, so it is necessary to seek to track 0 after switching from SD card, to make sure that everything matches up.   Ideally we would have some program to allow switching back and forth in a really easy way. Initially modifying the disk chooser menu program is probably the right way to do this, so that there is an option to select "internal drive" as one of the disk choices.

After that, the next step on the floppy drive now is to get writing working, including ideally formatting disks which requires unbuffered writing.

Tuesday 16 January 2018

Improving my bench MEGA65 prototype hardware

After the last several posts focussing on VHDL implementation of various interfaces and things, here is a much shorter read with more pictures, following the improvement of my bench-test MEGA65 revision 1 PCB. 

Here it is before improvement:


The main problems I wanted to solve were, in no particular order:

1. The floppy drive had to live externally and loose, and be powered by an adapter on the joystick port. I want it internal, and firmly held in place.

2. No keyboard. Is further explanation really necessary?

3. The headphone jack and FPGA programming port are very close, which I can't change, but the hole in the case for them was too small to allow both to be plugged in at the same time.  Very annoying when the kids want to play games, and I have to keep unplugging the sound to plug in the data interface and vice-versa.

4. The hole for the joystick ports was very tight, and needed a little enlarging.

5. The hole for the HDMI port was also too small.

6. The cartridge port hole was also too small.

7. I wanted to install pull-ups for the IEC bus, so that it would just work, without having to plug anything strange in (and so that it wouldn't lock the C65  ROM during boot-up).
8. I wanted all the improvements to result in a reasonably physically robust arrangement, that wouldn't be at risk of falling down and damaging itself or shorting itself out when used, whether by myself, or by the kids.

The clear plastic case is big enough for everything, so I figured I would just enlarge holes, and in the case of the floppy drive, make some new holes.  It would be nice if I had the correct tools for cutting holes in plastic. Instead, I have a power drill and some 3mm wood drill bits.  Making the hole for t he floppy drive consisted of drilling perforations around the outline, and then joining them up using the drill as a kind of power saw.  Not ideal at all, but it worked. Here it is with most of the holes drilled:

Then with all the holes joined up, and the piece knocked out, but yet to be filed into a nice rectangle. Sorry the shot is a bit blurry. Cameras don't like photographing nearly invisible objects very much.

 We'll come back to that a bit later.

Then it was time to think about how to attach me genuine C65 keyboard (without printet key caps) to the top of the box, such that it couldn't fall off, fall in, or snag the fragile ribbon cable that connects it to the mother board.

I had a piece of acrylic the right size to sit on top of the box, so I traced out around where I wanted the keyboard to be mounted on it:

Then fitted a couple of scrap plastic blocks to the underside, so that when it rests in the top of the plastic case, it can't move in any direction.  Here is the arrangement from underneath:

With those in place, and a couple of extra holes to fix the keyboard to the top (using the existing two holes in the keyboard, which presumably were designed with a similar purpose in mind), I had a keyboard sitting nicely and securely on top of the box:

Here you can see it from the side, with the green plastic thingoes you put in walls before you put screws in.  If only I could remember their name. Anyway, even nameless they work just fine for this job:

Then it was time to fix a few issues with the PCB, adding a floppy power connector, and pull-ups for the IEC serial bus, so that that can work without any special cable attached. By permanently fitting the pull-ups, it means I can't use the IEC port to drive the POT lines on the joystick, as I did while implementing 1351 mouse support, however as that is done, and since I can use an Amiga mouse transparently (a blog post on that coming up soon), I figured this was no great loss. The expansion/cartridge port was the easiest place to find power:

So after doing that, and having finished making the hole for the floppy drive (which is held in place with four screws on the underside, like in an old PC), I had all the electronics inside. By this time I had also already enlarged the various holes in the case.

Connecting the ribbon cable for the keyboard was fairly straightforward:

Here it is all together. They keyboard ribbon pokes up a bit, which I don't really like, as it is still at some risk of damage like that. But it is not snagging on anything or under any tension, so it will have to do for now:
 And the view from the lest side with the power and joystick ports etc:

 And set up in my office, ready for use:
So while it is clearly a bench prototype, it is now all assembled and functional, without a mess of cables and having to plug and unplug things all the time.

Monday 15 January 2018

Bringing the internal 3.5" floppy drive to life - part 2

Again, a longer post documenting the process of making the real 3.5" floppy drive work.  What I thought would be hard, the MFM decoding sector reading from the real disk, turned out to be quite easy -- taking only a day or so.  But what I thought would be easy, plumbing this into my existing F011 implementation, turned out to be a long string of strange bugs to track down, and took a week or so to work through.  Anyway, here it is.

So, yesterday, I managed to successfully decode MFM pulses from the 3.5" floppy drive, but only in a little C test program on my laptop.  Today, I want to move that into VHDL, make sure I can decode MFM data there, and then correctly parse the sector markers etc, to find a requested sector, and push the sector bytes out, and do all the other little bits and pieces, like CRC checking, required to plumb it into my F011 floppy controller implementation in the MEGA65. The result should be that the MEGA65 can read data from a real 1581 floppy disk in the 3.5" floppy drive.

Yesterday I had broken down the MFM parsing into a set of clearly defined steps: measure pulse gaps, quantise pulse gaps, turn pulse gaps into bits/sync markers, turn bits into bytes.  Turning those things into VHDL was rather easy, give or take the odd spot of debugging.  Similarly, implementing a parser that looks for sync marks and captures the track/sector numbers and compares them with requested track/sector, and works out whether the following sector data should be read out was also fairly easy.  Then came the CRC checks.

The 1581 uses a CRC check that is similar to, but not exactly like the CCITT CRC16 algorithm.  The C65 specifications manual provides an explanation and example routine:

Generating the CRC

     The  CRC  is a sixteen bit value that must be generated serially,
one  bit  at  a  time.  Think of it as a 16 bit shift register that is
broken in two places. To CRC a byte of data, you must do the following
eight  times,  (once  for each bit) beginning with the MSB or bit 7 of
the input byte.

     1. Take the exclusive OR of the MSB of the input byte and CRC
        bit 15. Call this INBIT.
     2. Shift the entire 16 bit CRC left (toward MSB) 1 bit position,
        shifting a 0 into CRC bit 0.
     3. If INBIT is a 1, toggle CRC bits 0, 5, and 12.

     To  Generate a CRC value for a header,  or for a data field,  you
must  first  initialize the CRC to all 1's (FFFF hex).  Be sure to CRC
all bytes of the header or data field, beginning with the first of the
three  A1  marks,  and ending with the before the two CRC bytes.  Then
output  the  most  significant CRC byte (bits 8-15) and then the least
significant CRC byte  (bits 7-0).  You may also CRC the two CRC bytes.
If you do, the final CRC value should be 0.

     Shown below is an example of code required to CRC bytes of data.

; CRC a byte. Assuming byte to CRC in accumulator and cumulative
;             CRC value in CRC (lsb) and CRC+1 (msb).

        CRCBYTE LDX  #8          ; CRC eight bits
                STA  TEMP
        CRCLOOP ASL  TEMP        ; shift bit into carry
                JSR  CRCBIT      ; CRC it
                BNE  CRCLOOP

; CRC a bit. Assuming bit to CRC in carry, and cumulative CRC
;            value in CRC (lsb) and CRC+1 (msb).

       CRCBIT   ROR
                EOR CRC+1       ; MSB contains INBIT
                ASL CRC
                ROL CRC+1       ; shift CRC word
                BPL RTS
                LDA CRC         ; toggle bits 0, 5, and 12 if INBIT is 1.
                EOR #$21
                STA CRC
                LDA CRC+1
                EOR #$10
                STA CRC+1
       RTS      RTS

It is super helpful to have an example implementation, as well as explanation. Nonetheless, it took me about 2 hours to actually get the CRC calculating correctly, as CRC routines are notorious to get exactly right, as they require considerable attention to detail and very sound comprehension of the algorithm.

Another interesting problem was how to test this in simulation.  I already have a debug register on the MEGA65 that allows me to read the FDC data read line. However, as some signals can be as narrow as 120ns, this requires sampling at at least 5MHz.  Using DMA I could sample it at around 20MHz, however, this meant being able to capture only a part of a sector.  And even at 50MHz, the CPU is fractionally too slow, unless I completely unrolled the data capture loop, in which case I would still only be able to capture a relatively few samples, as 64KB of data capture loop would only be able to cover about 10K samples.  What I realised after, was that I should just write a C program that MFM encodes a set of sectors, and feed that in.  If my existing C program for decoding MFM data can read it, then it should make fine test input data for the VHDL. Writing such a program would also be the first step towards being able to write data to floppies, so it is work that needs to happen, anyway.  If I have problems with the current work, I will certainly follow this path.

Then it was a case of debugging some out-by-one errors and the like on sector read lengths, and making sure the right flags are asserted at the right times.  Finally, the whole new MFM assembly had to be plumbed into the existing VHDL, and some new registers added to the SDcard (and now, floppy drive) controller, so that it is possible to select the SD card or the floppy drive as the source for C65 F011 floppy controller accesses.

At this stage, all the machinery is now in place for this to work, assuming that there are no bugs in my VHDL.  Because synthesis takes a long time, I have added some debug registers that will allow me to interrogate more closely what the floppy drive and MFM decoder are doing.  While it would be nice to think that I won't need them, and the test-benching of the MFM decoder with real captured data helps reduce the risks, I suspect I will be getting familiar with them.  We will see how that pans out in a few hours.

So, synthesis has finished, and the selection register to switch to using the real floppy drive seems to work, and the read request line gets asserted to the MFM decoder, but there is no sign of bytes being read from the drive.  So, I will do what I should have done before synthesising the first time, and debug registers to see right down to the lowest level of the MFM decoder.

Okay, next resynthesis, and I can see that it is finding gaps, and decoding bytes, but it never finds a sync byte. This is likely due to the register I setup for setting the number of cycles per MFM bit to be limited to too low a value, thus preventing the gaps from being properly detected.  I rather confused myself here for a while, because I couldn't remember definitively the data rate for a 720K drive. Was it 250K, 500K or 1Mbit?  It took a long time to actually find a list. It is 250K, so 4 usec per bit. At 50MHz, this means we should be using 200 as the cycles per interval. So I should be seeing gaps of around 200, 300 and 400 cycles.

So, in terms of gaps, I am seeing plenty of them around 200 cycles, corresponding to runs of identical bits. This makes sense.  I am also seeing a reasonable number around 400, which also makes sense, as well as some around 300. However, I am also seeing a lot of seemingly randomly distributed gaps, anywhere upto at least 1000 cycles, which is more than 2x the maximum we should see.  There is something clearly wrong here.  However, looking at the source code for the gap collecting code, it is outrageously simple, and of course, it works fine in simulation.  What is particularly curious is that the failure mode appears to be the missing of pulses, not seeing spurious pulses, for example, if there was high frequency glitching on the floppy read-data line.  Yet to be missing pulses is very strange, since the debug register that allows direct reading of the floppy read-data line shows no sign of these random-length pulse intervals. This makes me a little worried about intermittent glitches I have seen, where a couple of registers in particular are read with the wrong values.  As much as I would like to blame the synthesis tools, it is quite possible I have a subtle timing bug that might just be tickling things here.  My immediate next step is to resynthesise with the mfm_gaps module outputting a history of the values it has seen on the f_rdata line, so that I can see if it is indeed missing pulses.

And things get stranger.  While waiting for synthesis of the above debug register, I tried again, this time simply having the M65 try to boot the C65 ROM using the real floppy drive.  In this situation, the C65 ROM will try to load an auto-start file from the floppy.  And all of a sudden, I am seeing better distribution of gaps. I think that what is happening here is the ROM steps the head, which results in it being definitively on a track, where as if the drive was powered off, the head might have been able to move off track a little (or outside tracks 0 - 79).  I'm still not really sure, but it seems to be something mechanical.

To avoid the long resynthesis times,  I have imported my MFM controller into my joystick controlled test bench.  That way I can iterate after only a few minutes. The down side is I have very limited input and output, and can't capture and stream signal values.  This is what I used to work out the stepping the head helps.  However, even after stepping, while I see quantised gaps being detected, with essentially no invalid gap lengths, it is still not detecting any sync bytes.
So, I added yet another debug option (and had another 8,000 second synthesis) to log the quantised pulse gaps, and wrote a little program to interpret a capture of those signals.  That is, I tested the function of the pulse detection and gap quantisation.  And the results are good. Here is what I saw from my captured trace after decoding:

 $42 $42 $42 $42 $42 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72
 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72
 $72 $72 $72 $70 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $05
Sync $A1
Sync $A1
Sync $A1
 $fe $00 $01 $01 $02 $fd $5f $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00
Sync $A1
Sync $A1
Sync $A1
 $fb $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00

$00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $da $6e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00
Sync $A1
Sync $A1
Sync $A1
 $fe $00 $01 $02 $02 $a8 $0c $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e

First, we can see that Sync $A1 bytes are correctly detected.  Second, we can see that a complete sector is observed, with the $FE header ID byte and track, side, sector and sector size bytes. Then there is a $FB ID byte following the next burst of Sync bytes indicating the sector, and a completely empty sector, followed by two CRC bytes s($da and $6e), and the $4e and $00 gap bytes, before the header of another sector is visible. I did notice an error in checking this: I was pulling the header bytes out as Track, Sector, then Side, not Track, Side and then Sector. That's easy enough to fix.

So now we know that the intervals are being correctly quantised, and the stream of intervals that should be detected as a Sync mark are being generated.  Attention thus turns to the mfm_gaps_to_bits module, where the sync bytes are generated. Assuming that the problem must be in the collection and assembly of the strings of gaps into bytes, I am modifying my VHDL test rig so that I can sample the most recent four gaps at any point in time, and see that sample held.  I am wondering if the gap_valid signal is being asserted for more than one clock cycle, for example, which is causing a gap to be registered more than once.  It is also possible that gaps are being missed, but that will be harder to determine.

The logged sets of recent gaps seem to be okay, and I even managed to see a sync mark. So I added a counter for the number of times the sync marks are found. I also added a safety catch in case the strobes to indicate a new pulse has been seen were sticking around for more than one clock cycle.  Whether that was the issue or not, I am seeing of the order of 256 sync marks per second.  Given that there should be 5 rotations per second, and 10 physical sectors each with 2 x 3 sync marks per second, this means I should be seeing 5 x 10 x 2 x 3 = 300 sync marks per second.  This basically matches my eye-ball investigation of ~256 per second, watching the highest order bit of an 8-bit counter blinking about twice per second.  In short, it seems reasonable to conclude that sync marks are now being reliably recognised.  The question is whether that tweak to the strobe line handling fixed anything, or not. This is at least easy to test: Change one line of VHDL to remove the extra check I put in place, and then resynthesise the VHDL test rig. Hmm. Seems that it wasn't necessary.

So, it seems that the problem must be higher up the processing chain.  So I took a closer look at the top level mfm_decoder module, and noticed a couple of errors in CRC checking of sectors, and when the found_track/sector/side variables are set.  I then realised that my test bench tested that the sectors were found on the disk, but not that this information was communicated back up to the caller.  Needless to say I have fixed those now, and am resynthesising.

While waiting for synthesis of the whole MEGA65, I also made the same change to my VHDL test rig.  Finally, that is now finding sectors! I can also see which track I am on, from the reported track number, as well as watching the sector numbers cycle through.  Now I am impatiently waiting for synthesis to complete, so that I can test it in the MEGA65 again...

Resynthesis has finished, so now trying it out.  I have hit the problem that sometimes happens where some registers for the SD card interface and the non-existent input switches misbehave. This mean booting up without an SD card, and hence, without the F011 disk present flag being set by mounting a D81 image.  It turns out that this is the only source of the disk present flag at the moment. So I need to plumb the disk ready line. 

However, I have just hit the exact same problem that causes Amiga 600 and 1200 disk drives to tick constantly when there is no disk inserted: The /DISKCHANGE line does double duty as the disk present signal.  As it doesn't seem right for an 8-bit computer that is likely to only sometimes have a disk in the drive to tick like a bomb, I'll just lie and tell the F011 that it always has a disk inserted when it is talking to a real floppy drive.

I also found a similar problem in the logic that checks if a disk is available before dispatching a sector read or write job via the F011.  I have now fixed that logic ready for next synthesis. This was again because the existing F011 code assumed that the SD card was the only source of floppy disks.  It also meant that I could work around it by setting the disk image enable flag for floppy drive 0, so that it would look like a disk is available to the internal logic.

With those two fixes, the C65 BASIC no longer reports DRIVE NOT READY when attempting to run a DIR command.  Instead it hangs... because I don't assert the Request Not Found (RNF) flag if a requested sector is not found.  Reading the C65 specifications, the RNF flag should be set if the requested sector has not been found before six index pulses, i.e., within about a second.
This is implemented now, ready for the next synthesis run.

The hanging, I think, was because I had stepped the disk drive to a different track, which confused it, as the sector headers would not have matched, and so it would have kept on searching.  Now with the drive on the correct track, the DIR command returns a broken directory, that looks to me as though it is reading all zeroes from disk, or some similar problem.

Taking a look at the sector buffer for the F011, it looks like it is finding all zeroes.  Ah, that would be because I set the "match any sector" flag.  This is actually encouraging, as it means that it is reading the empty sectors on the disk.  With that flag turned off, DOS is back to hanging. Perhaps I have the side 1/2 head select flag wrong?  The request that it is currently stuck on is track 39 (this is floppy track numbering, the C65 is talking about track 40, the directory track on a 1581 disk), sector 1, side 0.  Checking the sectors that the MFM decoder is seeing (via the debug registers I added at $D6A3-5), I can see that it is in fact correctly finding the sectors for that track and side.  More specifically, I can see that it finds track 39, sector 1, side 0.  Yet it never seems to indicate to the F011 that it has read the sector, and the busy flag stays asserted, which is why DOS is hanging.

At this point, I could try to work my way through the C65 DOS to figure out everything that is going on. However, as the regular SD card mode of operation works fine, I am instead going to write a little test program that tries to drive the F011 to read a sector, and see if that works. If not, then I will know where the process is failing.  If it does work, then I can take a look at the DOS after all.

This reveals that reading a sector doesn't seem to properly complete. So I finally got around to writing a program that produces a whole sector's worth of MFM data to input into simulation, to see if that sheds any light on the situation. Found one more bug: sector_found was being reset the same cycle as the sector_end flag was being asserted, and the MEGA65 was not handling that correctly.  So I have added a fix for that, which is now synthesising.  I have also added some extra debug registers so that I can see if the MEGA65 thinks it is accepting sector bytes from the MFM decoder.  Synthesis time again.

Following synthesis of the above, I have worked out that the SIDE flag from the F011 registers is being inverted in sense, so I have pushed a fix for that, which will require another synthesis.  The next mystery is that the Request Not Found flag is still being set, even when the track and side are correct. Setting the match any sector flag solves this, by accepting any sector. So this suggests that there is still something faulty with the track/sector/side match logic.  Indeed, I can see that the found_track/sector/side matches what is being asked for from the F011.
This problem was caused by comparing unsigned values, instead of first casting them to integers.  This is one of many really annoying "features" of VHDL.

After various other little fixes, I can now see the correct sector is being found, and the various signals are being asserted, that should be telling the MEGA65 that bytes are ready for writing to the sector buffer as they are read from the floppy drive. However, still no bytes are read.  The combinations of signals that I am seeing, using the debug register $D6A1 as the source of my information are recorded by incrementing a byte of RAM corresponding to each value read using a routine like LDX $D6A1, INC $0400,X. In this way the contents of $0400-$04FF form a histogram, which I can see on screen as it is gathered, and yet the sampling rate is still able to be ~1MHz.  This will miss some instances of short-lived signals, however, by repeatedly plotting in this way, those will still tend to show up over time.  So here are the combinations of signals I am seeing:

fdc_sector_end - Very frequent. This is pulsed each time the end of a sector is reached, whether or not it is the sector we are looking for.

fdc_sector_found - Is held for a period of time, ~5 times per second, i.e., each time the sector we are searching for passes under the head.  This is encouraging, as it means we are finding our sector quite reliably.

fdc_sector_found | fdc_sector_data_gap - As above. This indicates that we have passed the sector header, and are now in the gap between the sector header, and the data of the sector itself. Again, a very healthy sign.

fdc_sector_found | fdc_byte_valid - This also pulses a number of times each rotation, indicating that the MFM decoder has the bytes off the sector to serve.

fdc_sector_found | fdc_byte_valid | fdc_first_byte - This gets indicated only once per reading of the sector, to indicate the first byte of the sector is being presented. Again, a very healthy sign.

So, we have clear evidence that the sequence of signals is more or less correct when the MFM decoder is doing its job.  It should be noted that these are generated automatically, whether or not the F011 has asked for a sector to be read or not.  When a read sector command is given, there is an extra signal that is asserted by the F011 so that it knows that it should accept the bytes presented by the MFM decoder, fdc_read_request.  By running a bunch of read requests while my histogram is collecting, I can verify if all of these combinations also occur while fdc_read_request is asserted. 

Because the histogram is running continuously collecting, I have to manually issue the read command via the serial monitor, so the number of samples is much smaller.  As a result I didn't detect any instances of the rather rare combination of fdc_sector_found | fdc_byte_valid | fdc_first_byte, however the face that all of the others are showing up gives me confidence that it should be showing up.

For a byte to be written, fdc_read_request, at least one of fdc_sector_found and fdc_sector_end, and fdc_byte_valid must be simultaneously asserted. From my histogram, I can see that fdc_read_request | fdc_byte_valid | fdc_sector_found happens quite frequently, as one would expect. That is, the conditions are satisfied for writing the bytes read from the floppy into the buffer. However, there is not so much as a single byte changed in the buffer. The logic that makes this decision consists of a couple of nested if statements, so I might put something in one of the outer if statements, so that I can see if it is getting there.  I'll also go through the synthesis warnings to see if there is anything fishy there that looks like it could be causing this problem.

Nothing fishy turned up in the synthesis warnings, however instrumenting those if statements also showed that it never gets to the appropriate test.  A little further searching, and it looks like the fdc_sector_end signal was not being handled correctly: Whenever it occurred -- whether it was the end of the sector that was being searched or not -- it would cancel the current read request.  I have now added this extra check, and am resynthesising. 

As I was thinking about this bug, I was initially thinking that this should mean I have a 1 in (sectors per track) chance of reading a sector. However, it instead required that the read request be initiated in the gap after the end of the sector just before the one desired, and the sector to be read.  This was quite a comforting revelation, as it explained why I was never seeing a sector being read -- because the probability was probably less than one in a thousand that this condition would be met. Even when set to accept any sector, the inter-sector gap is only a small percentage of the length of a sector.  Now we await synthesis to see if this theory is correct.
Indeed, with that fix, the sector data thinks it is being read into the sector buffer. I say thinks, because the sector buffer data doesn't change.  I lost several hours to this problem, until Falk reminded me that we have two copies of the sector buffer, one where the CPU controls access to it, and the other where the SD card writes directly into it, because the SD card can't be held off while the CPU is looking at the sector buffer, and vice-versa, when the CPU is reading from the buffer, there isn't a mechanism to add an extra wait-state if the SD card (or now also the floppy controller) is writing to the buffer.  There is special logic that allows each side to cause writes to happen to the other copy of the sector buffer, and I had forgotten to implement that for floppy reads. That is now synthesising.

Although the data itself is not being read into the sector buffer, in theory, the status signalling should be mostly complete, and I should be able to boot the C65 ROM with the real floppy drive enabled, so I tried doing this. This revealed that there is a problem with the track stepping: The DOS gets stuck looking on boot trying to read 1581 track 40, sector 0, because no data ever arrives, which is because the floppy drive has stepped only to track 38, not track 40. I'm not quite sure what the cause for this is.  If I manually step the drive forward to the correct track, the read can then complete, although I think I am seeing a problem with the buffer empty/full flag being erroneously set to indicate the buffer is empty, rather than full. This is used by the C65 ROM to work out if a sector has been read or not.  This might just be because of how I am probing the registers to debug, as the EQ flag inhibit when reading gets cleared if $D087 (the port register to the sector buffer) is read. Since I am reading 16 registers at $D080-$D08F as a batch, this would cause that problem.  So I expect that this is a non-problem. So it really just leaves the track stepping problem.

The C65 DOS steps tracks assuming that this works without problem, and always lands on the correct track. The routine for seeking to a track is at $9B98, and looks something like:

   ldy #$00
   ldx $1FE8    ; presumably the current drive ID
   lda $d084    ; The target track
   jsr $9A88    ; Wait for F011 busy flag to clear
   cmp $010F,x ; compare with the track we think we are on?
   beq foundtrack
   bcc lowertrack

   inc $010f,x   ; increase the track we think we are on?
   ldy #$18
   bra issuestepcommand

   dec $010f,x      ; decrease the track we think we are on?
   ldy #$10

   sty $D081     ; Tell F011 to step
   bra waitforready

   beq done      ; if we didn't step, skip head settling time
   jsr waitforheadsettle


Single-stepping through this routine, it seems to do what it should. However, after that $14 gets written to the command register in the waitforheadsettle routine. And that's where the problem is: I was interpreting that as a head step command, because the high-nybl is $1.  Again, a couple of lines to fix it, and probably a couple of hours of waiting for synthesis to run.  Then I found that the F011 disk-side select line was inverted, and I wasn't updating the pointer to the location in the second copy of the sector buffer, i.e., the one that the CPU sees.  So, it's off to synthesis again, but very much with the feeling now that it is the last few barriers this time.

It turns out there were still a number of other little niggly bugs to track down with filling the sector buffer, which once solved frustratingly still haven't quite got it working.  Sectors do now get read, and loading a directory goes through the motions of working -- but it is reading the wrong sectors for some reason.  Or perhaps looking at the wrong halves of the sectors, since the 1581 uses standard 512 byte sectors as the on-disk format, which each contain two logical 256 byte sectors, presumably because the PC-style floppy controller IC it uses doesn't support 256 byte sectors, or because using 256 byte sectors would have resulted in a slightly lower disk capacity.  Whatever the cause, the question is how SD card and floppy can behave differently, when all the buffer pointer management is the same regardless of whether a sector is read from the SD card or from the floppy drive -- and yet the problem only occurs when reading from the floppy drive.  Thus there must still be some subtle difference. 

To try to figure out what the difference might be, I tried loading the same sector from both SD card and from the real floppy drive.  Lo and behold, the sector from the floppy drive was rotated by one byte through the sector buffer.  Looking through the source code, I can see that the buffer pointer in question was not being zeroed when the read sector job was accepted, i.e., the part common to both SD card and floppy drive access, but instead in the setup stage for SD card access. Satisfying that I can see that this makes sense. In retrospect, I saw that the buffer write addresses were out by one in simulation, but the consequences didn't fully occur to me at the time.  Anyway, time for synthesis again...

Okay, so now the sector buffer rotation bug is fixed, and yet DIR still shows gibberish, as though it is reading the sector data incorrectly.  The C65 DOS uses unbuffered reading, where it accepts each byte as it is supplied by the F011 floppy controller, rather than waiting for it all to arrive. So I wondered if there was some sort of bug in that handling -- although, again, for SD card reads, it works without problem.  So I wrote a test program to make sure that unbuffered reading works correctly, and the correct data is read out, and in order. All seems to be correct there.  However, when trying to load the directory of a disk, it still looks very much like it is accessing the wrong half of the sector.

Unlike for the sector write offset, I can see no difference in the way the read pointer into the sector buffer is handled between the FDC and SD card data paths. Yet, like the last time I said this, there must be some difference, or else it would be working.

The F011 has a SWAP bit, that allows the halves of the sector buffer to be switched from the CPU's perspective, and the handling of this is the prime suspect as the root cause for this bug.  Single stepping through the C65 DOS, it turns out that this is not the problem.  Rather, the problem is that the floppy drive reads data at a slower rate than the CPU can read it, and the EQ flag that indicates if the buffer is empty claims not-empty as soon as the sector starts being read, regardless of whether the data has been read far enough or not.  As a result, the problem is actually that the C65 DOS is reading the contents of the buffer, before it has been filled from the floppy drive.  So the SWAP flag can stay as it is, but the EQ flag requires correction.

In the end, I looked at the whole handling of the EQ flag, and with the distance of time from when I first implemented, it seemed rather over complicated to me.  So I re-worked the entire sector buffer handling code, so that it now works much more like the real F011, and the duplicate copies of the sector buffer to solve bus contention have been reduced to a single buffer, with a bit of clever pre-fetching and write-buffer. The result is a whole lot simpler, and, it works.  FINALLY I can stick a 1581 disk in the real floppy drive, type DIR from C65 BASIC, and get a directory listing. Here is one I specially formatted the other day using my resurrected 1581:

And here is an old disk from the mid-90s when I was using the C65 I had at the time as my main computer:

I tried a bunch of my other old 1581 disks, but only a couple would work.  What I don't know is whether that means those disks have simply rotted away, which is entirely possible after a 20+ years, or that my error correction for MFM decoding could do with a bit more work.  I guess I will pull them out again at some point, and do some data captures from them, and from there work out whether my error correction could be improved. But adding write-support to the floppy drive is a higher priority than that.