Sunday, 7 January 2024

Hardware Accelerated IEC Controller -- Part 4

In the previous post, I implemented the JiffyDOS protocol into my hardware IEC controller for the MEGA65 (well, except for the optimised LOAD routine, but it works fine without it -- if a little slower). This means that the key remaining piece of functionality we need is the C128 "fast serial" interface that the 1571 and 1581 support. I put "fast serial" in quotes, because while it is faster than the 1541 protocol, despite needing extra hardware, it is slower than a 1541 with JiffyDOS.  But as many folks have stock 1571s or 1581s, and also just for completeness, it makes sense to implement it.

For the standard and JiffyDOS protocols, I made heavy use of VHDL simulation of a 1541 drive, so that I could have nice unit tests that verify lots of the functionality, and let me spy on the IEC bus in a way that just isn't possible with a physical setup. In particular, I could see whether the drive or the host computer was pulling IEC lines low, which really helps debug things. And with my new build server, I can run the dozen or so tests in less than 30 seconds.

Now the trick for doing this for C128 fast serial protocol, is that the 1541 I emulate in VHDL obviously doesn't support the fast serial protocol. So I need to emulate either a 1571 or a 1581.  I'm suspecting the 1581 will be the easier of the two.  Digging around, I have found a copy of the 1581 service manual, which includes a bunch of useful information for me to setup an emulated 1581.

In particular, it gives me the memory map for the drive:

$0000-$1FFF = 8KB RAM
$4000-$400F = 8520A CIA
$6000-$6003 = WD1770 FDC
$8000-$FFFF = 32KB ROM

The CPU is a 6502A at 2MHz. The CPU I use for the 1541 can be reused for this, just run at double the speed. 

The 8KB RAM won't be a problem.

The 8520A is a 2MHz CIA, very similar to the 6526 CIAs used in the C64, but with a different format for the time of day clock.  I'm suspecting that I can ignore the differences, and just use a 6526 CIA, for which we have good VHDL.

The WDC1770 FDC is pin-compatible with the slightly improved WDC1772, for which I can find a VHDL implementation.

Finally, the 32KB ROM won't be a problem either.

So I can start by adapting my internal1541.vhdl to make an internal1581.vhdl, ingest the 1581 ROM, and see if I can't get it running.

A lot of this is just refactoring to change the memory layout, and the plumbing of the IEC port to the CIA.  The 1581 Service Manual has a good clear schematic, from which I grabbed the following images.

First, we can see the general signals that need to connect to the CIA:

Some of these go through other circuits, but many are already pretty clear, as to how I need to plumb them.

Perhaps not surprisingly, the IEC CLK, DATA and ATN lines are laid out exactly the same as for the 1541, including the circuitry to connect them to the physical ports:

This would have allowed Commodore to reuse a lot of the 1541's IEC routines with out significant modification.

The fast serial port is formed using the SP and CNT pins of the CIA.  To protect them electrically from having multiple devices driving them simultaneously, these are gated through a 74LS241, under the control of a data direction line on bit 5 of the 2nd CIA port. If the line is high, then the 1581 will output, and if it is low, then the 1581 will input -- again, with the lines fed through glue-logic to make them behave open-collector, to eliminate all risk of cross-driving of lines on the bus.

Bo Zimmerman's archive even has the source for the 1581 ROM, which I am sure will be invaluable as I go about debugging this.

I now have everything tied together as a test suite called by make test-iec-c128, that uses tb_iec_serial_1581.vhdl as the top-level.  I'm running it now, and expecting lots of tests to fail.  It will be interesting to see if any that use the 1581 work, through sheer luck. And indeed lots of things are failing.

First stop will be to make sure I am allowing enough time for the 1581 to get to its IDLE loop, before asking it to do anything.  It's possible it will get stuck asking the floppy controller things, or testing its internal RAM or something like that.

It looks like the CPU gets going, but it gets stuck in an IRQ loop. The CIA is the only thing that can trigger an IRQ, so why is it doing that? It looks like Timer A is enabled, and set in continuous run mode, with a counter value of 6. Thus it will try to trigger an IRQ every 6 cycles.  This is not what the ROM is probably trying to do, except it seems to be pretty much what the initialisation routine at $AF26 does: Enable Timer A, tick source = bus clock (2MHz), and enable Timer B underflow IRQ.

Ah! This might be a bug in our CIA implementation. Indeed it is: For some reason we have our 6526 CIA configured to have Timer A interrupts enabled on power on.  This should not be the case.

Now we have something really interesting going on:  It looks like the 1581 ROM uses the BRK instruction on purpose!

brk_controller
      brk
      nop
1$    lda  jobs,x
      bmi  1$

It looks like it uses the BRK instruction as a "convenient" way to ask for translation of a logical track and sector into physical ones.  But I'm not 100% sure.

I am sorely tempted to just mask out the BRK instruction, since I don't really care about having the drive work fully, just boot up enough to talk on the IEC bus.

Hmm.. That doesn't work, because it seems the 1581 uses BRK to switch from DOS to controller personalities.  I'm reluctant to implement it, as I purposely use the fact that BRK is unimplemented in this CPU to detect when the CPU goes rogue. 

I've worked around it by masking out much of the routine in question, and just leaving the other funny stack operations it has in it. It now gets further, but without a ROM disassembly, it's a bit of a pain to figure out if the drive is in the idle loop, or if something else is going wrong. In any case, the tests are still failing.

Meanwhile, @johnwayner on our Discord server has had the change to plug a JiffyDOS enabled drive into a MEGA65 with a core with the new IEC controller enabled, and tried to test it for me.  Unfortunately it doesn't work quite correctly yet.  When trying to read the DOS command channel, which should give messages like this:

73, JIFFY DOS X.Y,00,00

and 

00, OK,00,00

It is giving rubbish. Well, not exactly rubbish, as we have quickly determined that it repeats with the correct interval for the 00, OK,00,00 message.  

All in hex, this is what we see:

A1 A0 A3 A6 A6 A5 A5 A6 A5 A7 A4 A0 A1
A3 A0 A0 A0 A1 A1 A0 A3 A0 A0 A3 A0 A0
93
A0 A0 A3 A0 A7 A6 A3 A0 A0 A3 A0 A0 93
A0 A0 A3 A0 A7 A6 A3 A0 A0 A3 A0 A0 93
...

The first obvious thing, is that the upper 4 bits are almost uniformly 1010 (= A in hex).  Can we gain some insight into what is going wrong here by comparing the actual to expected values?

Looking at the following table:
 

  HEX                BINARY
--    --   -----------    -----------

                 vv                vv
30 vs A0 : 00 11 00 00 vs 10 10 00 00
2C vs A3 : 00 10 11 00 vs 10 10 00 11
20 vs A0 : 00 10 00 00 vs 10 10 00 00
4F vs A7 : 01 00 11 11 vs 10 10 01 11
4B vs A6 : 01 00 10 11 vs 10 10 01 10
0D vs 93 : 00 00 11 01 vs 10 01 00 11
           ^^                   ^^
It looks suspciously like we are waiting ~2x longer for every interval: We read the second-pair of bits of the byte as the first pair of bits (the underline columns), and the 4th and final pair of bits, we read as the 2nd pair of bits (the bold columns).

The 3rd column from right, is almost always 10, which looks suspiciously like the status bits that come at the end of a JiffyDOS byte transfer. Notice the only difference is the 01 for the $0D, which is the carriage return at the end of the DOS status message, that I believe is sent with EOI, and that's exactly what this bit pair would mean if it was read as the JiffyDOS byte status bits.

So how can we be waiting 2x longer than we should, when all the VHDL simulation tests are passing?

This would require one of the key clocks in the test framework to be wrong.

It seems unlikely to be the micro-second clock in iec_serial.vhdl, since the simulation seems to correctly track the time elapsed compared with that of the delays in the wait_micro() calls. 

It also seems unlikely that the clock being fed into iec_serial.vhdl in synthesis is different, because that would break all sorts of things.

But I am still curious to make sure that the 81MHz and 40.5MHz clocks in simulation are really ticking at the right rate. Because if they aren't, and I've made wait_micro() behave correctly with an incorrect clock, that could explain everything.

I have just confirmed that the 41MHz clock is ticking at the right rate in the simulation, and as it is 81MHz clock / 2, the 81MHz clock must also be fine.

So what about the 1MHz clock?

Yes, that's ticking at the right rate, too.

The micro second timer in iec_serial.vhdl is also ticking at the right rate.

I was wondering if the drive with JiffyDOS wasn't running at 2MHz, but it was a 1541, so that can't be an issue, either.

Perhaps to help debug this, I should make a command that toggles the IEC lines at a fixed expected frequency. Then the actual speed can be probed on the bus using an oscilloscope.  That's probably a good idea.

Well, that's surprising: On real hardware, it really does wait twice as long as it should in the calls to micro_wait()! Why, I don't really have a clear explanation right now. The logic is really quite simple: It just counts 81 cycles of the 81MHz pixel clock.  I really can't see how this is happening.  Anyway, to further explore the issue, I am adding an option to allow selection of 1x or 2x speed for these delays, so that it can be compensated for at run-time.

@johnwayner kindly tested this new bitstream over night, and with the 2x speed setting, it is quite a lot better:

RUN DEVICE 8 TALK SECONDARY ADDRSS 15 TURN AROUND TO LISTEN READ DOS STATUS 73,JIFFYDOS 5.0 1541,03,00 00, OK,30,00 00, OK,00,00 00, OK,00,00 00, OK,00,0000, OK,03,00 00, OK,00,00 00, OK,00,0000, OK,00,00 00, OK,00,00 00, OK,00,00 00, OK,03,00 00, OK,00,0000, OK,00,00 00, READY.

We can actually see the bytes, and most of them are correct, even ;)

I'm suspecting that errors are because the 2x mode couldn't provide exactly 2x speed up, because you can't divide 81 by 2 and get an integer.  I've tried a quick little fix to compensate for this, but if that doesn't work, I'll just make the division loop a bit more accurate by tracking the remainder between clock ticks to bring it into line... but first, it's nap time, as I am still getting over this cold.

@johnwayner and @spairhaid/BAS on discord have been kindly helping with testing on real drives, which has been great.  Partly to support them, I've also written up the initial documentation for the controller to appear in an appendix of the MEGA65 Chipset Reference, and also of course in the "big book".

That weird 2x speed issue is still causing problems: I tried making it default to the modified speed, so that JiffyDOS communications will work out of the box, but now apparently turn-around from talker is failing for the normal protocol, regardless of the speed setting.

So let's try to find the root cause of this.

There is this highly suspicious warning from Vivado (emphasis added):

WARNING: [Synth 8-5787] Register usec_toggle_reg in module iec_serial is clocked by two different clocks in the same process. This may cause simulation mismatches and synthesis errors. Consider using different process statements  [/home/paul/Projects/mega65/mega65-core/src/vhdl/iec_serial.vhdl:276]

That sounds quite plausible as the source of the problem. So let's go and fix that.

Okay, I think I found it :) I was doing this:

usec_toggle <= last_usec_toggle

when what I should have been doing was:

last_usec_toggle <= usec_toggle

Which would have caused glitching on the usec_toggle line, and fully explains the weirdness we are seeing.

This is because usec_toggle gets set in the 81MHz clock domain (to get the exact 1MHz signal, since 40.5MHz isn't a whole number of MHz). Then, in the 40.5MHz domain, it was being modified, as well, because I had that assignment backwards! 

This slipped through for so long because it mostly works: The block that this assignment was in just cared whether the two signals had different values or not, so setting one to the other, or the other way around, it didn't care. 

The problem came when it was doing this assignment at the same time that the 81MHz domain was trying to modify it -- which occurs about 1/2 the time, since 81MHz / 40.5MHz = 2.  Then it would be a bit of a race to see which assignment would take effect, which is partially indeterminate.

Oh such hilarities!

The real root cause is that the MEGA65 core still generates way too many warnings when building, so we tend to ignore them.  We should do something about that at some point.

Okay, so that has fixed the ~1/2 speed thing. Super. All the simulation based tests are still passing, but it's now failing with a real 1541 when it gets to the turn-around from talker to listener.  It works fine with JiffyDOS-equipped drives still, though, so it must be something quite marginal.

@johnwayner and @spairhead/BAS on discord are continuing to be fantastic providing support with testing, while I don't have access to hardware, nor to the full range of Commodore disk drives, with and without JiffyDOS, that they have on hand.

To help resolve this issue, it has spurred me into writing some decent documentation for the IEC controller, and also making my own versions of the IEC timing diagrams that appear in the C64 Programmer's Reference Guide, including corrections to some of the timing requirements that I noticed while implementing this controller. 

At a personal level, I don't really enjoy making these diagrams, as I find drawing programs a bit frustrating, because it takes me (what feels like) ages to make anything in them -- even with something as good and powerful as Inkscape. But I also know that in the process of making these diagrams, I also learn a ton.  

And I have noticed that in all the existing reference material for the Commodore serial bus, that no one seems to have gone to the effort of making updated versions of these charts and the timing requirements of them. The closest that any of that comes at the moment is the great work by Michael Steil in his blog posts about the Commodore serial bus, and the associated git repository. But while Michael has made a couple of very useful timing diagrams of the protocol, they don't cover the full protocol, nor indicate the true timing requirements.

The hardware IEC controller is almost uniquely positioned to both support this, as well as to use this effort to debug its remaining problems: Because it can switch and sense the IEC lines with a precision of <50ns, it will be possible to test the full range of each timing value, to discover when things actually break, as compared to what is claimed in the Programmer's Reference Guide. I can make this much faster and easier to do, by making each of those delays run-time configurable, so that I can write a BASIC65 program that can test each paramater, to determine what the real limits are.

So I'll work through each of the timing diagrams in turn, and pull out the claimed timing tolerances, and make those all run-time adjustable, and I hope, get to the bottom of this regression fairly quickly in the process.

The starting point is the turn-around from talker to listener, as that's the point where it actually gets stuck (although I'm now suspecting the problem is actually with sending bytes under attention, which will be the second diagram I will do).

Here is the turn-around from talker to listener diagram I have made:

For those of you familiar with the diagrams in the C64 Programmer's Reference Guide, you will notice that I haven't included the transfer of a whole byte on either side, as this isn't necessary, and otherwise results in the image being a bit cramped. I really wanted to demystify the whole protocol, thus the fairly generous spacing.  I'll likely make a combined diagram later, that shows a complete sequence of transactions stitched together, so that the reader can also get the big picture.

For each of these, I'm also building a table with the timing constraints for each of the labelled intervals.  In the C64 PRG, these are all collated at the end, which means you have to flip pages to be able to match them up. To avoid this, I'm going to display them on the same page. Here's the one for this diagram:

Following this, I have a set of notes where required:

* TDA is provided by the peripheral, not the controller. The 4541 is able to
respond much faster than a C64 to serial bus events, and thus requires
only 4 usec to ensure the CLK line has time to rise to 5V. For controllers
using software implementations of the protocol, such as the C64, the
minimum is 80 usec. Therefore peripherals should always use a value of
at least 80 usec for TDA .


^ TR is claimed in the Commodore™ 64 Programmer’s Reference Guide to
have a minimum duration of 20 usec. We see no evidence that would
suggest that any implementation requires a delay before the ATN line
can be released following the transfer of a byte.

I've also added commands to the $D698 command register to allow setting each of those timing values.

So that's that one. Now to do sending bytes under attention, which is where I am suspecting the current bug really lies. So let's make that timing diagram:

A couple of things to note: First, the yellow and blue bars are just there to make it visually easier to find out which half of the clock pulses they are referring to. Second, there are a few timing constraints listed here that are not described in the C64 Programmer's Reference Guide, and one that I have changed the name of:

TAC is the time between pulling ATN low, and pulling CLK low.  I'm pretty sure I've seen some problems if they are pulled down simultaneously.

TAL is the time after the first peripheral has responded to the ATN requests by pulling DATA to 0V, before releasing the CLK signal.  It can probably be zero, but nonetheless I've made it adjustable, so that we can confirm this.

Finally, THA is the same as TH, but in ATN bytes. My IEC controller does enforce a non-infinite timeout for this.

With these definitions, there is no part of the transaction that we do not have the timing clearly specified for.

I've set about making all of these timing parameters adjustable, so that we can debug the problem.

In the process, I noticed that the bit setup and valid hold times were 35 micro-seconds each, and I recalled that this is one of the really timing sensitive parts, especially for the 1541 with its 1MHz CPU.  This also tallies with the problem not showing up with a 1581, which has a 2MHz CPU.

So hopefully that will be the root cause resolved. If so, I'll have to go through and make sure I have the bit timings correct for non-ATN TX as well. Interestingly, I had 70 usec for that case!

Okay, I've gone through and parameterised almost all of the delays in the IEC controller, so that they can be controlled at run-time. This will allow us to more easily try fixes.  The current state of play is that 1541 and 1571 drives work fine, but 1581s (whether or not with JiffyDOS) are jamming up when trying to read the DOS status.

@johnwayner has done a great job quickly hacking a waveform visualiser for the IEC bus that uses the data logger that I have built into the IEC controller. It shows nice waveforms like this:


I think it's pretty cool that we can view high-speed signal waveforms in PETSCII on an 8-bit computer, using BASIC!

That capture above is an example of when the communications with the 1581 locks up.  The following is an example of when it was working, before I reworked all the timing stuff:

The first key difference I can see is that during the first byte transfer, in the failing case, it reaches state 127 quite early on -- this results in a DEVICE NOT PRESENT error, which would certainly explain things.  State 127 is when the controller is waiting for a drive that has already begun to respond, to indicate that it is ready to receive the byte under attention.  This suggest to me that T_HA is either too short, or not being properly waited for.

In the process of digging around the code, I realised I was using T_H instead of T_HA here, and that T_HA was not being rescaled from milli-seconds to micro-seconds.  That combination shouldn't be a problem, but it should get fixed anyway.

Actually, with that change, T_H doesn't get read anywhere at all! This is because we really do allow infinite timeout waiting for a device to indicate it is ready for the next byte, which we should.

To debug whether the delay was being properly applied, I have modified the data logger to also capture the duration of each state.  I've also dug up my old 1581, which I am sure was not working last time, but seems to be quite happy today.  Given that 1581 is the most cantankerous of the drives we have tested with this new controller, this might just have been some of the marginal IEC timing problems in the C65 ROMs making it look like the drive was faulty.

Anyway, that's all chugging away re-synthesising, and I'm meanwhile working on the documentation.  It's now about 32 pages, complete with timing diagrams for JiffyDOS, which I haven't seen elsewhere. Hopefully they will be of use to others as well:

 




If you want the full timing specifications and explanations, you will need to grab the books from https://mega65.org/docs, but they will probably take a while to be up-dated to include this new content.  Look for the MEGA65 Compendium or the MEGA65 Chipset Reference Guide to find it. Or you can build your own PDFs from the open-source documentation repository for the MEGA65.

But in the meantime, it's wait until morning when the folks on Discord are able to help me finish tracking down the 1581 stability problems.

@johnwayner has updated the data logger program, so that it now shows the number of cycles in a particular bus state, so that I can try to debug further.

The following video shows how easy it is to do all this completely on a MEGA65, which is really nice:

 


The very first part of the transfer reaches state 157, which is a timeout. This is quite likely the root cause.  The issue seems to be in state 124, where it waits for only 251 cycles at 40.5MHz (about 6 microseconds), instead of up to 64 milliseconds. This is the T_HA delay.  That would certainly explain things. 

Now, the question is why T_HA is not waiting its full duration. The logic allows for delays of up to 2^31 cycles, and the way that T_HA is defined in milliseconds, should end up with it being multiplied by 1024 (because that's cheaper than multiplying by 1000 in hardware).  Yet we get 251, instead of 64*1024 = 65,536 micro seconds.

I'll do a quick test under simulation to see if the value is being calculated correctly and passed into the delay routine. Yes, it does seem to.

Ah, the error in reading the durations might be wrap-around in the counter, because it counts clock cycles, not micro-seconds. Except that that can't be it, either, because I was already sensible enough to clamp it to stay at 65,535 if the delay is longer.

I am wondering if the problem might not be very short glitches on the IEC bus lines. This could trick the controller to exiting T_HA if DATA glitched high briefly, but then when it would go and look at the line, it would be low again.  I can fix this by putting in some logic to de-bounce the IEC lines, which I should probably have done long ago, anyway. Except that the data logger would log those events, so it can't be that. But I think I may need to patch the data logger to show the upper 8 bits of the delay.

The other thing I am wondering, is if the 1581 doesn't like CLK being left high so long after ATN is asserted. That's T_AC, and I can quite easily try adjusting that down to see if it solves the problem.  It's striking me as the most likely cause right now.

Well, @johnwayner has just reported that it works fine, provided we allow more time for the 1581 to reset, or don't reset the drive at all before starting. That is, no changes to T_AC required or anything else like that.  This is odd, since I am sure I tried increasing the delay to 10 seconds, and it still didn't work for me.  But he has tried it 10x in a row, and every time it has worked.

Meanwhile, in our investigating of this, and the fact that the 1581 seems to do something on the IEC bus when it powers up, I discovered a nice list of 1581 ROM routine locations in chapter 10 of the 1581 user's guide.  So I can give it another go to get 1581 simulation working for automated tests.

So let's follow the boot process through a bit. 

The only vector it hits is the one for the DOS error routine, $FF3F, at instruction #1569 to report error 74, i.e., drive not ready. This is not surprising, given the simulated drive doesn't have a disk in it -- or even a working floppy controller. What is surprising, is that it never seems to exit.

Anyway, that mystery can wait for another day, since we have the 1581 working in real-life, and these recent blog posts have had a habit of ending up super-long.


No comments:

Post a Comment