Wednesday, 3 January 2024

Hardware-Accelerated IEC Serial Bus -- Part 3 - JiffyDOS

In the previous post, I got the standard C64/1541 version of the protocol working to at least the point where I could query the DOS status from a real 1541.  But we also want it to work with JiffyDOS, and with the fast serial mode of the 1571 and 1581 drives. Those are two quite different extensions to the standard protocol.

As I was working on all this, I had a nice chat with Gideon, the maker of the Ultimate64 and other goodies for the C64. Gideon is also very talented technically, and also just a really nice guy.  Anyway, we were chatting about how the Ultimate 1541 II+ implements its hardware-accelerated IEC serial bus, and Gideon kindly offered for me to make use of that implementation.  

If this had occurred to me a few months earlier, I probably would have gone down that path. But I am now sufficiently progressed with my own implementation, and also my implementation is designed to be the bus master, rather than a bus slave, that I'm currently planning to finish mine off for use in the MEGA65. That said, knowing that there is a good VHDL implementation of the slave side, that is quite light on FPGA resources is very helpful to know. For example, I could use it in my VHDL simulation based tests -- at least for the slow and JiffyDOS protocols, because Gideon's implementation doesn't currently implement the C128 fast protocol.

I'm generally planning on working on the JiffyDOS protocl next, anyway.  JiffyDOS is pretty amazing, really, with what it achieved. The 1541 and C64 together were horribly, horribly slow.  Commodore fixed this with the C128, by using a hardware serial shift-register to speed things up, fairly credibly, in fairness. But then some clever folks came up with JiffyDOS, that works on a stock C64 and 1541 that lack the working hardware serial shift-register plumbing into the IEC bus, and so had to do things the hard way -- and yet ended up with a system that is faster than even the C128 fast serial protocol.  JiffyDOS continues to be available to buy, because it is so darn handy.

JiffyDOS obtains its speed up through a few different tricks.  The main one, is that it replaces the horrible original serial protocol that Commodore made, that limits the data rate to about 1 bit per 100usec, due to handshaking of each and every bit in a horribly slow, and still not that reliable kind of way.  JiffyDOS instead uses both the CLK and DATA lines to carry data, and just uses accurate timing at each end in place of the handshaking. To do this, it has to partially scramble the order of the bits being sent, to keep to the ~20usec per bit-pair that it sends.  Together, this allows approximately 10x faster transfer than the original.  

Another trick it uses is to optimise things even more when LOADing. That provides a further speed-up of perhaps another 20 -- 30% again. We might implement that in the fullness of time, but not initially, as it's not required, and the marginal gain is relatively slight. A similar situation applies to the C128 fast serial protocol that has a "burst mode" that further increases the speed of things.

The first step is to buy a JiffyDOS license for US$8 for my emulated 1541 in my tests, and then build tests around this. I'll probably keep all the tests the same as for the slow protocol, but just use the different ROM. The ROM itself will not end up in the open-source repository.

I'm preparing these tests and running them on my build-server now.  I'm expecting one or more of them to fail, because my IEC controller already claims to support the JiffyDOS protocol, but doesn't actually implement it.  In the process I've also added support for running the tests in parallel, since my new build server has 8 performance cores, each with Hyper-Threading, compared to the sad old dual-core processor in my laptop.  This nicely reduces the total run time for the tests down to ~25 seconds, instead of about 200 seconds.

Anyway, as expected, all but the simplest tests now fail.  

Let's start with the ATN test for a non-connected device failing to report a DEVICE NOT FOUND error.  This was failing, because it seems that the JiffyDOS ROM asserts the /DATA line on the IEC bus during booting.  I've now fixed this test to hold the 1541 under reset, when it isn't supposed to be present.

Next, the ATN test with a dummy device. This needed a similar change, to hold the simulated 1541 under reset during the boot sequence.

So we now have 5 of the 9 tests passing, with the remaining being the ones that actually require reading from the drive. Let's start with the simplest case for reading from the DOS command channel.

This gets into the routine to read from the drive. i.e., we expect the drive running JiffyDOS to try to do this using the JiffyDOS fast protocol, rather than the original slow protocol. Pleasingly, we see that the JiffyDOS ROM is indeed trying to do this, as it enters the routine at $FF79, as indicated in the documentation that Gideon sent me about the JiffyDOS protocol (I haven't attempted to disassemble the JiffyDOS ROM).

However, what is odd, is that the IEC controller is not able to detect that the drive supports JiffyDOS. This turned out to be that I was putting the JiffyDOS detection pulse one bit early.  With that fixed, the JiffyDOS ROM recognises it. What is odd, though, is that even when this was in the wrong place, and the ROM didn't seem to be recognising it, that it looks like the JiffyDOS ROM thought it should still send the byte using the JiffyDOS protocol.  I'll have to investigate that a bit further.

What I might do here, is add an option to disable the controller from offering JiffyDOS (or the C128 fast) protocols, and confirm that it behaves correctly here.

Anyway, fixing the JiffyDOS protocol detection bug has now caused two of the tests to fail. The test for the ATN sequence with a dummy drive is the first of those regressions. It turns out I still had the JiffyDOS detection logic a bit messed up.  Fixing that has the regressions corrected, and now we are back to the point where I believe the 1541 is trying to talk JiffyDOS to the controller.

Before I proceed further, I might just add an extra test that confirms that disabling the JiffyDOS offer works correctly.  In the process of doing that, I have also refactored the tests to use a bunch of little procedures to really simplify the test definitions.  So now the test definition in VHDL looks a bit like this:

elsif run("ATN Sequence with no device gets DEVICE NOT PRESENT") then
  -- Hold 1541 under reset, so that it can't answer
  f1541_reset_n <= '0';

  POKE(x"D689",x"28"); -- Access device 8 (drive is device 11, so shouldn't respond)
  POKE(x"D688",x"30"); -- Trigger ATN write

  wait_a_while_until_done(400000);

  fail_if_DEVICE_NOT_PRESENT;
  fail_if_TIMEOUT;

I am very happy with how semantically clear this kind of test definition is, including to make it easy to convert the tests to BASIC for testing on real hardware, whenever that's required.

Anyway, back to dealing with the failing tests now that the refactor is done... First is ATN command sequence with no device present. That was a regression from refactoring of the tests.  See if you can spot the 2 bugs in the test definition above :)

With that fixed, we are now up to testing the reading of the DOS error channel, which we expect to fail until we implement the JiffyDOS byte transfer routines in both directions.  Again, I might fork these tests to JiffyDOS and non-JiffyDOS versions.

Okay, I've forked that off, but all of the tests that read the DOS command channel status fail -- whether JiffyDOS is enabled or not. So I must have messed something up in the various refactorings.  Oh well. But that's why I like to have lots of tests -- any regressions are easily caught, and relatively easily corrected with the support of good debug tools.

Now, a good disassembly of the JiffyDOS ROM for 1541 would be helpful here. But since I don't have one, I am instead switching to the normal 1541 ROM during the debugging, since the test fails without JiffyDOS, too.  That way my little tool can print correct information about IEC transfers still.

I did chase my tail around for a while doing this, because something in VUnit likes to keep previously built files, which was silently using the JiffyDOS ROM I had converted to a VHDL file, even when I was asking for the normal ROM.

With that out the way, it looks like I am releasing the ATN line too early after sending the secondary address. As a result, the 1541 falls out of the secondary address handling logic back to idle, and thus is not expecting the turn-around when it comes. I think I had added the ATN release to make the test for the debug RAM in my IEC controller work.  I'm not sure why it took me so long to find the problem here, anyway, its fixed, and all tests pass for non-JiffyDOS simulated 1541 again.

The next hurdle was the JiffyDOS-disabled test, but with a JiffyDOS ROM in the 1541. This turned out to be failing, because my test was looking for "73,C" when checking it could read the command channel. But JiffyDOS ROMs return "73,JIFFY DOS...".  This one was simple to fix: Only look for the "73," part.

So now all the remaining failing tests are those that actually require a working JiffyDOS protocol implementation -- not just the detection of it.

First up we need the RX routine to receive a byte from a JiffyDOS enabled drive. Finally, into the meat of JiffyDOS!

Some of the best documentation of the JiffyDOS protocol is in the MEGA65 OpenROMs project, including a working reference implementation, with such nice routines as jiffydos_rx_byte, which yields the following general summary of the protocol:

1. Wait for drive to release the CLK line.
2. Release DATA line to tell device it can start sending.
3. Wait 15 usec
4. Receive 2 bits on CLK and DATA
5. Wait 10 usec
6. Receive 2 bits on CLK and DATA
7. Wait 11 usec
8. Receive 2 bits on CLK and DATA
9. Wait 11 usec
10. Receive 2 bits on CLK and DATA
11. Wait 11 usec
12. Receive 2 status bits on CLK and DATA
13. Wait 4 usec.
14. Pull DATA to 0V to tell drive to wait before sending next byte
15. If CLK was low in status read, then byte was successfully read, and terminate.
16. If DATA was low, it's EOI.
17. If DATA was high, its a timeout error (i.e., the drive probably didn't send anything).

All the bits are sent inverted, so we have to invert them on reception.

This is a much simpler protocol than Commodore's slower protcol, both in hardware and software implementations, and it hasn't taken me long to write something that looks plausible.

In the process of testing, I discovered that JiffyDOS only reports itself when being sent a TALK or LISTEN command, not, for example, a secondary address. So I need to take this into account when updating the device capability flags. With that, I now have both the JiffyDOS ROM and my IEC controller thinking that they need to communicate using the JiffyDOS protocol.

Of course, it still doesn't work.  Using my debug tooling, it looks like the JiffyDOS ROM doesn't start sending the byte until well after my IEC controller expected the data to arrive.

Ah! This is because I tried to read a byte probably less than one microsecond after asking the 1541 to do something, and its 1MHz CPU takes time to process things, including completing its house-work after turning around to talk.  I could just allow some extra time, but I'm not yet sure I understand how JiffyDOS indicates it is ready to send the first byte following turn-around.  

It is supposed to be when CLK goes high, but based on the documentation from Gideon, it looks like the JiffyDOS send byte routine releases CLK some time before it is actually ready to send the byte.  I've added a 100usec delay to the end of the turn-around to listen that has helped. Now it seems to receive a byte from the 1541, but it is being read as $00, and indicated as EOI.  Either that's true, in which case something else has gone wrong, or the byte RX is still getting confused data.  

I just tried adding huge (700usec) delays at the end of the turn-around, as well as to the start of the JiffyDOS receive routine. Neither (nor their combination) solves the problem.

Using the simulation, I can spy on whatever the byte is that is supposed to be sent, to see which it is... and I can see that it is indeed $37, the first character of the DOS status message. So somehow we are still not being properly synchronised with the 1541 JiffyDOS ROM.

Well! After some fiddling around, I realised that my test bed was failing due to a time-out if I extended the timeout in the IEC turn-around function. I had previously assumed that the failures as I allowed longer times in the turn-around were the protocol not working properly. But instead it was my test harness failing.

Once I made the turn-around test allow much more time, suddenly I was receiving correct bytes using the JiffyDOS protocol! I tried to find the critical time delay required, but the turn-around test was now adding a constant delay, masking the effects.

I've now reworked the turn-around test, to only wait until the READY flag of the IEC controller is re-asserted, so that it doesn't fail if I increase the delay in the turn-around function. Now I can tune it to find the safe time. It needs quite a while -- around 540 usec. I'm going to allow 600 usec, just to be safe.

Interestingly, none of the documentation I have found suggests that this delay should be necessary. The turn-around protocol is supposed to not complete until the sender is actually ready. But for JiffyDOS this is either not the case, or there is some subtle difference in the way that JiffyDOS responds following a turn-around that I am not aware of. I'm actually a bit suspicious that this might be the case. 

Except it turns out to be my imagination, perhaps because of that problem with the testing of the turn-around.  It now looks to be behaving perfectly.

So let's see how our test result table is looking now:

==== Summary ============================================================================================================================
pass lib.tb_iec_serial.Debug RAM can be read                                                                          (1.8 seconds)
pass lib.tb_iec_serial.ATN Sequence with no device gets DEVICE NOT PRESENT                                            (1.9 seconds)
pass lib.tb_iec_serial.ATN Sequence with dummy device succeeds                                                        (3.5 seconds)
pass lib.tb_iec_serial.ATN Sequence with VHDL 1541 device succeeds with JiffyDOS and C128 FAST disabled               (10.6 seconds)
pass lib.tb_iec_serial.ATN Sequence with VHDL 1541 device succeeds                                                    (10.7 seconds)
pass lib.tb_iec_serial.Simulated 1541 runs                                                                            (11.5 seconds)
pass lib.tb_iec_serial.Read from Error Channel (15) of VHDL 1541 device succeeds                                      (23.5 seconds)
pass lib.tb_iec_serial.Read from Error Channel (15) of VHDL 1541 device succeeds with JiffyDOS and C128 FAST disabled (23.7 seconds)
pass lib.tb_iec_serial.Read from Error Channel (15) of VHDL 1541 with delay before turn-around                        (27.4 seconds)
pass lib.tb_iec_serial.Read from Error Channel (15) of VHDL 1541 with SRQ low                                         (27.9 seconds)
fail lib.tb_iec_serial.Write to and read from Command Channel (15) of VHDL 1541 device succeeds                       (18.8 seconds)
=========================================================================================================================================
pass 10 of 11
fail 1 of 11
=========================================================================================================================================
Total time was 161.1 seconds
Elapsed time was 27.9 seconds
=========================================================================================================================================


Three more tests are passing now (in bold) :)

Only one test is still failing -- and that's the one that requires the JiffyDOS byte send function, which I have not yet implemented.  So let's attack that, and see if we can't get that last test passing, and thus reach the point where we believe that JiffyDOS support should be complete.

Sending bytes from the computer to drive doesn't require the turn-around logic, so I'm quietly hopeful that this will be much faster for me to implement, than the receiving was.

Okay, so I have the general gizzards of the JiffyDOS TX routine in place.

First, we have to decide when to use it:

            -- SEND A BYTE (no attention)
          when 400 =>

            -- First, make sure ATN has been released.
            a('1');
            -- T_R -- Release of ATN at end of frame: 20 usec
            -- But we don't need to pay it if ATN was already released
            if iec_atn_int = '0' then
              micro_wait(20);
            end if;

            -- Decide whether to send using slow, fast or JiffyDOS protocol
            if iec_devinfo(6)='1' then
              -- Assume drive will be expecting JiffyDOS protocol
              iec_state <= 480;
            elsif iec_devinfo(5)='1' then
              -- Assume drive will be expecting C128 fast serial protocol
            else
              -- Use original slow Commodore serial protocol
              null;
            end if;

Then if we are using JiffyDOS protocol, we have this section of the state machine:

          when 480 => report "IEC: Sending byte using JiffyDOS(tm) protocol";
                      wait_data_high <= '1';
          when 481 =>                     c('1');             micro_wait(10);
          when 482 => d(iec_data_out(5)); c(iec_data_out(4)); micro_wait(13);
          when 483 => d(iec_data_out(7)); c(iec_data_out(6)); micro_wait(11);
          when 484 => d(iec_data_out(1)); c(iec_data_out(0)); micro_wait(13);
          when 485 => d(iec_data_out(3)); c(iec_data_out(2)); micro_wait(20);
          when 486 => d('1');             c(send_eoi);        micro_wait(20);
                      send_eoi <= '0';
                      wait_data_high <= '1';
          when 487 => if iec_data_i='1' then
                        -- ERROR: Report timeout
                        iec_dev_listening <= '0';
                        iec_devinfo(1) <= '1';
                        iec_devinfo(0) <= '1'; -- while outputting data
                        iec_busy <= '0';
                        iec_state_reached <= to_unsigned(iec_state,12);
                        iec_state <= 0;
                      else
                        -- No error, JiffyDOS drive is busy again
                        null;
                      end if;
          when 488 => report "IEC: Successfully sent byte using JiffyDOS(tm) protocol";
                      iec_devinfo(7) <= '0';
                      iec_busy <= '0';

                      iec_dev_listening <= '1';

                      -- And we are still under attention
                      iec_under_attention <= '0';
                      iec_devinfo(4) <= '0';

And it looks like it is doing what it should. But the JiffyDOS ROM in the simulated 1541 is getting stuck waiting for the CLK line to be released. I can see the simulated CPU in a loop doing BIT $1800  / BNE *-3, i.e., waiting for $1800 to read as $00. 

Now, this leads me to suspect that JiffyDOS for the 1541 has an interesting bug: You _must_ strap the drive to be hardware device 8, or else this will fail, because the device number straps are on bits 5 and 6 of this byte, and if either or both are high (as in my simulation, where I have the drive strapped to device 11), then it will keep reading those bits, no matter what, because those pins are set to input.  In my case, it keeps reading $60.

Changing the device to 8 has indeed avoided this problem (although it isn't working correctly yet). The question is whether it is a bug in JiffyDOS, or in my 1541 VHDL implementation, specifically if the VIS 6522s have a different behaviour that would allow the JiffyDOS code to work. I'll ask around to see if anyone has a 1541 with device number switches on it and JiffyDOS, to see if they can reproduce the problem on a real drive (it should refuse to communicate, e.g., if you send a DOS command to it). If you have such hardware, and feel inclined to test it out, let me know in the comments if you can reproduce the problem, and if so, what model of 1541 and version of JiffyDOS you have.

Anyway, back to figuring out the remaining problems.

The first byte sent should be $55, but is being received as $A3.  The upper nybl is just inverted. The lower nybl is also inverted, but has a couple of bits swapped around, because that makes it more efficient for the 1541 to decode. I've added a new test that specifically tests the bit order and polarity of the bytes sent via the JiffyDOS protocol, and with that got that test passing. 

I also speed up most of the tests by tweaking the way I wait for things in the tests. However, that has resulted in a regression of the test that writes to the command channel and reads the result back.

I've narrowed it down to needing only this delay here:

        report "IEC: Sending UI- command";
        iec_tx(x"55");  -- U
        iec_tx(x"49");  -- I
      wait_a_while(1800000);
        iec_tx_eoi(x"2D");  -- +

 

Moving the delay to follow the iec_tx_eoi doesn't work.

But it can be moved to be between the two iec_tx calls, instead.

And what about just before them? Nope, that doesn't work, either.

This doesn't make a great deal of sense: The slow IEC protocol, which is what is being used here, is supposed to not allow transfers to get de-synchronised. 

So let's take the delay out, and try to see what goes wrong. Does the 1541 receive the UI- command at all? The hack I added to my test harness to let me see the most recently received byte by the 1541 will let us determine this.  Probably, I should just add that following all iec_tx() and iec_tx_eoi() calls, anyway.

All three bytes of the "UI-" command are correctly received, so that's good.

So what about the UNLISTEN command following that?

That gets stuck quite early in the ACPTR routine around $EA0E, which is a tight loop checking for CLK=0V, which has already expired.  This is weird, since CLK has been low for over a milli-second. 

Ah! Delaying the asserting of CLK for 20 usec after asserting ATN seems to fix it!

I believe that the $E9C9 routine expects to see CLK drop, and then rise again, so far as I can tell, and if the two are too close together, then it breaks.

So will this also fix the same test under JiffyDOS? Also, is the 20 usec always enough? Does it need to be a bit more for safety?

Well, we can't answer these questions yet, because there is a regression with the JiffyDOS TX to the drive.  Sending the first byte works ok, but the 2nd byte is not received. 

It looks like our JiffyDOS TX routine is not allowing enough time after sending the status bits, before being ready to try sending again. 

Nope, looks more like we might be sending the wrong status bits, and its being interpreted as EOI. We have to assert DATA=0V for a little while, before releasing it, to check whether the 1541 has received the byte or not. With that corrected, that test now passes.

So now we have only 4 JiffyDOS tests failing -- all the ones that read from the DOS status channel.

It looks like we start trying to receive the bytes a little too early, before the JiffyDOS send routine in the 1541 is ready. This doesn't happen for the first byte received, but the second, so probably the time we hold after receiving a byte is a bit short, after refactoring all the test timing stuff. Anyway, after a bit of fiddling around, I have that all working again. I'm actually glad I tightened the timing of the tests up, as otherwise it wouldn't have revealed these problems.

Now the only test that is failing with JiffyDOS is the one that sends the UI- DOS command, and then reads the status back.

That one looks like it gets to the point where it tries to tell the drive to UNLISTEN after having sent the UI- command.  It might just be that we have to allow more time for the DOS command to finish executing.

It's actually possible that the JiffyDOS ROM executes the command immediately after sending the byte with EOI (or that I am not sending the EOI correctly).

It looks like JiffyDOS is detecting EOI on the 2nd of the 3 bytes, rather than on the 3rd one. As a result it will be busy processing the UI command when the controller asserts ATN. This means that the timing must be a bit out.  Hopefully it won't be hard to find and fix.

It was indeed not too much trouble: I just needed to reduce the delay before sending the status bits, and then increase the delay after sending them a bit in compensation. Now JiffyDOS on the 1541 correctly detects the 3 bytes of the "UI-" command, including that exactly the last byte is the EOI. However, JiffyDOS stays in the receive loop, and doesn't notice that ATN is pulled low by the controller.

Looking deeper, I had the EOI indication not quite right: CLK gets pulled low early or late by the controller depending on no-EOI or EOI condition. Anway, after fixing that, it's still doing the same thing.

Tracing through, I can see that the ATN line triggers an IRQ, and that the ATN event is logged into location $7C.  It looks suspiciously like the EOI is not really recorded, or there is some other subtle thing I don't understand about how the JiffyDOS ROM handles the end of stream.

Once JiffyDOS in the 1541 is committed to reading the next byte from the computer, it's already too late for an ATN event to be noticed. So I will need to work backwards in time, to see where the decision point goes wrong.

Following the standard 1541 ROM disassembly, I can confirm that I have correctly marked the EOI, as it is used to correctly determine the end of command at $CFED.

Ah! I think I might have it: When sending a byte by EOI via JiffyDOS, you have to also assert ATN immediately: You have only one window for it to be noticed, which is when the status bits are read by the 1541 side.

This gets it further: The 1541 breaks out of the tight JiffyDOS receive loop, and I even can read the DOS status back after. But it's still 73,JIFFY..., not 00,OK... I think this is because I need to also release ATN, so that the drive can get back to the IDLE state, and process the queued command.

It looks like the ATN pulse has to be just long enough for the status bits to latch it, but then released again before the drive gets to the normal serial receive byte loop, where the proper check for ATN is made.

I also spotted a stupid bug in the test definition that was introduced in the timing rework I did: The delay for the UI- command to execute was effectively removed, which would also explain why I was seeing the 73,... instead of 00,... message.

Hmm... it looks like with the JiffyDOS protocol, that the byte accompanying an EOI is not actually processed.  This is a bit annoying, as it means we need a separate case for sending EOI after sending the character normally.

This is slightly complicated by the tb_iec_serial tests checking that the most recently sent byte is actually correctly received by the drive. So we can't just send a dummy byte with $FF, but have to actually send the same byte again. Oh well.

Hmm... nope. Not quite. The EOI has to be indicated on the last byte, yes, but then ATN has to be asserted in that very short window of opportunity. Or some other subtle thing that I don't quite understand. Anyway, the point is, that with the EOI indication on the byte, and then sending the byte a 2nd time using the JiffyDOS protocol with ATN asserted, it causes the correct behaviour, and the test passes.

I'm not totally happy with this, as it feels like there should be a sequence of actions that correctly aborts it without sending the 2nd byte. My best guess is that on a real C64, the code path in the KERNAL to EOI and unlisten a channel normally happens with the "just right" delay, for this to all magically work.  

What I have implemented is, on reflection, I think, safer.

Anyway, the happy test result summary for JiffyDOS is now obtained:

Next will be to test this on a real drive that has JiffyDOS, as well as re-test on my stock 1541 that lacks JiffyDOS, to make sure that there have been no regressions for function on real hardware during all this simulation-based testing.

No comments:

Post a Comment