Sunday 31 December 2023

Hardware-Accelerated IEC Serial Interface - Part 2

In a previous post, we got the hardware-accelerated IEC serial interface to the point where I could send bytes under attention to connected devices, and perform the talker-to-listener turn-around, using a VHDL simulation of a 1541 and some tools I built to test and debug it.  Now its time to get receiving of bytes from the bus working!

Receiving of bytes using the "slow" protocol is fairly easy: Wait for CLK to go to 5V to indicate the sender is ready, then release DATA to 5V to indicate that we are ready to receive a byte.  Then 8 bits will be sent that should be latched on the rising-edge of CLK pulses.  Once all 8 bits are received, the sender drives CLK low one more time, after which the receiver should drive the DATA line to 0V to indicate acknowledgement of the byte -- thus getting us back to where we started, i.e., ready to receive the next byte.

The C128 fast serial protocol uses the SRQ line instead of CLK to clock the data bits, and drives the data bits at a much faster rate.  Bits are accepted on the rising edge of the SRQ, but the bit order is reversed compared to the normal protocol. Which protocol is being used can be determined by whether CLK or SRQ pulses first.  We'll start for now with just the slow protocol, because that's all our VHDL 1541 does.  Likewise, we'll worry about the JiffyDOS protocol a bit later.

Okay, so I have quickly implemented the receiving of bytes using both the slow and C128 fast protocol -- although I can't easily test the latter just yet. Remarkably, the only bug in the first go, was that I had the bit orders all reversed.

I've also refactored the test driver, so that now my test looks something like this:

elsif run("Read from Error Channel (15) of VHDL 1541 device succeeds") then

  -- Send $48, $6F under ATN, then do turn-around to listen, and receive
  -- 73,... status message from the drive.

  report "IEC: Commencing sending DEVICE 11 TALK ($4B) byte under ATN";
  atn_tx_byte(x"4B"); -- Device 11 TALK

  report "IEC: Commencing sending SECONDARY ADDRESS 15 byte under ATN";

  report "IEC: Commencing turn-around to listen";

  report "IEC: Trying to receive a byte";

  -- Check for first 4 bytes of "73,CBM DOS..." message

I'm liking how clear and consise the test definition is.  It's also nice to see this passing, including reading those 4 bytes! Actually, the simplicity of this test definition speaks to the simplicity of the interface that I am creating for this: Each of those test functions really just writes to a couple of registers in the IEC controller, and then checks the status register for result. In other words, these tests can be easily adapted to run on the real-hardware when done.

So anyway, we now have TX under attention, turn-around and RX working.  The next missing piece is TX without attention, which should in theory be quite similar to TX under attention.  The key difference is that TX without attention is allowed to use FAST and JiffyDOS protocols. This means it probably needs a separate implementation.  Again, I'll start with just the slow protocol. In terms of what to test, I'm thinking I'll send a command to the 1541 to do something, and then try to read the updated status message back.  I'll go with the "UI-" command, that sets the faster VIC-20 timing of the bus protocol, since it doesn't need a disk in the drive.

Hmm, the UI- command is implemented using a horrible, horrible hack in the 1541 ROM by the look of things:  Take a look at $FF01 in and you will see that they basically put a shim routine in the NMI handler for the CPU that checks for the - sign of the UI- command.  I've not dived deep enough into the 1541's internals to know why they are even using an NMI to dispatch user commands. Ah -- actually it is that the UI command just jumps to the NMI vector. So they just hacked in the + / - check.  That's an, er, "efficient" solution, I suppose.

I guess in a way, it doesn't matter, provided we know that it gets executed, which we can do by watching if the PC reaches the routine at $FF01, and in particular at $FF0D where the C64/VIC20 speed flag is written to location $23.

Well, as easy as the IEC RX was, the IEC TX without ATN is proving to be a bit of a pain. Or at least my general bus control doing more complex things. Specifically, I have written another test that tries to issue the UI- command, but it fails for some reason. This is what I am doing in the test:

        -- Send LISTEN to device 11, channel 15, send the "UI-" command, then
        -- Send TALK to device 11, channel 15, and read back 00,OK,00,00 message

        report "IEC: Commencing sending DEVICE 11 LISTEN ($2B) byte under ATN";
        atn_tx_byte(x"2B"); -- Device 11 LISTEN

        report "IEC: Commencing sending SECONDARY ADDRESS 15 byte under ATN";

        report "IEC: Sending UI- command";
        iec_tx(x"55");  -- U
        iec_tx(x"49");  -- I
        iec_tx(x"2D");  -- +

        report "IEC: Sending UNLISTEN to device 11";

        report "Clearing ATN";
        report "IEC: Request read command channel 15 of device 11";
        report "IEC: Commencing turn-around to listen";

        report "IEC: Trying to receive a byte";

        -- Check for first 4 bytes of "00,OK..." message

It gets to the ATN release, and then ATN doesn't release.  It looks like something wonky is going on with the IEC command dispatch stuff, because in the atn_release procedure it first does a command $00 to halt any existing command, and then a $41 to release ATN to 5V.  It does the $00 command, but not the $41:

e): IEC: Release ATN line and abort any command in progress
: IEC: REG: register write: $00 -> reg $8
: IEC: REG: register write: $41 -> reg $8
 IEC: Command Dispatch: $00
e): IEC: Request read command channel 15 of device 11
: IEC: REG: register write: $4B -> reg $9

It looks like I don't wait long enough after issuing the $00 command, before issuing the $41 command. Adding a delay now gets it further, and it tries to read the command channel status back, but still sees the 73, CBM DOS message there, not a 00, OK.  So I need to look at what is going on there. Do I need to put an EOI on the last byte of the command? Do I have a bug in my 6502 CPU for the 1541? Is it something else? I'll have to investigate another day, as its bed time now.

Found one problem: You can't do $6F if you then want to write to the command channel. Instead you have to do $FF to open the command channel.  With that, I am now getting the problem that the drive is not acknowledging the $FF byte.  No idea why yet, but it's something very specific to investigate, which makes our life much easier.

Digging through the 1541 ROM disassembly I found that indeed the DOS commands do have to be terminated with an EOI, so I will need to implement EOI next.  But for some reason I'm not getting a DEVICE NOT PRESENT when sending the $FF secondary address.  Sending $48 to talk instead works fine, though.  

Well, I had the ACK sense of the data line inverted in my check. How that was even working before, I have no idea, other than my very fast hardware implementation was able to think the byte was being ACK'd before the 1541 ROM had time to pull DATA low. That is, the famous charts on pages 364-365 of the Programmer's reference guide erroneously claim T_NE has no minimum, when in fact it does.

The next problem is that the 1541 is missing a bit or two in the byte being sent as part of the DOS command.  This seems to rather ironically be because the sender is sending the bits slightly slower than it would like, which causes the 1541 ROM to have to loop back to check for the bit again.  

More carefully reading the timing specifications, I am still being a bit too fast: It should be 70usec with CLK low, and 20 with CLK high for each bit. I was using 35 + 35. I'm a little sceptical that 70+20 will work, as I'm fairly sure that more like 35 usec is required with CLK high. But we will see. It might just be that this combination causes the 1541 loops to be in the right place, for it to work.

Well, it does seem to work. So that's a plus.

Now I am back to reading error code 73,CBM DOS... instead of 00, OK,00,00, because the DOS command is not being executed.  As already mentioned, this is partly at least because I need to implement EOI.  But I'm also seeing the 1541 seems to not think that the command channel is available when processing the bytes received:

: IEC: REG: register write: $55 -> reg $9
: IEC: REG: register write: $31 -> reg $8
 IEC: Command Dispatch: $31
            iec_state = 400
    +2702.663 : ATN=1, DATA=0(DRIVE), CLK=0(C64)
            iec_state = 401
      +19.062 : ATN=1, DATA=0(DRIVE), CLK=1
$EA2E            1541: LISTEN (Starting to receive a byte)
$EA44            1541: LISTEN OPEN  (abort listening, due to lack of active channel)
$E9C9            1541: ACPTR (Serial bus receive byte)
$E9CD            1541: ACP00A (wait for CLK to go to 5V)

Now, I don't know if that's supposed to happen, and it's just the lack of EOI stuffing things up, or whether it's a real problem.  And I won't know until I implement EOI, so let's do that before going any further.  For EOI, instead of holding T_NE for 40 usec, we hold it for 200usec, and expect to see the drive pull DATA low for ~60usec to acknowledge the EOI.

I have that implemented now, but the 1541 is just timing out, without acking the EOI.  It looks like the 6522 VIA timer that the 1541 uses to indicate this situation isn't ticking.  It looks like my boot speed optimisations have probably prevented something that needs to happen: It looks like the code at $FB20 is what enables timer 1.  But that seems to be in the disk format routine, so I don't think that's right.  Maybe the 6522 is supposed to power on with Timer 1 already in free-running mode?  It's a mystery that will have to wait until after I have slept...

Ok, nothing like a nights sleep to nut over a problem.  At $ED9A the high byte of timer 1 gets written to, which should set the timer running in one-shot mode. But it never counts down.  That's the cause of the EOI timeout not being detected.  Digging into the VIA 6522 implementation I found the problem: When I had modified it to run at 40.5MHz internally, I messed up the cycle phase advance logic. This meant that it never got to the point of decrementing the timer 1 value, and thus the problem we have been experiencing. There were also some other problems with the shift to 40.5MHz, that were preventing writes to the registers of the 6522.

With all that fixed, I can see the timer 1 counting down during the EOI transmission, but when it reaches zero, the interrupt flag for the timer is not being asserted, and thus the 1541 never realises that the timer has expired.

Ok, found and fixed a variety of bugs that were causing that, including in my IEC controller, as well as in the 40MHz operation of the VIA 6522.  With that done, the EOI byte is now sent, and it looks like it is correctly recognised. I now need to check:

1. Does the 1541 really process the EOI?

2. Why does the 6502 in the 1541 get stuck in an infinite IRQ loop when ATN is asserted the next time? (given the first time ATN was asserted, there was no such problem)

Hmm... Well, I did something that fixed that. Now it's actually trying to parse the DOS command, which is a great step forward :)  It's failing, though, because its now hitting some opcodes I haven't implemented yet in the CPU, specifically JMP ($nnnn).  That won't be hard to implement. And indeed it wasn't. So where are we up to now?

The command to get the device to read out the command channel after sending the command fails with device not present.  This seems to be because the 1541 is still busy processing the DOS command. The controller has no way to detect this, so far as I can tell. The simple solution is to make sure that we allow enough time for the DOS command to be processed.

After allowing some more time, and fixing a few other minor things, including in the test, the test now passes. Here's the final test script. See if you can spot the errors I picked up in the test definition:

      elsif run("Write to and read from Command Channel (15) of VHDL 1541 device succeeds") then

        -- Send LIST

EN to device 11, channel 15, send the "UI-" command, then
        -- Send TALK to device 11, channel 15, and read back 00,OK,00,00 message

        report "IEC: Commencing sending DEVICE 11 LISTEN ($2B) byte under ATN";
        atn_tx_byte(x"2B"); -- Device 11 LISTEN

        report "IEC: Commencing sending OPEN SECONDARY ADDRESS 15 byte under ATN";
        atn_tx_byte(x"FF"); -- Some documentation claims $FF should be used
                            -- here, but that yields device not present on the
                            -- VHDL 1541 for some reason?  $6F seems to work, though?

        report "Clearing ATN";
        report "IEC: Sending UI- command";
        iec_tx(x"55");  -- U
        iec_tx(x"49");  -- I
        iec_tx_eoi(x"2D");  -- +

        report "IEC: Sending UNLISTEN to device 11";

        report "Clearing ATN";

        -- Processing the command takes quite a while, because we have to do
        -- that whole computationally expensive retrieval of error message text
        -- from tokens thing.
        report "IEC: Allow 1541 time to process the UI+ command.";
        for i in 1 to 300000 loop
        end loop;         
        report "IEC: Request read command channel 15 of device 11";
        report "IEC: Commencing turn-around to listen";

        report "IEC: Trying to receive a byte";

        -- Check for first 5 bytes of "00, OK..." message

This is quite a satisfying achievement, as it is a complete plausible IEC transaction.

Okay, now that we have a lot of it working, it's time to implement detection of EOI, and then we have the complete slow serial protocol implemented, so far as I can tell, and I can synthesise a bitstream incorporating it, and see if I can talk to a real 1541 with it, before I move on to completing and testing the JiffyDOS and C128 fast serial variants of the protocol.

The EOI detection ended up being a bit fiddly, partly because the IEC timing diagrams don't indicate which side of a communication is pulling lines low, so I confused myself a couple of times. But it is the _receiver_ of an EOI who has to acknowledge it by pulsing the DATA line low.  Once I had that in place, it was fine.  The test does now take about 5 minutes to run, because I have to simulate several milli-seconds of run-time to go through the whole process that culminates in receiving a character with EOI.  In the process, I did also make the reception of non-EOI bytes explicitly check that no EOI is indicated in that case.

The next step is to test the bitstream.  I did a quick initial test, but it didn't talk to my 1541. Also, my Pi1541 I think has a blown 7406 that I need to replace.  I think I have spares in the shed. I'm keen to measure the voltage directly at the IEC connector when setting and clearing the various signals, to make sure that it is behaving correctly. I might finally get around to making an IEC cable that has open wiring at one end, so that I can easily connect it to the oscilloscope.

A quick probe with the oscilloscope and a simple BASIC program that POKEs values to $D698 to toggle the CLK, DATA and ATN lines shows no sign of life on the pins. So I'm guessing that my plumbing has a problem somewhere. So let's trace it through.

The iec_serial.vhdl module has output signals called iec_clk_en_n, iec_data_en_n and iec_atn.  Those go low when the corresponding IEC lines should be pulled low. We know those work, because our VHDL test framework confirmed this.

The iec_serial module is instantiated in iomapper, where those pins are assigned as follow:

        iec_clk_en_n => iec_hwa_clk,
        iec_data_en_n => iec_hwa_data,
        iec_srq_en_n => iec_hwa_srq,

The 2nd CIA that has the existing IEC interface then has the following pin assignments:

    -- CIA port a (VIC-II bank select + IEC serial port)
    portain(2 downto 0) => (others => '1'),
    portain(3) => iec_atn_reflect, -- IEC serial ATN
    -- We reflect the output values for CLK and DATA straight back in,
    -- as they are needed to be visible for the DOS routines to work.
    portain(4) => iec_clk_reflect,
    portain(5) => iec_data_reflect,
    portain(6) => iec_clk_external,
    portain(7) => iec_data_external,
    portaout(2) => dummy(2),
    portaout(1 downto 0) => dd00_bits_out,
    portaout(3) => iec_atn_fromcia,
    portaout(4) => iec_clk_fromcia,
    portaout(5) => iec_data_fromcia,
    portaout(7 downto 6) => dummy(4 downto 3),
    portaddr(3 downto 2) => dummy(8 downto 7),
    portaddr(4) => iec_clk_en,
    portaddr(5) => iec_data_en,
    portaddr(7 downto 6) => dummy(10 downto 9),
    portaddr(1 downto 0) => dd00_bits_ddr,

    spin => iec_srq_external,
    spout => iec_srq_o,
    sp_ddr => iec_srq_en,
    countin => '1'

This is then all tied to the output of iomapper.vhdl using the following:

           iec_clk_reflect <= iec_clk_fromcia and (not iec_hwa_clk);
      iec_data_reflect <= iec_data_fromcia and (not iec_hwa_data);
      iec_atn_reflect <= iec_atn_fromcia and iec_hwa_atn;
      iec_clk_o <= iec_clk_fromcia and (not iec_hwa_clk);
      iec_data_o <= iec_data_fromcia and (not iec_hwa_data);
      iec_atn_o <= iec_atn_fromcia and iec_hwa_atn;

This is kind of doubled-up, because we have the reflected internally readable signals with _reflect suffixes, as well as the signals actually being output on the pins with _o suffixes. 

Where it gets a bit crazy, is that the inputs and outputs are reversed on the CIA to the IEC port, because they go through either invert buffers on the input side, or through open-collector inverters on the output side -- and both are joined together, as can be seen in the following snippet of the C64 schematic:

In any case, the input side comes in via iec_clk_external and friends.  Those don't come into the current situation, because we are just trying to debug the setting of IEC port lines at the moment.

So let's move up to machine.vhdl that instantiates iomapper.vhdl, that really just passes the signals up to the next level, that is instantiated by the top-level file for each target, e.g., mega65r4.vhdl or mega65r3.vhdl.

In those files, iec_clk_o is connected to iec_clk_o_drive, and so on. This gives the FPGA an extra 25ns clock tick to propagate the signals around.  

Because the MEGA65 uses active-low open-collector output buffers for the IEC port, we have both an enable and a signal level we can set.  If you set the enable to low, whatever is on the signal level will get pushed out onto the pin. Otherwise it is tri-stated, i.e., now pushing a signal out. So we have the following code:

      -- The active-high EN lines enable the IEC output drivers.
      -- We need to invert the signal, so that if a signal from CIA
      -- is high, we drive the IEC pin low. Else we let the line
      -- float high.  We have external pull-ups, so shouldn't use them
      -- in the FPGA.  This also means we can leave the input line to
      -- the output drivers set a 0, as we never "send" a 1 -- only relax
      -- and let it float to 1.
      iec_srq_o <= '0';
      iec_clk_o <= '0';
      iec_data_o <= '0';

      -- Finally, because we have the output value of 0 hard-wired
      -- on the output drivers, we need only gate the EN line.
      -- But we only do this if the DDR is set to output
      iec_srq_en <= not (iec_srq_o_drive and iec_srq_en_drive);
      iec_clk_en <= not (iec_clk_o_drive and iec_clk_en_drive);
      iec_data_en <= not (iec_data_o_drive and iec_data_en_drive);

That is, by keeping the selected signal set to 0, we can pull each IEC line low by setting the corresponding _en signal low.  

Now, as mentioned earlier, the CIA sees these lines inverted. Thus we have to invert them here with those 'not' keywords.  This is also why we invert the iec_serial signal select lines, because they are inverted for a second time here at the top level.  

Anyway, all that plumbing looks fairly reasonable. I did see that the SRQ line was not correctly plumbed, which I have fixed.

However one subtle point is important here: The logic that takes the iec_serial outputs only uses the CIA to select the pin directions. This means that if the CIA has the CLK or DATA pins set to input, that the iec_serial logic can't drive the physical pins.

That shouldn't be an issue, though, because the CIAs are normally set to the correct directions on boot.  But it's also the most likely problem in the setup that I can think of right now. It can of course be fairly easily checked. I can also try using the CIA to control the IO lines, and see whether that's working, or if there is some sign that I have borked something else up, e.g., in the FPGA pin mappings.

So, trying to toggle the lines using the CIA, I can make DATA and ATN toggle, but not CLK.  I've just double-checked on the schematic to make sure that the FPGA pins for the CLK line are set correctly, and they are. The CLK line stays at +5V, so I assume that the enable line for output to the CLK pin is not being pulled low when it should, since the actual value on the signal is tied to 0 in the FPGA.

Hmm... I can't see any problem with that plumbing.  This makes me worry a bit that the CLK output driver on my R4 board might be cactus.  I should be able to test this fairly easily by making a trivial bitstream that just toggles it, or finding another test bitstream that can do that for me, without me having to resynthesise it.

It's possible that this line is borked.  Sometimes I can switch the output driver on and off, but other times it resists.  Interestingly, when I rerun the bitstream, it starts with CLK low, and I can then change the _o and _en lines, and make it go high. It's the getting it to go low again that is the problem.

I can toggle the iec_clk_en line, if I haven't touched the iec_clk_o line first, and it seems to switch reliably.    Hmm.. Made a new bitstream with iec_clk_o tied low, and the switching using iec_clk_en works fine -- which is actually what the full MEGA65 bitstream is doing. So it's all a bit weird.

I'm now just going to go through all the IEC line plumbing like a pack of epsom salts, and make sure everything is consistent. I'm going to make all the signals match the expected voltages on the line, i.e., binary 0 will cause 0V, and binary 1 will cause release to pull-up to 5V, and generally simplify everything as much as I can.

Okay, with all that done, I can use the Pi1541, so all the IEC lines are working. This also confirms that the addition of the iec_serial module has not interfered with the operation of the IEC bus.

So let's now try setting and clearing the various lines via the iec_serial.vhdl interface.

The DATA line can be controlled, and if asserted via iec_serial.vhdl, then the normal IEC interface is non-responsive until it is released -- as it should be. The same goes for the ATN line and the CLK line! So it looks like we should be good to try the hardware accelerated IEC communications.

... except that it doesn't want to talk to the Pi1541 at least, and I can't get it to report DEVICE NOT PRESENT if nothing is connected. So something is going wrong still. The CLK and DATA lines can be read correctly by iec_serial.vhdl, which I have confirmed by using the ability to read those signals from the status byte at $D698 of the iec_serial module.

So what on earth is going on? 

The Pi1541 is handy in that it has an HDMI output that shows the ATN, DATA and CLK lines in a continuous graph.  Unfortunately it doesn't have high enough time resolution to resolve the transmission of individual characters or bits.

Trying now with the real 1541.  In the process I realised I had a silly error in my test program, which after fixing, I can now correctly detect DEVICE NOT PRESENT, and also the opposite, when the device number matches the present device.  However it hangs during the IEC turn-around sequence.  This now works identically with the 1541 and the Pi1541. So why does it get stuck in the IEC turn-around sequence?

The automated tests for the turn-around still pass, so why isn't it passing in real life? 

My gut feeling is that the CLK or DATA line is somehow not being correctly read by the iec_serial module when plumbed into the MEGA65 core. Turnaround requires reading the CLK line.  It looks like the CLK line never gets asserted to 0V by the drive for some reason.

This is all very weird, as for the 1541 at least, the simulation is using exactly the same ROM etc. Perhaps the turn-around is marginally too fast, and the 20usec intervals are not being noticed, especially if there is a bit of drift in the clock on master or slave. It's easy to increase that 20usec to, say, 100usec, and try it out again after resynthesising.

Ok, that gets us a step further: The turnaround from talker to listener now completes.  However, reading the bytes from the DOS status still times out. Now, this could be caused by the Pi1541 supporting either JiffyDOS or C128 fast serial, and thus mistakenly trying to use one of those when sending the bytes back, as we don't have those implemented on byte receive yet. If that's the case, it should be possible to talk to my physical 1541, which supports neither.

Well, the physical 1541 still hangs at the turn-around to listen. How annoying. I'll have to hook up the oscilloscope to this somehow, and probe the lines while it is doing stuff, to see what's going on.  Oh how nice it would be to be able to easily probe the PC of the CPU in the real 1541 :) If I had a 16-channel data logger, I could in theory put probes on the address lines of the CPU, but I don't, and it really shouldn't be necessary. Just probing CLK and DATA on the oscilloscope should really be sufficient.  So I'll have to make that half-IEC cable, so that I can just have the oscilloscope probes super easily connected.

I went and bought the parts for that, and then realised that it would be better to just make a VHDL data-logger. The idea would be to log all the IEC signal states, and to save BRAM space for logging them, log how many cycles between each change.  This would let us log over quite long time periods using only a small amount of BRAM.  Using 32 bit BRAM, we should be able to record the input state of CLK, DATA and SRQ, and the output state of CLK, DATA, SRQ, ATN and RESET. That's 8 bits. Then we can add a 24-bit time step from the last change, measured in clock cycles. At 40.5MHz, that's up to 414ms per record.  This means we can potentially record 1024 records in a 4KB BRAM covering up to 423 seconds, or about 7 minutes. Of course, if the lines are waggling during a test, then it will be much less than this.  But its clearly long enough.

I'll implement this in iec_serial.vhdl, with a special command mode telling it to reset the log buffer, and then another special command that tells it to play out the 4KB buffer. I can rework the existing debug logger I had made, that just logged 8 bits per event, at ~1usec intervals, thus allowing logging of about 4ms.  I can just make it so that it builds the 32 bit value to write, and commits it over 4 clock cycles, so that I don't need to change the BRAM to an asymmetric interface with 32 bit writes and 8 bit writes, although that's quite possible to do.

Ok, I have implemented this, let's see if it works.  IEC accelerator commands $D0 clears the signal log, and effectively sets it going again, $D1 resets the reading to the start of the log, and $D2 reads the next byte from the signal log, making it visible on the IEC data register... Assuming it works :)  First step is check if it synthesises, and I'll probably work on testing under simulation at the same time.

At this point, it would be really nice if the new FPGA build server were ready, to cut the synthesis time down from 45 -- 60 minutes, down to hopefully more like 10 minutes. But as yet, I don't have a delivery date. Hopefully Scorptec will give me an update in the coming days. The only progress I have on that front so far, is that I separately purchased some thermal transfer paste for the CPU, that I realised I hadn't included in the original order.

It's now a few weeks later, the new FPGA build machine has arrived, and cuts synthesis down to around 12 -- 15 minutes, which I've been using in the meantime to fix some other bugs. So that's great.

Now to tackle the remaining work for this, beginning with repairing my Pi1541, which I am pretty sure has a blown open-collector inverter chip. It can use a 74LS05, 74LS06, 74LS16 or 7405 or 7406.  The trick is whether I have any spares on hand, it being the Christmas holidays and all, and thus all the shops will be shut.

Fortunately I do still have a coupe on hand. In fact, what I found first is a ceramic packaged 7405 with manufacture date of week 5 of 1975!

Hopefully that brings it back to life again.

Unfortunately not, it seems.  Not yet sure if it is the 48 year old replacement IC that is the problem, or something else.

I think in any case, that I will go back to my previous idea for testing, which is to stick with the VHDL 1541, which I have working with stock ROM, and then try that with a JiffyDOS ROM, and make sure JiffyDOS gets detected, and that it all works properly. Then I can do a VHDL 1581 with and without JiffyDOS, to test the fast serial mode, as well as its interaction with JiffyDOS, to make sure that it works with all of those combinations. In theory, that should cover the four combinations of communications protocol adequately, and give me more time to think about real hardware for testing.

I also need to separate all my changes out into a separate feature branch, so that it can be more easily merged in to the mainline tree later. That's probably actually the sensible starting point, in fact.

Ok, I now have the 736-hardware-iec-accel branch more or less separated out.  I've confirmed that the IEC bus still works under software control, so that's good. 

Next step is to make sure that the hardware-accelerated IEC interface is also still working.  I have my test program that tries to read the start of the 1541 DOS status message from a real drive I have connected here:

And it gets part way through, but fails to complete the turn-around to listen:

I recall having this problem before.  Anyway, the best way to deal with this is to run the unit tests I built around this -- hopefully they will still work on this new branch, else I will have to fix that first.

... and all the tests fail. Hmmm... 

Oddly there don't seem to be any changes that are likely to be to blame for this. So I'll have to dig a bit deeper. The emulated 1541's CPU seems to hit BRK instructions.  This happens after the first RTS instruction is encountered. Is it that the RAM of the emulated drive is not behaving correctly? Nope, in the branch separation process, I had messed up the /WRITE line from the CPU, so when the JSR was happening, the return address was not being written to the stack -- and thus when the RTS was read, unhelpful things happened...

Now re-running the tests, and at least some of them are passing. Let's see what the result is in a moment:

==== Summary ======================================================================================================
pass lib.tb_iec_serial.Simulated 1541 runs                                                      (24.1 seconds)
pass lib.tb_iec_serial.ATN Sequence with no device gets DEVICE NOT PRESENT                      (4.2 seconds)
pass lib.tb_iec_serial.Debug RAM can be read                                                    (4.2 seconds)
fail lib.tb_iec_serial.ATN Sequence with dummy device succeeds                                  (7.2 seconds)
fail lib.tb_iec_serial.ATN Sequence with VHDL 1541 device succeeds                              (22.2 seconds)
fail lib.tb_iec_serial.Read from Error Channel (15) of VHDL 1541 device succeeds                (23.8 seconds)
fail lib.tb_iec_serial.Write to and read from Command Channel (15) of VHDL 1541 device succeeds (23.2 seconds)

As I've said before, failing tests under simulation is a good thing: It's way faster to fix such problems.  I'm suspecting an IEC line plumbing error of some sort.

But for my sanity, I'm going to run the tests on the pre-extraction version of the branch, just to make sure I haven't missed anything obvious -- and it has the same failing tests there. I have a vague memory that the tests started failing once I plumbed in the whole IEC bus logic together -- both the CIAs and the new hardware accelerated interface -- together with the complete emulated 1541.

Let's start with the first failing one, the ATN sequence with a dummy device, as that should also be the easiest to debug, and it seems to me just as likely, that it will be some single fault that is at play, like ATN not releasing, or something like that.

Okay, so the first issue is that in the test cases, the ATN line is not being correctly presented: The dummy device never sees ATN go low. This is, however, weird, because the sole source of the ATN signal in this test framework is from the hardware accelerated interface. And it looks like it is being made visible.

The problem there was that the dummy device was looking at the wrong internal signal for the CLK line, and thus never responded.  That's fixed now, but I don't think that's going to fix the problem for the emulated 1541, but I'm expecting it to be something similar.

It looks like the 1541 only thinks it has received 6 bits, rather than 8.  Most likely this will be some timing issue in our protocol driver. This is reassuring, as it is a probable cause for the issues we were seeing on real hardware.

Trudging through the 1541 instruction trace, I can see that it sees the CLK line get released, ready to start receiving data bits, and that the 1541 notices this, and then releases the DATA line in response to this. At this point, our hardware IEC driver begin sending bits... but it looks like the 1541 pulls the DATA line low again! Why is this?

So let's trace through the ATN request service routine again, to see where it goes wrong. The entry is at $A85B, which then calls ACPTR ($E959) from $E884 to actually reads the byte. This occurs at instruction #3597 in the 1541 simulation.

Let's go through this methodically:

1. The IEC controller sets ATN=0, CLK=0, and the 1541 is not driving anything, at instruction 3393.

2. The IEC controller is in state 121, which is the place where it waits upto 1ms for the DATA line to be pulled low by the 1541 at instruction, at instruction 3397.

3. The 1541 ROM reaches $E85B, the ATN service routine, at instruction 3597.

4. The 1541 pulls DATA low to acknowledge the ATN request, at instruction 3616.

5. The IEC controller advances to state 122 after waiting 1ms for DATA to go low, at instruction 3724.

6. The 1541 enters the ACPTR routine at $E9C9, at instruction 3729. This is the start of the routine to receive a byte from the IEC bus, and indicates that the CLK low pulse from the IEC controller has been observed.

7. The 1541 releases the DATA line at $E9D7, at instruction 3748.  This occurs, because the 1541 ROM sees that the CLK line has been released by the IEC controller.

8. The IEC controller reaches state 129, which is the clock low half of the first data bit, at instruction 3760.

Okay, so at this point, I think we are where we intend to be: ATN and CLK are being asserted by the IEC controller, and the DATA line is not being held by anyone: The 1541 will now be expecting the transmission of the first bit within 255 usec. If it takes longer, then it would be an EOI condition.  So let's see what happens next...

The EOI detection is done by checking the VIA timeout flag in $180D bit 6, which if detected, will cause execution to go to $E9F2 -- which is exactly what we see happen at instruction 3760.  

It looks like the VIA timer is not ticking at 1MHz, but rather, is ticking down at 40.5MHz.  Fixing that gets us further: This test now passes. What about the remaining tests?

==== Summary ======================================================================================================
pass lib.tb_iec_serial.Simulated 1541 runs                                                      (23.9 seconds)
pass lib.tb_iec_serial.ATN Sequence with no device gets DEVICE NOT PRESENT                      (3.9 seconds)
pass lib.tb_iec_serial.Debug RAM can be read                                                    (4.1 seconds)
pass lib.tb_iec_serial.ATN Sequence with dummy device succeeds                                  (7.2 seconds)
pass lib.tb_iec_serial.ATN Sequence with VHDL 1541 device succeeds                              (22.8 seconds)
fail lib.tb_iec_serial.Read from Error Channel (15) of VHDL 1541 device succeeds                (29.1 seconds)
fail lib.tb_iec_serial.Write to and read from Command Channel (15) of VHDL 1541 device succeeds (150.6 seconds)
pass 5 of 7
fail 2 of 7

The bold test is the one we just got working again. So that's good. Unfortunately, as we see, the remaining two still fail. That's ok.  We'll take a look at the output, and see where they get to, and where they break.  I've dug up the program I made before, but had forgotten about, for scanning the test output to give me a summarised annotated version of the test output. Let's see what that has to say about the first of the two failing tests:

It looks like it sends the first byte under ATN to the 1541, but then gets stuck in the turn-around from talker to listener. It gets to state 203 in the IEC controller, where it is waiting for the CLK line to go low, to indicate that the 1541 has accepted the request.

Compared to when I had this working in the previous post, before refactoring a pile of stuff, the main difference I can see is that I changed the delays in this routine, to leave more time after releasing ATN before messing with the other lines. Reverting this got the test passing again.  Did it also fix the last test?

And, yes, it did :)

==== Summary ======================================================================================================
pass lib.tb_iec_serial.Simulated 1541 runs                                                      (27.9 seconds)
pass lib.tb_iec_serial.ATN Sequence with no device gets DEVICE NOT PRESENT                      (5.0 seconds)
pass lib.tb_iec_serial.Debug RAM can be read                                                    (3.7 seconds)
pass lib.tb_iec_serial.ATN Sequence with dummy device succeeds                                  (6.8 seconds)
pass lib.tb_iec_serial.ATN Sequence with VHDL 1541 device succeeds                              (19.9 seconds)
pass lib.tb_iec_serial.Read from Error Channel (15) of VHDL 1541 device succeeds                (52.2 seconds)
pass lib.tb_iec_serial.Write to and read from Command Channel (15) of VHDL 1541 device succeeds (232.8 seconds)

Okay, so let's synthesise this into a bitstream, and try it out on real hardware.

It works partially: The device detection and request to turn around from talker to listener seems to succeed, but then the attempt to read from the command channel status fails.  

I'm suspecting that that sensitivity to timing in the turnaround code in the controller might be revealing that I have something a bit messed up, and that the delays from driving this at low speed in BASIC are causing things to mess up.

So I think it might be worth increasing those timers again so that the turn-around test fails, and then examining why this happens, as it should still work with those longer timeouts.  Once I understand that better, and just as likely find and fix some subtle bug with the protocol, that we should be in a better position with this.

Ah -- found the first problem: After sending a byte under ATN, it was releasing both ATN, and also CLK, which allows the 1541 to immediately start the count-down for detecting an EOI event.  This explains what I saw with the real hardware test.  I could sharpen the tests involving turn-around by adding a delay of 1ms after sending a byte under ATN, so that it would trip up.

Anyway, fixing that doesn't cause any of the existing tests to fail, however, my test to talk to a real 1541 from BASIC using the new IEC controller still fails when trying to read the DOS status message back. I'm still assuming that the problem is with handling the turn-around from talker to listener, so I think I'll have to add that extra test with a delay to allow the simulated 1541 to hopefully get confused, like the real drive does.

I added a test for this, but it still passes the simulation test, but fails with a real drive. That's annoying. I am now writing a little test program in assembly to run from C64 mode with the CPU at full speed, to try to talk to the 1541. In the process of doing that, I realised that the IEC controller does not release the IEC lines on soft-reset, which makes testing a pain, as the C65-mode ROM has an auto-boot function from IEC device, that is triggered by one of the IEC lines being low.  So I'm fixing that, while I work on this little test program.

While that is synthesising, I've been playing around more with my test program, and it is getting stuck waiting for the IEC ready flag to get asserted.  With that clue, I have fixed a couple of bugs where the busy flag wasn't clearing, which might in fact be the cause of the errors we are seeing. That is, it might be communicating with the 1541 correctly, but not reporting back to the processor that it is ready for the next IEC command.

It is weird that this is not showing up in the simulation tests.

With a bit more digging, on real hardware, the busy flag bug only seems to happen with the IEC turn-around command ($35).  More oddly, it doesn't seem to happen immediately: Rather, the busy flag seems to be cleared initially, and only after a while, does it get set.

Well, now I have encountered something really weird: Issuing a command from C64 mode to the IEC controller, even via the serial monitor interface results in the IEC controller busy flag being stuck. But if the machine is in C65 mode, then this doesn't happen.  For example, if the MEGA65 is sitting at the BASIC65 ready prompt, I can do the following in the serial monitor:

.sffd3699 48
.sffd3698 30

I have highlit the register that contains the busy/ready flag. A value of $20 = ready, while $00 = busy.

Now if I issue exactly the same from C64 mode, i.e., just after typing GO64 into BASIC65:

.sffd3699 48
.sffd3698 30

We can see that this time, it stays stuck in the busy state. Curiously bit 5 of $D698 seems to get magically set in C64 mode.  That's the SRQ sense line. This would seem to suggest that the SRQ line is stuck low.  That's driven by one of the CIAs on the C65, so it shouldn't matter. But clearly it is messing something up.

Time to add a simulation based test, where I set the SRQ line in this way, and see what happens, i.e., whether I can reproduce the error or not.

And it does! Yay! It turns out my C128 fast serial protocol detection was a bit over-zealous: If SRQ is low, it assumes fast protocol, instead of looking for a negative going SRQ edge.  Fixing that gets the test to pass. So synthesising a fresh build including this.

And it works!

With my test program slightly modified, like this:


we get a result like this:

we get a result like this:

Yay! Finally it is working for normal IEC serial protocol.  I'm going to call this post here, because it's already way too long, and continue work on JiffyDOS and the C128 fast serial protocols in a follow-on post.

Tuesday 26 December 2023

Tracking down and fixing the thumbnail images in the freezer

After some recent fixes for some other bugs in the MEGA65 core, the hardware-generated thumbnails are no longer working in the freeze menu, instead they just show up black:


This issue is being tracked here.

This is sad, because having the little thumbnail of whatever was running when you froze is super handy, and just a really nice visual touch.  So we will of course fix this.

The problem stems from a fix to deconflict some address resolution stuff, that has meant that the thumbnail generator is not being decoded properly now, meaning that the freeze menu just reads all zeroes, thus giving the black display.

I'm going to fix it by moving the thumbnail interface to its own 4KB memory-mapped slab, so that we don't have to do anything fancy to read it out in the freezer any more. This will save some code in the HYPPO hypervisor that previously had to do this when preparing to freeze.

The big clue for this came from the following conversation on Discord:

From @johnwayner in discord:
It appears that the addressing changes in broke the ability of the freeze code to read from the thumbnail generator addresses at $FFD2640 and $FFD2641. Thus the freezer is just reading some memory somewhere else. My VHDL is very weak. <@824166878284349461> might be able to help with this as he was involved with these changes.

The first step, was for me to change the mapping, so that it is now visible at $FFD4xxx (see .  This gets us a little further, in that the thumbnail is now all the colour of one part of the screen, in this case, blue.

It looks like the mapping is fine, but that it is either reading the same address in all cases, or that it is writing to the same address in all cases when producing the thumbnail. As the thumbnail generation logic has not changed, I'm strongly suspecting that the problem is on the read-address side of things.

The best way to tackle this will be to simulate the MEGA65, and see what is going wrong. Getting simulation of the whole MEGA65 going is always a bit of a trial, as we don't tend to need to do it often, and there are almost always a bunch of weird regressions from the perspective of the GHDL simulation software. 

After an hour or so, I can now run simulation again. So let's add some simple code that tries to read the thumbnail data, and see if we are reading the same address repeatedly or not.  This is a touch annoying to craft, because $FFD4xxx is not mapped anywhere convenient. An inline DMA job is probably the simplest approach here.

Inline DMA jobs on the MEGA65 work by writing any value to the magic address $D707. The DMA controller then reads the bytes following the program counter as the DMA list, and then resumes the CPU at the next byte following.  This means we can read from all 4KB of the thumbnail buffer with something like this:

               sta $d707
        ;; MEGA65 Enhanced DMA options
        !8 $0A  ;; Request format is F018A
        !8 $80,$ff ;; Source is $00xxxxx
        !8 $81,$00 ;; Destination is $00xxxxx
        !8 $00  ;; No more options
        ;; copy $FFD4000-$FFD4FFF to $4000-$4FFF
        ;; F018A DMA list
        ;; (MB offsets get set in routine)
        !8 $00 ;; copy + last request in chain
        !16 $1000 ;; size of copy is 4KB
        !16 $4000 ;; starting at $4000
        !8 $0D   ;; of bank $0
        !16 $4000 ;; destination address is $4000
        !8 $00   ;; of bank $0
        !16 $0000 ;; modulo (unused)

Simulation indicates that it seems to be reading from successive addresses. So that's not the problem. Adding further debug output into the simulation reveals that it does indeed seem to be reading from successive addresses and returning the resulting data as output to the bus.  Quite mysterious.  More the point, quite annoying: This kind of "doesn't-show-up-in-simulation" type bug is quite frustrating in VHDL. 

Sometimes there are clues in warnings in the output of Vivado. But nothing there really seems to give me any clues this time.


The plumbing from the buffer for the thumbnail to the bus is a little convoluted, so I'll try inserting dummy data at the mid-point of the plumbing, and see if that appears, or not. And it fails! Yay! I'm happy because that gives me a clue as to what is going on, or rather, not going on.  The output from the thumbnail buffer is simply not being used when working out what is on the bus.

Okay, so I found that it should now be testable under simulation: The previous test only checked that the data could be read out, but didn't check what the DMA controller saw.  Now doing that, it looks like there is a plumbing problem.

It seems that thumbnail_cs and fastio_read are both only asserted for the first one or two bytes of the read. Poking around some more, mostly just adding debug statements, it now seems to be working correctly under simulation. So let's see how it goes synthesised...

Well, still the same problem: The colour of the border shows up in _all_ the bytes of the thumbnail data.  I'm pretty sure it is being read out correctly now.  And I know that something is being written, because I can change what value gets read out by changing the border colour.

My current theory is that the logic that writes the thumbnail is also broken. I'm reworking it to have much simpler and more robust logic to determine the correct thumbnail pixel coordinate based on the current screen pixel coordinate, rather than having a pile of magic that works out if it is a pixel row that we are currently needing to sample into the thumbnail buffer, and if so, which pixels along that line.

Hmm... That changed things, but didn't fix it: Now it seems that no thumbnail pixels are being written at all.  The most logical explanations for this, are that the hypervisor mode check is messed up, or that the pixel coordinate calculations are still broken. That none of the thumbnail pixels have been painted suggests to me that the hypervisor mode check might be the problem.

I'm now resynthesising with the hypervisor mode check disabled. I also found and fixed a bug where the thumbnail address would step by 2 at a time, instead of 1 at a time for the row and column tracking.

Hmm... it still looks like nothing gets written to the thumbnail buffer.  Confirmed this by changing the default contents of the thumbnail buffer.  So how can this be? Oddly when the CPU is in hypervisor mode, I can see other contents rather than the default contents.  This also makes me suspect that there might be some problem on the reading side, where it is still just reading the same address all the time.

I'll try debugging that by adding some debug registers that let me see the number of writes that have happened to the thumbnail buffer, as well as the lower bits of the last address written to, like this:

    if fastio_read='1' and (thumbnail_cs='1') then
      if fastio_addr(11 downto 0) = x"000" then
        fastio_rdata <= thumbnail_write_address_int(7 downto 0);
      elsif fastio_addr(11 downto 0) = x"001" then
        fastio_rdata <= thumbnail_write_count;
        fastio_rdata <= thumbnail_rdata;
        report "THUMB: Exporting read data $" & to_hexstring(thumbnail_rdata);
--      fastio_rdata <= fastio_addr(7 downto 0);
--      report "THUMB: Spoofing read data $" & to_hexstring(fastio_addr(7 downto 0));
      end if;
      fastio_rdata <= (others => 'Z');
    end if;

This way, I will be able to tell whether it thinks it is writing lots of pixels or not, or if it thinks it is stuck writing to the same address etc.  I.e., I'm hoping it will be quite diagnostic.

... and of course, both are working quite nicely. Also, they are both being decoded, which also means that the address decoding isn't the problem, either. The way I am checking for pixel writes is also robust, in that it reads the same signal that is used as the write enable for the thumbnail buffer.

Thus I'm starting to suspect I might be tickling a Vivado bug. I've seen similar issues before where BRAM inferencing goes wonky, and it feels a bit like that here. Changing the BRAM definition to one of the "officially blessed" configurations tends to fix it. We'll see if that helps here.

And, yes, that does seem to have fixed the thumbnail buffer writing issue: When I DMA the contents out now, it has plausible looking data in there.  Now, whether it is correct data is another question. It looks like it is capturing only the top-left part of the screen, but in too high resolution. i.e., my stepping for pixel rows and columns is not striding over enough pixels.  

Also, for some reason, the freeze menu is still not reading the data, and showing a blank blue thumbnail. So there is still some problem there somewhere. The freeze monitor does work to view the thumbnail, so I'm guessing the thumbnail renderer in the freezer is broken. It might be fetching from a fixed offset in the freeze slot, which has shifted since the internal virtual 1541 ROM and RAM contents were added to the memory freeze list. Ah! It was because previously the thumbnail claimed to be frozen at address $1000, rather than $FFD4000.  So that should fix that. That can be fixed without resynthesis, since it is just in FREEZER.M65.

With that fix, we can now see a thumbnail again:

But it clearly has problems.  There are black pixels in the border, and white rubbish in the main image.  More the point, it is not showing a thumbnail of what was last on the screen!

Magically, if I run a program in BASIC to DMA the thumbnail to the screen, it suddenly updates, but only to what was DMAd -- or at least seems that way. Am I still reading the wrong memory in the freezer?  Nope, it is really what is in the thumbnail buffer.

Incidentally, BASIC65 makes it easy to read the thumbnail data and draw it (in a non-decoded way) on the screen with something like this:

EDMA 0,$800, $FFD4000, $0800

Also, it looks like my worry about the pixels only being of the top corner of the screen is not quite right: It's still probably the upper part of the screen only, but it's certainly showing the full width.

I had disabled the suppression of thumbnail update in hypervisor mode, which might have been partially messing things up. I've put that back, and also tried to fix the vertical positioning, so will see how that goes in a new bitstream.

Hmm... when I turn the hypervisor mode check back on, the thumbnail never gets written -- even though I have added debug code that let's me confirm that the hypervisor mode flag is there, and correctly readable by the thumbnail generator logic.  What on earth is going on here?

It looks like some writes happen, but not all.  Yet the hypervisor mode bit stays clear.  We have no timing violations in the log, and the logic is all in a single clock domain.  Ah! I think I might have found the answer to this and related mysteries of it not updating properly, or only updating when I am reading from the thumbnail: The BRAM instantiation I am using uses the same chip-select line for reading and writing. We need separate ones.

Yay :) That fixes most of the remaining problems. Now to fix the vertical positioning, and then we are good to go. It can still do with a little more tweaking, but that is something that is now simple enough, that someone else can likely tackle it, so I'm going to leave it at this point, with the thumbnails working again:

Tuesday 12 December 2023

R5 Board Bring-Up

The first R5 board is on its way to me from Germany, so it's time to get organised and build a test bitstream for it before it arrives.  I was hoping that the new i9 13900K-based FPGA build machine would be ready first, but I'm still waiting for the parts to arrive, so I'll be synthesising using my increasingly venerable i7 laptop, that dates back to around 2017.  It's still not bad, but probably 3x slower than the i9 will be for FPGA synthesis. It will be nice to get synthesis times down to around 10 minutes.

Anyway, let's focus on what needs to be done for the R5:

1. Some IEC lines have changed pins.

2. Some cartridge ports have changed control mechanisms.

3. The DIP switches are now connected to an I2C IO expander. There are now 8 of them instead of 4, and there are also four pins each for board major and minor version on there, too, so that we can tell the R5 apart from any future boards, or variants, e.g., if we change the SDRAM, HyperRAM and/or QSPI flash part on future batches.

4. The DC-DC power supplies have an I2C interface and a register that needs to be set to prevent noise from them.

5. The joystick data lines can now be controlled bi-directionally, to allow some funny joystick port peripherals to work correctly.

I'm starting with the DIP switches, and just getting a bitstream building, even though it won't yet work.  By running a complete synthesis run I can tell if I have the top-level pin changes in order, and the general infrastructure for the new R5 target in the automated build system.

That build. So now I have added the new signals for the cartridge port to the XDC file.  That's probably all that's required on that front for now, because the MEGA65 core doesn't initially need to make use of them.

Ah, now having done the DIP switch stuff, I have just realised that they are not in fact connected to the same I2C bus as the rest of the I2C device: Rather the dip switches and DC-DC converters are on their own dedicated I2C bus.  So I'll need to rejig some of the plumbing. On the flip side, it does rather simplify some stuff, as it means I don't have to modify the R4 I2C device controller, but rather can re-use it for the R5.

Okay, the board has arrived!

First thing, it looks like the (very easy to make) mistake has been made of putting the DIP switch bank foot print on the reverse side of the board has been made. This will be easy to fix for production. You can see the DIP switches on the rear in this next shot:

... and that they aren't on the front in this shot:


Second, I was warned that the DC-DC regulators make a lot of unpleasant noise, if they aren't appropriately programmed via I2C to use the correct mode. And they certainly do! They sound like an old 15KHz CRT that is about to die.  So first step is to try to find out how to program those to shut them up.  We do this by setting the MODE bit (bit 0 of register 1) on the DC-DC converters.  There are two of them, and they have I2C addresses $61 and $67.  So I'll make the I2C master set those periodically in an endless loop.  I'll also make it read the I2C expander DIP switches and board revision signals.

Well, a week has gone by with a lot of stuff going on that has limited my time to sink into this. On the up-side, this has allowed time for my new FPGA build box to arrive and get setup: I can now synthesise a MEGA65 core in about 11 minutes, instead of 25 -- 40 minutes. A very welcome improvement!

Michael has also discovered an oddity in the R5 schematic, that means that some joystick lines can only be read when the cartridge port is enabled.  So I have gone on to fix the state of that line for the MEGA65 core.

BASIC is still not booting -- this time, it looks like $DD00 has some bits set wrong, causing the C65 ROM to be stuck in a loop waiting for an IEC connected drive to respond to a request to load a boot file, because CLK and DATA are low, fooling the KERNAL into thinking that there is a device attached. So I'll have to check the IEC lines.

At least the CLK and DATA lines are staying low, even if I tie the output enables for those lines high, which should be dis-activated. It's all a bit of a mystery, as these bi-directional driver circuits are fairly simple, and we have used them elsewhere on the design before. I've even checked that the 5V supply to them isn't being switched off or anything crazy like that.

Well, I think I had some error in my build environment for using the new build server, where it was using old versions of files, because after some fiddling, it's all working as originally expected. The joys of teething problems :/

Anyway, I'm not looking back at the I2C interface to the DC-DC converters and the DIP switch I2C IO expander.  In theory this should be pretty easy to get working: I've started by copy-pasting and adapting one of the existing I2C controllers from one of the other MEGA65 board revisions.  I have clearly messed something up, though, because under simulation it only does the first two I2C transactions, and then sits stuck after the I2C driver clears its BUSY flag.

It took way more messing around with the I2C master, as I seem to be doing some quite strange things with it in the other board revisions, even though it works for those. So I have refactored things around a bit, and can now read the DIP switches and board revision resistors and set the DC-DC mode under simulation.  I quickly built a bitstream with the new build box to confirm that it fixes the DC-DC noise, which it does :)  Next, let's check that the board revision resistors and DIP switches get read correctly...

I had set aside this Sunday morning to make more progress, but was rather thwarted by a tree falling on the power-lines nearby, resulting 10 hours WOE (With Out Electricity).  So I didn't get any MEGA65 stuff done this morning, apart from looking up a datasheet on my phone. But it did get me to finally get around to buying a pure sine-wave inverter for these situations, so that our fridge and freezer contents don't spoil.  In winter its not a big deal, but summer in Australia can easily produce days and nights where the temperature never drops below 30C, which results in rather fast defrosting of freezer contents in particular. Anyway, that's all solved now, the generator is cooling off again ready to get packed away for next time it's needed, and I have a couple of hours before dinner (roast is already in the oven) to make some more progress.

So, let's see how the DIP switch and board revision reading is going. $D69D reads the DIP switches, while $D629 reads the board revision info.  At the moment they are reading as all 1's, which is what you get if an I2C device doesn't respond.  The most common cause is having the address wrong.  Looking through the MEGA65 R5 schematic, I can see I did indeed have the wrong address: I was using $41 instead of $40. So let's resynthesise, and see if that fixes the problem... unfortunately not.  So what is going on?

The R5 board conveniently has a header that exposes the I2C lines for this bus, so I can probe it with my oscilloscope, to see if the transactions look good. I'm not really expecting them to be bad, since the simulation tests with simulated I2C devices all pass -- including reading the ports that the DIP switches and board revision lines are connected to.

It's a little hard to trace on the oscilloscope, because the transactions are repeated continuously.  It might be worth trying to add a delay between them, so that I can more easily check which part of the transaction it is in at any point in time.

The plot thickens, as pages 7 and 8 of the datasheet indicates that there are many more addresses this part can end up on. It looks like there is some magic where you can connect the 3 address pins to either GND, VCC, SCL or SDA, and the IC detects this to select the correct address. It's quite a clever scheme that let's you select from 64 addresses using only 3 pins. I like it.  Except that it's not yielding the address that I am expecting.

Ok, I think I know what the problem is: The address is $40 in 8-bit format, not in 7-bit I2C address notation: So shifting it right one to make it $20 should fix it.  Time to resynthesise again, and again glad I can do that in less than 12 minutes now, instead of taking close to an hour. If I were still using the laptop for FPGA builds, I'd have had to have resorted to making a separate test bitstream with just the I2C controller and some serial output so that I could read the output.  It would have worked, but would have required more effort to construct it, and generally taken a bunch of time.

Good: I can now read the board revision info, although it's appearing in the DIP switch register. I also found that the revision was returning $FF, because I hadn't added logic to the target ID stuff in c65uart.  I've now modified it so that for MEGA65 R5 boards, it reads the straps on the board, instead of just passing a hard-coded value.  The target ID doesn't have extra bits for minor revision of boards, so now I have made that available in the upper nybl of $D628, which was some unused DDR bits for an IO port.  I should also make it return $1 for the R3A board, except that we can't, because we have no easy way to tell the R3 from R3A at the VHDL level.  So that will remain an anomaly, unless I get really excited at some point.

Anyway, to try to fix the problem with the board revision appearing in the DIP switch, I'm switching the ports used for the board revision straps and DIP switches, in case I have mixed them up some how, or that the read behaviour of the PCA9655 differs from the model I am using. If it works, then we assume that one of these must have been the case. 

While I wait for those to synthesise, I can probably start to take a look at the joystick bidirectional driving: It should now be possible to pull joystick lines by setting their DDR to output and the lines to low.  For example, setting $DC02 to $FF and $DC00 to $00 should pull all the lines on joystick port 2 down: And that works first go, with the plumbing I have already put in place for those lines :) Yay!

Actually, an important advantage of the MEGA65 here, is that it controls these lines as open-collector outputs. This means you can't so easily fry the MEGA65 if you put something on the joystick port that tries to pull a data line low while you are driving it high. If the joy device tries to drive it high while the MEGA65 is pulling it low you might still be able to fry the peripheral (just like on a C64), but the MEGA65 should not be harmed.

While waiting for synthesis runs, I've also generalised out the R5 board ID and related bits and pieces to also cater for any R6 or later boards.  This will probably guarantee that we will never need one ;).

With that build, the RTC now works again. So that leaves only the IEC bus that requires testing now. Plugging in my Pi1541, I don't get any life. So let's investigate each of the IEC lines, and see if I can make them toggle.

The main IEC lines are controlled by bits on $DD00:

  • Bit #3: Serial bus ATN OUT; 0 = High; 1 = Low.

  • Bit #4: Serial bus CLOCK OUT; 0 = High; 1 = Low.

  • Bit #5: Serial bus DATA OUT; 0 = High; 1 = Low.

  • Bit #6: Serial bus CLOCK IN; 0 = Low; 1 = High.

  • Bit #7: Serial bus DATA IN; 0 = Low; 1 = High.

So let's check those first, and then deal with SRQ in a moment, as that is controlled in a different way using C65 IO registers.

So what we expect is that when the bit is 0 for output, that we will then read a high value, and vice-versa. It looks like I have them inverted, because the R5 IEC line controls are flipped.  Re-inverting seems to get the CLK and DATA lines behaving properly.

I now have them right way round, and have also confirmed that ATN and RESET are correctly wired on the IEC bus... But I still can't load from the Pi1541.  Could it be that the SRQ line is wrong some how?  I don't think so. Or at least, SRQ is sitting high, so shouldn't be interfering.  

So why on earth can't I get the Pi1541 to respond? Is it broken? I can at least test that using the R3A machine and older core... and that's not working, either. So I am guessing my Pi1541 probably has some problems.  

Next will be to try a real 1541 with either a C64 ROM or with C65 ROM and switching the internal drive number away from 8. If that works, then I know it's fine.  But that will have to wait for another day, as I need to sleep now.

Another day is here, and I have just squeaked the time in to test with a real 1541, and it works, so the IEC port is confirmed working.

That leaves only the cartridge port control signals. 

I just tried a Radar Rat Race cartridge that uses Ultimax mode, and the MEGA65 correctly reads the /EXROM and /GAME signals, but the cartridge doesn't start. I'm sure this worked on previous boards at some point. But I'm not sure if its some change we have made in the VHDL, or a physical problem. I'll test that by trying the C64 core for the R5 board, that Michael and co have already got working.

No luck with the C64 core, either.  

I'm building a new bitstream that disables cartridges while in hypervisor context, so that it's easier to try to debug the machine with a cartridge installed.

The problem I am seeing in the MEGA65 core at least, is that the ROM of the cartridge is being read as all 1s.  That would be because we have new direction control registers. With those fixed, I can again start cartridges:


Now that we have everything working

So let's summarise the errata for the R5 board:

1. DIP switch footprint is on the rear of the PCB: Move it to the front-side of the board.

2. U9 /OE1 and /OE2 lines should be simply tied to GND, leaving EXP_SLOT_EN only controlling U66. i.e., U65 can be removed.

Errata that are desirable but not vital, i.e., can be deferred to an R5A board (as bitstreams will still be compatible with R5):

3. Add 3.3V line to J17, if there is space, so that additional I2C peripherals can be connected without needing to source the 3.3V from elsewhere.

4. Add a buffer between the DBG header and the joystick bi-directional control lines and U64A, with /OE controlled by DBG11. This will allow this debug header to be used either to provide those features, or alternatively, to connect a high-speed device directly to the FPGA with 11 available GPIOs.

Sunday 5 November 2023

Fixing Yet Another HyperRAM Bug

Grargle! I thought I had fixed all the bugs with the HyperRAM / expansion RAM interface, but it looks like at least one still remains.

The 585_test program passes, but my older hyperramtest.c program fails to detect the RAM size correctly, and the read stability test quickly reveals something of the nature of the problem:  The two bytes read during a linear read are correct, but half of subsequent bytes are wrong, as it alternates between displaying the correct byte, and then repeating the correct byte instead of presenting the next byte.  Looking at it more closely, the problem seems to be oriented around cache lines, that is, the first two bytes of a cache line will be correctly read.... or something like that.

So let's start by reducing it down to a simplest failing case: Writing values to $8000000 -- $8000002 and then reading them sequentially looks like it should trigger it, like this:

Okay, so we have a minimum failing test case. Now to subject it to simulation, and confirm that we can reproduce it there.

Hmm.. simulation of just the slow memory interface and hyperram alone doesn't reproduce the error. So I'm guessing it requires CPU involvement.

Well, simulating with the CPU as well still doesn't cause it to show up, so it must be something with marginal timing.

I believe the timing problem is on the read side, rather than when writing.

What I would really like to establish, is whether the problem is in slow_devices, hyperram, or the timing constraints on the physical pins of the FPGA that connect to the HyperRAM.

One way to start trying to peel this back, is to use the debug registers I built into the HyperRAM controller at $BFFFFFx. These are helpful here, because they don't have any latency due to the HyperRAM, and we know that the values read at those addresses can't be messed about by any potentially dodgy communications with the HyperRAM chip.  In particular, $BFFFFF2 (controller mode information), $BFFFFF3 (write latency), $BFFFFF4 (extra write latency) and $BFFFFF5 (read time adjust) are all read-writeable registers, that don't get changed by the controller when it operating.

So I wrote this little program to test the stability of reading these registers:

If it's working correctly, we should see only the same value in each column. But instead we see this:

I.e., successive pairs of reads seem to read the same value, then the following read does the right thing, but it's value then gets read again on the next transaction.

In other words, this tells us that the problem is not in the HyperRAM controller, but rather in the interface between the HyperRAM controller and the slow_devices, or between the slow_devices and the CPU.

We can eliminate the latter as a likely source, because we can read from other regions of the slow_devices memory without seeing such effects.  That is, the problem very likely lies in the interaction of the HyperRAM with the slow_devices.

The communication of results between these two occurs using a data ready toggle, which naturally has 2 states, and thus makes me a bit suspicious that it might be involved, since the problem we see is very much related to pairs of successive reads.

What would be nice, would be to be able to read the status of the ready toggle and the expected value of that toggle from the slow_devices module, so that we can see if it's getting confused in some way. There is supposed to be some nice unmapped memory in the slow_devices module that basically works like a slab of read-only registers, but it's not doing the right thing, which might be a clue. I'll have to think on that.

Meanwhile, I've modified my little program above to do a memory access between each one displayed to test and confirm some of my supsicions:

If we access memory location $8000000 between each iteration, so that each displayed read is from a separate pair of reads, then it either works properly like this, or always displays the contents of $8000000, depending which way around the pairing is acting:

But if I access a location that is still on the slow_devices module, but not in the HyperRAM, say, an address on the cartridge port interface, then the doubling remains:

This makes it much more likely that the problem is in the interface between the HyperRAM and slow_devices.  It's just really annoying that it doesn't show up under simulation.

So what else can I figure out from the behaviour I am able to observe to help track it down? Interestingly, writing different values to $BFFFFF2 changes things.

$E0 is what it was, and behaves as above. But $01 or $02 cause the doubling-up to return, even with the read of $8000000 between. Those enable fast command and fast read mode. It looks like bit 7 has to be set for the problem to go away via the read to $8000000. 

Ok, how about if we read from $BFFFFF0 instead of $80? In that case, the value in $BFFFFF2 doesn't matter at all -- it always reads correctly, provided that extra PEEK is in there. This test is interesting, because it does not involve touching the actual HyperRAM chip at all -- it's all just internal registers in hyperram.vhdl. So if the HyperRAM chip is not involved, it can't be the problem.  This makes me increasingly confident that the problem is in the communications between the slow_devices and hyperram modules.

The hyperram module drives the toggle at 162MHz, and is caught by slow_devices in the 81MHz clock domain. That _shouldn't_ be a problem, because we are using a toggle rather than a strobe.  But who knows what funniness might be going on. It might be glitching, for example. In which case adding a drive stage to the toggle on the export side might help to fix that.  Actually, related to that, the ability to select between the hyperram and SDRAM means that there is a bit of combinatorial logic at the top level that multiplexes between the data ready toggle from these two sources -- that could also be adding a little bit of delay that might be causing some havoc on the latching side in the slow_devices.

Another little test I could do, would be to write an assembly routine that does these accesses, and times whether the bad reads are timing out in slow_devices, and thus take longer. If so, that would tell us that the toggle line is not being seen to change.

I'm adding a debug register at $F000000 that will let me check those toggle lines directly. That shows nothing untoward.  I've added an extra signal to that register that samples the toggle in the block that actually uses it, in case it is being set with some delay.

Actually, while thinking about that, I realised that I have another nice way to diagnose where things are going wrong: Switching from HyperRAM to SDRAM.  If the doubling still happens, its in slow_devices. If it doesn't still happen, it _must_ be in hyperram.vhdl. ... and the verdict is, it must be in hyperram.vhdl.

Next thing to try is to add a debug register to slow_devices.vhdl that will let me see if the data ready toggle is arriving before the data does.

What I did instead, was add a 1 cycle delay to the toggle, so that the data value would be setup a full cycle early, so that if there was any clock phase issues between the source 162MHz clock and the destination 81MHz clock, the toggle would definitely not be noticed before the data was made available.

This has got reading working fine in the general sense, without the doubling of data. However, when I run the hyperramtest.prg, it seems to trigger it to cause the problem. Perhaps because a memory read times out or something.

Actually, the problem there seemed to be it was fiddling with the cache settings of the HyperRAM controller, which was upsetting things. I've patched hyperramtest.prg to not do that, and it then passes the main test.

So that has things a bit further, but now the "mis-write test" in hyperramtest.c is consistently failing in a curious way: Writing to the HyperRAM continues to work (which I can verify by reloading the core and checking its contents), but reading from it ceases to work.  Hmm.. the latest synthesis run doesn't have this problem, so presumably I fixed it along the way. I'm just doing a final synthesis run to be 100% sure... and with that new bitstream it's also fine. So whatever that problem was, it was just a bad bitstream build, either because I hadn't merged all the changes in, or the Vivado randomness of synthesis was causing dramas again.

Anyway, so far as I can tell now, the HyperRAM controller is now rock-solid, and doing all that it should. So hopefully that's the end of that until we have the time and energy to upgrade the HyperRAM controller to use the higher-performance one that Michael built for MiSTer cores.