Sunday 5 November 2023

Fixing Yet Another HyperRAM Bug

Grargle! I thought I had fixed all the bugs with the HyperRAM / expansion RAM interface, but it looks like at least one still remains.

The 585_test program passes, but my older hyperramtest.c program fails to detect the RAM size correctly, and the read stability test quickly reveals something of the nature of the problem:  The two bytes read during a linear read are correct, but half of subsequent bytes are wrong, as it alternates between displaying the correct byte, and then repeating the correct byte instead of presenting the next byte.  Looking at it more closely, the problem seems to be oriented around cache lines, that is, the first two bytes of a cache line will be correctly read.... or something like that.

So let's start by reducing it down to a simplest failing case: Writing values to $8000000 -- $8000002 and then reading them sequentially looks like it should trigger it, like this:

Okay, so we have a minimum failing test case. Now to subject it to simulation, and confirm that we can reproduce it there.

Hmm.. simulation of just the slow memory interface and hyperram alone doesn't reproduce the error. So I'm guessing it requires CPU involvement.

Well, simulating with the CPU as well still doesn't cause it to show up, so it must be something with marginal timing.

I believe the timing problem is on the read side, rather than when writing.

What I would really like to establish, is whether the problem is in slow_devices, hyperram, or the timing constraints on the physical pins of the FPGA that connect to the HyperRAM.

One way to start trying to peel this back, is to use the debug registers I built into the HyperRAM controller at $BFFFFFx. These are helpful here, because they don't have any latency due to the HyperRAM, and we know that the values read at those addresses can't be messed about by any potentially dodgy communications with the HyperRAM chip.  In particular, $BFFFFF2 (controller mode information), $BFFFFF3 (write latency), $BFFFFF4 (extra write latency) and $BFFFFF5 (read time adjust) are all read-writeable registers, that don't get changed by the controller when it operating.

So I wrote this little program to test the stability of reading these registers:

If it's working correctly, we should see only the same value in each column. But instead we see this:

I.e., successive pairs of reads seem to read the same value, then the following read does the right thing, but it's value then gets read again on the next transaction.

In other words, this tells us that the problem is not in the HyperRAM controller, but rather in the interface between the HyperRAM controller and the slow_devices, or between the slow_devices and the CPU.

We can eliminate the latter as a likely source, because we can read from other regions of the slow_devices memory without seeing such effects.  That is, the problem very likely lies in the interaction of the HyperRAM with the slow_devices.

The communication of results between these two occurs using a data ready toggle, which naturally has 2 states, and thus makes me a bit suspicious that it might be involved, since the problem we see is very much related to pairs of successive reads.

What would be nice, would be to be able to read the status of the ready toggle and the expected value of that toggle from the slow_devices module, so that we can see if it's getting confused in some way. There is supposed to be some nice unmapped memory in the slow_devices module that basically works like a slab of read-only registers, but it's not doing the right thing, which might be a clue. I'll have to think on that.

Meanwhile, I've modified my little program above to do a memory access between each one displayed to test and confirm some of my supsicions:

If we access memory location $8000000 between each iteration, so that each displayed read is from a separate pair of reads, then it either works properly like this, or always displays the contents of $8000000, depending which way around the pairing is acting:

But if I access a location that is still on the slow_devices module, but not in the HyperRAM, say, an address on the cartridge port interface, then the doubling remains:



This makes it much more likely that the problem is in the interface between the HyperRAM and slow_devices.  It's just really annoying that it doesn't show up under simulation.

So what else can I figure out from the behaviour I am able to observe to help track it down? Interestingly, writing different values to $BFFFFF2 changes things.

$E0 is what it was, and behaves as above. But $01 or $02 cause the doubling-up to return, even with the read of $8000000 between. Those enable fast command and fast read mode. It looks like bit 7 has to be set for the problem to go away via the read to $8000000. 

Ok, how about if we read from $BFFFFF0 instead of $80? In that case, the value in $BFFFFF2 doesn't matter at all -- it always reads correctly, provided that extra PEEK is in there. This test is interesting, because it does not involve touching the actual HyperRAM chip at all -- it's all just internal registers in hyperram.vhdl. So if the HyperRAM chip is not involved, it can't be the problem.  This makes me increasingly confident that the problem is in the communications between the slow_devices and hyperram modules.

The hyperram module drives the toggle at 162MHz, and is caught by slow_devices in the 81MHz clock domain. That _shouldn't_ be a problem, because we are using a toggle rather than a strobe.  But who knows what funniness might be going on. It might be glitching, for example. In which case adding a drive stage to the toggle on the export side might help to fix that.  Actually, related to that, the ability to select between the hyperram and SDRAM means that there is a bit of combinatorial logic at the top level that multiplexes between the data ready toggle from these two sources -- that could also be adding a little bit of delay that might be causing some havoc on the latching side in the slow_devices.

Another little test I could do, would be to write an assembly routine that does these accesses, and times whether the bad reads are timing out in slow_devices, and thus take longer. If so, that would tell us that the toggle line is not being seen to change.

I'm adding a debug register at $F000000 that will let me check those toggle lines directly. That shows nothing untoward.  I've added an extra signal to that register that samples the toggle in the block that actually uses it, in case it is being set with some delay.

Actually, while thinking about that, I realised that I have another nice way to diagnose where things are going wrong: Switching from HyperRAM to SDRAM.  If the doubling still happens, its in slow_devices. If it doesn't still happen, it _must_ be in hyperram.vhdl. ... and the verdict is, it must be in hyperram.vhdl.

Next thing to try is to add a debug register to slow_devices.vhdl that will let me see if the data ready toggle is arriving before the data does.

What I did instead, was add a 1 cycle delay to the toggle, so that the data value would be setup a full cycle early, so that if there was any clock phase issues between the source 162MHz clock and the destination 81MHz clock, the toggle would definitely not be noticed before the data was made available.

This has got reading working fine in the general sense, without the doubling of data. However, when I run the hyperramtest.prg, it seems to trigger it to cause the problem. Perhaps because a memory read times out or something.

Actually, the problem there seemed to be it was fiddling with the cache settings of the HyperRAM controller, which was upsetting things. I've patched hyperramtest.prg to not do that, and it then passes the main test.

So that has things a bit further, but now the "mis-write test" in hyperramtest.c is consistently failing in a curious way: Writing to the HyperRAM continues to work (which I can verify by reloading the core and checking its contents), but reading from it ceases to work.  Hmm.. the latest synthesis run doesn't have this problem, so presumably I fixed it along the way. I'm just doing a final synthesis run to be 100% sure... and with that new bitstream it's also fine. So whatever that problem was, it was just a bad bitstream build, either because I hadn't merged all the changes in, or the Vivado randomness of synthesis was causing dramas again.

Anyway, so far as I can tell now, the HyperRAM controller is now rock-solid, and doing all that it should. So hopefully that's the end of that until we have the time and energy to upgrade the HyperRAM controller to use the higher-performance one that Michael built for MiSTer cores.