There are a couple of niggly little bugs left in the HyperRAM controller. Annoyingly, they don't show up under simulation, so I am having to do a bit more detective work than I would like.
The first happens when reading: The first byte of a read block can be wrong. But this only seems to happen if reads are constrained to a 16-byte region. If I read a 256-byte block of memory repeatedly, the values are always correct.
This shows up in the following kinds of ways.
First, we can make sure we have the data written properly. In this case, I have $01 -- $1F written to $8000000 -- $800001E. The remaining values are just whatever the HyperRAM contained on power up. We see that we read the correct values of those bytes back:
.M8000000
:08000000:0102030405060708090A0B0C0D0E0F10
:08000010:1112131415161718191A1B1C1D1E1F55
:08000020:5561515DD5555551555557554D555455
:08000030:155555555D555555157555D5155D54D7
:08000040:557555D5455555550557555441455555
:08000050:15575117554555541555D5555C555755
:08000060:53555555515D575555555155555DD555
:08000070:555515465575D5555515555555155555
:08000080:5551055555555555477115455555561F
:08000090:55F15555D45551555754551551555555
:080000A0:55415455555515555555555555555515
:080000B0:595155555555554144455C5D15555555
:080000C0:54555565555D5D155455555554555515
:080000D0:51015455514555505555155754551545
:080000E0:D4555511D5D15115D193545455551D55
:080000F0:71575115D75555D55555515555D554D1
If I execute the same command again to read those 256 bytes back, they read back correctly:
.M8000000
:08000000:0102030405060708090A0B0C0D0E0F10
:08000010:1112131415161718191A1B1C1D1E1F55
:08000020:5561515DD5555551555557554D555455
:08000030:155555555D555555157555D5155D54D7
:08000040:557555D5455555550557555441455555
:08000050:15575117554555541555D5555C555755
:08000060:53555555515D575555555155555DD555
:08000070:555515465575D5555515555555155555
:08000080:5551055555555555477115455555561F
:08000090:55F15555D45551555754551551555555
:080000A0:55415455555515555555555555555515
:080000B0:595155555555554144455C5D15555555
:080000C0:54555565555D5D155455555554555515
:080000D0:51015455514555505555155754551545
:080000E0:D4555511D5D15115D193545455551D55
:080000F0:71575115D75555D55555515555D554D1
So far, so good. If I now use the "little m" command to just read 16 bytes back, instead of 256, it works the first time:
.m8000000
:08000000:0102030405060708090A0B0C0D0E0F10
But where things start getting weird, is if I now just read those same 16 bytes back:
.m8000000
:08000000:1F02030405060708090A0B0C0D0E0F10
Notice that the first byte is $1F instead of $01. This is actually an interesting clue: Previously I thought that it was reading the wrong byte and returning it. But it actually looks like it is returning the most recently written byte.
Anyway, this problem is persistent: I can repeat that 16-byte read, and it will keep returning that same incorrect byte:
.m8000000
:08000000:1F02030405060708090A0B0C0D0E0F10
Let's test that theory that it is returning the most recently written byte by writing a different byte to a different location, and seeing what happens:
.s800000f 90
.m8000000
:08000000:0102030405060708090A0B0C0D0E0F90
Okay, so the first read after the write happens properly, which is also the behaviour that I have seen previously. But if I repeat the read, I will now get that incorrect byte:
.m8000000
:08000000:9002030405060708090A0B0C0D0E0F90
.m8000000
:08000000:9002030405060708090A0B0C0D0E0F90
What happens if we write to an address that is not in that little area, say at $8000040. I picked that address, because the cache in the HyperRAM controller is only 2x 8 byte lines, which are 8-byte aligned, but there is also a 32-byte pre-fetch buffer, which is 32-byte aligned. So for addresses in $8000000 -- $800001F they could all be in the same pre-fetch zone. $8000040 is definitely not in either.
So lets write a new value and see what we see:
.s8000040 33
.m8000000
:08000000:9002030405060708090A0B0C0D0E0F90
.m8000000
:08000000:9002030405060708090A0B0C0D0E0F90
Interesting, so we aren't seeing the new incorrect value, but rather the old one. Reading the whole 256-byte block resolves it on the 2nd read, as previously seen. I'm presuming this is because some part of the cache gets invalidated:
.M8000000
:08000000:9002030405060708090A0B0C0D0E0F90
:08000010:1112131415161718191A1B1C1D1E1F55
:08000020:5561515DD5555551555557554D555455
...
:080000F0:71575115D75555D55555515555D554D1
.m8000000
:08000000:0102030405060708090A0B0C0D0E0F90
But then if I just read the same 16-byte region again, the most recently written byte shows up again consistently:
.m8000000
:08000000:3302030405060708090A0B0C0D0E0F90
.m8000000
:08000000:3302030405060708090A0B0C0D0E0F90
.m8000000
:08000000:3302030405060708090A0B0C0D0E0F90
So there must be some mechanism by which the most recently written byte is being incorrectly returned. Its interesting that the address of the most recently written byte is quite irrelevant: Whether it is $800001F, $8000040, or, say, $8100000, the problem is the same:
.s8100000 44
.m8000000
:08000000:0102030405060708090A0B0C0D0E0F90
.m8000000
:08000000:4402030405060708090A0B0C0D0E0F90
.m8000000
:08000000:4402030405060708090A0B0C0D0E0F90
.m8000000
:08000000:4402030405060708090A0B0C0D0E0F90
But it seems to rely on reading beginning at $xxxxxx0. If I do the 16-byte read from $8000008, for example, the problem doesn't show up:
.m8000008
:08000008:090A0B0C0D0E0F901112131415161718
.m8000008
:08000008:090A0B0C0D0E0F901112131415161718
.m8000008
:08000008:090A0B0C0D0E0F901112131415161718
.m8000008
:08000008:090A0B0C0D0E0F901112131415161718
But if I go back to reading from $8000000, the problem reappears!
.m8000008
:08000008:090A0B0C0D0E0F901112131415161718
.m8000008
:08000008:090A0B0C0D0E0F901112131415161718
.m8000000
:08000000:4402030405060708090A0B0C0D0E0F90
A little more poking around reveals that this problem only occurs with addresses that are aligned to 32-bytes, which is suspiciously the alignment of the pre-fetch buffer...
So we now have 4 clues as to what is going on:
1. The most recently written byte is somehow being returned. Where is this being held?
2. Reading a large enough slab of memory to bust our little cache seems to be enough to stop the problem, at least until the cache is being read again.
3. It seems unlikely that the incorrect byte is in the 2x8-byte read cache lines, as invalidating those doesn't make the problem go away.
4. The problem only occurs when reading a 32-byte aligned address. This makes it highly likely that the pre-fetch logic is to blame.
Ok, so I was sensible and built-in a register to control various aspects of the cache controller, including the ability to disable the block-read logic... and when I disable that, the problem disappears! So the block read logic is to blame. My challenge now is to work out how so that I can fix it.
Now, the fact that the pre-fetch buffer is implicated gives a possible explanation as to why the problem has not shown up under simulation: The simulation test hammers the hyperram controller, sending new requests as soon as possible. This means that often a pre-fetch operation will be aborted in order to serve a new transaction. So its possible the controller can never get into this problematic state under those condition. Increasing the delay between transactions might enable it to show up, however.
Unfortunately that didn't work. So we will live for now with the pre-fetch logic disabled, at a modest cost to read performance of the HyperRAM. However, it's still quite acceptable overall:
So we can still copy to and from the HyperRAM at 5MB/sec or better. It is only copying between regions of HyperRAM that it really slows down, because of the increased contention that this causes, and that the block-fetch mechanism helps to relieve. In any case, its good enough for now, and I'll focus on other more pressing issues as we approach release of the DevKits.
No comments:
Post a Comment