Now, the good news is that I have managed to work around the DDR controller write bugs (which are probably my own fault, but also currently eluding my available time and brain power to track down and fix).
The write bugs are really quite bizarre.
Here is a routine that does work to load data into the DDR RAM:
; copy sector into slow ram
ldx #$00
rr1:
buggyddrwriteretry1:
lda $de00,x
sta $4000,x
cmp $4000,x
bne buggyddrwriteretry1
inx
bne rr1
rr2:
buggyddrwriteretry2:
lda $df00,x
sta $4100,x
cmp $4100,x
bne buggyddrwriteretry2
inx
bne rr2
This routine copies a 512-byte sector loaded from the SD card into a piece of the DDR RAM that has been mapped to $4000-$5FFF.
As you can see I have to retry writes, just in case they don't work. This sort of bug is easy to deal with.
The following is the earlier version of the routine that does not work, instead causing the same data to be written to $41xx and $40xx due to mysterious factors (or some glaring bug in the routine that I am incapable of seeing):
; copy sector into slow ram
ldx #$00
rr1:
buggyddrwriteretry1:
lda $de00,x
sta $4000,x
cmp $4000,x
bne buggyddrwriteretry1
buggyddrwriteretry2:
lda $df00,x
sta $4100,x
cmp $4100,x
bne buggyddrwriteretry2
inx
bne rr1
The only practical difference is alternating which page of memory gets written to. Bizarre.
Anyway, the good news is that the fixed version of the routine does seem to correctly load the ROM. Kickstart now attempts to boot the C65 ROM, but this is thwarted by a read bug in my DDR sub-system. Below is a piece of what happens. Note that the PC values are the value after the instruction bytes shown on the right have been executed. I'll add some commentary to help explain what is going on.
PC A X Y Z B SP MAPL MAPH LAST-OP P P-FLAGS RGP uS IO
CBAA 00 FF 01 B3 A5 01FB E300 B300 28 65 00 .VE..I.C. ..P 13 -00 --
The CPU has just done a PLP above, and the PC is now at $CBAA. This is in the C65 "Interface ROM" at $C800-$CFFF that acts as the glue between the C64-mode, C65-mode and internal drive DOS parts of the ROM. Nothing surprising here.
PC A X Y Z B SP MAPL MAPH LAST-OP P P-FLAGS RGP uS IO
F0C1 00 FF 01 B3 A5 01FD E300 B300 60 65 00 .VE..I.C. ..P 13 -00 --
Now we have executed an RTS, returning control to $F0C1 in the C64-mode Kernal, which is in the process of working out whether to stay in C64 mode, or boot into C65 mode. Again, nothing looks surprising here. However, it is worth showing the bytes of the ROM at this point:
.df0c0
:777F0C0 CB A2 FF AD 11 D0 10 FB A9 08 CD 12 D0 90 06 AD
So the PC is now at $F0C1, so we expect the next instruction opcode to be $A2, i.e., LDX #$FF. So let's see what happens:
PC A X Y Z B SP MAPL MAPH LAST-OP P P-FLAGS RGP uS IO
F0C2 A5 FF 01 B3 A5 01FD E300 B300 7B E5 00 NVE..I.C. ..P 13 -00 --
Er, $7B is not $A2. This is really Not Good.
And there, my friends, is the current read bug with the DDR controller.
The good news is that this seems to be a glitch with the DDR read cache, which should be fairly easy to fix. Although part of me dreads that it might be the DDR controller returning the wrong line of data from the DDR RAM.
The next instruction after this erroneous $7B (TBA) instruction is $00 (BRK). That is, $F0C1-$F0C2 is being read as $7B $00. Assuming that the bytes must occur at $xxx1 & $xxx2 in the ROM file to show up in the cache at these offsets (the cache reads 16-byte wide lines from the DDR controller), a quick search through the ROM file reveals that the only instance that fits this constraint is at $CBC1-$CBC2:
0000cbc0: 00 7b 00 5c 00 3d 00 2e 00 16 00 07 00 a2 22 16
This is rather interesting. The previous DDR memory access prior to $F0C1 was of $CBAA, which is suspiciously close. The DDR cache I have implemented is 8KB in size, consisting of 512 lines of 16-bytes each. This means that $800x and $A00x would map to the same cache line, and eject each other from the cache. However, $F0Cx and $CBCx should not map to the same cache line.
That is, even if the cache logic somehow erroneously was showing a stale cache line, it shouldn't be able to show the cache line for $CBCx.
So, there is either some horrible bug in my cache logic, or it is also possible that the DDR controller is returning the wrong line of data (again, quite possibly due to my poor DDR controller implementation rather than an intrinsic fault in the Xilinx DDR low-level controller or anything else done by someone else).
What is interesting, is that given enough time, it seems to start returning the correct data for the cache row. That is, if the CPU were to hang around for a few cycles (quite possibly hundreds or thousands of cycles), it all seems to catch up and start delivering the right data.
When a read is requested from the DDR cache, it waits until the DDR controller is presenting the correct row of data, which is tested by examining the top 23-bits of the requested address and comparing those bits to the cache line address field in the data delivered from my DDR controller. Only when they match, does the cache logic let the CPU resume and read the data.
The cache is implemented as a dual-port memory between the DDR controller, with the DDR controller being the only side that can write, and the CPU the only side that can read. This acts to avoid all the cross-domain clocking problems between the two.
So in theory, there should not be any glitches with reading the data that might cause trouble. This is especially true since the CPU checks the cache line ID as described above, so even if the DDR controller thinks we have asked for something else, the CPU will realise, and persist in requesting the line it was after until the DDR controller gets it right.
This leads me to the frustrating conclusion that the DDR controller is supplying data that it thinks is the right data, but is in fact from a different memory line... So more DDR controller pain awaits me.