Wednesday, 25 February 2015

Almost all working on the Nexys4DDR board

First, confession time.

It turns out that the funny pin assignment problems I reported in the previous post were because I hadn't correctly compiled the FPGA bitstream, and so I was using a bitstream that was meant for the original model of the Nexys4 board. So one dunce cap for me.

Second, good progress.

Having identified the cause of the compilation failure, I was able to build a bitstream, and it did indeed have the correct pin outs all of a sudden.  Funny that.

Reading from the slowram was now giving some results, but they were all a bit weird, like the following:

.M8000000                                                       
 :8000000 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000010 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000020 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000030 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000040 70 71 70 71 70 71 70 71 70 71 41 08 41 08 41 08
 :8000050 41 08 41 08 41 08 41 08 41 08 41 08 41 08 41 08
 :8000060 41 08 41 08 41 08 41 08 41 08 41 08 41 08 41 08
 :8000070 41 08 41 08 41 08 41 08 41 08 41 08 41 08 41 08
...

The same values were being repeated.  Using the serial monitor like this can be very diagnostic, because it pumps the CPU to perform each 16 memory reads on a row in sequential CPU cycles, so any lag in the memory reading shows up as repeated values.  

This had me suspecting that I needed to increase the number of waitstates on the slowram.  This was a bit annoying, because it already had six wait states, so takes 7 cycles per access, giving an effective speed of just 6.8MHz when working in slowram.  

Some experimentation discovered that the minimum stable setting was $22 = 34 waitstates. In other words, less than 2MHz. There was also a funny thing happening, as can be seen below where I set 16 memory locations to sequential values:


.sffc00a0 22

.m8000000 
 :8000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.s8000000 1 2 3 4 5 6 7 8 9 a b c d e f 0

.m8000000
 :8000000 03 04 03 04 07 08 07 08 0B 0C 0B 0C 0F 00 0F 00

.

Basically the memory lookup was ignoring address line 1, resulting in reading the same 16 bits twice.  I think I have found and fixed the cause of that problem, which I will find out after I resynthesise the design.  I have also fixed the address mapping of the slowram, so that it is available as one 127MB contiguous block.

The amazingly abysmal latency of the DDR2 slowram  has me thinking about simple caching strategies.  About 30 of the 34 wait states are due to the RAS and CAS latency of the RAM, and so are unavoidable in that sense, although I could in theory work out when only a CAS select needs to occur, and trim some cycles off when that happens.  

But what is a much better idea is to take advantage that the 34 cycles gets you 16 bytes of data, and make a nice little cache.  That cache could be accessed in perhaps 2 waitstates (so 48/(2+1) = 16MHz effective speed), and could hold a few KB of data, and do some pre-emptive reading to hide the 34 cycle delay when it is incurred.  The end result should be faster, on average, than the old slowram was.  The motivating factor for implementing this is that the C65 DOS will be about four times slower until I implement the cache, because it runs out of "ROM", which is really held in slowram.  As a result this might happen sooner rather than later.

If I get really excited, I might also add some IO registers that allow you to ask the cache to pre-fetch a line or memory, so that if you know you will need some memory soon, you can ask for it to be fetched ahead of time, but that will take a bit of effort, and since it won't provide any immediate benefit, won't be too high up the priority list. 

No comments:

Post a Comment