I am getting closer to being able to communicate with the QSPI flash,
 so that we can have the MEGA65 update its own bitstreams in the field. 
To recap the current situation:
1. Most of the signals 
to the flash are easy to connect to with the QSPI flash, except the 
clock, which is normally driven by the FPGA's configuration logic.
2. The FPGA has a facility, the STARTUPE2 component, that allows the running bitstream to take control of this signal.
3. I have managed to achieve (2) in a test bitstream, as confirmed by my new JTAG boundary scan setup.
4. But I haven't got it working for a real bitstream.
To get to this point, from the last blog post, I discovered that the STARTUPE2 component *must* be in the top level of a design.
The question is now why in the real bitstream, it still isn't working, even though I have moved it to the top level.
Basically
 it works in the pixeltext test target, that lacks a M65 computer, but 
not in the nexys4ddr-widget target. More weird, when I removed the M65 
computer component out of this second target, it still isn't working.
This
 makes me suspect that there might be some kind of target setup in the 
Vivado project that is to blame. There is a "persist" flag that can be 
used, which causes the configuration clock to remain active on the QSPI 
clock pin.  That could be the problem -- but then I would still be 
expecting to see the line waggle, which it doesn't seem to.
However,
 digging further, I did managed to control the line with the M65 
computer component taken out of the real bitstream.  Now trying to put 
it back in, but with a dedicated 1Hz clock on the pin, so that I can 
eliminate internal problems in the plumbing of the line to the register I
 had it hooked up to.  Basically I can keep pushing the connection 
deeper down into the design, until it is in the component where I was 
controlling it.
Ok, so with the full machine core, and 
the 1Hz clock in the outer layer, I can control the clock line. Next 
step is from in the sdcardio.vhdl file where it gets connected to, to 
see if I can toggle it there under automatic control.  If that works, 
then I must have some subtle bug in the register plumbing. If not, then 
the plumbing problem must be between sdcardio.vhdl and the outer layer 
of the design. Either way, I will be able to considerably narrow down 
where the problem can be hiding.
So, the clock toggles, meaning the problem is probably in sdcardio.vhdl somewhere...
Okay....
 So, this is one of those funny bug fixes that I really hate. It could 
well be that I have done something really stupid, but if so, I am 
ignorant to what it is.  But the solution was to create a 2nd register 
to control the QSPI clock at $D6CD.  With that implemented, magically 
$D6CC works to control the clock.  I've had this kind of problem before 
with VHDL, where possibly something is incorrectly optimising out the 
ability to write to some signal.  Anyway, it is solved for now.
Then
 I started trying to investigate things, and came to the rapid 
conclusion that my life would be so much nicer, if I could make my new 
JTAG boundary scanner produce industry-standard VCD files that I could 
view in gtkwave, to get a more effective understanding of what is going 
on.  So I did. It wasn't too hard, and now I can produce pretty pictures
 like this:
Which
 is helpfully showing me that I can waggle the clock line, and also 
control the CS (chip select) line, but that the data lines are seemingly
 not doing anything.  But I know from prior experimentation that I can 
indeed control these lines, so this is probably an example of me having 
an error in my test program.  But how nice it is to be able to determine
 that in just a few seconds :)
Digging through this, I 
fixed the initial problem, but also found I had the SO and SI lines 
switched around from the way they should be, so that will need a 
resynthesis...  Well, then I wasn't so sure, so I made it so that the 
four data lines are open-collector with internal pull-ups in the FPGA. 
This means that the lines can be either driven low, or float high.  This
 means I can fiddle with which line is which etc, without having to 
resynthesise each time.
However, I am seeing some quite weird things with the data lines when I look at the JTAG traces:
So
 let me explain what we have here.  Because I was seeing weird things, I
 make a test program that tries every possible value on the four data 
lines, CS and clock pins to the QSPI flash.  The open-collector 
operation means that the direction pins (the .ctl pins in the lower 
half) basically indicate what we *should* be seeing on the actual pins 
(in the top half).  This holds true for QspiDB[2], QspiDB[3], QspiCSn 
and the clock, but not for QspiDB[1] and QspiDB[0]: These two pins 
switch a short time later.  This would only make real sense, if the QSPI
 flash was pulling those lines down (remember, open-collector outputs 
"float" high, so any device connected to them can pull them down to 
ground), or there is something really fishy going on with the FPGA 
control of those pins.  I now need to try to solve this riddle.
Let's
 look first at FPGA control of the pins as a potential cause. As the 
other pins don't exhibit this strange behaviour, and the four DB pins 
are all controlled in an identical manner, I find it hard to believe 
that the problem is there.  That leaves the QSPI flash as the current 
primary suspect.
First stop: Check the schematics.  
Nothing sinister here on the Nexys4DDR boards: the QSPI flash is 
directly connected to the FPGA, with only some external pull-up 
resistors, which can't cause this funny problem I am seeing.
So that suggests it is most likely just the way that I am communicating with the QSPI flash.
Poking
 around, it seems that DB0 only changes (or is only changeable) when  CS
 is high. This makes sense, as when CS is high, the QSPI flash is not 
active, and so shouldn't be trying to drive any lines. When it is low, 
then DB1 stays tied low.  This makes me 99% sure that DB1 is the line 
from the QSPI to the FPGA, and DB0 is the command line from the FPGA to 
the QSPI.
This means, in theory at least, that I should be
 able to talk to the QSPI flash, if I drive the correct waveform. 
However, so far at least, there are no signs of active response from the
 QSPI flash.  And looking at the trace, here we see this weird problem 
again: The DB0 signal stays low for one clock tick longer than it is 
being pulled low:
This
 is really weird. I can slow the clock down even more (its currently 
less than 1KHz, anyway) to the point where it looks mucb better, but 
this feels altogether wrong: The FPGA can read out its bitstream from 
this QSPI interface at 66MHz, so ~660Hz should be absolutely no 
problem!  The 1.8KOhm pull ups should be able to pull these lines high 
in <1 micro second, but we are seeing rise (or delay) times of >1 
milli second -- a thousand times slower.
This bizarre 
delay occurs whether the QSPI flash is selected via the CS line, or 
not.  This would seem to suggest that it is not the QSPI flash to blame 
-- unless it is in some strange mode following the FPGA configuration 
process. 
Ok, looking again that the schematic, there 
are indeed 1.8K pull-ups on the DB2 and DB3 lines, but not on DB0 or 
DB1. This means that it is possible that running these lines 
open-collector might not be practicable. So I resynthesised with the 
ability to push those lines actively high, as well as pull them low, or 
tri-state them, as before.  Now by actively pushing them, they respond 
immediately, as expected. So now I can send a byte via the SPI 
interface, and it all looks right:
Of
 course, it still isn't working. But that could be because I just 
realised I am sending the bits least-significant-bit first, instead of 
most-significant-bit first. And indeed, that suddenly gets it responding
 to me!
Now we're finally getting somewhere :)  Again, I am so glad I implemented this VCD logger and JTAG boundary scan stuff.
Of
 course I could have just figured out how to do it from in Vivado, but 
its so much nicer to have a little light-weight and open-source tool.  
Also, by having it integrated in monitor_load, I can do multiple things 
all in one quick action.  Here is now I run the test program, and then 
ask monitor_load to sample those pins -- all in one single command:
make
 src/tests/qspitest.prg && src/tools/monitor_load -F -4 -r 
src/tests/qspitest.prg -V log.vcd -J 
src/vhdl/nexys4ddr-widget.xdc,${HOME}/build/artix7/public/bsdl/xc7a100tl_csg324.bsd,qspisck,qspicsn,qspidb[3],qspidb[2],qspidb[1],qspidb[0]
Okay,
 so its a bit of a long command, but that's what pressing the up arrow 
in a shell is all about, so you can just use it again and again, without
 having to re-type it. 
When that command has logged 
the pins for long enough, I just hit control-C, and then launch gtkwave 
on the resulting log.vcd file, with a little tiny script that tells it 
to automatically show all signals:
gtkwave -S allsigs.tcl log.vcd 
So the whole work-flow is now super easy and efficient.
But
 anyway, back to figuring out why the test program doesn't read the data
 from the SPI response correctly... It's currently reading all ones, 
i.e., not noticing when the DB1 line goes low. Adding a short delay 
fixes this. Not entirely sure why. But with that, I can finally read 
some useful things out of the chip, and display them:
QSPI FLASH MANUFACTURER = $01           
QSPI DEVICE ID = $2018                  
RDID BYTE COUNT = 77                    
SECTOR ARCHITECTURE IS 4KB PARAMETER SEC
TORS WITH 64KB SECTORS.                 
PART FAMILY IS 8000                     
256/512 BYTE PROGRAM TYPICAL TIME IS 2^8
 MICROSECONDS.                          
ERASE TYPICAL TIME IS 2^8 MILLISECONDS. 
 01 80 30 30 80 FF FF FF                
 FF FF FF FF 51 52 59 02                
 00 40 00 53 46 51 00 27                
 36 00 00 06 08 08 0F 02                
 02 03 03 18 02 01 08 00                
 02 1F 00 10 00 FD 00 00                
 01 FF FF FF FF FF FF FF                
 FF FF FF FF 50 52 49 31                
                                        
READY.                                  
I
 confirmed with the data sheet that these data are broadly sensible.  So
 the next step will be to extract all the relevant data out, e.g., the 
information I need to programme the device, and after that, to implement
 simple block read, erase and write functions... Which turned out to be 
remarkably painless, if rather boring internally.  The more exciting 
part will be in the next post, where I (hopefully) actually implement 
writing of bitstreams to the QSPI flash.





Is it planned to have some "extract" of mega65-core to have at least things like VGA + digital video + etc stuffs in form of VHDL stuffs, so it's easy to develop new cores, without the need to re-implementing everything from zero? Even with my near-to-zero VHDL knowledge, I created implementation of a simple Z80 based computer (using the t80 opencore ...) running on Nexys4 board, so with those above, it would be rather easy - I guess - people to provide alternative cores, which can attract more people, interested in other things as well on a "physical" Mega65 also with alternative cores as well.
ReplyDeleteWe should do this, I agree. But it will likely first happen after we get the machine out.
Delete