I am getting closer to being able to communicate with the QSPI flash,
so that we can have the MEGA65 update its own bitstreams in the field.
To recap the current situation:
1. Most of the signals
to the flash are easy to connect to with the QSPI flash, except the
clock, which is normally driven by the FPGA's configuration logic.
2. The FPGA has a facility, the STARTUPE2 component, that allows the running bitstream to take control of this signal.
3. I have managed to achieve (2) in a test bitstream, as confirmed by my new JTAG boundary scan setup.
4. But I haven't got it working for a real bitstream.
To get to this point, from the last blog post, I discovered that the STARTUPE2 component *must* be in the top level of a design.
The question is now why in the real bitstream, it still isn't working, even though I have moved it to the top level.
Basically
it works in the pixeltext test target, that lacks a M65 computer, but
not in the nexys4ddr-widget target. More weird, when I removed the M65
computer component out of this second target, it still isn't working.
This
makes me suspect that there might be some kind of target setup in the
Vivado project that is to blame. There is a "persist" flag that can be
used, which causes the configuration clock to remain active on the QSPI
clock pin. That could be the problem -- but then I would still be
expecting to see the line waggle, which it doesn't seem to.
However,
digging further, I did managed to control the line with the M65
computer component taken out of the real bitstream. Now trying to put
it back in, but with a dedicated 1Hz clock on the pin, so that I can
eliminate internal problems in the plumbing of the line to the register I
had it hooked up to. Basically I can keep pushing the connection
deeper down into the design, until it is in the component where I was
controlling it.
Ok, so with the full machine core, and
the 1Hz clock in the outer layer, I can control the clock line. Next
step is from in the sdcardio.vhdl file where it gets connected to, to
see if I can toggle it there under automatic control. If that works,
then I must have some subtle bug in the register plumbing. If not, then
the plumbing problem must be between sdcardio.vhdl and the outer layer
of the design. Either way, I will be able to considerably narrow down
where the problem can be hiding.
So, the clock toggles, meaning the problem is probably in sdcardio.vhdl somewhere...
Okay....
So, this is one of those funny bug fixes that I really hate. It could
well be that I have done something really stupid, but if so, I am
ignorant to what it is. But the solution was to create a 2nd register
to control the QSPI clock at $D6CD. With that implemented, magically
$D6CC works to control the clock. I've had this kind of problem before
with VHDL, where possibly something is incorrectly optimising out the
ability to write to some signal. Anyway, it is solved for now.
Then
I started trying to investigate things, and came to the rapid
conclusion that my life would be so much nicer, if I could make my new
JTAG boundary scanner produce industry-standard VCD files that I could
view in gtkwave, to get a more effective understanding of what is going
on. So I did. It wasn't too hard, and now I can produce pretty pictures
like this:
Which
is helpfully showing me that I can waggle the clock line, and also
control the CS (chip select) line, but that the data lines are seemingly
not doing anything. But I know from prior experimentation that I can
indeed control these lines, so this is probably an example of me having
an error in my test program. But how nice it is to be able to determine
that in just a few seconds :)
Digging through this, I
fixed the initial problem, but also found I had the SO and SI lines
switched around from the way they should be, so that will need a
resynthesis... Well, then I wasn't so sure, so I made it so that the
four data lines are open-collector with internal pull-ups in the FPGA.
This means that the lines can be either driven low, or float high. This
means I can fiddle with which line is which etc, without having to
resynthesise each time.
However, I am seeing some quite weird things with the data lines when I look at the JTAG traces:
So
let me explain what we have here. Because I was seeing weird things, I
make a test program that tries every possible value on the four data
lines, CS and clock pins to the QSPI flash. The open-collector
operation means that the direction pins (the .ctl pins in the lower
half) basically indicate what we *should* be seeing on the actual pins
(in the top half). This holds true for QspiDB[2], QspiDB[3], QspiCSn
and the clock, but not for QspiDB[1] and QspiDB[0]: These two pins
switch a short time later. This would only make real sense, if the QSPI
flash was pulling those lines down (remember, open-collector outputs
"float" high, so any device connected to them can pull them down to
ground), or there is something really fishy going on with the FPGA
control of those pins. I now need to try to solve this riddle.
Let's
look first at FPGA control of the pins as a potential cause. As the
other pins don't exhibit this strange behaviour, and the four DB pins
are all controlled in an identical manner, I find it hard to believe
that the problem is there. That leaves the QSPI flash as the current
primary suspect.
First stop: Check the schematics.
Nothing sinister here on the Nexys4DDR boards: the QSPI flash is
directly connected to the FPGA, with only some external pull-up
resistors, which can't cause this funny problem I am seeing.
So that suggests it is most likely just the way that I am communicating with the QSPI flash.
Poking
around, it seems that DB0 only changes (or is only changeable) when CS
is high. This makes sense, as when CS is high, the QSPI flash is not
active, and so shouldn't be trying to drive any lines. When it is low,
then DB1 stays tied low. This makes me 99% sure that DB1 is the line
from the QSPI to the FPGA, and DB0 is the command line from the FPGA to
the QSPI.
This means, in theory at least, that I should be
able to talk to the QSPI flash, if I drive the correct waveform.
However, so far at least, there are no signs of active response from the
QSPI flash. And looking at the trace, here we see this weird problem
again: The DB0 signal stays low for one clock tick longer than it is
being pulled low:
This
is really weird. I can slow the clock down even more (its currently
less than 1KHz, anyway) to the point where it looks mucb better, but
this feels altogether wrong: The FPGA can read out its bitstream from
this QSPI interface at 66MHz, so ~660Hz should be absolutely no
problem! The 1.8KOhm pull ups should be able to pull these lines high
in <1 micro second, but we are seeing rise (or delay) times of >1
milli second -- a thousand times slower.
This bizarre
delay occurs whether the QSPI flash is selected via the CS line, or
not. This would seem to suggest that it is not the QSPI flash to blame
-- unless it is in some strange mode following the FPGA configuration
process.
Ok, looking again that the schematic, there
are indeed 1.8K pull-ups on the DB2 and DB3 lines, but not on DB0 or
DB1. This means that it is possible that running these lines
open-collector might not be practicable. So I resynthesised with the
ability to push those lines actively high, as well as pull them low, or
tri-state them, as before. Now by actively pushing them, they respond
immediately, as expected. So now I can send a byte via the SPI
interface, and it all looks right:
Of
course, it still isn't working. But that could be because I just
realised I am sending the bits least-significant-bit first, instead of
most-significant-bit first. And indeed, that suddenly gets it responding
to me!
Now we're finally getting somewhere :) Again, I am so glad I implemented this VCD logger and JTAG boundary scan stuff.
Of
course I could have just figured out how to do it from in Vivado, but
its so much nicer to have a little light-weight and open-source tool.
Also, by having it integrated in monitor_load, I can do multiple things
all in one quick action. Here is now I run the test program, and then
ask monitor_load to sample those pins -- all in one single command:
make
src/tests/qspitest.prg && src/tools/monitor_load -F -4 -r
src/tests/qspitest.prg -V log.vcd -J
src/vhdl/nexys4ddr-widget.xdc,${HOME}/build/artix7/public/bsdl/xc7a100tl_csg324.bsd,qspisck,qspicsn,qspidb[3],qspidb[2],qspidb[1],qspidb[0]
Okay,
so its a bit of a long command, but that's what pressing the up arrow
in a shell is all about, so you can just use it again and again, without
having to re-type it.
When that command has logged
the pins for long enough, I just hit control-C, and then launch gtkwave
on the resulting log.vcd file, with a little tiny script that tells it
to automatically show all signals:
gtkwave -S allsigs.tcl log.vcd
So the whole work-flow is now super easy and efficient.
But
anyway, back to figuring out why the test program doesn't read the data
from the SPI response correctly... It's currently reading all ones,
i.e., not noticing when the DB1 line goes low. Adding a short delay
fixes this. Not entirely sure why. But with that, I can finally read
some useful things out of the chip, and display them:
QSPI FLASH MANUFACTURER = $01
QSPI DEVICE ID = $2018
RDID BYTE COUNT = 77
SECTOR ARCHITECTURE IS 4KB PARAMETER SEC
TORS WITH 64KB SECTORS.
PART FAMILY IS 8000
256/512 BYTE PROGRAM TYPICAL TIME IS 2^8
MICROSECONDS.
ERASE TYPICAL TIME IS 2^8 MILLISECONDS.
01 80 30 30 80 FF FF FF
FF FF FF FF 51 52 59 02
00 40 00 53 46 51 00 27
36 00 00 06 08 08 0F 02
02 03 03 18 02 01 08 00
02 1F 00 10 00 FD 00 00
01 FF FF FF FF FF FF FF
FF FF FF FF 50 52 49 31
READY.
I
confirmed with the data sheet that these data are broadly sensible. So
the next step will be to extract all the relevant data out, e.g., the
information I need to programme the device, and after that, to implement
simple block read, erase and write functions... Which turned out to be
remarkably painless, if rather boring internally. The more exciting
part will be in the next post, where I (hopefully) actually implement
writing of bitstreams to the QSPI flash.
Is it planned to have some "extract" of mega65-core to have at least things like VGA + digital video + etc stuffs in form of VHDL stuffs, so it's easy to develop new cores, without the need to re-implementing everything from zero? Even with my near-to-zero VHDL knowledge, I created implementation of a simple Z80 based computer (using the t80 opencore ...) running on Nexys4 board, so with those above, it would be rather easy - I guess - people to provide alternative cores, which can attract more people, interested in other things as well on a "physical" Mega65 also with alternative cores as well.
ReplyDeleteWe should do this, I agree. But it will likely first happen after we get the machine out.
Delete