Wednesday 1 January 2020

Running multiple bitstreams

The MEGA65 is based on an FPGA.  FPGAs are like a blank canvas that you load a hardware design into, with that design being typically stored in flash memory.  Generally you don't notice this, because the whole process of loading the design into the FPGA and starting it, takes only about 0.3 seconds.  This is why the MEGA65 can boot much faster than, say, a THE C64, which has to boot a Linux operating system and fire up an emulator.

It's one of the many advantages of FPGAs, if you have the time and sanity to spare to implement a retro-computer that way, instead of using software emulation.  But there is a potential down-side to this: With software emulation, it's really easy to change the program you are running. So, for example, emulator-based systems typically let you run not only C64, but also VIC-20, Amiga, Spectrum, Apple ][ and a whole pile of other systems.

So how can we have a framework for "swapping programs" like this on an FPGA?  Fortunately, this is a question that lots of big-spending customers of FPGAs asked a very long time ago, and so Xilinx and the other major vendors all have various ways of doing this.  In this blog post, I will document my learning process, as I explore the Xilinx documentation, to work out how to do this on the MEGA65, so that we can potentially have different machine cores down the track, but also, so that we can more easily have updates for the MEGA65's main core, without having the risk of bricking the machine if an update fails part way through.

So the starting point is Xilinx's documentation for configuring their FPGAs. Configuration is Xilinx's name for "loading the design into an FPGA and setting it running".  You can just think of it as being like loading a programme on a regular computer.  Anyway, Xilinx's documentation lives here.  We're particularly interested in Chapter 7 "Reconfiguration and Multiboot", since what Xilinx calls "multiboot" is exactly what we want.

Xilinx's Multiboot facility basically allows one bitstream (the FPGA program) to indicate where the FPGA should look in the flash memory for a different FPGA program, and then tell the FPGA to pretend it has just been turned on, so that it will load the new bitstream instead.  This means two lots of the approximately 0.3 seconds of boot time, if you want to have the first bitstream load the second one.  Actually, it can be a bit quicker, if the first bitstream, which Xilinx calls the "Golden Bitstream," is a really simple design, and thus will compress well.

My current thinking is that our Golden Bitstream will just be a known-working release of the normal MEGA65 core.  At least to begin with.  What I'm thinking of doing, is adding the necessary extra bits to the bitstream to allow the triggering of reconfiguration, together with a little bit of code in the Hypervisor, that checks if any of the number keys from 1 to 9 are being held down. If one of them is, then it will calculate an address in the flash memory based on the number pressed, and then trigger reconfiguration.  This will allow the use off the standard MEGA65 core, as well as up to 9 other cores, subject to them all fitting in the flash memory.

We also want to be able to support having updates to the MEGA65 core itself, which I am currently thinking will be implemented by having the Hypervisor try to load an updated bitstream from a specific part of the flash memory, if none of those number keys are pressed.  If 0 is held down, then I will have it not do this, so if you need to "downgrade", this will be possible. For example, if some bitstream update doesn't work for some particular reason.

The Xilinx FPGAs are also capable of a nice trick: If when you try to load a bitstream from somewhere else in the flash memory, and it fails, it will reload the Golden Bitstream again, but this time, with special flag set to say that it has fallen back to the Golden Bitstream. That way, we can even have the MEGA65 display some kind of message on first boot, if the updated bitstream doesn't work for whatever reason.

All up, this should give us a good basis on which to build a nice update mechanism for the bitstream on the MEGA65.  All I need to do now, is actually extract the information I need from Xilinx's documentation, and then actually implement it.  This could be the fun part, as this is a feature that is notoriously under-documented...

First step: Find out how to instantiate the ICAPE2 thingy (Dingsbums for the Germans reading along), that allows access to the whole configuration system.  This seems to be available here on page 178.  What worries me, is that it looks to be a bit minimalistic:

Library UNISIM;
use UNISIM.vcomponents.all;
-- ICAPE2: Internal Configuration Access Port
-- 7Series
-- Xilinx HDL Libraries Guide, version 2012.2

ICAPE2_inst: ICAPE2 
generic map(
   DEVICE_ID => X"3651093",    -- Specifies the pre-programmed
                               -- Device ID value to be used for
                               -- simulation purposes.
   ICAP_WIDTH => "X32",        -- Specifies the input and output
                               -- data width.
   SIM_CFG_FILE_NAME => "NONE" -- Specifies the Raw Bitstream (RBT)
                               -- file to be parsed by the
                               -- simulation model.
)
portmap(
   O => O,        -- 32-bit output : Configuration data output bus
   CLK => CLK,    -- 1-bit input   : Clock Input 
   CSIB => CSIB,  -- 1-bit input   : Active-Low ICAP Enable
   I => I,        -- 32-bit input  : Configuration data input bus
   RDWRB => RDWRB -- 1-bit input   : Read/Write Select input
); 
-- End of ICAPE2_inst instantiation

So now I need to figure out what each of those does.
The DEVICE_ID and SIM_CFG_FILE_NAME are apprently only used for simulation, so that the fake configuration register values can be read-out, so we can ignore those, I think.

ICAP_WIDTH, O and I also seems to be prettz logical, defining the width and input and output bus.  The fact that it is allowing the width to be varied is tempting for trying to make the interface 8-bit, but I have a gut feeling that that would just Lead To Trouble.  But I'll have a think about it as I keep exploring.

So that just leaves CLK, which should be straight-forward, CSIB and RDWRB, which I am not yet totally sure about.

Reading page 148 of this, suggests that we have to write a series of 32-bit values that are basically a pretend tiny bitstream.  This would explain why the interface has only Read/Write select and Chip Select (CS) sigals to go with the data: We just have to write the correct series of values. It also suggests that the 8-bit interface mode might just work, too, which would be nice -- if I can get the byte order correct.

Xilinx's recommended set of values to send are:

FFFFFFFF - Dummy word
AA995566 - Sync word 
20000000 - Type 1 NOOP
30020001 - Type 1 write to WBSTAR
00000000 - Warm-boot start address
30008001 - Type 1 write words to CMD
0000000F - IPROG word
20000000 - Type 1 NOOP

Let's try to go through those to understand what is going on.
The dummy word probably doesn't require much explanation. The sync word, I think, helps the FPGA work out the bit/byte order / endian-ness. Might also work to help get 8-bit mode right.  We'll investigate that later.
Then we have some "Type 1 NOOP"s in there. Those we can generally ignore for now, as well.
Then we have the interesting part, where we write to the WBSTAR register.  This sets the upper bits of the flash memory address used to configure the FPGA from.  The lower 8 bits are undefined, so apparently the bitstream should be pre-padded with 256 FF bytes, to make sure.
Then we have the writing the IPROG word to the CMD register. This is apparently what tells the FPGA to reset and reconfigure, but keeping the just-set WBSTAR value.

So, let's cook up a bit of VHDL that embeds one of these ICAPE2 thingies, and tries to tell it to load a bitstream from a particular place, and see if we can make it work.

Along the way, I also found that the bit order of each byte in the ICAPE2 entity have to be reversed. I also found what claims to be a working implementation.

Then I discovered that on the Artix 7 FPGAs, you have to allow 3 cycles, so the write sequence ends up like this:

  signal bitstream_values : reg_value_pair := (
    x"FFFFFFFF", -- Dummy word
    x"FFFFFFFF", -- Dummy word
    x"FFFFFFFF", -- Dummy word
    x"FFFFFFFF", -- Dummy word
    x"FFFFFFFF", -- Dummy word
    x"AA995566", -- Sync word
    x"20000000", -- Type 1 NOOP
    x"20000000", -- Type 1 NOOP
    x"30020001", -- Type 1 write to WBSTAR
    x"00000000", -- Warm-boot start address
    x"20000000", -- Type 1 NOOP
    x"20000000", -- Type 1 NOOP
    x"30008001", -- Type 1 write words to CMD
    x"0000000F", -- IPROG word
    x"20000000", -- Type 1 NOOP
    x"20000000", -- Type 1 NOOP
    others => x"FFFFFFFF"
    );


I then dynamically change the contents of the entry for the Warm-boot start address via some memory mapped registers:

      bitstream_values(9) <= reconfigure_address;
        cs <= '1';
        rw <= '1';
        if trigger_reconfigure = '1' then
          counter <= 0;
        end if;


Asserting trigger_reconfigure sets the counter to the start of the command stream, and then sends them all, which triggers the reconfigure.

Then it's just a case of memory-mapping access to those registers:

           when x"C8" =>
              -- @IO:GS $D6C8-B - Address of bitstream in boot flash for reconfiguration
              reconfigure_address(7 downto 0) <= fastio_wdata;
            when x"C9" =>
              reconfigure_address(15 downto 8) <= fastio_wdata;
            when x"CA" =>
              reconfigure_address(23 downto 16) <= fastio_wdata;
            when x"CB" =>
              reconfigure_address(31 downto 24) <= fastio_wdata;
            when x"CF" =>
              -- @IO:GS $D6CF - Write $42 to Trigger FPGA reconfiguration to switch to alternate bitstream.
              if fastio_wdata = x"42" then
                trigger_reconfigure <= '1';
              end if;              


I got this all together, but then still had a problem: When I tested it, it wouldn't work.  I suspected that this is because I was loading the bitstream via JTAG, rather than from the SPI flash that contains the usual bitstream.  This means that the FPGA hasn't been setup for the SPI flash configuration, and that thus trying to load a subsequent bitstream will fail.  To test this, I had to reflash the SPI flash to contain this new bitstream, and then try it from there... And it worked without problem!

To be a bit more specific: If you set $D6C8-$D6CB to the value $00000000, and then write $42 to $D6CF, it will reload itself, since it is at $00000000.  Thus it works like a kind of Very Hard Reset Indeed for the MEGA65.  Better, if you put some other value in there, where no valid bitstream exists, the FPGA has a watchdog timer that gets tripped when the FPGA fails to configure up, and thus after a few seconds it falls back to the original bitstream at $00000000!  This means that if something goes wrong you get the "police lights" on the keyboard for a few seconds, before the machine boots normally.

Now, apparently there is a way to work out if this has happened, so that you can avoid an infinite loop of trying to start a broken bitstream, which I'll look into in due course.  Similarly, I need to work out how to write the extra bitstreams into the flash, so that we can actually use the multi-boot facility. Those who want to follow along, or see all the code, hop over to https://github.com/MEGA65/mega65-core/issues/153.

But for now, I think it's time to open those fireworks we bought for New Year's Eve / Silvester that are sitting in our MEGA65 Hack Session New Year's Kit* and celebrate!


* Limited stocks. Some items not available in some countries.  Contents may vary from above image. Contains small parts. Not suitable for children under 3 years of age or lactating rhinoceroses, except under medical supervision.

No comments:

Post a Comment