Tuesday 3 March 2020

Starting work on a libc for the MEGA65, and further work on QSPI flash updating

I'm continuing to work on creating a tool to write updated bitstreams (or "cores", or whatever you like to call the things that turn an FPGA into an interesting computer :) Here's what it looks like after I got it working:

But let's go back to the beginning...

In a recent post, I confirmed that I am able to access the QSPI flash on the MEGA65, reading it, writing to it, and erasing sectors of it, as required.  So the next step is to turn this into some kind of functioning utility that lets you actually pick out a bitstream from the SD card, and then write it into one of the slots in the flash memory.  The MEGA65 R2 board has 32MB of flash memory, and each core needs just under 4MB, so I will arrange things using 4MB slots. The little bit of spare space might well get used to allow including icons and descriptions for the bitstreams, so that we can have a more visually interactive means of switching cores.

But first, we need to be able to read a bitstream file to write to the flash.  To do that, we need to use the hypervisor calls for accessing the SD card's FAT32 file system, since the bitstreams are too big to fit in a D81 disk image.  Also, the FAT32 file system is MUCH faster (upto around 1MB/sec) compared with the C65 DOS on D81 disk images (typically <20KB/sec, even with the CPU at 40MHz).

Fortunately, I already have working Hypervisor calls implemented for traversing directories, and even opening and reading files.  What I don't have, though, is a collection of nice wrapper routines that I can call from a C program written using CC65, i.e., a kind of MEGA65 standard library or libc.  Eventually, this should go into CC65 as part of a MEGA65 target (and we would welcome a volunteer to work on that). But for now, I have created a repository where those routines can be collected: https://github.com/mega65/mega65-libc.

Many of the routines that I have already put in there exist duplicated in mega65-fdisk and mega65-freezer etc already, so this was really a job that needed to be done.  At some point, I will adapt those programmes to use the new library as well. Probably after I have confirmed that the library is working for the megaflash utility...

Argument and return value passing in CC65 is a little bit fiddly, but I have done it before, so I am hoping to not have any great problems there.

It didn't take too long to pull together a simple programme that can display the contents of the different areas of the QSPI flash: It considers each 4MB section as a "slot" for a bitstream, and allows a name (and soon, an icon) to be stored in each, followed by the bitstream itself:

  

To test this programme, I then hooked up keyboard input, so that pressing the numbers 1 - 8 will cause the FPGA to load the corresponding bitstream... and then it refused to reconfigure the FPGA.  This was unexpected and annoying, as I had gotten the FPGA reconfiguration stuff working only a couple of weeks earlier, and it was rock solid.

Then I eventually remembered that there is a funny aspect of the FPGA reconfiguration stuff: You must have started the first bitstream from the QSPI flash, if you want it to reconfigure from the QSPI flash.  This is because there is some magic at the start of the bitstream that tells it to use QSPI flash, what the clock frequency for that should be, along with a few other parameters.

Once I had remembered this, everything was good again, but it is still annoying, as I have to write the bitstream to flash to be able to test it, which takes a couple of minutes, instead of just being able to use it with the monitor_load command, that lets you load a bitstream in a couple of seconds.   This has me wondering, if it isn't possible to figure out the magic that tells the FPGA how to configure from the QSPI flash, so that I can enable that myself as part of the bitstreams I build.  That would just help speed up the remaining development of this feature.

The problem is that this seems to have me back into Poorly Documented Features of FPGAs territory.  I might just have to try experimenting building bitstreams with different frequencies and/or bus widths, to see if I can spot the differences.  Hardly ideal, especially when the whole point was to save time, not waste it :/ I'll try switching the SPI width from 4 to 1, and see if I see anything obvious looking, and maybe the frequency as well, but if it takes too long, I'll just give up, I think.  Project X-Ray does have some information as well, but I can't immediately figure out where to find what I need.

The first difference I have found is this:

30 03 E0 01 00 00 02 6C for 4 bit QSPI
30 03 E0 01 00 00 00 0C for 1 bit SPI

The 30 03 E0 01 is an instruction to write $001 words to register $3E/2 = $1F = 0b11111 of the FPGA, i.e., write a value to a specific register.  From the Xilinx 7 Series Configuration Guide, I have figured out that this is most likely the BPI/SPI Configuration Options Register described in Table 5-41.  Indeed the changed bits indicate a change in configuration bus width, and SPI read command (SPI chips have different read commands for 1-bit, 2-bit and 4-bits wide). Setting the frequency to 1MHz, changed another register, register 0b01001, which has a field for the oscillator frequency, which seems to be conveniently directly encoded in MHz. 

I also found other information about the FPGA register setup, and tried it out, but it still didn't work.  So rather than wasting more time, I think I will just concentrate on doing it the way I can -- just at the cost of a few minutes of extra delay when trying a new bitstream.

So, anyway, back to writing the programme...

Today I copied the disk image chooser code from the freeze menu in, and adapted that to allow selecting a .BIT file from the SD card.  That required a bit of fiddling about, because the freeze menu uses 16-bit text mode, while the freeze utility uses the normal VIC-II text mode. This uses the opendir() and readdir() functions in the library I have been writing.  So if you hold the CONTROL key when pressing the digit for a freeze slot, it will show you the list of .BIT files that are available:


Since I already have the code working to erase, write and read flash memory pages, that just leaves the code to read the contents of the BIT files.  I have already implemented the open() and close() functions in the library, as well as a read512() function, that reads one sector of data into a buffer.  Those were quite easy to hook up to test, but it looks like they aren't working for some reason.  My guess is that the problem lies somewhere in the library functions, so I'll start having a dig through that next.

To investigate the open() and read512() functions, I want to know if they are calling the correct Hypervisor traps, and if those are working. To do that, I'll add checkpoint message code to those Hypervisor functions temporarily, so that I can know if the calls are succeeding.  My gut feeling is that open() is fine, but that the C parameter handling for read512() might be messed up, as that is a bit fiddly.  So I might also add some debug output in the read512() function to show what it thinks are in the passed parameters.

So, first up, I have confirmed that open() correctly gets the filename from the passed parameters.  Now to check that it really opens the file.  Hmm.. It seems to not enter the Hypervisor open() call for some reason.  That I fixed by making the library to force VIC-IV IO mode, to make sure the Hypervisor trap registers remain visible.

After that, I wasted a few hours chasing my tail around various problems.  The main one that I have worked around, but not really completely resolved, is that the assembly routines I have for calling from the CC65 programme were doing weird things with parameter passing.  For example, both the _open() and _read512() functions take a single string argument.  I'd expect both functions to be treated the same way by CC65, but one gets the pointer to the string passed on the stack, while the other other gets is pointer passed in the A and X registers. At some point I'll figure out why, but for now, I have it working.

That got me to the point where I can now select a bitstream, and read the contents of the file from the SD card.  Together with the flash access routines, everything is now known to have worked at least once.  But that will have to wait until we get settled back in at Arkaroola, after being rained out for a week.

Okay, so we are still rained out, and our trailer has thrown an axle, so we're still stuck here.  I don't have a screen or keyboard to connect to the Nexys4DDR board, so I have to work blind. Well, almost blind, because I implemented a crude text screen grab function in monitor_load a while back.

As I had all the support functions for accessing the flash, drawing the progress bar and reading the bitstream file from the FAT32 file system, I figured I could at least write the block of code to erase and then write to the flash:

First step is to erase the flash area we are using.  These flash chips can have funny size flash sectors, which can make working out which sectors to erase a bit tricky. Fortunately, they also allow you to specify the page to erase by giving an address in the flash.  The trick is, you don't know how much of the flash has been erased, because you don't know the size of page it has just erased.  My solution to this is to read the flash region progressively, and if it needs erasing, ask for that piece of the flash to be erased.  I then verify that it has been erased, and continue.  This seems to work fine. Here is that block of code:

  // Do a smart erase: read blocks, and only erase pages if they are
  // not all $FF.  Later we can make it even smarter, and only clear
  // pages where bits need clearing.
  // Also, we will assume the BIT files contain the 4KB header we want
  // so we will just write upto 4MB of stuff in one go.
  progress=0; progress_acc=0;
  for(addr=(4L*1024L*1024L)*slot;addr<(4L*1024L*1024L)*(slot+1);addr+=512) {
    progress_acc+=512;
    if (progress_acc>26214) {
      progress_acc-=26214;
      progress++;
      progress_bar(progress);
    }
    read_data(addr);
    for(i=0;i<512;i++) if (data_buffer[i]!=0xff) break;
    if (i<512) {
      erase_sector(addr);
      // Wait a while for erasing to finish
      for(i=0;i<100;i++) usleep(10000);
      // Then verify that the sector has been erased
      read_data(addr);
      for(i=0;i<512;i++) if (data_buffer[i]!=0xff) break;
      if (i<512) {
    printf("\n! Failed to erase flash page at $%llx\n",addr);
    printf("  byte %d = $%x instead of $FF\n",i,data_buffer[i]);
    while(1) continue;
      }
    }
  }


The progress_acc>26214 stuff is to get the progress bar to work nicely.  4MB / 160 positions in the progress bar = 26,214 bytes. That is, the progress bar needs to grow a little after each 26,214 bytes. Otherwise is is fairly logical.  Erased bytes of flash should contain 0xFF.  The delay after erasing before reading is to allow the flash chip time to complete erasing the sector.  I'm probably being a bit conservative with this, and it can certainly be optimised, but it works for now.

Once that is done, I can then try to write to the flash.  Here is what I have cooked up so far:

  // Read the flash file and write it to the flash
  printf("Writing bitstream to flash...\n",0x93);
  progress=0; progress_acc=0;
  for(addr=(4L*1024L*1024L)*slot;addr<(4L*1024L*1024L)*(slot+1);addr+=512) {
    progress_acc+=512;
    if (progress_acc>26214) {
      progress_acc-=26214;
      progress++;
      progress_bar(progress);
    }

    bytes_returned=read512(buffer);
   
    if (!bytes_returned) break;

    // Programming works on 256 byte pages, so we have to write two of them.
    lcopy((unsigned long)&buffer[0],(unsigned long)data_buffer,256);
    program_page(addr);
    for(i=0;i<100;i++) usleep(10000);
    lcopy((unsigned long)&buffer[256],(unsigned long)data_buffer,256);
    program_page(addr+256);
    for(i=0;i<100;i++) usleep(10000);

    // Verify
    read_data(addr);
    for(i=0;i<512;i++) if (data_buffer[i]!=buffer[i]) break;
    if (i<512)
      {
    // Failed to verify. Try once more, then give up.

    if (i<256) {
      // Programming works on 256 byte pages, so we have to write two of them.
      lcopy((unsigned long)&buffer[0],(unsigned long)data_buffer,256);
      program_page(addr);
      for(i=0;i<100;i++) usleep(10000);
    } else {
      lcopy((unsigned long)&buffer[256],(unsigned long)data_buffer,256);
      program_page(addr+256);
      for(i=0;i<100;i++) usleep(10000);
    }
   
    // Verify
    read_data(addr);
    for(i=0;i<512;i++) if (data_buffer[i]!=buffer[i]) break;
    if (i==512) break;
   
    printf("Verification error at address $%llx:\n",
           addr+i);
    printf("Read back $%x instead of $%x\n",
           data_buffer[i],buffer[i]);
    while(1) continue;
      }
   
   
  }


For writing, the loop is somewhat similar to the erasing. The main difference is that here we read the 512 bytes of file data that we need to write, and then try to write it.  Writing (at the moment) happens in 256 byte blocks, so we have to spslit the data in halves.  That's the theory. But at the moment, it successfully writes only the first 256 bytes.  So I get output like the following (this is using the crude screen grab feature, so some of it is a bit messed up in appearance):

?RASING FLASH SLOT...                  
ACTIVATING WRITE ENABLE...             
CLEARING STATUS REGISTER...            
ERASING SECTOR...                      
                                       
?RITING BITSTREAM TO FLASH...          
ACTIVATING WRITE ENABLE...             
CLEARING STATUS REGISTER...            
WRITING 256 BYTES OF DATA...           
DATA AT $00400000 WRITTEN.             
ACTIVATING WRITE ENABLE...             
CLEARING STATUS REGISTER...            
WRITING 256 BYTES OF DATA...           
DATA AT $00400100 WRITTEN.             
ACTIVATING WRITE ENABLE...             
CLEARING STATUS REGISTER...            
WRITING 256 BYTES OF DATA...           
DATA AT $00400000 WRITTEN.             
?ERIFICATION ERROR AT ADDRESS $400000: 
?EAD BACK $FF INSTEAD OF $0            


On this particular run, we see it trying to write to $0040000 and then $0040100, which is the first 512 bytes of the flash slot.  It tries to write to $0040000 again, because the verification process detects that the write didn't happen correctly.  Sometimes it will do the same, but the verification error happens in the $0040100 write, instead of the $00400000 one.

This is all a bit annoying, as I had previously tested the write routines, and they worked fine.  I am presuming that the problem is something to do with the sequencing of the writes after each other, or the write after the erase + read or something like that.

Okay, so working some more, it turns out that the flash chip has some sort of read buffer, and it can return the contents of that instead of what has just been written (or erased).  I don't know how to invalidate the read buffer, other than to just do all the erase or write operations, and then check the whole thing over again after.  On the upside, I have managed to write a bitstream, and start it.

Erasing and programming is quite slow, several times slower than using the Vivado tools. Much of this I am sure is because I have an 8-bit CPU bit-bashing the SPI interface, and isn't even using quad SPI when programming, thus slowing the transfer down even more.  It actually shouldn't be that hard to get QSPI writing working.  Indeed, I probably should, since it will make the rest of the development team's life easier as they test things, as well as just making everyone happier who ever uses a MEGA65, by not having to wait as long when applying an update.

It's now several days later.  The intermittent flash writing problems I was experiencing above turned out to be a trivial fix: I was checking the wrong bits in the status register of the QSPI flash when checking if it had finished writing.  As a result, it would get in some funny state when I asked it to write another block of data while the first one was still writing. From there, it was fairly downhill running, and I had reliable writing and erasing of the flash.

So then the next step was to tie this all together, so that the flash menu could be invoked on boot up. Actually, more the point, the flash menu MUST get invoked on every boot, to see if it should reconfigure to an upgraded core, if you have installed one.  Again, this is so that when you install an upgrade core, you never actually remove the original "factory" core, which can then remain there as a safe fall-back, meaning you can't brick the machine, no matter how hard you try when installing an upgraded bitstream/core.

This means that the hypervisor has to copy the flash menu into the right place in memory, and briefly transfer control to it.  However, we don't want to wait for the SD card to finish resetting, partly as it would make booting an upgraded bitstream too slow, and partly because it would mean if you messed up your SD card, you could get into a position where you couldn't even enter the flash menu.  That would be a bit too fragile, so like the FDISK utility for preparing a new SD card, we prefer to have this all pre-loaded into the BRAM in the bitstream.

The trick with this, is that we have written the freeze menu using the CC65 C compiler targetting a C64 for ease of software development. This means that it makes use of various KERNAL calls, e.g., to implement printf().  So now we need to have a ROM available to us as well.  Fortunately, we started the OpenROM project to create open-source C64/C65 ROM sets.

As the flash menu needs to access the QSPI flash, which I don't intend to leave available from outside the hypervisor (so that programmes can't trash the QSPI flash and thus brick the machine), I don't intend to "boot" the ROM in the normal way.  Instead, all I want to do is to map the KERNAL in, initialise the screen and the indirect vector table, so that the program can run.

This all sounds great, but I hit a snag: The OpenROM for the C65/MEGA65 has had a number of improvements, and one of those caused me a small change to the CPU.  In short, Roman had made a really nice trick, where some of the initialisation routines in the KERNAL that are only used rarely, he has moved to a second 8KB part of the ROM, that was previously unused on the C65 ROM.  This frees up space for more interesting improvements, like support for turbo tape loading without a wedge etc.

However, it also caused problems for calling the KERNAL initialisation functions from within the hypervisor, because Roman used the MAP instruction to map that extra 8KB in. But of course the ROM doesn't expect to be in hypervisor mode, so it doesn't make any effort to preserve the existing memory map.  And in fact, it couldn't, even if it tried, because of an annoying flaw with the MAP instruction: You can't determine the current memory map with it, rather you can only destroy and replace it.

The work around was, however, rather simple: When the CPU is in Hypervisor Mode, it refuses to remove the Hypervisor from memory.  Here is the little bit of VHDL that does that check:

-- Lock the upper 32KB memory map when in hypervisor mode, so that nothing
-- can accidentally de-map it.  This will hopefully also fix using OpenROMs
-- with megaflash menu during boot (issue #156)
if hypervisor_mode='0' then
    reg_offset_high <= reg_z(3 downto 0) & reg_y;
    reg_map_high <= std_logic_vector(reg_z(7 downto 4));

end if;

With that fixed, the Hypervisor could then map the KERNAL in, and set everything up, and then call the flash menu. This has a few subtleties to it, that I will explain as we go along. So here goes...

First up, we need tell the flash menu where to re-enter the Hypervisor boot code, so that booting can resume.  We do this by writing the return address into $CF80/$CF81:

launch_flash_menu:
   
    // Store where the flash menu should jump to if it doesn't need to do anything.
    lda #<return_from_flashmenu
    sta $cf80
    lda #>return_from_flashmenu
    sta $cf81
    // Then actually start it.
    // NOTE: Flash menu runs in hypervisor mode, so can't use memory beyond $7FFF etc.




Then we have to copy the flash menu program into place, which we do via DMA for speed and simplicity:

    // Run the flash menu which is pre-loaded into memory on first boot
    // (in the FPGA BRAM).

        lda #$ff
        sta $d702
        lda #$ff
        sta $d704  // dma list is in top MB of address space
        lda #>flashmenu_dmalist
        sta $d701
        // Trigger enhanced DMA
        lda #<flashmenu_dmalist
        sta $d705



Then we need to make the memory map look a bit like a C64, with the KERNAL at $E000-$FFFF.  IO is already mapped in, so no problem there. We have to do a little trick of writing in absolute mode to $0001 instead of $01, because Zero Page is mapped elsewhere by default in the Hypervisor. Of course, writing this now, I can see that I can save a byte and the hassle, by just remapping Zero page first, and then doing this, but anyway. We also work around the assembler not knowing the TAB and TYS opcodes.


    // Bank in KERNAL ROM space so megaflash can run
    // Writing to $01 when ZP is relocated is a bit tricky, as
    // we have to mess about with the Base Register, or force
    // the assembler to do an absolute write.
    lda #$37
    .byte $8d,$01,$00 // ABS STA $0001

    // XXX Move Stack and ZP to normal places, before letting C64 KERNAL loose on
    // Hypervisor memory map!
    lda #$00
    .byte $5B // tab
    ldy #$01
    .byte $2B // tys


Then we make sure we are in a VIC-II video mode, and call the minimum set of KERNAL initialisation routines required to enable the flash menu to not crash:
   
    // We should also reset video mode to normal
    lda #$40
    sta $d054
   

    // Tell KERNAL screen is at $0400
    lda #>$0400
    sta $0288
    // Now ask KERNAL to setup vectors
    jsr $fd15
    // And clear screen, setup screen editor
    jsr $e518


Ah yes, for some reason that I do not in the slightest recall, we have 8KB of the 8MB RAM expansion mapped to $4000-$5FFF in the Hypervisor memory map. Maybe I though the Hypervisor needed some scratch space.  Of course, it is moot for now, because the expansion RAM doesn't yet work, although I am getting much closer.  But anyway, we remove it from the memory map, so that the flash menu program can use upto 30KB from $0800 - $7FFF:

    // Clear memory map at $4000-5FFF
    // (Why on earth do we even map some of the HyperRAM there, anyway???)
    lda #0
    tax
    tay
    ldz #$3f
    map
    eom

Then we simply jump into the entry point for the flash menu program:
   
    // Actually launch freeze menu
    jmp $080d





The DMA lists for setting everything up are here. Basically we copy the flash menu program down from $50000 to $07FF (so that the two load-address bytes don't displace things), and then save the screen RAM that the Hypervisor was using, so that we can restore it on exit, since the KERNAL initialisation routines that make it possible for the flash menu to run actually also clear the C64-mode screen, which overlaps with the Hypervisor's screen memory.

flashmenu_dmalist:
        // copy $50000-$577FF to $00007FF-$0007FFFF

        // MEGA65 Enhanced DMA options
        .byte $0A      // Request format is F018A
        .byte $80,$00  // Copy from $00xxxxx
        .byte $81,$00  // Copy to $00xxxxx

    // Copy screen from $0400-$0BFF to $00009000
        .byte $00 // no more options
        // F018A DMA list
        .byte $04 // copy + chained
        .word $0800 // size of copy
        .word $0400 // starting addr
        .byte $00   // of bank $0
        .word $9000 // destination address is $8000
        .byte $00   // of bank $5
        .word $0000 // modulo (unused)

    // Copy program down
        .byte $00 // no more options
    // F018A DMA list
        .byte $00 // copy + not chained request
        .word $77FF // size of copy
        .word $0000 // starting addr
        .byte $05   // of bank $5
        .word $07FF // destination address is $0801 - 2
        .byte $00   // of bank $0
        .word $0000 // modulo (unused)

Now we get back to what happens when the flash menu returns control to the Hypervisor (I'll explain how it does that in a moment).  Basically we just rearrange the furniture back to how the Hypervisor had everything, including restoring the screen:

return_from_flashmenu:   

    // Here we have been given control back from the flash menu program.
    // So we have to put some things back to continue the kickstart boot process.

    // Put ZP and stack back where they belong
    lda #$bf
    .byte $5B // tab
    ldy #$be
    .byte $2B // tys
    ldx #$ff
    txs
   
        lda #$ff
        sta $d702
        lda #$ff
        sta $d704  // dma list is in top MB of address space

    // Don't forget to reset colour RAM also
    lda #$01
    tsb $d030
        lda #>erasescreendmalist
        sta $d701
        // set bottom 8 bits of address and trigger DMA.
        //
        lda #<erasescreendmalist
        sta $d705
    lda #$01
    trb $d030
   
    // And finally, the screen data
        lda #>screenrestore_dmalist
        sta $d701
        // Trigger enhanced DMA
        lda #<screenrestore_dmalist
        sta $d705

    jsr resetdisplay
       
    jmp dont_launch_flash_menu
 

Otherwise, we have a couple of little niceties in the Hypervisor that check if you are trying to launch the flash menu after having already booted. If you try to do that, then it tells you that you need to power off and on first.  This is because the Hypervisor can't be sure that the flash menu is still intact in RAM after it has let, you, the user use the machine ;)
   
flash_menu_missing:
        ldx #<msg_flashmenumissing
        ldy #>msg_flashmenumissing
        jsr printmessage

dont_launch_flash_menu:

    // Check for the TAB key being pressed, indicating that the user wants
    // to enter the flash menu
    lda $d610
    cmp #$09
    bne fpga_has_been_reconfigured

    // Tell user what to do if they can't access the flash menu
noflash_menu:
        ldx #<msg_noflashmenu
        ldy #>msg_noflashmenu
        jsr printmessage
    inc $d020
nfm1:
    jmp nfm1

... boot normally

So, now let's look at how the flash menu program works out what to do, and passes control back to the hypervisor when required. The most important thing is this little line of code:

  if (PEEK(0xD610)!=0x09) {

It simply checks if you don't have the TAB key held down.  If it isn't held down, then it goes through and checks if the FPGA has already been reconfigured once since power on, which would mean that the flash menu has already done its job of switching to an upgraded bitstream.  If this isn't the case, then it looks to see if you have a valid, invalid or empty flash slot #1.  If it's valid, it switches to that bitstream. If it's empty, it just returns. If it has invalid contents, then it shows you a message, before entering the flash menu.  We also see here how the flash menu uses the return address provided by the Hypervisor to jump back into the Hypervisor, if the freeze menu has nothing to do:

    if (PEEK(0xD6C5)&0x01) {
      // FPGA has been reconfigured, so assume that we should boot
      // normally, unless magic keys are being pressed.
      if ((PEEK(0xD610)==0x09)||(!(PEEK(0xDC00)&0x10))||(!(PEEK(0xDC01)&0x10)))
    {
      // Magic key pressed, so proceed to flash menu after flushing keyboard input buffer
      while(PEEK(0xD610)) POKE(0xD610,0);
    }
      else {     
    // We should actually jump ($CF80) to resume hypervisor booting
    // (see src/hyppo/main.asm launch_flash_menu routine for more info)   
    POKE(0xCF7f,0x4C);
    asm (" jmp $cf7f ");
      }
    } else {
      // FPGA has NOT been reconfigured
      // So if we have a valid upgrade bitstream in slot 1, then run it.
      // Else, just show the menu.
      // XXX - For now, we just always show the menu
     
      // Check valid flag and empty state of the slot before launching it.
      read_data(4*1048576+0*256);
      y=0xff;
      valid=1;
      for(x=0;x<256;x++) y&=data_buffer[x];
      for(x=0;x<16;x++) if (data_buffer[x]!=bitstream_magic[x]) { valid=0; break; }
      // Check 512 bytes in total, because sometimes >256 bytes of FF are at the start of a bitstream.
      if (y==0xff) {
    read_data(4*1048576+1*256);
    for(x=0;x<256;x++) y&=data_buffer[x];
      } else {
    //      for(i=0;i<255;i++) printf("%02x",data_buffer[i]);
    //      printf("\n");
    printf("(First sector not empty. Code $%02x)\n",y);
      }
     
      if (valid) {
    // Valid bitstream -- so start it
    reconfig_fpga(1*(4*1048576)+4096);
      } else if (y==0xff) {
    // Empty slot -- ignore and resume
    POKE(0xCF7f,0x4C);
    asm (" jmp $cf7f ");
      } else {
    printf("WARNING: Flash slot 1 is seems to be\n"
           "messed up (code $%02X).\n",y);
    printf("To avoid seeing this message every time,either "
           "erase or re-flash the slot.\n");
    printf("\nPress almost any key to continue...\n");
    while(PEEK(0xD610)) POKE(0xD610,0);
    // Ignore TAB, since they might still be holding it
    while((!PEEK(0xD610))||(PEEK(0xD610)==0x09)) {
      if (PEEK(0xD610)==0x09) POKE(0xD610,0);
      continue;
    }
    while(PEEK(0xD610)) POKE(0xD610,0);


But of course don't worry if you can't follow how it works. All you need to know is that you hold the TAB key down, while turning the computer off and on, if you want to enter the flash menu.  Otherwise, the MEGA65 will just boot normally, including switching to any upgraded bitstream/core that you have installed via the flash menu.  This process launching the flash menu to check if it needs to switch to an upgrade bitstream/core and all the rest takes less than 0.5 seconds, keeping the MEGA65's boot time faster than most monitors can latch to the video signal.

Now, if you do use the TAB key to force the flash menu to appear, you get a display like this:


If you then press CONTROL and 1 through 7, this will let you choose which core file you want to write to that slot, or alternatively, to erase the slot.  I was lazy and hadn't put any on my SD card this time around, so we just see the "erase slot" option:


If I then hit enter, it will proceed to erase it, and show me a nice old-school progress bar while it does it:

Finally, we have the updated messages in the Hypervisor boot process, that tell us how to get into the flash menu, and of course also into the general utility menu:

 But of course if you have already booted the machine without turning it off first, then the flash menu can't be started, and it will tell you this, and what you should do:

Whew! That took a while. So then I set about creating a little utility called bit2core, that takes a bitstream file, and adds the correct header to it to make it into a COR file for the flash menu to program into the flash.

That all went well, until that is, I tried to flash a COR file. Then the flash menu refused to find any files... Then I suddenly rememberd that the flash menu is now running from within the hypervisor context. This means it can't use the normal Hypervisor Trap mechanism.  Probably the easiest solution here is to implement the basic FAT file system access stuff in the flash menu. Fortunately I can copy that from the fdisk program, which has all of that there.  I just hope it doesn't cause the flash menu to become too big, as we can only use 30KB in this context, and it is already about 23KB...

Fortunately I managed to make that all fit, and I can now use the flash menu to update the contents of flash slots.  I even fixed up the stuff to show the name and version of the bitstream/core that has been installed:


I also tidied up a few loose ends, like making it so that you can't accidentally try to flash over the factory bitstream in slot 0, while still making it possible for a determined user to do it. Basically there is a secret key to press, and then you have to answer a series of increasingly difficult responses.

So now it is pretty much all working, certainly well enough for us to share internally, and provided we don't discover any new problems with it, to include as the default bitstream on the MEGA65 DevKits once they are available -- which we hope won't be very far away now.

And if you'd like to see it all in action, here is a video of me installing a core update on the MEGA65:


It covers the full process, so feel free to skip over the boring 5 minutes in the middle :)