Thursday, November 22, 2018

Tool for easily transferring files to/from SD card via serial

The MEGA65 uses an SD card as its bulk data storage device.  The ROMs, disk images, utility programs and other bits and pieces are all loaded from the SD card.  This means that while we are developing the MEGA65 there is often need to put updated files on the SD card, or to pull files from the SD card, so that we can check that things have been written properly.

This is a bit of a nuisance, as it means pulling the SD card, finding where on earth I have left my SD card adaptor (neither of my laptops have working SD card slots), and generally fiddling with things a lot.  For the MEGA65 rev1 PCB prototype, this is especially annoying, because the SD card interface on those is quite fragile, being hacked on after the PCBs were produced.

So, I have been thinking for a while of how to make an easier way to get files on and off the SD card, without having to pull the SD card out.  We have some progress on an FTP server for the MEGA65 that would allow this to happen at high-speed via ethernet, but that still requires having an ethernet connection, which isn't always the case, and won't be possible at all for the MEGA65 phone prototypes, since they won't have ethernet at all.

I finally realised that a simple way to solve this problem would be to make a program under linux that operates a lot like FTP, but actually communicates to the MEGA65 via the serial interface, and basically takes remote control of the SD card interface by stopping the CPU and writing directly to the IO registers for it, and using the serial monitor commands to read back the data.  This would require implementing a simple FAT file system reader, but I have already done that a few times.  Otherwise, the main issue is that the maximum throughput would be limited to 20KB/sec because of the 2mbit serial speed.  So nowhere near as fast as ethernet, but fast enough for easy fiddling --  especially since it can talk over the existing serial link, which means I can include accessing the SD card as part of automated tests.  So all up, it should be a nice solution.

The particular catalyst for getting around to this, is that I am tracking down a bug with writing to D81 disk images, and so have needed to do a lot of putting-back-the-unmodified-D81 followed by copying-back-the-one-I-just-broke, so that I could examine what the problem was.

So after a few hours of fiddling, I got to the point where I could parse a FAT32 directory (short names only for now), and find the cluster chain for a given file.  A few more hours, and I had cooked up something that is quite similar to FTP to use, but indeed works over the MEGA65 serial monitor interface.

I discovered or implemented a few interesting things in the process of this:

1. Sending serial commands back and forth by USB has a LOT of latency. I had hit this in other projects, but the combined latency of lots of commands was a big problem here. So I worked out how to set the USB serial port latency down from 16ms to 1ms.  That got the actual latency down to only about 3ms -- still 3x longer than it should be, but at least it helped a lot.

2. There are two ways to reduce the latency.  One uses the ASYNC_LOW_LATENCY IOCTL, and in theory doesn't require root access.  Sounds great, but doesn't actually seem to reduce the latency.  The other way is to use a command like: echo 1 | sudo tee /sys/bus/usb-serial/devices/ttyUSB0/latency_timer which does work, but requires root access. Very annoying.  The difference in performance is about 3x - 4x actual throughput, so it really makes a difference.

3. I can reliably set the serial speed to 4mbit/sec, instead of 2mbit/sec. That helps quite a lot with throughput, and lets us get close to 30KB/sec, which while not stellar, is good enough when pushing files <1MB in size.

4. Writing sectors is much slower than reading them, so when writing a sector, I first read to see if it has the correct value, and if so, don't write it.  This, together with a read-cache helps a lot to avoid unnecessarily re-reading sectors of the FAT a lot.

So how does it all look?  Here is an example session of me pushing a fresh D81 file:

$ src/tools/mega65_ftp -b bin/nexys4ddr-widget.bit
fpgajtag -a bin/nexys4ddr-widget.bit
fpgajtag: Digilent:Digilent USB Device:210292645477; bcd:700
count 0/1 cortex -1 dcount 0 trail 0
STATUS 0050107d done 0 release_done 0 eos 10 startup_state 4
fpgajtag: Starting to send file
fpgajtag: Done sending file
fpgajtag: bypass already programmed bc
STATUS 0050107d done 0 release_done 0 eos 10 startup_state 4
fpgajtag: ERROR failed to run pciescanportal: No such file or directory
[T+3sec] Bitstream loaded
Detected new-style UART monitor.
SD card not yet ready, so reset it.
MEGA65 SD-card> dir
Found FAT32 partition in partition slot 1 : start=0x800, size=965 MB
FAT32 file system has 965MB formatted capacity, first cluster = 2, 1928 sectors per FAT
FATs begin at sector 0x238 and 0x9c0
           0 M.E.G.A..65!
      131072 MEGA65.ROM
      819200 MEGA65.D81
MEGA65 SD-card> put ../c65/games2.d81 MEGA65.D81
Uploaded 819200 bytes in 28 seconds (28.6KB/sec)
MEGA65 SD-card>

You might notice that you have to supply a bitstream for the FPGA.  This is because we need to catch the MEGA65 booting in the Hypervisor, and stop the CPU there, so that we have full access over the hardware. Also, it makes sure we actually have a MEGA65 system to connect to, in case the FPGA board was previously doing something else.

So with this, I was able to work out that sector writes were only writing every other byte, fix that, and then find some rotated-by-1-byte bugs and fix those, all without having to workout where I have put that SD card adapter.  This helped me to fix the nasty disk corruption bug I was facing, that would cause sad scenes like this:



I'm now looking forward to beginning to write the freeze menu, as this really is the heart of the MEGA65 system in many ways, for selecting disk images to attach, as well as swapping running programs and so on.

Sunday, November 18, 2018

Just because a splash screen is obligatory...

Still working on getting the MEGA65 presentation software ready for Linux Conf in January.  We decided that no presentation software could exist without having an annoying splash screen that appears while it loads, and that the incongruity will be very fun during the presentation. Here is the current version of it (we will fix the colour glitches).



It uses a 240x128 256-colour logo converted from a PNG, and then displayed using the MEGA65's full-colour text mode. As the program for displaying this doesn't need to be too efficient, I am just writing it using CC65, the little C compiler for the C64.  All it really does is copy the screen and colour RAM to another area, and turn it into the two-bytes-per-character version required when using full-colour text mode with character sets with >256 chars, and then draw the characters for the splash logo onto that.  It also uses a neat trick of the VIC-IV, where you can tell it the firsts 256 characters are normal mono/multi-colour characters, and only higher character numbers are full-colour. This lets you easily mix in normal text, which in our case, for overlaying the logo was a really nice simple solution.  The entire source code is really quite simple. Here is the interesting parts:

void main(void)
{

    // Set CPU to full speed, enable VIC-IV IO registers
    m65_io_enable();

    // Go back to upper case, because CC65 programs start by

    // going to lower-case for some reason
    POKE(0xD018,0x14);

    // Copy palette into place

    // $100 bytes each for red, green, and blue = $300 total
    lcopy(splashlogo,0xFFD3100U,0x300);
   
    // Copy splash logo to the top of RAM. The logo consists of

    // 30x16 8x8 full-colour characters, so 64 bytes per
    // character for a total of 30x16x64 = 30,720 bytes.
    // $58000 is near to top of the 384KB RAM
    lcopy(splashlogo+0x300,0x58000U,30720);

    // Copy screen to $57800 to make 16-bit version of screen

    // i.e. put the screen just before the character data
    // Do similar for colour RAM
    for(n=0;n<1000;n++) {
      // Screen RAM:

      // In 16-bit text mode, the character number is the lower
      // 13-bits. So when copying the C64 screen to this mode,
      // we can just copy the screen RAM byte with the character
      // number to the lower byte, and leave the upper byte
      // blank.
      lpoke(0x57800U+n*2,PEEK(0x0400+n));
      lpoke(0x57800U+n*2+1,0);

      // Colour RAM is similar, but the colour goes in the 2nd

      // byte (this is just how the VIC-IV works)
      lpoke(0xFF80800+n*2+0,0);
      lpoke(0xFF80800+n*2+1,PEEK(0xD800U+n));
    }    

    // Draw logo on the screen.
    // 240 x 128 = 30 x 16 rows

    // Character numbers >256 in full-colour mode refer to 
    // fixed addresses of (character number)*64, so we have to
    // add $58000/64 = $1600 = 5632 to character number
    // which is x + y * 30.  These then get stored into the 
    // appropriate bytes of the screen memory at $57800.
    // we use lpoke() because the addresses are >$FFFF.
    for(x=0;x<30;x++)
      for(y=0;y<16;y++) {
        glyph=(0x58000U/0x40)+x+y*30;
        lpoke(0x57800U+0+(5*2)+(4*40*2)+x*2+y*(40*2),glyph&0xff);
        lpoke(0x57800U+1+(5*2)+(4*40*2)+x*2+y*(40*2),glyph>>8);
      }
   
    // set 16-bit text mode, enable compositor, and full-colour

    // text mode for characters >255
    POKE(0xD054U,0xC5);


    // set screen line length to 40*2=80 bytes 
    POKE(0xD058U,40*2);

    // Move start of screen address to $057800
    POKE(0xD060U,0x00); POKE(0xD061U,0x78); POKE(0xD062U,0x05);
  
    // Set colour RAM start to $FF80000 + $0800
    POKE(0xD064U,0x00); POKE(0xD065U,0x08);

    // Then pretend to load for now
    while(1) {
      POKE(0xD020U,0x0e);
      for(n=0;n<12000;n++) continue;
      POKE(0xD020U,0x01);
      for(x=0;x<25;x++) continue;
    }
   
    return;
}

Tuesday, November 13, 2018

More work getting ready for presenting the MEGA65 at Linux Conf

We have been accepted to give a talk at Linux Conf in Christchurch New Zealand in January, and I really want the MEGA65 to be able to present itself at that event, rather than using another device to run the slides.  Many of you will have seen the recent work towards that.  We have more progress now, with basic slide editing and presentation mode working now:


You can edit slides, and switch between them, and go back and forth between editor and presenter modes (which really just changes whether the cursor is there and editing is available for now).

There are still a bunch of bugs and performance improvements (rendering the text is still much slower than it should/could be, for example), but it fundamentally works. With a little effort, it is possible to make simple presentations like this one.  And as it is built on our prior work on anti-aliased text for the MEGA65, it really looks quite nice, which is a bit clearer in the still image:


In the video you can also see sprites with alpha blending.  This is used for the slide number indicator that appears on the bottom right corner of the screen and then fades out using an alpha transition from opaque to transparent to smoothly disappear.

Sunday, November 11, 2018

Vertical border knockout for Wizball

The logic for what feature or bug we work on next on the MEGA65 varies quite a lot.  Today, the motivation was my kids have discovered the joy of Wizball, and in particular, playing it as a two-player cooperative team.  The trouble is that the MEGA65 didn't support vertical border knockout in a way that was compatible with existing C64 software.  

It turns out to be a real pain playing Wizball without vertical border knockout working, because you can see neither the power up section at the top, nor the colour buckets at the bottom.  I don't think I had realised until today that those were all fully placed in the vertical borders, but they are.

This bug/feature turned out to be really quite easy to fix. All I had to do was make the vertical border enable/disable logic in the VIC-IV be edge triggered like on the VIC-II (and I presume, the VIC-III), so that if you moved the border position so that the VIC-IV never sees the start of the vertical borders, then the vertical borders never appear.  

After a quick synthesis run, this was the result (with the border colour changed to blue to make it easier to see what is going on):


Yay! We can now see the missing sections of the screen.  

There is however some detritis in the lower border, which I need to deal with.  This is not entirely simple, because it is actually the 26th and further rows of the screen being drawn, because the VIC-IV isn't limited to 25 rows like the VIC-II or VIC-III.  I'll have to have a think about how I deal with that.   While there is little software for the MEGA65 at the moment, it might just be easiest to have a register that sets the number of lines of text to draw, and just have it default to the usual 25.

Monday, November 5, 2018

Working on the user's guide

Another little milestone for us, we started work more seriously on the user guide for the MEGA65, including making a latex template for us to use as the basis for the collaborative development stage of the user guide. This adds to our existing quiet little team of volunteers on the user's guide who have offered assistance with professional layout and editing. Whether we use the latex template in the end will depend on whether we can make it look exactly like we want.   There is of course a lot of planning and content writing between now and then to be done.

On the content and planning, our intention is to make a user's guide that is at least as friendly and useful as the original C64 user's guide.
 
But for now, here is one possible front-cover for the user's guide:



Also, we'd love to hear what you think should be in the MEGA65 user's guide (and what shouldn't), so comment below!

Wednesday, October 31, 2018

Video closure more or less done. Alpha-blending fixed along the way. Proportional text rendering looking quite nice.

After a pile more work, and just a little more hair-pulling, we now have timing-closure on the video modes, and by and large, they are working as expected, although there are a few remaining things to fix, such as some of the composited overlay displays that are used in particular circumstances.  But those should be fairly fixable without great difficulty.  (There is still a niggling problem that the CPU is not making timing closure, but that will have to wait for another day to fix. I might have to just drop the CPU speed to 40MHz, if we can't use the faster grades of the FPGA part we are using, but I hope we can instead just improve the timing to get closure at the current 50MHz.)

The Xilinx FIFO was doing really weird things for us (most likely because we weren't driving it properly, but I am still not sure), so in the end we ditched it, and implemented our own little pixel FIFO to cross from the 100MHz VIC-IV clock to the 30 or 40MHz VGA/HDMI pixel clock.  That worked quite well, and only took a day or so to mostly shake down.

In the process, and since I was doing a bunch of synthesis runs, I figured I would track down the remaining bugs with the alpha-blending logic in the VIC-IV.  Basically you can make full-colour characters either be 256 colours, or instead, turn on the alpha blender, and get 256 graduations between the background colour and the character colour.  

This will be handy for a bunch of uses, but for now, we are focusing on our simple presentation software for the MEGA65, because we would love to introduce the MEGA65 at gatherings using the MEGA65 itself.  

We have an intern who is doing great work on that presentation software, and has a mostly functional editor that supports simultaneous use of multiple type faces, as well as colours, underline, reverse and blink attributes (all using the VIC-III extended character attributes).

Having this editor mostly working made it really easy to do some tests that the alpha blender was working properly, and that the video output was stable with the timing closure work, as you can see here:


That screen full of stuff was basically just typed in by me in the space of a minute or two, using short-cut keys to change colours (using the normal control and Commodore key combinations), as well as to set the attributes (RVS ON / Control-R, Control-U and Control-B).  Remember that this is all being rendered using gymnastics in text mode, not using a bitmap mode.  Each big character is drawn using a number of normal 8x8 characters with alpha-blending and full-colour mode enabled.  Kerning is via the kerning extended attribute of the VIC-IV.

While there are still a few glitches and bugs to be worked through, the result is clearly pretty nice, and quite amazing for a little 8-bit computer.   Here is a closer view of part of the display, where you can clearly see how the alpha blender is being used to anti-alias the text:


 And even closer in, this can be seen even more clearly.  Note that the shadowing on the red "l" is an artifact of my camera, and that it is in fact properly blended.

Here we show that the anti-aliasing really is dynamic to the background colour, by changing the background colour of the screen, and as expected, the anti-aliasing is now all between the text colour and the background colour:


One of the interesting effects of this, is that it is possible to have many more than 256 colours on the screen at once, because each text cell can be showing a variety of colours between the foreground and background colours.

And just to confirm that we really can reproduce annoying 1990s style blinking text using the VIC-III's hardware blink attribute:


So, while there are still a few loose ends to tidy up, I am happy that we are moving forward, and I can cross a couple of long-standing issues off my list.

Monday, October 29, 2018

More work on video mode closure

Getting the video mode switching stuff finally settled is driving me bananas.

I have a simple test bed that implements the video output system, and that works just fine.  But the instant I feed it with the whole design, so that the VIC-IV is feeding the video output system, I get really weird problems.  And I mean really weird.

For example, the digital video output lines that are fed to the resister network for making the analog VGA signals should be either 3.3V or 0V at any point in time.  However, with the whole design, they suddenly are showing quite weird analog modulation on the signals.  This should simply not be possible.  But if I bypass the FIFO and just pass the video signals through, then they come out just fine.  FIFOs are First-In-First-Out digital buffers. They should not be able to add weird analog effects.  In fact, if I even just pass the video signals in and out of the pixel_driver module, without even modifying them or delaying them, then the problems show up.  The following photo gives an idea of the kind of thing we are seeing:

You can see that the displays fades off to black on the right hand side.  It is also doing it a bit on the left.

In the process of trying to debug it, I found that if I put an oscilloscope probe on the HSYNC line of the VGA port, then the pattern will shift sideways.  It does this on all 3 FPGA boards we tested it on.  Very weird. Especially since the HSYNC line is outputting just fine, and is an output-only signal.  Clearly there is some weird analogy thing going on.

We can also work out a few other things from this display.  Primarily, the FIFO feeding and reading are both working fine, because the text display is stable, and is not warped or distorted.  There is also no jitter on the pixels: they are rock solid.  It is only this weird analogy effect that iss causing trouble. We can see what is going on there a bit better if I make the screen all white, so that the red green and blue channels are all fully saturated, and then look at those signals. In these image, the top channel is the HSYNC pulse, and the bottom channel the blue VGA output.

So, here in the first one, we can see the time around a single HSYNC pulse. We see the blue channel is active, except for the HSYNC fly-back time, which is what we expect. We can also see that, although the blue channel should be stably fully saturated, it is showing a variation of about 300mV continuously:

So lets look at this more closely.  The following shot shows the start of the blue channel activating, and we can see a clear pattern where about every 25ns, the blue channel varies by about 300mV.  It should instead be totally stable, but clearly isn't:


So what is going on here?  What are some possibilities?

Well, on the one hand, while we are using a 40MHz pixel clock (= 25ns period), the real pixel clock is 120MHz, i.e., 3x that. It is possible to speculate that we are seeing the blue channel pulling down 1/3 of the time, instead of holding constant.  The monitor might then not latch onto this clock cleanly, and might thus have the fading effect due to different clock drift of when the monitor samples the signal, and when the sampling point of the monitor drifts with regard to the peaks and troughs in the signal, and ends up mostly sampling the troughs instead of the peaks.  I did check that it isn't the signal itself going funny at the end of the raster lines, instead the signal looks more or less identical throughout.

What I can see is sometimes the blue channel has some strange oscillations visible on it.  This could be due to meta-stability on cross-domain signal crossing, except that by using the FIFO the data for the blue channel doesn't actually cross clock domains.


Back to the sampling theory, there might be some evidence in support of this, because if I switch the video mode to 50Hz, which uses a different dot clock (30MHz instead of 40MHz), then the whole effect changes.  Instead of the ragged right edge, only some of the columns of pixels are visible:



While various columns of pixels are missing, we can see that the image itself is there. As with the 60Hz mode, the pixels are stationary horizontally, with just a little bit of sparkle as some pixels decide whether they are visible or not.  If I make the screen white again, so that we have a constant saturated channel to look at, the display looks like this:



 And the oscilloscope view of the blue channel looks like this:

 That blue channel at the bottom looks anything but constant! It really is no wonder that the display looks like it does, with the blue channel jumping all over the place like that, instead of being steady.  The ~50mV ripple in the HSYNC line also worries me. It has a period of about 25ns, i.e., pretty close to the 30MHz pixel clock, which makes me think there is something leaking somewhere.  However, this is more interesting as a clue of what is going wrong, rather than a functional problem, because the ripple is small enough to not be a problem.  The almost 1V swing on the blue channel is of course a complete different story.

In short, this is all really weird and makes very little sense to me. Especially since the test harness target I wrote exhibits none of these behaviours, despite using the same pixel_driver module I wrote.

But what is even stranger is something we discovered today, when one of our students was helping me to debug this.  He bypassed the FIFO to see what would happen, and the result is quite interesting.  Basically, it all works, without any of the funny problems, but of course the pixels are not all lined up in both modes, because the pixels are released on the wrong clock domain, so it isn't really an option.  It does make me think that there is some problem in the way that I am using the FIFO, though. It also tempts me to make my own simpler FIFO for this job.  We'll see. In the meantime, I'll sleep on it.