Thursday, November 22, 2018

Tool for easily transferring files to/from SD card via serial

The MEGA65 uses an SD card as its bulk data storage device.  The ROMs, disk images, utility programs and other bits and pieces are all loaded from the SD card.  This means that while we are developing the MEGA65 there is often need to put updated files on the SD card, or to pull files from the SD card, so that we can check that things have been written properly.

This is a bit of a nuisance, as it means pulling the SD card, finding where on earth I have left my SD card adaptor (neither of my laptops have working SD card slots), and generally fiddling with things a lot.  For the MEGA65 rev1 PCB prototype, this is especially annoying, because the SD card interface on those is quite fragile, being hacked on after the PCBs were produced.

So, I have been thinking for a while of how to make an easier way to get files on and off the SD card, without having to pull the SD card out.  We have some progress on an FTP server for the MEGA65 that would allow this to happen at high-speed via ethernet, but that still requires having an ethernet connection, which isn't always the case, and won't be possible at all for the MEGA65 phone prototypes, since they won't have ethernet at all.

I finally realised that a simple way to solve this problem would be to make a program under linux that operates a lot like FTP, but actually communicates to the MEGA65 via the serial interface, and basically takes remote control of the SD card interface by stopping the CPU and writing directly to the IO registers for it, and using the serial monitor commands to read back the data.  This would require implementing a simple FAT file system reader, but I have already done that a few times.  Otherwise, the main issue is that the maximum throughput would be limited to 20KB/sec because of the 2mbit serial speed.  So nowhere near as fast as ethernet, but fast enough for easy fiddling --  especially since it can talk over the existing serial link, which means I can include accessing the SD card as part of automated tests.  So all up, it should be a nice solution.

The particular catalyst for getting around to this, is that I am tracking down a bug with writing to D81 disk images, and so have needed to do a lot of putting-back-the-unmodified-D81 followed by copying-back-the-one-I-just-broke, so that I could examine what the problem was.

So after a few hours of fiddling, I got to the point where I could parse a FAT32 directory (short names only for now), and find the cluster chain for a given file.  A few more hours, and I had cooked up something that is quite similar to FTP to use, but indeed works over the MEGA65 serial monitor interface.

I discovered or implemented a few interesting things in the process of this:

1. Sending serial commands back and forth by USB has a LOT of latency. I had hit this in other projects, but the combined latency of lots of commands was a big problem here. So I worked out how to set the USB serial port latency down from 16ms to 1ms.  That got the actual latency down to only about 3ms -- still 3x longer than it should be, but at least it helped a lot.

2. There are two ways to reduce the latency.  One uses the ASYNC_LOW_LATENCY IOCTL, and in theory doesn't require root access.  Sounds great, but doesn't actually seem to reduce the latency.  The other way is to use a command like: echo 1 | sudo tee /sys/bus/usb-serial/devices/ttyUSB0/latency_timer which does work, but requires root access. Very annoying.  The difference in performance is about 3x - 4x actual throughput, so it really makes a difference.

3. I can reliably set the serial speed to 4mbit/sec, instead of 2mbit/sec. That helps quite a lot with throughput, and lets us get close to 30KB/sec, which while not stellar, is good enough when pushing files <1MB in size.

4. Writing sectors is much slower than reading them, so when writing a sector, I first read to see if it has the correct value, and if so, don't write it.  This, together with a read-cache helps a lot to avoid unnecessarily re-reading sectors of the FAT a lot.

So how does it all look?  Here is an example session of me pushing a fresh D81 file:

$ src/tools/mega65_ftp -b bin/nexys4ddr-widget.bit
fpgajtag -a bin/nexys4ddr-widget.bit
fpgajtag: Digilent:Digilent USB Device:210292645477; bcd:700
count 0/1 cortex -1 dcount 0 trail 0
STATUS 0050107d done 0 release_done 0 eos 10 startup_state 4
fpgajtag: Starting to send file
fpgajtag: Done sending file
fpgajtag: bypass already programmed bc
STATUS 0050107d done 0 release_done 0 eos 10 startup_state 4
fpgajtag: ERROR failed to run pciescanportal: No such file or directory
[T+3sec] Bitstream loaded
Detected new-style UART monitor.
SD card not yet ready, so reset it.
MEGA65 SD-card> dir
Found FAT32 partition in partition slot 1 : start=0x800, size=965 MB
FAT32 file system has 965MB formatted capacity, first cluster = 2, 1928 sectors per FAT
FATs begin at sector 0x238 and 0x9c0
           0 M.E.G.A..65!
      131072 MEGA65.ROM
      819200 MEGA65.D81
MEGA65 SD-card> put ../c65/games2.d81 MEGA65.D81
Uploaded 819200 bytes in 28 seconds (28.6KB/sec)
MEGA65 SD-card>

You might notice that you have to supply a bitstream for the FPGA.  This is because we need to catch the MEGA65 booting in the Hypervisor, and stop the CPU there, so that we have full access over the hardware. Also, it makes sure we actually have a MEGA65 system to connect to, in case the FPGA board was previously doing something else.

So with this, I was able to work out that sector writes were only writing every other byte, fix that, and then find some rotated-by-1-byte bugs and fix those, all without having to workout where I have put that SD card adapter.  This helped me to fix the nasty disk corruption bug I was facing, that would cause sad scenes like this:



I'm now looking forward to beginning to write the freeze menu, as this really is the heart of the MEGA65 system in many ways, for selecting disk images to attach, as well as swapping running programs and so on.

Sunday, November 18, 2018

Just because a splash screen is obligatory...

Still working on getting the MEGA65 presentation software ready for Linux Conf in January.  We decided that no presentation software could exist without having an annoying splash screen that appears while it loads, and that the incongruity will be very fun during the presentation. Here is the current version of it (we will fix the colour glitches).



It uses a 240x128 256-colour logo converted from a PNG, and then displayed using the MEGA65's full-colour text mode. As the program for displaying this doesn't need to be too efficient, I am just writing it using CC65, the little C compiler for the C64.  All it really does is copy the screen and colour RAM to another area, and turn it into the two-bytes-per-character version required when using full-colour text mode with character sets with >256 chars, and then draw the characters for the splash logo onto that.  It also uses a neat trick of the VIC-IV, where you can tell it the firsts 256 characters are normal mono/multi-colour characters, and only higher character numbers are full-colour. This lets you easily mix in normal text, which in our case, for overlaying the logo was a really nice simple solution.  The entire source code is really quite simple. Here is the interesting parts:

void main(void)
{

    // Set CPU to full speed, enable VIC-IV IO registers
    m65_io_enable();

    // Go back to upper case, because CC65 programs start by

    // going to lower-case for some reason
    POKE(0xD018,0x14);

    // Copy palette into place

    // $100 bytes each for red, green, and blue = $300 total
    lcopy(splashlogo,0xFFD3100U,0x300);
   
    // Copy splash logo to the top of RAM. The logo consists of

    // 30x16 8x8 full-colour characters, so 64 bytes per
    // character for a total of 30x16x64 = 30,720 bytes.
    // $58000 is near to top of the 384KB RAM
    lcopy(splashlogo+0x300,0x58000U,30720);

    // Copy screen to $57800 to make 16-bit version of screen

    // i.e. put the screen just before the character data
    // Do similar for colour RAM
    for(n=0;n<1000;n++) {
      // Screen RAM:

      // In 16-bit text mode, the character number is the lower
      // 13-bits. So when copying the C64 screen to this mode,
      // we can just copy the screen RAM byte with the character
      // number to the lower byte, and leave the upper byte
      // blank.
      lpoke(0x57800U+n*2,PEEK(0x0400+n));
      lpoke(0x57800U+n*2+1,0);

      // Colour RAM is similar, but the colour goes in the 2nd

      // byte (this is just how the VIC-IV works)
      lpoke(0xFF80800+n*2+0,0);
      lpoke(0xFF80800+n*2+1,PEEK(0xD800U+n));
    }    

    // Draw logo on the screen.
    // 240 x 128 = 30 x 16 rows

    // Character numbers >256 in full-colour mode refer to 
    // fixed addresses of (character number)*64, so we have to
    // add $58000/64 = $1600 = 5632 to character number
    // which is x + y * 30.  These then get stored into the 
    // appropriate bytes of the screen memory at $57800.
    // we use lpoke() because the addresses are >$FFFF.
    for(x=0;x<30;x++)
      for(y=0;y<16;y++) {
        glyph=(0x58000U/0x40)+x+y*30;
        lpoke(0x57800U+0+(5*2)+(4*40*2)+x*2+y*(40*2),glyph&0xff);
        lpoke(0x57800U+1+(5*2)+(4*40*2)+x*2+y*(40*2),glyph>>8);
      }
   
    // set 16-bit text mode, enable compositor, and full-colour

    // text mode for characters >255
    POKE(0xD054U,0xC5);


    // set screen line length to 40*2=80 bytes 
    POKE(0xD058U,40*2);

    // Move start of screen address to $057800
    POKE(0xD060U,0x00); POKE(0xD061U,0x78); POKE(0xD062U,0x05);
  
    // Set colour RAM start to $FF80000 + $0800
    POKE(0xD064U,0x00); POKE(0xD065U,0x08);

    // Then pretend to load for now
    while(1) {
      POKE(0xD020U,0x0e);
      for(n=0;n<12000;n++) continue;
      POKE(0xD020U,0x01);
      for(x=0;x<25;x++) continue;
    }
   
    return;
}

Tuesday, November 13, 2018

More work getting ready for presenting the MEGA65 at Linux Conf

We have been accepted to give a talk at Linux Conf in Christchurch New Zealand in January, and I really want the MEGA65 to be able to present itself at that event, rather than using another device to run the slides.  Many of you will have seen the recent work towards that.  We have more progress now, with basic slide editing and presentation mode working now:


You can edit slides, and switch between them, and go back and forth between editor and presenter modes (which really just changes whether the cursor is there and editing is available for now).

There are still a bunch of bugs and performance improvements (rendering the text is still much slower than it should/could be, for example), but it fundamentally works. With a little effort, it is possible to make simple presentations like this one.  And as it is built on our prior work on anti-aliased text for the MEGA65, it really looks quite nice, which is a bit clearer in the still image:


In the video you can also see sprites with alpha blending.  This is used for the slide number indicator that appears on the bottom right corner of the screen and then fades out using an alpha transition from opaque to transparent to smoothly disappear.

Sunday, November 11, 2018

Vertical border knockout for Wizball

The logic for what feature or bug we work on next on the MEGA65 varies quite a lot.  Today, the motivation was my kids have discovered the joy of Wizball, and in particular, playing it as a two-player cooperative team.  The trouble is that the MEGA65 didn't support vertical border knockout in a way that was compatible with existing C64 software.  

It turns out to be a real pain playing Wizball without vertical border knockout working, because you can see neither the power up section at the top, nor the colour buckets at the bottom.  I don't think I had realised until today that those were all fully placed in the vertical borders, but they are.

This bug/feature turned out to be really quite easy to fix. All I had to do was make the vertical border enable/disable logic in the VIC-IV be edge triggered like on the VIC-II (and I presume, the VIC-III), so that if you moved the border position so that the VIC-IV never sees the start of the vertical borders, then the vertical borders never appear.  

After a quick synthesis run, this was the result (with the border colour changed to blue to make it easier to see what is going on):


Yay! We can now see the missing sections of the screen.  

There is however some detritis in the lower border, which I need to deal with.  This is not entirely simple, because it is actually the 26th and further rows of the screen being drawn, because the VIC-IV isn't limited to 25 rows like the VIC-II or VIC-III.  I'll have to have a think about how I deal with that.   While there is little software for the MEGA65 at the moment, it might just be easiest to have a register that sets the number of lines of text to draw, and just have it default to the usual 25.

Monday, November 5, 2018

Working on the user's guide

Another little milestone for us, we started work more seriously on the user guide for the MEGA65, including making a latex template for us to use as the basis for the collaborative development stage of the user guide. This adds to our existing quiet little team of volunteers on the user's guide who have offered assistance with professional layout and editing. Whether we use the latex template in the end will depend on whether we can make it look exactly like we want.   There is of course a lot of planning and content writing between now and then to be done.

On the content and planning, our intention is to make a user's guide that is at least as friendly and useful as the original C64 user's guide.
 
But for now, here is one possible front-cover for the user's guide:



Also, we'd love to hear what you think should be in the MEGA65 user's guide (and what shouldn't), so comment below!

Wednesday, October 31, 2018

Video closure more or less done. Alpha-blending fixed along the way. Proportional text rendering looking quite nice.

After a pile more work, and just a little more hair-pulling, we now have timing-closure on the video modes, and by and large, they are working as expected, although there are a few remaining things to fix, such as some of the composited overlay displays that are used in particular circumstances.  But those should be fairly fixable without great difficulty.  (There is still a niggling problem that the CPU is not making timing closure, but that will have to wait for another day to fix. I might have to just drop the CPU speed to 40MHz, if we can't use the faster grades of the FPGA part we are using, but I hope we can instead just improve the timing to get closure at the current 50MHz.)

The Xilinx FIFO was doing really weird things for us (most likely because we weren't driving it properly, but I am still not sure), so in the end we ditched it, and implemented our own little pixel FIFO to cross from the 100MHz VIC-IV clock to the 30 or 40MHz VGA/HDMI pixel clock.  That worked quite well, and only took a day or so to mostly shake down.

In the process, and since I was doing a bunch of synthesis runs, I figured I would track down the remaining bugs with the alpha-blending logic in the VIC-IV.  Basically you can make full-colour characters either be 256 colours, or instead, turn on the alpha blender, and get 256 graduations between the background colour and the character colour.  

This will be handy for a bunch of uses, but for now, we are focusing on our simple presentation software for the MEGA65, because we would love to introduce the MEGA65 at gatherings using the MEGA65 itself.  

We have an intern who is doing great work on that presentation software, and has a mostly functional editor that supports simultaneous use of multiple type faces, as well as colours, underline, reverse and blink attributes (all using the VIC-III extended character attributes).

Having this editor mostly working made it really easy to do some tests that the alpha blender was working properly, and that the video output was stable with the timing closure work, as you can see here:


That screen full of stuff was basically just typed in by me in the space of a minute or two, using short-cut keys to change colours (using the normal control and Commodore key combinations), as well as to set the attributes (RVS ON / Control-R, Control-U and Control-B).  Remember that this is all being rendered using gymnastics in text mode, not using a bitmap mode.  Each big character is drawn using a number of normal 8x8 characters with alpha-blending and full-colour mode enabled.  Kerning is via the kerning extended attribute of the VIC-IV.

While there are still a few glitches and bugs to be worked through, the result is clearly pretty nice, and quite amazing for a little 8-bit computer.   Here is a closer view of part of the display, where you can clearly see how the alpha blender is being used to anti-alias the text:


 And even closer in, this can be seen even more clearly.  Note that the shadowing on the red "l" is an artifact of my camera, and that it is in fact properly blended.

Here we show that the anti-aliasing really is dynamic to the background colour, by changing the background colour of the screen, and as expected, the anti-aliasing is now all between the text colour and the background colour:


One of the interesting effects of this, is that it is possible to have many more than 256 colours on the screen at once, because each text cell can be showing a variety of colours between the foreground and background colours.

And just to confirm that we really can reproduce annoying 1990s style blinking text using the VIC-III's hardware blink attribute:


So, while there are still a few loose ends to tidy up, I am happy that we are moving forward, and I can cross a couple of long-standing issues off my list.

Monday, October 29, 2018

More work on video mode closure

Getting the video mode switching stuff finally settled is driving me bananas.

I have a simple test bed that implements the video output system, and that works just fine.  But the instant I feed it with the whole design, so that the VIC-IV is feeding the video output system, I get really weird problems.  And I mean really weird.

For example, the digital video output lines that are fed to the resister network for making the analog VGA signals should be either 3.3V or 0V at any point in time.  However, with the whole design, they suddenly are showing quite weird analog modulation on the signals.  This should simply not be possible.  But if I bypass the FIFO and just pass the video signals through, then they come out just fine.  FIFOs are First-In-First-Out digital buffers. They should not be able to add weird analog effects.  In fact, if I even just pass the video signals in and out of the pixel_driver module, without even modifying them or delaying them, then the problems show up.  The following photo gives an idea of the kind of thing we are seeing:

You can see that the displays fades off to black on the right hand side.  It is also doing it a bit on the left.

In the process of trying to debug it, I found that if I put an oscilloscope probe on the HSYNC line of the VGA port, then the pattern will shift sideways.  It does this on all 3 FPGA boards we tested it on.  Very weird. Especially since the HSYNC line is outputting just fine, and is an output-only signal.  Clearly there is some weird analogy thing going on.

We can also work out a few other things from this display.  Primarily, the FIFO feeding and reading are both working fine, because the text display is stable, and is not warped or distorted.  There is also no jitter on the pixels: they are rock solid.  It is only this weird analogy effect that iss causing trouble. We can see what is going on there a bit better if I make the screen all white, so that the red green and blue channels are all fully saturated, and then look at those signals. In these image, the top channel is the HSYNC pulse, and the bottom channel the blue VGA output.

So, here in the first one, we can see the time around a single HSYNC pulse. We see the blue channel is active, except for the HSYNC fly-back time, which is what we expect. We can also see that, although the blue channel should be stably fully saturated, it is showing a variation of about 300mV continuously:

So lets look at this more closely.  The following shot shows the start of the blue channel activating, and we can see a clear pattern where about every 25ns, the blue channel varies by about 300mV.  It should instead be totally stable, but clearly isn't:


So what is going on here?  What are some possibilities?

Well, on the one hand, while we are using a 40MHz pixel clock (= 25ns period), the real pixel clock is 120MHz, i.e., 3x that. It is possible to speculate that we are seeing the blue channel pulling down 1/3 of the time, instead of holding constant.  The monitor might then not latch onto this clock cleanly, and might thus have the fading effect due to different clock drift of when the monitor samples the signal, and when the sampling point of the monitor drifts with regard to the peaks and troughs in the signal, and ends up mostly sampling the troughs instead of the peaks.  I did check that it isn't the signal itself going funny at the end of the raster lines, instead the signal looks more or less identical throughout.

What I can see is sometimes the blue channel has some strange oscillations visible on it.  This could be due to meta-stability on cross-domain signal crossing, except that by using the FIFO the data for the blue channel doesn't actually cross clock domains.


Back to the sampling theory, there might be some evidence in support of this, because if I switch the video mode to 50Hz, which uses a different dot clock (30MHz instead of 40MHz), then the whole effect changes.  Instead of the ragged right edge, only some of the columns of pixels are visible:



While various columns of pixels are missing, we can see that the image itself is there. As with the 60Hz mode, the pixels are stationary horizontally, with just a little bit of sparkle as some pixels decide whether they are visible or not.  If I make the screen white again, so that we have a constant saturated channel to look at, the display looks like this:



 And the oscilloscope view of the blue channel looks like this:

 That blue channel at the bottom looks anything but constant! It really is no wonder that the display looks like it does, with the blue channel jumping all over the place like that, instead of being steady.  The ~50mV ripple in the HSYNC line also worries me. It has a period of about 25ns, i.e., pretty close to the 30MHz pixel clock, which makes me think there is something leaking somewhere.  However, this is more interesting as a clue of what is going wrong, rather than a functional problem, because the ripple is small enough to not be a problem.  The almost 1V swing on the blue channel is of course a complete different story.

In short, this is all really weird and makes very little sense to me. Especially since the test harness target I wrote exhibits none of these behaviours, despite using the same pixel_driver module I wrote.

But what is even stranger is something we discovered today, when one of our students was helping me to debug this.  He bypassed the FIFO to see what would happen, and the result is quite interesting.  Basically, it all works, without any of the funny problems, but of course the pixels are not all lined up in both modes, because the pixels are released on the wrong clock domain, so it isn't really an option.  It does make me think that there is some problem in the way that I am using the FIFO, though. It also tempts me to make my own simpler FIFO for this job.  We'll see. In the meantime, I'll sleep on it.

Sunday, October 28, 2018

Improving timing closure with stable 50/60Hz video

Recently I got the video output nice and stable at 50Hz and 60Hz.  However, in the process I introduced a bunch of cross-domain timing dependencies because of how VHDL works, and fixing those problems has been driving me bananas!

Our latest crop of bananas. Much tastier than the VHDL kind of bananas I have been experiencing.

Basically I have 30MHz and 40MHz pixel clocks for the video output modes, 100MHz for the VIC-IV internal pixel generator and 50MHz for the CPU.  Whenever signals have to cross between these domains, things get tricky.

The CPU and VIC-IV are integer multiples of each other, so that ends up being simple -- all the transfers just have  to satisfy the 100MHz timing of the VIC-IV.

The problem is the integration of the 30MHz, 40Mz and 100MHz clocks, because some clock ticks can happen very close to one another from time to time.  100MHz means a 10ns clock, 40MHz a 25ns clock, and 30MHz 33.33ns.  They all start with their first pulse at 0ns, but as time goes on, we can see that some clock pulses will happen very close in time.  For example, after two ticks of the 30MHz clock, 66.66ns have passed, and after seven ticks of the 100MHz clock 70ns have passed.  This means that any signal being transferred from the 30MHz clock domain to the 100MHz clock domain has only 70ns - 66.66ns = 3.33ns to occur, effectively acting as though the clock were 300MHz instead of 30MHz.

This has really nasty effects on things, and so we need to avoid it, and I have been pulling my hair out over this for a few days now, to try to find a good solution.  What I finally realised is that I can avoid almost all of these problems by simplifying things down to a single 120MHz clock for the pixel drivers, instead of the 30MHz and 40MHz clocks, and by separately generating the signals needed for the 100MHz VIC-IV interface natively from a 100MHz clock.

On the VIC-IV side what we really need are the start-of-raster, start-of-frame and time-for-next pixel signals. These don't have to be precise, because the generated pixels go through a FIFO buffer, that we then clock the pixels out at exactly the right side using (until now) the 30MHz or 40MHz pixel clock appropriate for that mode.  Thus we can generate those VIC-IV signals on the 100MHz side, without messing anything up.  Then, on the output side, we can use a 120MHz clock, and count of either 3 clock ticks for 40MHz or 4 clock ticks for 30MHz, and everything should be just fine, and there will be absolutely no cross-domain signals on the video path, which should help the timing closure be made much more easily.

So, I need to:

1. Modify the PLL setup to give me a 120MHz clock instead of the 30MHz and 40MHz clocks.
2. Modify frame_generator.vhdl, so that it assumes a 120MHz output clock, and generates the 100MHz-oriented signals for the VIC-IV as well as the 120MHz-oriented signals for the video output.
3. Modify pixel_driver.vhdl, so that it works with the single 120MHz clock, and just multiplexes the signals for the respective video modes as required.
4. Ensure that everything is properly plumbed, and works.

To do all this, I will use the pixeltest target that I built earlier.  This has the advantage that it synthesises in less than five minutes, compared to 15 - 40 minutes for the whole MEGA65. In the process of revisiting this, I found the timing problems mentioned above, even with the rest of the design removed, so I am hopeful that once I have fixed them, that the synthesis of the pixeltest target should be much faster, because it won't need to try hard to meet impossible timing requirements.

So, first step, regenerate the PLL setup to get me a 120MHz clock. This is in some ways the fiddliest part, because our PLL setup was generated in ISE, not Vivado, so I need to use the old ISE tool to modify it, as I am not game to manually fiddle with clock divider values, although it should in theory be possible.

Well, it turns out I did have to fiddle it by hand, and it looks like it worked just fine.  I only had to adjust the frequency divider values for the 30MHz clock to get it to be 30MHz x 4 = 120 MHz.

With that out the way, I have been working on frame_generator and pixel_driver to make them both work with the single 120MHz output clock.  I have it to the point where the two video modes are working again, but there are still funny cross-domain timing dependencies between the 100MHz and the 120MHz clock domains, which I am now working through.

And here is the point at which I find myself feeling completely stupid, or that I have missed something really basic and really important.

Vivado has a method for describing when a signal shouldn't be considered timing sensitive, for example, when it is a flag crossing timing domains.  This works using the set_false_path directive in a .xdc file.  It sounds great in theory, but I cannot for the love or money figure out what the exact signal name it expects.

Vivado logs might show a signal, like, say, pixeldriver/frame60/y_zero_driver100b, that is, the signal called y_zero_driver100b, inside something called frame60, inside something called pixeldriver.  However, when it comes to reading the XDC file, it claims that no signals match.  I have tried a variety of variations on this, but am no closer to a solution than when I started. What is really annoying is that everything seems to indicate that this is how it should be done, but nowhere can I find documented how you actually do it.

So after a bit of digging around, I discovered you can kind of test this in the Vivado IDE, by using the tcl console, to type things like:

get_cells pixeldriver/fifo_inuse*




That will tell you if it thinks anything matches or not.  After a bit of fiddling around, I was able to find things that claimed to be found. But then when synthesising, I keep getting errors that they aren't found.

Finally making some headway on this:  The synthesis logs WILL report errors on all the signals you reference in the first part of the log.  This is because the file gets read once BEFORE synthesis.  So, you should just ignore those errors, and only care if those errors appear TWICE, as it is is the second instance that matters.  Arggghhh!!!!

Also, it turns out you need funny suffixes on SOME but not ALL of the signals (if you put them on the wrong ones, they won't work, either).  So I had to add _reg_srl onto the end of the names of some, and _reg onto others.  I might work out why at some point.


Oh, well. I have at least figured out how to do this part.  Now of course, something else is not working, but I feel like  I have at least made some progress.

Friday, October 26, 2018

Taking the big picture view

Some folks have been a bit confused by some of the different things going on in the MEGA65 project, and have found it hard to pick out the big picture of what we are doing.  In fairness, we have not been perfect at putting out a clear vision all the time.  In part, this is simply because we are a bunch of volunteers, and don't have the time to do all that we would like, and prepare all the material we would like.  Also, there is a bit of a skewed view of our activity, because this blog is the main source of information, and so whatever I am working at the time is the most obvious.

While this post can't do everything, I figure it is still worth spending a little time to layout our overall plans, and give a bit of an insight in to what is going on at the moment.

The primary goal of the project remains to create the MEGA65, a Commodore 65 compatible home computer, that faithfully recreates the experience of using a Commodore 65.  This means a real floppy drive (mostly done), a real keyboard
(also mostly done), and a real injection moulded case (still being worked on).
We also want to provide a complete delightful experience when you buy one, with a good user manual, and fun packaging.

The injection moulded case is one of the big things that we are slowly grinding away on.  We have partners to help with this, but it is a slow process, precisely because we will not do any kind of pre-sales or crowd-funding until we are 100% certain we have everything in place, and the risks of production have been dealt with.  We have seen some other projects that have failed at this point, and want neither to create such a disaster, nor to be part of any such disaster.  As an idea, I suspect the tooling for the case would cost between 50 and 90 thousand Euro if we went to the normal market of suppliers -- so this really is a big cost. But we hope that, as with the keyboard, we are able to work with our fantastic partners to get it done for much less than that, but again, this means we are turning a money cost into a time cost in many regards.  But don't interpret that as us being idle, or having given up.

We also have to do the 2nd revision of the main board, which is underway at the moment, and depending on the vagueries of hardware design, we might need a 3rd revision or so before that is totally ready.  This is a bit intertwined with the case design, as we need to make sure that everything actually fits.

Otherwise, there is the VHDL to get finished, and the core software, so that you can configure the machine, and easily pick which disk image you want to use at any point in time.  This continues as I have time to attack it, and also in part helped by students who are doing MEGA65-related projects in my lab. 

On the VHDL side, this is quite fiddly in places, in part because we want the machine to be a really nice and powerful 8-bit computer, with all the key peripherals built in.  Also, we don't want to have to issue piles of updates that affect the core function of the machine. This means that in places we have to grind through slowly.  This has been most true for the video display, as we have changed video modes for improved monitor compatibility, and in the process had to refactor quite a few things. This is still an ongoing progress, and we are getting towards the end of it I hope.  Otherwise, getting the SD card rock-solid is a real concern for us, and, again, we have had to refactor that quite a bit to get SDHC cards working, since 2GB SD cards are getting a bit hard to get hold of.  There are a number of these sorts of two steps forward, one step back kind of things that have happened.

Now, talking about the research students, this is actually where some of the other confusion seems to have come from.   This is because the research students are typically not working on critical path items for our primary goal.  This is in part because we can't tell ahead of time how much a student will be able to get done, and at what quality.  The students are often fantastic, and get great stuff done, so don't take it as a form of criticism, but rather a form of contingency and risk management on our part.

Also, a research student has to do, well, research. This means that we have to form a research question, and pursue it.  This means that academically it is not appropriate to give them a project of "finish the MEGA65".  As a result, some of their projects have a specific focus that is somewhat distinct from the primary goal, although almost always supports the primary goal in some way.

An example of this is the intern working on the presentation software for the MEGA65. Is it required for the computer to be released? Probably not. Will it be great for letting us use the MEGA65 as the platform for showing itself off? Absolutely.  Also, just the working on the tools, and being able to shake down various bugs as we write software on it and for it is invaluable. 

Another example is the hand-held version of the MEGA65.  It doesn't make sense to add students to the process of making the new PCB revision for the desktop computer, because that will just complicate matters. But, it doesn't mean that we can't explore alternative complementary form-factors, like a hybrid console/phone device.  Again, there are collatoral benefits for the core project, and just increasing the number of folks working on it in the lab actually helps the whole project along.

Yet another example is a student working on a security framework based on the MEGA65. Basically, the MEGA65 is simple enough to be verifyable in the field, unlike modern phones and computers.  So, we are able to have another project looking into that space, and which, again, provides a bunch of collatoral benefits for the project.  Incidentally, this is where the matrix rain display comes in. It doubles as the transition for the built-in memory monitor (which itself is a product of the security work), as well as the indicator when switching to and from secure mode.  Here is the poster for his project:



What I am trying present here is the means for folks to see how we remain focused on the bigger picture, but are leveraging the activity in the telecommunications research lab that I work in, so that both useful research is being done, and at the same time, the MEGA65 is advanced.  The end result is, even if it doesn't look like it all the time from the outside, is that the MEGA65 will be ready sooner, and in a more mature state, and as an added bonus, we will likely end up with quite a nice hand-held version in due course.

In the meantime, we will keep working towards our primary goal, and, as always, the community is welcome to accelerate things by contributing to the project.  And, like yourselves, we are very much looking forward to completing what we still think will be a very nice and very fun computer for us all to play with for many years to come.

Tuesday, October 16, 2018

Revisiting proportional text rendering

A long time ago I wrote about implementing a proportional text renderer for the MEGA65, that uses the crazy enhanced text modes of the MEGA65 to make it much more memory efficient, and also bucket loads faster.

Well, after a long pause, we have a great intern who is working on this with me, with the goal of making a functional simple presentation program for the MEGA65, i.e., something a bit like a very simple version of Powerpoint(tm). Only about a million times smaller ;)

So after a lot of preliminary work, we have something that visibly works:

As you can see, we already have colours working.  These are selected in the editor, just like on the C64: Control+number.  I also made a little video showing how fast it is (but couldn't play with the colours while holding the camera in my other hand):

That's an 80 point anti-alised proportional text display! And it is fast. Even though we are using CC65 for everything at the moment, rather than hand-tuned assembly language.  I know the CC65 is producing some quite horrible code in there, and that there is thus plenty of room for optimisation.  But for now, let's just pull apart a bit of how this all works.

First, we take a TrueType (TTF) font file, and feed it through a special rasteriser that I wrote. That rasteriser produces C64-style 8x8 tiles of pixels, and a map of how to make each glyph in the font from those tiles.  The tiles can in principle be either monochrome or 8-bit pixels.  For now, we actually only support the 8-bit pixels.

This works with the MEGA65's alpha channel mode and full-colour text mode where one byte is used for each pixel in the characters, thus requiring 64 bytes per character tile, instead of the usual 8.  With the alpha bit set on the characters, the pixel values are used to mix foreground and background colours for nice anti-aliased rendering. This also means you can end up with way more than 256 colours on the screen, because the alpha-blended colours don't eat precious palette entries.

Anyway, our crazy little program takes one or more of these rasterised fonts, and lets you render Unicode strings, and edit them.  At the moment we only support 16-bit Unicode points, so no smiling poo emoji for now.  That said, the engine does support colour glyphs, so it would be quite easy to support coloured emoji, just by having the alpha flag for those glyphs cleared.  The only change we would need to make is to have a per-glyph alpha flag, instead of having it per-typeface as at present.

The next steps are to make the editor more robust, and support loading multiple fonts at the same time. It already supports multiple colours, and multiple typefaces is really just an issue of us adding the code to load the fonts in from the SD card.

Sunday, October 14, 2018

Stable 50Hz and 60Hz video

The main reason for going to 800x600, is that we can get most monitors to do a decent 50Hz mode and 60Hz mode at this resolution.  This means we can support PAL and NTSC in a very natural way.

Unfortunately, both of these video modes use different dot clock rates, which complicates issues a bit.  The issue is that the VIC-IV feeds pixels out at a fixed 100MHz, and works out internally when it is time for the next real pixel, which is at either 30MHz or 40Mhz, depending on the video mode.  If we do this naively, then the pixels will either jitter or smear on the VGA output, because the edges will not line up exactly from pixel to pixel.  It also just won't work for HDMI output, which requires a perfectly regular clock.

I have already put a bunch of logic in to deal with this, but it doesn't seem to be working properly, and as a result, there is visible jitter and sparkle on the edges of pixels, and it really doesn't look good, as you can see here:


It actually is more annoying in real-life, because the ragged edges of the pixels crawl up the frame.

So to try to fix this, I have added an extra buffer that buffers a whole raster line, and, so theory has it, clocks out the pixels using the correct clock.  The only problem is that the image above was taken with it enabled, and the problem is unchanged.  So now it is time to work my way back through the video pipeline and find out where it all going wrong.

First step, let's check that the buffer is working correctly.  To test this, I have modified the video pipeline so that instead of buffering a real video pixel, it buffers black and white pixels alternately, to make a simple kind of test pattern. Thus, if it is working correctly, I should see stable vertical lines in both the 50Hz and 60Hz video modes.  This takes only a few lines to implement (and then about half an hour to synthesise...), so it isn't too hard to test.

Well, that was the theory. In fact, it all refused to work for me, so I started instead to completely rework the video pipeline handling so that this is all fixed from one end to the other. 

The first step in this was to make an FPGA target with just the video frame generators for PAL and NTSC, and try to make it so that it can be switched from one to the other, with stable pixels for both.  Thus I made a test-pattern (just alternating black and white lines like before), and the frame drivers and tried to think of the most synthesis friendly way to switch the pixel clocks.

What I settled on for now, is to use an asynchronous FIFO, which are great for crossing clock domains, and to implement logic to multi-plex between the 30 and 40MHz clocks, to pull the pixels out at the correct rate.

This took quite a bit of fiddling around to get all the plumbing in the pixel driver cleaned up, and using the new multiplexer. It did, at least, eventually synthesise, and the VGA monitor thinks there is an image in both modes, and, thinks it is 800x600 at 50Hz and 60Hz, as desired.  So that looks good.  However, there is no image at all, not even the test pattern. So I need to try to figure out why.

After a bunch of fiddling, I managed to get some picture. Part of the problem was that I had forgot to blank the video during the vertical refresh, which is when most monitors work out the base-line voltage of the video signal. As a result, it had decided that the entire image should be black.

A bit more fiddling and I got a kind of test pattern working, that shows the read and write addresses computed for the FIFO buffer. Well, more correctly, for the memory buffer I was using previously.  It looks, in fact, like the FIFO is perhaps the problem now.  I had ditched my home-written BRAM memory buffer for the official Xilinx standard-issue FIFO, because it is supposed to be the best way to do this kind of thing.  However, right now I am scratching my head as to why it isn't working.  There is a reset line, which the documentation isn't totally clear whether it is active high or low. So I connected that to a switch, and tried both positions, but to no avail.

The FIFO has a line that strobes every time a byte is committed to the FIFO, and that is staying stubbornly silent.  So it looks indeed like I have messed up something with the FIFO, and it is not accepting the pixels I am pushing into it.

Yet more fiddling, and it now seems that the FIFO is working.  Part of what I had to do was add a reset sequence for the FIFO.  So now the next trick is to get columns of alternating pixels showing, to make sure that we are able to properly convert the pixels from the incoming 100MHz clock to the outgoing 30/40MHz clocks with no jitter. 

After a little work, I now have it more or less working, but with some artifacts in both the 50Hz and 60Hz video modes, as can be seen in these images:

First, we have 60Hz mode. The vertical banding is an artifact of my camera, but the fuzzy vertical lines are very much real. It happens spaced regularly about every 70 or so pixels.  My suspicion is that one of the few signals that are required to cross the clock domain is responsible, but it is hard to know for sure.


Meanwhile, for the 50Hz display, we have vertical banding that is visible in real life. Also, the pixels don't seem to be of even width:


The uneven width of pixels is really strange, because the FIFO should be using the correct pixel clock to clock the data out.  My only guess, and this might be the cause of both the uneven pixel thicknesses and the occassional fuzzy vertical line in the other mode, is that the FIFO is emptying, which then means that the jitter (from the perspective of the output clocks) of the incoming pixel stream cannot be concealed.

The best way to fix this, is to make sure we don't start reading from the FIFO until the FIFO's "almost empty" signal clears, which indicates that it has a few words stored in it, and thus should be safe to start reading from, since the input side should be supplying pixels at least as fast.

Well, that sounded like a nice theory, but I have confirmed with the oscilloscope that the FIFO only empties at the end of each raster line, and thus should not be the problem.

Closer examination of the 50Hz display shows that there is some tearing on the right edges of some columns of pixels, i.e., there is uncertainty which column will be wide or narrow.  This makes me think that the FIFO machinary I have built must not really be working to the output clock.  The first suspect here is the multiplexor that ties either the 30MHz or 40MHz clock to the read-side of the FIFO.  So, let's just force the read side to 30MHz. In theory, that should give us rock-solid 50Hz display, and messed up 60Hz display. And, indeed it does mess up the 60Hz display in quite interesting ways, but the 50Hz display is unchanged.

I am now really at a miss as to quite what is going on, and am increasingly suspecting that the semantics of the FIFO are not quite what I expect for some reason. But first, let's check that using the oscilloscope. The pixels alternate green between full on, and full off.  Thus, if we stick a probe on the green line of the VGA out, we should be able to get an idea of what is going on.  And the results are indeed interesting...

If the problems I was describing above were with the video signal, then we would expect to see jitter in the widths off the peaks and troughs of the green pulse.  However, they look completely equal to me in both modes:




So then I thought to myself, I wonder if the problem isn't in the monitor sampling the pixels strangely.  That would really explain a lot of things.  So I hit auto-adjust on the monitor, and the vertical bars in 60Hz mode briefly disappeared completely, before the monitor decided that it really wanted to re-adapt to a slightly fuzzy setting again. And checking the mode information detected by the monitor for 50Hz mode, I discovered that it now thought the mode was 1080x610 (I didn't even know tht such a mode exists!), so the banding makes sense, because again, the monitor is sampling the pixels at odd times and interpolating it out to full width.




So, this means that my FIFO machinery is all fine, which is great, but that there is something funny with the frames I am outputting, in that the monitor thinks that they are not quite right, and thus has to interpolate a few pixels one way or the other.

By changing the test pattern to a counter of the number of pixels output on a line, so that it represents a series of binary vertical counters, the problem becomes apparent: 



Basically there are a bunch of dummy pixels at the left edge of the screen (the part that looks like fat ! signs and the clear blue area next to it, before the recurring binary pattern starts).  That means that there are more than 800 pixels, and so the monitor tries to deal with that, by squishing or stretching, depending on what mode it thinks it is in.

A bit of digging around revealed a bug in the frame generator, where an allowance for the video pipeline depth was not being applied in a consistent manner to the start and end of rasters. 

Finally, after a lot more fiddling and fixing silly little bugs, I got the new frame driver working, although in 50Hz mode, the monitor still claims it is 1080 x 610.

So the next step is to integrate all this back into the mainline, which means adjusting the whole video pipeline to use the signals provided by the updated pixel driver here.  That will be a job for another post.

Monday, September 10, 2018

Keyboard prototypes have been manufactured, and they are AMAZING

I am not particularly prone to being over excited, but I will confess to being just a bit excited when I saw the photos of the keyboard samples that have been produced for us for the MEGA65.

From the outset, we were exacting in our demands:

1. The space bar must be the full size.  This is no small thing, because NO ONE makes a 19cm wide space bar any more, and the injection moulding tooling for such a beast would cost thousands of dollars on its own.

2. The graphics symbols MUST be on the front, not on the top.

3. The shift and caps lock keys MUST be nice and clicky.

4. The whole thing must not in any way break the magic spell of 8-bit-ness.

Oh, and of course we have no money up-front to get the tooling made.

So we had set a high bar, and almost impossible conditions, and yet our friends at GMK have come to the party. And oh, how they have come.  There really isn't anything more to say.  I'll just show you some pictures.


The cases you can see in the first few shots are our 3D printed prototype cases: There is no CGI here -- just real objects.


What first struck me when I saw the first images, was that unless you really knew what you were looking for, you would have no idea that this was not an existing Commodore 65 prototype.  This is of course exactly the effect we want to create :)
* The peg is an optional extra, not included in the standard package.

 Trade-mark fun and games means you get the MEGA65 logo on the "Vendor Key". You can also see here that we have put an LED on the shift-lock and caps-lock keys, in the style of the old Amiga keyboards.

Our volunteer team went to a LOT of work to match the type-face of the original C65 keyboards.  These keyboards just look so fresh and crisp, and yet so 1990, all at the same time.

Across to the right hand side now, all the usual suspects are there.  We have also dealt with the "right cursor key rubs on case" issue that many original C65s had.  Again, the look of the keys is excruciatingly close to the original.

Now the whole keyboard from above (We have just noticed that Blogger has munged the image resolution. We will try to get higher-res images up soon).
 I'll just leave you in peace to take in the next few shots, before I comment again.





 The discerning viewer might notice that the MEGA logo is bleeding together a bit in the little vertical gap. We will tweak this. Also, the printing on the fronts of the keys is currently black instead of grey, and there is a little over-bleed on the graphical symbols. We will also get this fixed. But even as it is now, I find that it is a thing of beauty, and would be just fine, but we are not satisfied with "just fine", we want to get as close to perfection as we can.



 We can see here that the keyboard has a nice continuous slope to the key tops, nicer even than the original.











 In this next shot, you can make out the extra ASCII symbols we have put on the front of some keys that on a C64/128/65 lack any graphics symbols.  The symbols are {, }, _, ~, | and \, and will be accessed from native MEGA65 software by holding the MEGA key down.  It will also be possible to patch C64 and C65 ROMs to support them.  Backquote (`) is also avalilable as MEGA + the <- key, which can be spotted in some of the earlier photos.  Having these missing symbols will make it much more pleasant to do more modern workflows, or even just programming in C, or any other language that uses curly-braces.
 Now we look at the PCB.  Due to some tight deadlines, the keyboards have a slightly wrong PCB outline.  So we had two made with the correct PCB outline to fit into our case design, but that won't work, because lots of tracks have been cut.


Here you can see that we have a full metal plate in the keyboard.  Combined with the CPLD and diodes, this will be a keyboard that has no ghosting when used natively (C64 and C65 ROMs will still have ghosting, if you don't change the keyboard scanning routines, because they don't have perfect provision for a non-ghosting keyboard mechanism.  We might be able to work around this in the VHDL. We will see).







 And now some side-by-side comparisons with an original C65 keyboard: