Sunday, November 18, 2018

Just because a splash screen is obligatory...

Still working on getting the MEGA65 presentation software ready for Linux Conf in January.  We decided that no presentation software could exist without having an annoying splash screen that appears while it loads, and that the incongruity will be very fun during the presentation. Here is the current version of it (we will fix the colour glitches).



It uses a 240x128 256-colour logo converted from a PNG, and then displayed using the MEGA65's full-colour text mode. As the program for displaying this doesn't need to be too efficient, I am just writing it using CC65, the little C compiler for the C64.  All it really does is copy the screen and colour RAM to another area, and turn it into the two-bytes-per-character version required when using full-colour text mode with character sets with >256 chars, and then draw the characters for the splash logo onto that.  It also uses a neat trick of the VIC-IV, where you can tell it the firsts 256 characters are normal mono/multi-colour characters, and only higher character numbers are full-colour. This lets you easily mix in normal text, which in our case, for overlaying the logo was a really nice simple solution.  The entire source code is really quite simple. Here is the interesting parts:

void main(void)
{

    // Set CPU to full speed, enable VIC-IV IO registers
    m65_io_enable();

    // Go back to upper case, because CC65 programs start by

    // going to lower-case for some reason
    POKE(0xD018,0x14);

    // Copy palette into place

    // $100 bytes each for red, green, and blue = $300 total
    lcopy(splashlogo,0xFFD3100U,0x300);
   
    // Copy splash logo to the top of RAM. The logo consists of

    // 30x16 8x8 full-colour characters, so 64 bytes per
    // character for a total of 30x16x64 = 30,720 bytes.
    // $58000 is near to top of the 384KB RAM
    lcopy(splashlogo+0x300,0x58000U,30720);

    // Copy screen to $57800 to make 16-bit version of screen

    // i.e. put the screen just before the character data
    // Do similar for colour RAM
    for(n=0;n<1000;n++) {
      // Screen RAM:

      // In 16-bit text mode, the character number is the lower
      // 13-bits. So when copying the C64 screen to this mode,
      // we can just copy the screen RAM byte with the character
      // number to the lower byte, and leave the upper byte
      // blank.
      lpoke(0x57800U+n*2,PEEK(0x0400+n));
      lpoke(0x57800U+n*2+1,0);

      // Colour RAM is similar, but the colour goes in the 2nd

      // byte (this is just how the VIC-IV works)
      lpoke(0xFF80800+n*2+0,0);
      lpoke(0xFF80800+n*2+1,PEEK(0xD800U+n));
    }    

    // Draw logo on the screen.
    // 240 x 128 = 30 x 16 rows

    // Character numbers >256 in full-colour mode refer to 
    // fixed addresses of (character number)*64, so we have to
    // add $58000/64 = $1600 = 5632 to character number
    // which is x + y * 30.  These then get stored into the 
    // appropriate bytes of the screen memory at $57800.
    // we use lpoke() because the addresses are >$FFFF.
    for(x=0;x<30;x++)
      for(y=0;y<16;y++) {
        glyph=(0x58000U/0x40)+x+y*30;
        lpoke(0x57800U+0+(5*2)+(4*40*2)+x*2+y*(40*2),glyph&0xff);
        lpoke(0x57800U+1+(5*2)+(4*40*2)+x*2+y*(40*2),glyph>>8);
      }
   
    // set 16-bit text mode, enable compositor, and full-colour

    // text mode for characters >255
    POKE(0xD054U,0xC5);


    // set screen line length to 40*2=80 bytes 
    POKE(0xD058U,40*2);

    // Move start of screen address to $057800
    POKE(0xD060U,0x00); POKE(0xD061U,0x78); POKE(0xD062U,0x05);
  
    // Set colour RAM start to $FF80000 + $0800
    POKE(0xD064U,0x00); POKE(0xD065U,0x08);

    // Then pretend to load for now
    while(1) {
      POKE(0xD020U,0x0e);
      for(n=0;n<12000;n++) continue;
      POKE(0xD020U,0x01);
      for(x=0;x<25;x++) continue;
    }
   
    return;
}

Tuesday, November 13, 2018

More work getting ready for presenting the MEGA65 at Linux Conf

We have been accepted to give a talk at Linux Conf in Christchurch New Zealand in January, and I really want the MEGA65 to be able to present itself at that event, rather than using another device to run the slides.  Many of you will have seen the recent work towards that.  We have more progress now, with basic slide editing and presentation mode working now:


You can edit slides, and switch between them, and go back and forth between editor and presenter modes (which really just changes whether the cursor is there and editing is available for now).

There are still a bunch of bugs and performance improvements (rendering the text is still much slower than it should/could be, for example), but it fundamentally works. With a little effort, it is possible to make simple presentations like this one.  And as it is built on our prior work on anti-aliased text for the MEGA65, it really looks quite nice, which is a bit clearer in the still image:


In the video you can also see sprites with alpha blending.  This is used for the slide number indicator that appears on the bottom right corner of the screen and then fades out using an alpha transition from opaque to transparent to smoothly disappear.

Sunday, November 11, 2018

Vertical border knockout for Wizball

The logic for what feature or bug we work on next on the MEGA65 varies quite a lot.  Today, the motivation was my kids have discovered the joy of Wizball, and in particular, playing it as a two-player cooperative team.  The trouble is that the MEGA65 didn't support vertical border knockout in a way that was compatible with existing C64 software.  

It turns out to be a real pain playing Wizball without vertical border knockout working, because you can see neither the power up section at the top, nor the colour buckets at the bottom.  I don't think I had realised until today that those were all fully placed in the vertical borders, but they are.

This bug/feature turned out to be really quite easy to fix. All I had to do was make the vertical border enable/disable logic in the VIC-IV be edge triggered like on the VIC-II (and I presume, the VIC-III), so that if you moved the border position so that the VIC-IV never sees the start of the vertical borders, then the vertical borders never appear.  

After a quick synthesis run, this was the result (with the border colour changed to blue to make it easier to see what is going on):


Yay! We can now see the missing sections of the screen.  

There is however some detritis in the lower border, which I need to deal with.  This is not entirely simple, because it is actually the 26th and further rows of the screen being drawn, because the VIC-IV isn't limited to 25 rows like the VIC-II or VIC-III.  I'll have to have a think about how I deal with that.   While there is little software for the MEGA65 at the moment, it might just be easiest to have a register that sets the number of lines of text to draw, and just have it default to the usual 25.

Monday, November 5, 2018

Working on the user's guide

Another little milestone for us, we started work more seriously on the user guide for the MEGA65, including making a latex template for us to use as the basis for the collaborative development stage of the user guide. This adds to our existing quiet little team of volunteers on the user's guide who have offered assistance with professional layout and editing. Whether we use the latex template in the end will depend on whether we can make it look exactly like we want.   There is of course a lot of planning and content writing between now and then to be done.

On the content and planning, our intention is to make a user's guide that is at least as friendly and useful as the original C64 user's guide.
 
But for now, here is one possible front-cover for the user's guide:



Also, we'd love to hear what you think should be in the MEGA65 user's guide (and what shouldn't), so comment below!

Wednesday, October 31, 2018

Video closure more or less done. Alpha-blending fixed along the way. Proportional text rendering looking quite nice.

After a pile more work, and just a little more hair-pulling, we now have timing-closure on the video modes, and by and large, they are working as expected, although there are a few remaining things to fix, such as some of the composited overlay displays that are used in particular circumstances.  But those should be fairly fixable without great difficulty.  (There is still a niggling problem that the CPU is not making timing closure, but that will have to wait for another day to fix. I might have to just drop the CPU speed to 40MHz, if we can't use the faster grades of the FPGA part we are using, but I hope we can instead just improve the timing to get closure at the current 50MHz.)

The Xilinx FIFO was doing really weird things for us (most likely because we weren't driving it properly, but I am still not sure), so in the end we ditched it, and implemented our own little pixel FIFO to cross from the 100MHz VIC-IV clock to the 30 or 40MHz VGA/HDMI pixel clock.  That worked quite well, and only took a day or so to mostly shake down.

In the process, and since I was doing a bunch of synthesis runs, I figured I would track down the remaining bugs with the alpha-blending logic in the VIC-IV.  Basically you can make full-colour characters either be 256 colours, or instead, turn on the alpha blender, and get 256 graduations between the background colour and the character colour.  

This will be handy for a bunch of uses, but for now, we are focusing on our simple presentation software for the MEGA65, because we would love to introduce the MEGA65 at gatherings using the MEGA65 itself.  

We have an intern who is doing great work on that presentation software, and has a mostly functional editor that supports simultaneous use of multiple type faces, as well as colours, underline, reverse and blink attributes (all using the VIC-III extended character attributes).

Having this editor mostly working made it really easy to do some tests that the alpha blender was working properly, and that the video output was stable with the timing closure work, as you can see here:


That screen full of stuff was basically just typed in by me in the space of a minute or two, using short-cut keys to change colours (using the normal control and Commodore key combinations), as well as to set the attributes (RVS ON / Control-R, Control-U and Control-B).  Remember that this is all being rendered using gymnastics in text mode, not using a bitmap mode.  Each big character is drawn using a number of normal 8x8 characters with alpha-blending and full-colour mode enabled.  Kerning is via the kerning extended attribute of the VIC-IV.

While there are still a few glitches and bugs to be worked through, the result is clearly pretty nice, and quite amazing for a little 8-bit computer.   Here is a closer view of part of the display, where you can clearly see how the alpha blender is being used to anti-alias the text:


 And even closer in, this can be seen even more clearly.  Note that the shadowing on the red "l" is an artifact of my camera, and that it is in fact properly blended.

Here we show that the anti-aliasing really is dynamic to the background colour, by changing the background colour of the screen, and as expected, the anti-aliasing is now all between the text colour and the background colour:


One of the interesting effects of this, is that it is possible to have many more than 256 colours on the screen at once, because each text cell can be showing a variety of colours between the foreground and background colours.

And just to confirm that we really can reproduce annoying 1990s style blinking text using the VIC-III's hardware blink attribute:


So, while there are still a few loose ends to tidy up, I am happy that we are moving forward, and I can cross a couple of long-standing issues off my list.

Monday, October 29, 2018

More work on video mode closure

Getting the video mode switching stuff finally settled is driving me bananas.

I have a simple test bed that implements the video output system, and that works just fine.  But the instant I feed it with the whole design, so that the VIC-IV is feeding the video output system, I get really weird problems.  And I mean really weird.

For example, the digital video output lines that are fed to the resister network for making the analog VGA signals should be either 3.3V or 0V at any point in time.  However, with the whole design, they suddenly are showing quite weird analog modulation on the signals.  This should simply not be possible.  But if I bypass the FIFO and just pass the video signals through, then they come out just fine.  FIFOs are First-In-First-Out digital buffers. They should not be able to add weird analog effects.  In fact, if I even just pass the video signals in and out of the pixel_driver module, without even modifying them or delaying them, then the problems show up.  The following photo gives an idea of the kind of thing we are seeing:

You can see that the displays fades off to black on the right hand side.  It is also doing it a bit on the left.

In the process of trying to debug it, I found that if I put an oscilloscope probe on the HSYNC line of the VGA port, then the pattern will shift sideways.  It does this on all 3 FPGA boards we tested it on.  Very weird. Especially since the HSYNC line is outputting just fine, and is an output-only signal.  Clearly there is some weird analogy thing going on.

We can also work out a few other things from this display.  Primarily, the FIFO feeding and reading are both working fine, because the text display is stable, and is not warped or distorted.  There is also no jitter on the pixels: they are rock solid.  It is only this weird analogy effect that iss causing trouble. We can see what is going on there a bit better if I make the screen all white, so that the red green and blue channels are all fully saturated, and then look at those signals. In these image, the top channel is the HSYNC pulse, and the bottom channel the blue VGA output.

So, here in the first one, we can see the time around a single HSYNC pulse. We see the blue channel is active, except for the HSYNC fly-back time, which is what we expect. We can also see that, although the blue channel should be stably fully saturated, it is showing a variation of about 300mV continuously:

So lets look at this more closely.  The following shot shows the start of the blue channel activating, and we can see a clear pattern where about every 25ns, the blue channel varies by about 300mV.  It should instead be totally stable, but clearly isn't:


So what is going on here?  What are some possibilities?

Well, on the one hand, while we are using a 40MHz pixel clock (= 25ns period), the real pixel clock is 120MHz, i.e., 3x that. It is possible to speculate that we are seeing the blue channel pulling down 1/3 of the time, instead of holding constant.  The monitor might then not latch onto this clock cleanly, and might thus have the fading effect due to different clock drift of when the monitor samples the signal, and when the sampling point of the monitor drifts with regard to the peaks and troughs in the signal, and ends up mostly sampling the troughs instead of the peaks.  I did check that it isn't the signal itself going funny at the end of the raster lines, instead the signal looks more or less identical throughout.

What I can see is sometimes the blue channel has some strange oscillations visible on it.  This could be due to meta-stability on cross-domain signal crossing, except that by using the FIFO the data for the blue channel doesn't actually cross clock domains.


Back to the sampling theory, there might be some evidence in support of this, because if I switch the video mode to 50Hz, which uses a different dot clock (30MHz instead of 40MHz), then the whole effect changes.  Instead of the ragged right edge, only some of the columns of pixels are visible:



While various columns of pixels are missing, we can see that the image itself is there. As with the 60Hz mode, the pixels are stationary horizontally, with just a little bit of sparkle as some pixels decide whether they are visible or not.  If I make the screen white again, so that we have a constant saturated channel to look at, the display looks like this:



 And the oscilloscope view of the blue channel looks like this:

 That blue channel at the bottom looks anything but constant! It really is no wonder that the display looks like it does, with the blue channel jumping all over the place like that, instead of being steady.  The ~50mV ripple in the HSYNC line also worries me. It has a period of about 25ns, i.e., pretty close to the 30MHz pixel clock, which makes me think there is something leaking somewhere.  However, this is more interesting as a clue of what is going wrong, rather than a functional problem, because the ripple is small enough to not be a problem.  The almost 1V swing on the blue channel is of course a complete different story.

In short, this is all really weird and makes very little sense to me. Especially since the test harness target I wrote exhibits none of these behaviours, despite using the same pixel_driver module I wrote.

But what is even stranger is something we discovered today, when one of our students was helping me to debug this.  He bypassed the FIFO to see what would happen, and the result is quite interesting.  Basically, it all works, without any of the funny problems, but of course the pixels are not all lined up in both modes, because the pixels are released on the wrong clock domain, so it isn't really an option.  It does make me think that there is some problem in the way that I am using the FIFO, though. It also tempts me to make my own simpler FIFO for this job.  We'll see. In the meantime, I'll sleep on it.

Sunday, October 28, 2018

Improving timing closure with stable 50/60Hz video

Recently I got the video output nice and stable at 50Hz and 60Hz.  However, in the process I introduced a bunch of cross-domain timing dependencies because of how VHDL works, and fixing those problems has been driving me bananas!

Our latest crop of bananas. Much tastier than the VHDL kind of bananas I have been experiencing.

Basically I have 30MHz and 40MHz pixel clocks for the video output modes, 100MHz for the VIC-IV internal pixel generator and 50MHz for the CPU.  Whenever signals have to cross between these domains, things get tricky.

The CPU and VIC-IV are integer multiples of each other, so that ends up being simple -- all the transfers just have  to satisfy the 100MHz timing of the VIC-IV.

The problem is the integration of the 30MHz, 40Mz and 100MHz clocks, because some clock ticks can happen very close to one another from time to time.  100MHz means a 10ns clock, 40MHz a 25ns clock, and 30MHz 33.33ns.  They all start with their first pulse at 0ns, but as time goes on, we can see that some clock pulses will happen very close in time.  For example, after two ticks of the 30MHz clock, 66.66ns have passed, and after seven ticks of the 100MHz clock 70ns have passed.  This means that any signal being transferred from the 30MHz clock domain to the 100MHz clock domain has only 70ns - 66.66ns = 3.33ns to occur, effectively acting as though the clock were 300MHz instead of 30MHz.

This has really nasty effects on things, and so we need to avoid it, and I have been pulling my hair out over this for a few days now, to try to find a good solution.  What I finally realised is that I can avoid almost all of these problems by simplifying things down to a single 120MHz clock for the pixel drivers, instead of the 30MHz and 40MHz clocks, and by separately generating the signals needed for the 100MHz VIC-IV interface natively from a 100MHz clock.

On the VIC-IV side what we really need are the start-of-raster, start-of-frame and time-for-next pixel signals. These don't have to be precise, because the generated pixels go through a FIFO buffer, that we then clock the pixels out at exactly the right side using (until now) the 30MHz or 40MHz pixel clock appropriate for that mode.  Thus we can generate those VIC-IV signals on the 100MHz side, without messing anything up.  Then, on the output side, we can use a 120MHz clock, and count of either 3 clock ticks for 40MHz or 4 clock ticks for 30MHz, and everything should be just fine, and there will be absolutely no cross-domain signals on the video path, which should help the timing closure be made much more easily.

So, I need to:

1. Modify the PLL setup to give me a 120MHz clock instead of the 30MHz and 40MHz clocks.
2. Modify frame_generator.vhdl, so that it assumes a 120MHz output clock, and generates the 100MHz-oriented signals for the VIC-IV as well as the 120MHz-oriented signals for the video output.
3. Modify pixel_driver.vhdl, so that it works with the single 120MHz clock, and just multiplexes the signals for the respective video modes as required.
4. Ensure that everything is properly plumbed, and works.

To do all this, I will use the pixeltest target that I built earlier.  This has the advantage that it synthesises in less than five minutes, compared to 15 - 40 minutes for the whole MEGA65. In the process of revisiting this, I found the timing problems mentioned above, even with the rest of the design removed, so I am hopeful that once I have fixed them, that the synthesis of the pixeltest target should be much faster, because it won't need to try hard to meet impossible timing requirements.

So, first step, regenerate the PLL setup to get me a 120MHz clock. This is in some ways the fiddliest part, because our PLL setup was generated in ISE, not Vivado, so I need to use the old ISE tool to modify it, as I am not game to manually fiddle with clock divider values, although it should in theory be possible.

Well, it turns out I did have to fiddle it by hand, and it looks like it worked just fine.  I only had to adjust the frequency divider values for the 30MHz clock to get it to be 30MHz x 4 = 120 MHz.

With that out the way, I have been working on frame_generator and pixel_driver to make them both work with the single 120MHz output clock.  I have it to the point where the two video modes are working again, but there are still funny cross-domain timing dependencies between the 100MHz and the 120MHz clock domains, which I am now working through.

And here is the point at which I find myself feeling completely stupid, or that I have missed something really basic and really important.

Vivado has a method for describing when a signal shouldn't be considered timing sensitive, for example, when it is a flag crossing timing domains.  This works using the set_false_path directive in a .xdc file.  It sounds great in theory, but I cannot for the love or money figure out what the exact signal name it expects.

Vivado logs might show a signal, like, say, pixeldriver/frame60/y_zero_driver100b, that is, the signal called y_zero_driver100b, inside something called frame60, inside something called pixeldriver.  However, when it comes to reading the XDC file, it claims that no signals match.  I have tried a variety of variations on this, but am no closer to a solution than when I started. What is really annoying is that everything seems to indicate that this is how it should be done, but nowhere can I find documented how you actually do it.

So after a bit of digging around, I discovered you can kind of test this in the Vivado IDE, by using the tcl console, to type things like:

get_cells pixeldriver/fifo_inuse*




That will tell you if it thinks anything matches or not.  After a bit of fiddling around, I was able to find things that claimed to be found. But then when synthesising, I keep getting errors that they aren't found.

Finally making some headway on this:  The synthesis logs WILL report errors on all the signals you reference in the first part of the log.  This is because the file gets read once BEFORE synthesis.  So, you should just ignore those errors, and only care if those errors appear TWICE, as it is is the second instance that matters.  Arggghhh!!!!

Also, it turns out you need funny suffixes on SOME but not ALL of the signals (if you put them on the wrong ones, they won't work, either).  So I had to add _reg_srl onto the end of the names of some, and _reg onto others.  I might work out why at some point.


Oh, well. I have at least figured out how to do this part.  Now of course, something else is not working, but I feel like  I have at least made some progress.