Monday 29 October 2018

More work on video mode closure

Getting the video mode switching stuff finally settled is driving me bananas.

I have a simple test bed that implements the video output system, and that works just fine.  But the instant I feed it with the whole design, so that the VIC-IV is feeding the video output system, I get really weird problems.  And I mean really weird.

For example, the digital video output lines that are fed to the resister network for making the analog VGA signals should be either 3.3V or 0V at any point in time.  However, with the whole design, they suddenly are showing quite weird analog modulation on the signals.  This should simply not be possible.  But if I bypass the FIFO and just pass the video signals through, then they come out just fine.  FIFOs are First-In-First-Out digital buffers. They should not be able to add weird analog effects.  In fact, if I even just pass the video signals in and out of the pixel_driver module, without even modifying them or delaying them, then the problems show up.  The following photo gives an idea of the kind of thing we are seeing:

You can see that the displays fades off to black on the right hand side.  It is also doing it a bit on the left.

In the process of trying to debug it, I found that if I put an oscilloscope probe on the HSYNC line of the VGA port, then the pattern will shift sideways.  It does this on all 3 FPGA boards we tested it on.  Very weird. Especially since the HSYNC line is outputting just fine, and is an output-only signal.  Clearly there is some weird analogy thing going on.

We can also work out a few other things from this display.  Primarily, the FIFO feeding and reading are both working fine, because the text display is stable, and is not warped or distorted.  There is also no jitter on the pixels: they are rock solid.  It is only this weird analogy effect that iss causing trouble. We can see what is going on there a bit better if I make the screen all white, so that the red green and blue channels are all fully saturated, and then look at those signals. In these image, the top channel is the HSYNC pulse, and the bottom channel the blue VGA output.

So, here in the first one, we can see the time around a single HSYNC pulse. We see the blue channel is active, except for the HSYNC fly-back time, which is what we expect. We can also see that, although the blue channel should be stably fully saturated, it is showing a variation of about 300mV continuously:

So lets look at this more closely.  The following shot shows the start of the blue channel activating, and we can see a clear pattern where about every 25ns, the blue channel varies by about 300mV.  It should instead be totally stable, but clearly isn't:


So what is going on here?  What are some possibilities?

Well, on the one hand, while we are using a 40MHz pixel clock (= 25ns period), the real pixel clock is 120MHz, i.e., 3x that. It is possible to speculate that we are seeing the blue channel pulling down 1/3 of the time, instead of holding constant.  The monitor might then not latch onto this clock cleanly, and might thus have the fading effect due to different clock drift of when the monitor samples the signal, and when the sampling point of the monitor drifts with regard to the peaks and troughs in the signal, and ends up mostly sampling the troughs instead of the peaks.  I did check that it isn't the signal itself going funny at the end of the raster lines, instead the signal looks more or less identical throughout.

What I can see is sometimes the blue channel has some strange oscillations visible on it.  This could be due to meta-stability on cross-domain signal crossing, except that by using the FIFO the data for the blue channel doesn't actually cross clock domains.


Back to the sampling theory, there might be some evidence in support of this, because if I switch the video mode to 50Hz, which uses a different dot clock (30MHz instead of 40MHz), then the whole effect changes.  Instead of the ragged right edge, only some of the columns of pixels are visible:



While various columns of pixels are missing, we can see that the image itself is there. As with the 60Hz mode, the pixels are stationary horizontally, with just a little bit of sparkle as some pixels decide whether they are visible or not.  If I make the screen white again, so that we have a constant saturated channel to look at, the display looks like this:



 And the oscilloscope view of the blue channel looks like this:

 That blue channel at the bottom looks anything but constant! It really is no wonder that the display looks like it does, with the blue channel jumping all over the place like that, instead of being steady.  The ~50mV ripple in the HSYNC line also worries me. It has a period of about 25ns, i.e., pretty close to the 30MHz pixel clock, which makes me think there is something leaking somewhere.  However, this is more interesting as a clue of what is going wrong, rather than a functional problem, because the ripple is small enough to not be a problem.  The almost 1V swing on the blue channel is of course a complete different story.

In short, this is all really weird and makes very little sense to me. Especially since the test harness target I wrote exhibits none of these behaviours, despite using the same pixel_driver module I wrote.

But what is even stranger is something we discovered today, when one of our students was helping me to debug this.  He bypassed the FIFO to see what would happen, and the result is quite interesting.  Basically, it all works, without any of the funny problems, but of course the pixels are not all lined up in both modes, because the pixels are released on the wrong clock domain, so it isn't really an option.  It does make me think that there is some problem in the way that I am using the FIFO, though. It also tempts me to make my own simpler FIFO for this job.  We'll see. In the meantime, I'll sleep on it.

Sunday 28 October 2018

Improving timing closure with stable 50/60Hz video

Recently I got the video output nice and stable at 50Hz and 60Hz.  However, in the process I introduced a bunch of cross-domain timing dependencies because of how VHDL works, and fixing those problems has been driving me bananas!

Our latest crop of bananas. Much tastier than the VHDL kind of bananas I have been experiencing.

Basically I have 30MHz and 40MHz pixel clocks for the video output modes, 100MHz for the VIC-IV internal pixel generator and 50MHz for the CPU.  Whenever signals have to cross between these domains, things get tricky.

The CPU and VIC-IV are integer multiples of each other, so that ends up being simple -- all the transfers just have  to satisfy the 100MHz timing of the VIC-IV.

The problem is the integration of the 30MHz, 40Mz and 100MHz clocks, because some clock ticks can happen very close to one another from time to time.  100MHz means a 10ns clock, 40MHz a 25ns clock, and 30MHz 33.33ns.  They all start with their first pulse at 0ns, but as time goes on, we can see that some clock pulses will happen very close in time.  For example, after two ticks of the 30MHz clock, 66.66ns have passed, and after seven ticks of the 100MHz clock 70ns have passed.  This means that any signal being transferred from the 30MHz clock domain to the 100MHz clock domain has only 70ns - 66.66ns = 3.33ns to occur, effectively acting as though the clock were 300MHz instead of 30MHz.

This has really nasty effects on things, and so we need to avoid it, and I have been pulling my hair out over this for a few days now, to try to find a good solution.  What I finally realised is that I can avoid almost all of these problems by simplifying things down to a single 120MHz clock for the pixel drivers, instead of the 30MHz and 40MHz clocks, and by separately generating the signals needed for the 100MHz VIC-IV interface natively from a 100MHz clock.

On the VIC-IV side what we really need are the start-of-raster, start-of-frame and time-for-next pixel signals. These don't have to be precise, because the generated pixels go through a FIFO buffer, that we then clock the pixels out at exactly the right side using (until now) the 30MHz or 40MHz pixel clock appropriate for that mode.  Thus we can generate those VIC-IV signals on the 100MHz side, without messing anything up.  Then, on the output side, we can use a 120MHz clock, and count of either 3 clock ticks for 40MHz or 4 clock ticks for 30MHz, and everything should be just fine, and there will be absolutely no cross-domain signals on the video path, which should help the timing closure be made much more easily.

So, I need to:

1. Modify the PLL setup to give me a 120MHz clock instead of the 30MHz and 40MHz clocks.
2. Modify frame_generator.vhdl, so that it assumes a 120MHz output clock, and generates the 100MHz-oriented signals for the VIC-IV as well as the 120MHz-oriented signals for the video output.
3. Modify pixel_driver.vhdl, so that it works with the single 120MHz clock, and just multiplexes the signals for the respective video modes as required.
4. Ensure that everything is properly plumbed, and works.

To do all this, I will use the pixeltest target that I built earlier.  This has the advantage that it synthesises in less than five minutes, compared to 15 - 40 minutes for the whole MEGA65. In the process of revisiting this, I found the timing problems mentioned above, even with the rest of the design removed, so I am hopeful that once I have fixed them, that the synthesis of the pixeltest target should be much faster, because it won't need to try hard to meet impossible timing requirements.

So, first step, regenerate the PLL setup to get me a 120MHz clock. This is in some ways the fiddliest part, because our PLL setup was generated in ISE, not Vivado, so I need to use the old ISE tool to modify it, as I am not game to manually fiddle with clock divider values, although it should in theory be possible.

Well, it turns out I did have to fiddle it by hand, and it looks like it worked just fine.  I only had to adjust the frequency divider values for the 30MHz clock to get it to be 30MHz x 4 = 120 MHz.

With that out the way, I have been working on frame_generator and pixel_driver to make them both work with the single 120MHz output clock.  I have it to the point where the two video modes are working again, but there are still funny cross-domain timing dependencies between the 100MHz and the 120MHz clock domains, which I am now working through.

And here is the point at which I find myself feeling completely stupid, or that I have missed something really basic and really important.

Vivado has a method for describing when a signal shouldn't be considered timing sensitive, for example, when it is a flag crossing timing domains.  This works using the set_false_path directive in a .xdc file.  It sounds great in theory, but I cannot for the love or money figure out what the exact signal name it expects.

Vivado logs might show a signal, like, say, pixeldriver/frame60/y_zero_driver100b, that is, the signal called y_zero_driver100b, inside something called frame60, inside something called pixeldriver.  However, when it comes to reading the XDC file, it claims that no signals match.  I have tried a variety of variations on this, but am no closer to a solution than when I started. What is really annoying is that everything seems to indicate that this is how it should be done, but nowhere can I find documented how you actually do it.

So after a bit of digging around, I discovered you can kind of test this in the Vivado IDE, by using the tcl console, to type things like:

get_cells pixeldriver/fifo_inuse*




That will tell you if it thinks anything matches or not.  After a bit of fiddling around, I was able to find things that claimed to be found. But then when synthesising, I keep getting errors that they aren't found.

Finally making some headway on this:  The synthesis logs WILL report errors on all the signals you reference in the first part of the log.  This is because the file gets read once BEFORE synthesis.  So, you should just ignore those errors, and only care if those errors appear TWICE, as it is is the second instance that matters.  Arggghhh!!!!

Also, it turns out you need funny suffixes on SOME but not ALL of the signals (if you put them on the wrong ones, they won't work, either).  So I had to add _reg_srl onto the end of the names of some, and _reg onto others.  I might work out why at some point.


Oh, well. I have at least figured out how to do this part.  Now of course, something else is not working, but I feel like  I have at least made some progress.

Friday 26 October 2018

Taking the big picture view

Some folks have been a bit confused by some of the different things going on in the MEGA65 project, and have found it hard to pick out the big picture of what we are doing.  In fairness, we have not been perfect at putting out a clear vision all the time.  In part, this is simply because we are a bunch of volunteers, and don't have the time to do all that we would like, and prepare all the material we would like.  Also, there is a bit of a skewed view of our activity, because this blog is the main source of information, and so whatever I am working at the time is the most obvious.

While this post can't do everything, I figure it is still worth spending a little time to layout our overall plans, and give a bit of an insight in to what is going on at the moment.

The primary goal of the project remains to create the MEGA65, a Commodore 65 compatible home computer, that faithfully recreates the experience of using a Commodore 65.  This means a real floppy drive (mostly done), a real keyboard
(also mostly done), and a real injection moulded case (still being worked on).
We also want to provide a complete delightful experience when you buy one, with a good user manual, and fun packaging.

The injection moulded case is one of the big things that we are slowly grinding away on.  We have partners to help with this, but it is a slow process, precisely because we will not do any kind of pre-sales or crowd-funding until we are 100% certain we have everything in place, and the risks of production have been dealt with.  We have seen some other projects that have failed at this point, and want neither to create such a disaster, nor to be part of any such disaster.  As an idea, I suspect the tooling for the case would cost between 50 and 90 thousand Euro if we went to the normal market of suppliers -- so this really is a big cost. But we hope that, as with the keyboard, we are able to work with our fantastic partners to get it done for much less than that, but again, this means we are turning a money cost into a time cost in many regards.  But don't interpret that as us being idle, or having given up.

We also have to do the 2nd revision of the main board, which is underway at the moment, and depending on the vagueries of hardware design, we might need a 3rd revision or so before that is totally ready.  This is a bit intertwined with the case design, as we need to make sure that everything actually fits.

Otherwise, there is the VHDL to get finished, and the core software, so that you can configure the machine, and easily pick which disk image you want to use at any point in time.  This continues as I have time to attack it, and also in part helped by students who are doing MEGA65-related projects in my lab. 

On the VHDL side, this is quite fiddly in places, in part because we want the machine to be a really nice and powerful 8-bit computer, with all the key peripherals built in.  Also, we don't want to have to issue piles of updates that affect the core function of the machine. This means that in places we have to grind through slowly.  This has been most true for the video display, as we have changed video modes for improved monitor compatibility, and in the process had to refactor quite a few things. This is still an ongoing progress, and we are getting towards the end of it I hope.  Otherwise, getting the SD card rock-solid is a real concern for us, and, again, we have had to refactor that quite a bit to get SDHC cards working, since 2GB SD cards are getting a bit hard to get hold of.  There are a number of these sorts of two steps forward, one step back kind of things that have happened.

Now, talking about the research students, this is actually where some of the other confusion seems to have come from.   This is because the research students are typically not working on critical path items for our primary goal.  This is in part because we can't tell ahead of time how much a student will be able to get done, and at what quality.  The students are often fantastic, and get great stuff done, so don't take it as a form of criticism, but rather a form of contingency and risk management on our part.

Also, a research student has to do, well, research. This means that we have to form a research question, and pursue it.  This means that academically it is not appropriate to give them a project of "finish the MEGA65".  As a result, some of their projects have a specific focus that is somewhat distinct from the primary goal, although almost always supports the primary goal in some way.

An example of this is the intern working on the presentation software for the MEGA65. Is it required for the computer to be released? Probably not. Will it be great for letting us use the MEGA65 as the platform for showing itself off? Absolutely.  Also, just the working on the tools, and being able to shake down various bugs as we write software on it and for it is invaluable. 

Another example is the hand-held version of the MEGA65.  It doesn't make sense to add students to the process of making the new PCB revision for the desktop computer, because that will just complicate matters. But, it doesn't mean that we can't explore alternative complementary form-factors, like a hybrid console/phone device.  Again, there are collatoral benefits for the core project, and just increasing the number of folks working on it in the lab actually helps the whole project along.

Yet another example is a student working on a security framework based on the MEGA65. Basically, the MEGA65 is simple enough to be verifyable in the field, unlike modern phones and computers.  So, we are able to have another project looking into that space, and which, again, provides a bunch of collatoral benefits for the project.  Incidentally, this is where the matrix rain display comes in. It doubles as the transition for the built-in memory monitor (which itself is a product of the security work), as well as the indicator when switching to and from secure mode.  Here is the poster for his project:



What I am trying present here is the means for folks to see how we remain focused on the bigger picture, but are leveraging the activity in the telecommunications research lab that I work in, so that both useful research is being done, and at the same time, the MEGA65 is advanced.  The end result is, even if it doesn't look like it all the time from the outside, is that the MEGA65 will be ready sooner, and in a more mature state, and as an added bonus, we will likely end up with quite a nice hand-held version in due course.

In the meantime, we will keep working towards our primary goal, and, as always, the community is welcome to accelerate things by contributing to the project.  And, like yourselves, we are very much looking forward to completing what we still think will be a very nice and very fun computer for us all to play with for many years to come.

Tuesday 16 October 2018

Revisiting proportional text rendering

A long time ago I wrote about implementing a proportional text renderer for the MEGA65, that uses the crazy enhanced text modes of the MEGA65 to make it much more memory efficient, and also bucket loads faster.

Well, after a long pause, we have a great intern who is working on this with me, with the goal of making a functional simple presentation program for the MEGA65, i.e., something a bit like a very simple version of Powerpoint(tm). Only about a million times smaller ;)

So after a lot of preliminary work, we have something that visibly works:

As you can see, we already have colours working.  These are selected in the editor, just like on the C64: Control+number.  I also made a little video showing how fast it is (but couldn't play with the colours while holding the camera in my other hand):

That's an 80 point anti-alised proportional text display! And it is fast. Even though we are using CC65 for everything at the moment, rather than hand-tuned assembly language.  I know the CC65 is producing some quite horrible code in there, and that there is thus plenty of room for optimisation.  But for now, let's just pull apart a bit of how this all works.

First, we take a TrueType (TTF) font file, and feed it through a special rasteriser that I wrote. That rasteriser produces C64-style 8x8 tiles of pixels, and a map of how to make each glyph in the font from those tiles.  The tiles can in principle be either monochrome or 8-bit pixels.  For now, we actually only support the 8-bit pixels.

This works with the MEGA65's alpha channel mode and full-colour text mode where one byte is used for each pixel in the characters, thus requiring 64 bytes per character tile, instead of the usual 8.  With the alpha bit set on the characters, the pixel values are used to mix foreground and background colours for nice anti-aliased rendering. This also means you can end up with way more than 256 colours on the screen, because the alpha-blended colours don't eat precious palette entries.

Anyway, our crazy little program takes one or more of these rasterised fonts, and lets you render Unicode strings, and edit them.  At the moment we only support 16-bit Unicode points, so no smiling poo emoji for now.  That said, the engine does support colour glyphs, so it would be quite easy to support coloured emoji, just by having the alpha flag for those glyphs cleared.  The only change we would need to make is to have a per-glyph alpha flag, instead of having it per-typeface as at present.

The next steps are to make the editor more robust, and support loading multiple fonts at the same time. It already supports multiple colours, and multiple typefaces is really just an issue of us adding the code to load the fonts in from the SD card.

Monday 15 October 2018

Stable 50Hz and 60Hz video

The main reason for going to 800x600, is that we can get most monitors to do a decent 50Hz mode and 60Hz mode at this resolution.  This means we can support PAL and NTSC in a very natural way.

Unfortunately, both of these video modes use different dot clock rates, which complicates issues a bit.  The issue is that the VIC-IV feeds pixels out at a fixed 100MHz, and works out internally when it is time for the next real pixel, which is at either 30MHz or 40Mhz, depending on the video mode.  If we do this naively, then the pixels will either jitter or smear on the VGA output, because the edges will not line up exactly from pixel to pixel.  It also just won't work for HDMI output, which requires a perfectly regular clock.

I have already put a bunch of logic in to deal with this, but it doesn't seem to be working properly, and as a result, there is visible jitter and sparkle on the edges of pixels, and it really doesn't look good, as you can see here:


It actually is more annoying in real-life, because the ragged edges of the pixels crawl up the frame.

So to try to fix this, I have added an extra buffer that buffers a whole raster line, and, so theory has it, clocks out the pixels using the correct clock.  The only problem is that the image above was taken with it enabled, and the problem is unchanged.  So now it is time to work my way back through the video pipeline and find out where it all going wrong.

First step, let's check that the buffer is working correctly.  To test this, I have modified the video pipeline so that instead of buffering a real video pixel, it buffers black and white pixels alternately, to make a simple kind of test pattern. Thus, if it is working correctly, I should see stable vertical lines in both the 50Hz and 60Hz video modes.  This takes only a few lines to implement (and then about half an hour to synthesise...), so it isn't too hard to test.

Well, that was the theory. In fact, it all refused to work for me, so I started instead to completely rework the video pipeline handling so that this is all fixed from one end to the other. 

The first step in this was to make an FPGA target with just the video frame generators for PAL and NTSC, and try to make it so that it can be switched from one to the other, with stable pixels for both.  Thus I made a test-pattern (just alternating black and white lines like before), and the frame drivers and tried to think of the most synthesis friendly way to switch the pixel clocks.

What I settled on for now, is to use an asynchronous FIFO, which are great for crossing clock domains, and to implement logic to multi-plex between the 30 and 40MHz clocks, to pull the pixels out at the correct rate.

This took quite a bit of fiddling around to get all the plumbing in the pixel driver cleaned up, and using the new multiplexer. It did, at least, eventually synthesise, and the VGA monitor thinks there is an image in both modes, and, thinks it is 800x600 at 50Hz and 60Hz, as desired.  So that looks good.  However, there is no image at all, not even the test pattern. So I need to try to figure out why.

After a bunch of fiddling, I managed to get some picture. Part of the problem was that I had forgot to blank the video during the vertical refresh, which is when most monitors work out the base-line voltage of the video signal. As a result, it had decided that the entire image should be black.

A bit more fiddling and I got a kind of test pattern working, that shows the read and write addresses computed for the FIFO buffer. Well, more correctly, for the memory buffer I was using previously.  It looks, in fact, like the FIFO is perhaps the problem now.  I had ditched my home-written BRAM memory buffer for the official Xilinx standard-issue FIFO, because it is supposed to be the best way to do this kind of thing.  However, right now I am scratching my head as to why it isn't working.  There is a reset line, which the documentation isn't totally clear whether it is active high or low. So I connected that to a switch, and tried both positions, but to no avail.

The FIFO has a line that strobes every time a byte is committed to the FIFO, and that is staying stubbornly silent.  So it looks indeed like I have messed up something with the FIFO, and it is not accepting the pixels I am pushing into it.

Yet more fiddling, and it now seems that the FIFO is working.  Part of what I had to do was add a reset sequence for the FIFO.  So now the next trick is to get columns of alternating pixels showing, to make sure that we are able to properly convert the pixels from the incoming 100MHz clock to the outgoing 30/40MHz clocks with no jitter. 

After a little work, I now have it more or less working, but with some artifacts in both the 50Hz and 60Hz video modes, as can be seen in these images:

First, we have 60Hz mode. The vertical banding is an artifact of my camera, but the fuzzy vertical lines are very much real. It happens spaced regularly about every 70 or so pixels.  My suspicion is that one of the few signals that are required to cross the clock domain is responsible, but it is hard to know for sure.


Meanwhile, for the 50Hz display, we have vertical banding that is visible in real life. Also, the pixels don't seem to be of even width:


The uneven width of pixels is really strange, because the FIFO should be using the correct pixel clock to clock the data out.  My only guess, and this might be the cause of both the uneven pixel thicknesses and the occassional fuzzy vertical line in the other mode, is that the FIFO is emptying, which then means that the jitter (from the perspective of the output clocks) of the incoming pixel stream cannot be concealed.

The best way to fix this, is to make sure we don't start reading from the FIFO until the FIFO's "almost empty" signal clears, which indicates that it has a few words stored in it, and thus should be safe to start reading from, since the input side should be supplying pixels at least as fast.

Well, that sounded like a nice theory, but I have confirmed with the oscilloscope that the FIFO only empties at the end of each raster line, and thus should not be the problem.

Closer examination of the 50Hz display shows that there is some tearing on the right edges of some columns of pixels, i.e., there is uncertainty which column will be wide or narrow.  This makes me think that the FIFO machinary I have built must not really be working to the output clock.  The first suspect here is the multiplexor that ties either the 30MHz or 40MHz clock to the read-side of the FIFO.  So, let's just force the read side to 30MHz. In theory, that should give us rock-solid 50Hz display, and messed up 60Hz display. And, indeed it does mess up the 60Hz display in quite interesting ways, but the 50Hz display is unchanged.

I am now really at a miss as to quite what is going on, and am increasingly suspecting that the semantics of the FIFO are not quite what I expect for some reason. But first, let's check that using the oscilloscope. The pixels alternate green between full on, and full off.  Thus, if we stick a probe on the green line of the VGA out, we should be able to get an idea of what is going on.  And the results are indeed interesting...

If the problems I was describing above were with the video signal, then we would expect to see jitter in the widths off the peaks and troughs of the green pulse.  However, they look completely equal to me in both modes:




So then I thought to myself, I wonder if the problem isn't in the monitor sampling the pixels strangely.  That would really explain a lot of things.  So I hit auto-adjust on the monitor, and the vertical bars in 60Hz mode briefly disappeared completely, before the monitor decided that it really wanted to re-adapt to a slightly fuzzy setting again. And checking the mode information detected by the monitor for 50Hz mode, I discovered that it now thought the mode was 1080x610 (I didn't even know tht such a mode exists!), so the banding makes sense, because again, the monitor is sampling the pixels at odd times and interpolating it out to full width.




So, this means that my FIFO machinery is all fine, which is great, but that there is something funny with the frames I am outputting, in that the monitor thinks that they are not quite right, and thus has to interpolate a few pixels one way or the other.

By changing the test pattern to a counter of the number of pixels output on a line, so that it represents a series of binary vertical counters, the problem becomes apparent: 



Basically there are a bunch of dummy pixels at the left edge of the screen (the part that looks like fat ! signs and the clear blue area next to it, before the recurring binary pattern starts).  That means that there are more than 800 pixels, and so the monitor tries to deal with that, by squishing or stretching, depending on what mode it thinks it is in.

A bit of digging around revealed a bug in the frame generator, where an allowance for the video pipeline depth was not being applied in a consistent manner to the start and end of rasters. 

Finally, after a lot more fiddling and fixing silly little bugs, I got the new frame driver working, although in 50Hz mode, the monitor still claims it is 1080 x 610.

So the next step is to integrate all this back into the mainline, which means adjusting the whole video pipeline to use the signals provided by the updated pixel driver here.  That will be a job for another post.