Sunday, 28 October 2018

Improving timing closure with stable 50/60Hz video

Recently I got the video output nice and stable at 50Hz and 60Hz.  However, in the process I introduced a bunch of cross-domain timing dependencies because of how VHDL works, and fixing those problems has been driving me bananas!

Our latest crop of bananas. Much tastier than the VHDL kind of bananas I have been experiencing.

Basically I have 30MHz and 40MHz pixel clocks for the video output modes, 100MHz for the VIC-IV internal pixel generator and 50MHz for the CPU.  Whenever signals have to cross between these domains, things get tricky.

The CPU and VIC-IV are integer multiples of each other, so that ends up being simple -- all the transfers just have  to satisfy the 100MHz timing of the VIC-IV.

The problem is the integration of the 30MHz, 40Mz and 100MHz clocks, because some clock ticks can happen very close to one another from time to time.  100MHz means a 10ns clock, 40MHz a 25ns clock, and 30MHz 33.33ns.  They all start with their first pulse at 0ns, but as time goes on, we can see that some clock pulses will happen very close in time.  For example, after two ticks of the 30MHz clock, 66.66ns have passed, and after seven ticks of the 100MHz clock 70ns have passed.  This means that any signal being transferred from the 30MHz clock domain to the 100MHz clock domain has only 70ns - 66.66ns = 3.33ns to occur, effectively acting as though the clock were 300MHz instead of 30MHz.

This has really nasty effects on things, and so we need to avoid it, and I have been pulling my hair out over this for a few days now, to try to find a good solution.  What I finally realised is that I can avoid almost all of these problems by simplifying things down to a single 120MHz clock for the pixel drivers, instead of the 30MHz and 40MHz clocks, and by separately generating the signals needed for the 100MHz VIC-IV interface natively from a 100MHz clock.

On the VIC-IV side what we really need are the start-of-raster, start-of-frame and time-for-next pixel signals. These don't have to be precise, because the generated pixels go through a FIFO buffer, that we then clock the pixels out at exactly the right side using (until now) the 30MHz or 40MHz pixel clock appropriate for that mode.  Thus we can generate those VIC-IV signals on the 100MHz side, without messing anything up.  Then, on the output side, we can use a 120MHz clock, and count of either 3 clock ticks for 40MHz or 4 clock ticks for 30MHz, and everything should be just fine, and there will be absolutely no cross-domain signals on the video path, which should help the timing closure be made much more easily.

So, I need to:

1. Modify the PLL setup to give me a 120MHz clock instead of the 30MHz and 40MHz clocks.
2. Modify frame_generator.vhdl, so that it assumes a 120MHz output clock, and generates the 100MHz-oriented signals for the VIC-IV as well as the 120MHz-oriented signals for the video output.
3. Modify pixel_driver.vhdl, so that it works with the single 120MHz clock, and just multiplexes the signals for the respective video modes as required.
4. Ensure that everything is properly plumbed, and works.

To do all this, I will use the pixeltest target that I built earlier.  This has the advantage that it synthesises in less than five minutes, compared to 15 - 40 minutes for the whole MEGA65. In the process of revisiting this, I found the timing problems mentioned above, even with the rest of the design removed, so I am hopeful that once I have fixed them, that the synthesis of the pixeltest target should be much faster, because it won't need to try hard to meet impossible timing requirements.

So, first step, regenerate the PLL setup to get me a 120MHz clock. This is in some ways the fiddliest part, because our PLL setup was generated in ISE, not Vivado, so I need to use the old ISE tool to modify it, as I am not game to manually fiddle with clock divider values, although it should in theory be possible.

Well, it turns out I did have to fiddle it by hand, and it looks like it worked just fine.  I only had to adjust the frequency divider values for the 30MHz clock to get it to be 30MHz x 4 = 120 MHz.

With that out the way, I have been working on frame_generator and pixel_driver to make them both work with the single 120MHz output clock.  I have it to the point where the two video modes are working again, but there are still funny cross-domain timing dependencies between the 100MHz and the 120MHz clock domains, which I am now working through.

And here is the point at which I find myself feeling completely stupid, or that I have missed something really basic and really important.

Vivado has a method for describing when a signal shouldn't be considered timing sensitive, for example, when it is a flag crossing timing domains.  This works using the set_false_path directive in a .xdc file.  It sounds great in theory, but I cannot for the love or money figure out what the exact signal name it expects.

Vivado logs might show a signal, like, say, pixeldriver/frame60/y_zero_driver100b, that is, the signal called y_zero_driver100b, inside something called frame60, inside something called pixeldriver.  However, when it comes to reading the XDC file, it claims that no signals match.  I have tried a variety of variations on this, but am no closer to a solution than when I started. What is really annoying is that everything seems to indicate that this is how it should be done, but nowhere can I find documented how you actually do it.

So after a bit of digging around, I discovered you can kind of test this in the Vivado IDE, by using the tcl console, to type things like:

get_cells pixeldriver/fifo_inuse*

That will tell you if it thinks anything matches or not.  After a bit of fiddling around, I was able to find things that claimed to be found. But then when synthesising, I keep getting errors that they aren't found.

Finally making some headway on this:  The synthesis logs WILL report errors on all the signals you reference in the first part of the log.  This is because the file gets read once BEFORE synthesis.  So, you should just ignore those errors, and only care if those errors appear TWICE, as it is is the second instance that matters.  Arggghhh!!!!

Also, it turns out you need funny suffixes on SOME but not ALL of the signals (if you put them on the wrong ones, they won't work, either).  So I had to add _reg_srl onto the end of the names of some, and _reg onto others.  I might work out why at some point.

Oh, well. I have at least figured out how to do this part.  Now of course, something else is not working, but I feel like  I have at least made some progress.

1 comment:

  1. Maybe, I can't understand the situation here fully ... But if 120MHz is great for being a multiplier for both of 30 and 40MHz pixel clock (50 and 60Hz video mode) as well, wouldn't it implies that 60MHz CPU clock would be better, since 120 is again an integer multiplier of 60, but not for 50 (currently used, in M65-fast mode, I guess)? Or the cross-clock domain problem is more about the video modes, and not between CPU and video things?