Monday, 11 November 2019

Improving raster timing accuracy / Preparing for HDMI output

While the video PAL and NTSC modes for the MEGA65 are already kind of okay, they are far from perfect for various reasons:

1. Current timing is rather approximate, in terms of cycles per raster and rasters per frame. We need to be able to have this exactly match the C64, if we want better compatibility.
2. For HDMI output, we also need to use "real" video modes.
3. The different pixel clocks for PAL and NTSC modes causes various annoying problems.

In looking through all this, I have decided that the video modes that make the most sense are:

PAL: 720x576 50Hz
NTSC: 720x480 60Hz

These are widely supported HDMI-compatible modes used for TV, and which have the added advantage that they both use a 27MHz pixel clock -- which means we can in theory ditch all the complexity I previously added for supporting 30MHz and 40MHz pixel rates for the old VGA-style modes I was using.

The first question I had was whether VGA-input monitors are able to display these modes.  My initial experimentation with my monitors here, and those of the MEGA team seems to indicate that that this is no big problem -- although some fine-tuning is required to get the best picture on the most screens.  VGA is really only intended as a backup to the HDMI connector, in any case, so we are willing to accept moderate compromises on the VGA output, if required.

So first step, modify the clock generator in the FPGA to provide me with 27MHz for the pixel clock, and some multiples of it, for when I need to do things within a clock tick.  This was a bit of a pain, because the MEGA65's clock generator was generated using the old Xilinx ISE toolset, which we no longer use, and which even when we have tried to reinstall seems to refuse to generate the updated clocking file.

The solution here was to run the ISE generator as far as I could, so that it would tell me the clock multiplier and divider values for the PLL, and then import those by hand into my existing file.  Of course, this is a process prone to errors, so I generated a bitstream that just exports the clocks to some pins on a connector, so that I can verify that the frequencies are correct.  After a few false starts, I managed to get this sorted.

(One side-effect of this, is that I couldn't use exactly the same 40MHz CPU clock, but had to increase the CPU clock to 40.625MHz.  That shouldn't hurt the timing closure too much, but will be a welcome 1.5% CPU speed increase.)

Okay, but now I have a problem: We need a 650MHz so that we can divide it in various ways to generate the 27MHz video pixel clock oriented clocks, while also generating being able to generate the ~40MHz CPU clock.  This works because 162 = 27x6 ~= 650/4, and 40.5 = 162/4.  So that's all fine to better than 1% error. The problem comes that I also need a 200MHz clock for the ethernet TX phase control.  There is no integer n, which satisfies 200n = 650*.  So I have a problem.

(* Excluding the possibility of assistance from Stupid Bloody Johnson.)

One approach to solving this, would be if the FPGA we are using has two such clock units, that I could configure separately.  Reading the datasheet, it seems as though there should be 6 of them -- so I should just be able to create the 200MHz clock on a second one.  This would normally be problematic, because the phase of the 200MHz clock would have no relationship to that of the 50MHz fundamental clock for the ethernet.  However, in this case, as the 200MHz clock is only used to adjust the phase of the ethernet TX clock versus that of the TX bits, we might be able to avoid it. In fact, we could in theory use a different clock frequency for this function.  Quite possibly just 100MHz, and allow only 2 instead of 4 phase offsets for the ethernet.  This should probably be enough, and solves the problem for us, so that's what I will do.

So now I have the correct clocks.  Next step is to rework the frame generators, so that they natively use the 27MHz pixel clock to generate the frames.  In theory this should remove a pile of complexity from those files.  The MythTV database lists them as:

720 480 60 Hz 31.4685 kHz ModeLine "720x480" 27.00 720 736 798 858 480 489 495 525 -HSync -VSync
...



720 576 50 Hz 31.25 kHz ModeLine "720x576" 27.00 720 732 796 864 576 581 586 625 -HSync -VSync

First, we see that the redraw rates are approximately double what we would expect, with ~31KHz corresponding (coincidentally) to about 31 usec per raster.  That's okay, because we are using double-scanning instead of interlace, so we expect this. The VIC-IV internally manages this, so we can ignore it (for now).

So, these look pretty good -- except that the NTSC C64 has 65 usec per raster compared with 63 usec per raster for a PAL C64.   This means that if we want accurate timing, we need to adjust the lengths of the NTSC mode to be 65/63 the duration of the PAL mode. We'll come back to this, after we actually get the modes working correctly.

Now, previously I had a FIFO buffer that took the pixels generated by the VIC-IV on the 80MHz pixel clock, and allowed them to be read out using the 30 or 40MHz pixel clock of the actual display mode. In theory, it should work quite well. However, in practice, it seemed to always have jitter problems from somewhere, where the pixel positions would vary sideways a little.  Thus I am taking the FIFO out, and requiring that the pixels have explicit X positions when being written, so that the jitter cannot happen.  This buffer will get read out at the native 27MHz pixel clock rate, which should also completely eliminate the fat/skinny pixel problems caused by the video mode not matching what monitors expect, and thus trying to latch the pixels at some rate other than 27MHz.

The only challenge with this re-work, is that we need to make sure there is no vertical tear line, when the VIC-IV catches up with the pixel driver outputting from the buffer.  I'll work on that once I get it working, as there are a few ways to deal with this.

Actually, stop press on that... The VIC-IV's pixel clock is now 81MHz and the video pixel clock is 27MHz, because we had to make them related to generate them all from the one clock on the FPGA.  This means that the VIC-IV's internal pixel clock is 3x the real pixel clock.  This means we can just clock a pixel out every 3 cycles, and completely dispense with the FIFO, buffer or whatever AND avoid the vertical tear line AND free-up a precious 4KB BRAM in the FPGA in the process.  I'm not sure why it took so long for me to realise this...

Anyway, now that I have realised it, I am in the process of restructuring everything to use this property.  It's really nice in that it is allowing me to strip a pile of complexity out of the system.  It took a few hours to get the pixeltest bitstream working with it, but it now displays a nice stable pattern in both the 50Hz and 60Hz PAL and NTSC video modes.

The next step was integrating the changes into the main targets for the MEGA65 PCB and Nexys4 boards.  This wasn't too hard, but there were a few places where I had to adjust the display fetch logic, to account for the fact that the display is now 720 pixels wide, instead of 800.  In particular, this was preventing the character / sprite / bitplane fetch logic from occurring each raster line, because it was waiting for the X position to reach 800, which of course would now never happen.
 
There was then about a two month pause, while I had too much workload at work to have any real spare time to investigate any further.  After this involuntary pause, I started looking more seriously into the ADV7511 HDMI controller chip that we are using. 

I'd previously had trouble talking to the I2C interface on this chip.  So I started with trying to waggle the SDA and SCL lines of the device.  This revealed that I had one of the pin numbers of the FPGA incorrrectly recorded.  After fixing that, I finally was able to communicate with the 7511.  This was quite a nice step forward.  However, while I could read all the registers, it was refusing to write to most registers.  After some hunting around, I discovered that this was because it hadn't detected a monitor, and that causes it to stay in power-down mode, in which you can't write to most of the registers. 

The mystery of why it was stuck in power-down when I clearly had a monitor connected took a little bit of exploration to work out.  I hadn't realised that the MEGA65 main board has a kind of helper chip to protect the 7511 from static and other over-voltage hazards.  That chip has an output enable line, that if not asserted prevents its built-in level converter from connecting the 7511 to the actual HDMI port.  Once I figured that out, I was able to activate the output enable, and then I was suddenly able to get the 7511 to power-up, and even to read the EDID information from the connected monitor, which can be seen in the lowest two rows of hex in the second image.  The first image shows the registers before being configured. You can play spot the differences to work out some of the register settings I used ;)



The next step was to configure all the various registers of the 7511 for operation.  The documentation for the 7511 lists a whole bunch of mandatory register settings that are required for correct operation. It also lists a whole bunch of other registers that control video format and other things.    It's quite hard to know exactly the right settings.

After some trial and error, and a general lack of HDMI picture appearing. I had a look at Mike Field's example VHDL for driving the same chip.  I'm not sure why I hadn't discovered this a long time ago.  It took a few hours to adapt his code to run on the MEGA65 main board, since it was intended for another board.  However, it too failed to show any display initially.  At which point I was quite despondent, as I had really expected his code to work, and to thus give me a starting point from which to slowly adapt across to my purposes.  It was also midnight, so I sensibly gave up and went to bed.

It wasn't until this evening that I sat down again to look at this, and it occurred to me that Mike's code might have been assuming the incorrect I2C address for the 7511, since it supports two different addresses.  It was wrong, but fixing it didn't fix the problem.  However, my own code had shown me that sometimes you have to send the I2C commands to the 7511 twice instead of once, if the monitor disconnects and reconnects again.  It seems that both of my HDMI montiors do this.  So I wrote a little routine to trigger the sending of these commands to the 7511 every second.  And FINALLY I had some sort of picture showing up on the HDMI output:

 So here we see a couple of problems with the HDMI (right screen) vs VGA (left screen), as well as me writing this blog post on my laptop ;)

1. The HDMI output has the same colour for every pixel on a raster.

2. The colours are All Wrong.

For (1), I don't (yet) have any firm idea, but it could be something to do with the way Mike's program generates the HDMI frame using some complicated DDR output mode that I don't need for our 27MHz pixel clock, and thus haven't really tried to understand.

For (2), amongst the pile of reading I have done on the ADV7511, I know that this is likely a colour-space conversion problem, because RGB black becomes green in the YCbCr colour space.  That should thus be easy enough to fix by changing the default register setup.

Of these, (2) looks the easiest to fix for now.  So I'll start there.  There are two main registers on the 7511 that control the  pixel format and related things:

The upper-nybl of $15 controls the audio sample format, which we are not yet ready to worry about. The lower nybl, however, controls the pixel format.   $x6 indicates YCbCr, which is what Mike's code was using.  That would explain things. So I'll try $x0 instead, which should tell it to use RGB 4:4:4 (i.e., RGB where each colour channel is sampled at the same rate, i.e., Stink Normal RGB). And indeed it makes quite a difference:

Note that while we can now see lots of pixels, which probably means that (1) is not really a problem, but was some weird artefact, that not all the colours are being produced the same on both outputs.  The keen observer will also notice that the image is slightly rotated horizontally.  We'll also worry about that later.

The next register of interest is register $16, which controls a number of things related to pixel format. Mike's code sets it to $37.  In my code, I had it set to $30.  The difference is that my value has "Not valid" instead of "Style 2" for the pin assignments for the pixels coming to the 7511. That's definitely a possible problem.  Bit 1 controls DDR rising/falling edge for certain modes, which shouldn't matter for us.  Bit 0 selects the input colour space, where 0 is RGB and 1 is YCbCr.  I'm going to try $30 in Mike's code, and see if it breaks the display of the image, and then if that's the case, try $34, which I think should work in my code.  Except... that it seems to make no difference to Mike's program when I do that (or to my code when I try the different settings).

So then I tried copying in the colour space conversion registers, but also without any visible effect in my code.

Now I'm a bit confused.  Basically it seems to me that the only two differences are the use of the DDR versus simple delivery of pixels, and the video mode.  Neither of which should prevent the thing from displaying an image.  The video mode in particular should be fine, since the 7511 even reports that it detects the mode correctly as the EDTV 576p or 480p video mode, which means it thinks that the HSYNC and VSYNC timing are all just fine.

But just in case, and also to make it easier to progressively converge Mike's working code and my not working code, I am converting Mike's code to produce a 576p EDTV PAL image.  If that displays on HDMI, then I know it isn't the mode, and must presumably be to do with the pixel format, and that I might have to use DDR pixel delivery after all.

Okay, so video mode switched to 576p 50Hz, and with Mike's code, it still displays just fine.  Well, fine-ish, since I have messed with the colour space stuff.   In adapting Mike's code to produce 576p, I ended up using a slightly different pixel clock of 1400/52=26.9something MHz. 


The monitor shows the mode correctly as 720x576 @ 50Hz, so this might be a good option for the MEGA65 to use, since it might get us back to a simpler set of clock frequencies. But that's for another day.  First, let's try to get some sensible pixel colours happening again, by making sure we have the pixel format set to RGB etc, and finally, we have a working HDMI image:



After simplifying the code, and re-working it to use the MEGA65-style externally supplied 27MHz pixelclock, this is now VERY similar in operation to my own HDMI driver that doesn't work. This is really quite mysterious.  The main difference I can see so far, is Mike's code has an I2C sender that does the initialisation of the HDMI controller automatically, instead of having to be bit-bashed as I do it in mine.  I've checked the registers that it sends, and I am pretty sure I do all of them.  But I have a sneaking suspicion that there is still some subtle difference there.  So I might just try incorporating that into the MEGA65 target next.
 

1 comment:

  1. Excellent work ! Absolutely amazing, how you've worked yourself into the problem and it is great news that HDMI support is coming up so good. Thanks for the Update !!!

    ReplyDelete