Making a C64/C65 compatible computer: VIC-IV

Showing posts with label VIC-IV. Show all posts

Saturday, 22 November 2014

Working on the anti-aliaser

I finally got around to plumbing the alpha-blender together to begin testing it for use with anti-aliasing text.

Not surprisingly, there are some things that aren't quite right. Primarily, Xilinx's reference implementation of an alpha blender doesn't quite seem up to the 192MHz pixel clock. In particular, the blue channel is failing to meet timing closure, and seems to be delayed by a few pixels, as can be seen in the screen shots here:

The displacement is a fixed number of physical pixels, as can be seen if I change the horizontal scaling of the text generator:

The timing failure isn't too surprising, given that the alpha blender uses a double-clock rate to multiplex the DSP operations to save a bit of silicon. However, it means that the blender is effectively running at 192MHz x 2 = 384MHz.

I think I will have to modify the blender so that it doesn't need to multiplex DSP blocks, and can hopefully then meet timing, and avoid the colour planes separating like this.

Wednesday, 19 November 2014

Proportional font glitches fixed

It is almost a week since I fixed some underlying faults in the hardware support for drawing proportional fonts, together with some bugs in my font file generator that was trimming the right-most pixels from some glyphs.

However, it is only tonight that I have managed to get the display working, because the character generator logic in the VIC-IV would seize up when trying to display variable-width full-colour characters. I eventually tracked the bug down to the paint_ready flag never being cleared if the VIC-IV was asked to draw a one-pixel wide 256-colour character.

This came about because the logic previously assumed that there would always be a 2nd-to-last pixel in a character, which is clearly not true when a character is trimmed to be just a single pixel wide.

Now that I have that fixed, I can finally draw variable-width anti-aliased* fonts. The following two screen shots are of the same Unicode string being rendered using 10 point and 20 point versions of the same font. You can get an idea of the size of characters from the grid of junk at the bottom of each.

Again, the matrix-like tint is due to the colour-cube approximation of the VNC viewer I used to capture the image. The characters look pure white on the VGA display.

Obviously the 20 point version (top) looks nicer than the 10 point version. That said, to my wooden eye, they both look pretty decent. The kerning looks acceptable, and both ascenders and descenders on characters are drawing fine.

I haven't optimised the unicode printing routine code at all yet, so it is still fairly slow to draw. Nonetheless, I can draw those several words in well under a frame, as can be seen in the following screenshot where I set the border to white while the drawing happens:

Earlier I said that I can now draw anti-aliased characters. This is currently true only if I want to consume the entire 256-colour palette on fake alpha-values. As discussed in earlier posts, I am part-way through implementing a real alpha-blender that will allow each character to be a different colour, and appear on a different background.

I also need to get around to implementing variable height characters, so that there are no large blank regions between lines when the upper rows of pixels in the top-most row of characters of a font are unused.

The other thing I hope to work on soon is to update the little programme I wrote to display the unicode text so that it can access fonts located outside of the first 64KB of RAM. This will use the new 32-bit indirect zero-page addressing modes. Then I will be able to display text from different fonts on the same display.

Monday, 10 November 2014

Even more progress on proportional fonts

Hot on the heals of the earlier fixes of the evening, descenders (the bits of characters that hang below the base-line) and bits poking out the top are now correctly draw as can be seen in the screen-shot below.

Along the way I have fixed bugs in my unicode printing programme, and also a couple of bugs in my true-type font rasteriser (the top pixel of characters could be chopped off, and character tiles were being written to the file in the wrong order).

As with the other screen-shots, the Matrix-like green tint is an artefact of the image capture process I am using, and the bad kerning is pending a patch to the FPGA so that it stops ignoring the least significant bit of the kerning field of each character.

Once the horizontal kerning has been fixed, that just leaves the vertical spacing to fix. This will probably require both FPGA and more font-file format tweaking.

The above image is captured the same way as the one in the previous post. Then I realised that my VNC viewer was scaling the image, so below is the same image really without any scaling:

Just for further fun and to prove that it works for larger font sizes, I produced a 24 point version of the same font to try out:

This revealed a few new glitches, like why is the "t" in jupiter mostly missing. There are also some other issues, like the width of the right vertical of "H", Alef (the right-most Hebrew character) that I will have to look into.

The disappearing "t" turned out to be a quick fix, so the final result for the night is as follows:

Note also that with a big font the suppression of the spare blank raster lines becomes less important. I still intend to fix it, however.

Sunday, 9 November 2014

More work on proportional fonts

I have done a bit more work on the unicode text display programme.

The following screen-shot is completely generated by the programme, in contrast to the screen-shot in the previous post where I manually populated the screen memory with the tiles and kerning adjustments.

The Unicode string used to feed the program is:

unicodestring:
.word 'H,'e,'l,'l,'o,$20,'W,'o,'r,'l,'d
.word $000d
.word $05d4,$05d3,$05d2,$05d1,$05d0
.word $000d
.word 'g,'a,'r,'y,$20,'j,'u,'p,'i,'t,'e,'r
.word $000d
.word 0

Character $20 is of course space, and the $05xx characters are some Hebrew glyphs I threw in just to show that we are not limited to latin characters.

So, how did it turn out?

The screen shot is actual size so that there is hopefully no strange things going on in terms of smoothing of the zoomed image.

The slight green sheen is an artefact of the C65GS VNC server which sends pixels using a 3x3x2 colour cube, and so suffers some colour distortion. This doesn't need fixing since it is purely a VNC artefact.

We still have the kerning glitch, because the VIC-IV isn't honouring the least significant bit of the kerning field, so H and W are followed by one more blank pixel than they should. This is on my list to fix in the FPGA programme. I'll likely rearrange the bits so that the kerning bits are all adjacent, instead of spread between two different bytes (one of which is clearly not being read properly).

The programme has, however, faithfully rendered the first line of Latin and Hebrew Unicode characters, including H and W that are two tiles wide.

It also handles the carriage-returns, however, it doesn't correctly calculate how many rows of characters are needed, nor does it allow for reducing the height of character rows to provide correct line spacing. This will require implementing the remaining hardware support for this in the VIC-IV.

Also, in the character rows that don't contain active non-blank glyphs, it isn't putting a blank tile there, nor is it adjusting the width of each to kern them to the correct width, so there is rubbish which is wider than the actual text on the 2nd and 3rd rows of each line of output. This won't be too hard to fix, as it is just a software issue.

Finally, the third row is really messed up, because the line of characters that are drawing the under-hanging pixels are being written over the main row, instead of being written into the next row down. This shouldn't be too hard to fix either, as it is just a software issue.

Also, it would be nice if the routine cleared the remainder of the screen, but that's really just icing on the cake.

So for now we have a nice bit of progress, and I might take a look at fixing the kerning and rows of junk bugs next.

Wednesday, 5 November 2014

Starting to write tests for 16-bit text mode, including proportional fonts

I have been bashing away at a little test program that demonstrates printing unicode strings using a proportional font.

This involves looking up the glyph in the font, then getting its tile map, and then building the screen lines to draw and several other steps. While none of these steps are too complicated, it does make for about 1KB of code, including switching screen modes and setting the palette to a grey gradient since I don't have the alpha-blender working yet.

I could have settled for just a simple hard-coded test, but I think it is worth exercising the whole idea of how I intend to draw proportional fonts to make sure that it is feasible.

Anyway, it's got late, and I haven't got the code working yet, but I did spend a few minutes hand picking the necessary tiles and setting the kerning values to narrow the characters down so that it looks half-decent.

The main visual glitch is that single pixel resolution of the kerning is not working for some reason, so the gap between "H" and "e" is one pixel too wide.

Also, the colour cube that the VNC viewer uses butchers the gradient, and so some of the shading looks weird. On the real screen this looks quite a bit better. Nonetheless, the result is pretty reasonable for a first go at it, and will certainly look nicer once I figure out why it is ignoring the least significant kerning bit so that there is no big gap between "H" and "e".

Tuesday, 4 November 2014

More work on proportional fonts

I am edging my way towards getting hardware proportional fonts working.

To recap progress to date, text mode can be switched to 16-bit mode, where two screen RAM bytes and two colour RAM bytes describe each character instead of one of each. Some of those extra bits can be used to specify the width of a character, between 1 and 8 pixels wide. Thus proportional fonts can be constructed using some full-width characters and some narrowed characters.

I have now written a little programme that takes a true-type font, and produces a font file composed of 8x8 character blocks. It isn't perfect, but it does create a simple file with a list of unicode points, tile maps that say which tiles go to make each glyph, and then the array of 8x8 tiles, 64 bytes each. You can see the source at:

http://github.com/gardners/c65gs-font-rasteriser

It doesn't do any compression of the 64-byte blocks, so the fonts are quite a bit bigger than they need to be. However, it should be very easy to write some 6502 assembler that given a 16-bit unicode string, can setup a 16-bit text mode screen to display the text.

Simultaneously, I have been working on the anti-alias renderer for 8x8 tiles. It isn't done yet, but the alpha blender is in the design now. Assuming that it works, it shouldn't be too hard to plumb it in, and start displaying alpha-blended 8x8 character tiles.

Friday, 31 October 2014

More work towards hardware proportional fonts

I have spent a bit more time tonight working on hardware support for proportional fonts.

For those coming in late, the VIC-IV already has the ability to draw skinny characters 2, 4 or 6 pixels wide as well as the usual 8. This can be used to construct large characters from one or more 8x8 character blocks to make any even number of pixels in width on screen. For a large type face, each character may be several 8x8 character blocks wide.

This means that a row of proportional text may have a variable number of characters, because if there are skinny characters, then more will fit on a line. Conversely, if there is no text on the right of the display, then it doesn't make sense to waste RAM describing empty characters. Thus I have followed Jeremy's idea of implementing a special end of line marker, so that each row can differ in length, and we can hopefully use RAM much more efficiently when faced with large high-resolution text displays.

In the previous post I describe the work on skinny characters.

Now I have just about finished implementing the end of line markers, although as I write there is one remaining bug which is quite obvious in the screenshot below in the form of the vertical bars that shouldn't be there:

At first glance, this looks mostly like a normal C64 text mode display. However the entire screen is described using only about 80 bytes each of screen and colour RAM:

The screen RAM:

:0400 01 C0 02 00 03 00 04 00 05 00 FF FF 06 00 07 00
:0410 08 00 FF FF FF FF FF FF FF FF FF FF 09 00 0A 00
:0420 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
:0430 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
:0440 0B 00 FF FF 0C 00 FF FF 0D 00 FF FF

The colour RAM:

:D800 00 01 02 03 04 05 07 01 02 02 02 02 02 0E 0F 0E
:D810 0E 0E 0E 0E 00 00 00 00 02 03 04 05 02 08 01 02
:D820 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E
:D830 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E
:D840 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E
:D850 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E

To follow what is going on, remember that in this mode ($D054 = $01), two bytes are used to describe each character. The first byte is the low 8 bits of the character number, and the low nybl of the second byte are extra character number bits. The top two bits of the second byte set the width of the character as it is displayed on screen.

Thus $01 $C0 at $0400 draws only the left two pixels of the letter A, which shows up as the stumpy black line in the screen shot. The rest of the row of text is now offset by 6 pixels compared to normal.

Normal characters are encoded from $0402 - $0409. This is followed by $FF $FF which tells the VIC-IV that there is an early end of line. Thus the letter F described by $06 $00 at $040C-$040D appears at the beginning of the next line, and no other characters appear to the right of the letter E.

Colour RAM is drawn in a somewhat strange way, that I will probably fix. Within a row, the colour RAM bytes are read one per character, and so $D801 has $01 (white), and this is applied to the letter B encoded in $0402-$0403. That seems quite reasonable. But following an end of line, the colour RAM address catches up with the screen RAM. What I intend to do is make the colour RAM address advance two bytes for each character, so that the extra byte of information can be used. I might use this to allow skinny characters to be odd widths, and also to have a kind of super-extended-background-mode, where the other bits select the background colour.

However, before I do any of that, I need to fix the bug that happens when a character row consists only of a $FF $FF end of row marker. In that case the length of the character data is incorrectly set to the maximum value, instead of zero, and so whatever rubbish was hanging around in the character raster buffer gets redrawn.

Although, as I write that, I am not entirely convinced that this is the whole story. Indeed, it seems that the three bars are the contents of $3894, $300C, and $80CB. Very strange.

Hopefully my little bug-fix will work, otherwise I will just specify that each row must have at least one character, so you would use something like $20 $00 $FF $FF to make a row empty with one space at the beginning.

Thursday, 30 October 2014

Hardware support for proportional fonts and other text mode effects

I was talking with Jeremy in the lab today, and we got talking about hardware support for proportional fonts on the C65GS. I had thought about doing this before, but had put it on the back-burner for a while, because I hadn't really come up with an elegant solution.

While we were talking today, however, we got talking about how I had 2 spare bits in the character number in 16-bit text mode, where two screen RAM bytes are used for each character. I don't remember exactly who came up with which part of it, but by the end of it, we had come up with a workable solution not only for hardware support for proportional fonts, but anti-aliased fonts, too (more on that in a future post).

Proportional fonts really just requires specifying the width of each character, so that some can be narrower than others. With two spare bits, it was easy enough to allow characters to be 8, 6, 4 or 2 pixels wide. Characters more than 8 pixels wide can then be constructed with any number of 8 pixel wide characters, and one narrow character to make it easy to obtain any even number of pixels in width. It would be nice to allow odd widths, but it feels like it is a reasonable trade-off to have to round to the nearest 2 pixels.

Similarly, tall fonts can be constructed with multiple rows of text. For fonts not a multiple of 8 pixels high, a raster split could be used to skip one or more rows of pixels.

The result is conceptually very simple. The main trade-off is that skinny characters still use the same amount of RAM as full-width ones, but that seems a reasonable trade-off. In fact, it was so simple that I was able to implement it in about an hour, and the main functionality worked first time, as can be seen below:

Each row has a different width specified for the "A" at the beginning of the line.

To use this feature, you first have to enable 16-bit text mode, where two bytes of screen memory describe each character on screen. This is done by setting bit 0 in $D054.

In terms of screen RAM, the memory for the rows here looks something like:

0400 01 00 02 00 03 00 04 00 05 00
0428 01 40 02 00 03 00 04 00 05 00
0450 01 80 02 00 03 00 04 00 05 00
0447 01 c0 02 00 03 00 04 00 05 00

The 01, 02 ... 05 is the character numbers for A through E, and these stay the same. For all but the A's, the 2nd byte is 00 to indicate that no special attributes are set for the characters B through E. However, the high-byte for the A characters is modified in each row to have all possible combinations in bits 6 and 7: the higher the value, the narrowerer the character.

While I am fiddling with text attributes, I should explain what the other bits in the high byte mean:

bits 0 - 3 = bits 11 - 8 of the character number. i.e., there can now be 4,096 characters in a character set.
bit 4 = flip character horizontally
bit 5 = flip character vertically

The following screen shot shows the flip bits in action:

The contents of screen RAM is:

0400 01 00 02 10 03 20 04 30 05 00

The ability to flip characters is designed to be used with full-colour text mode, where (some or all) characters on the screen consist of 64 8-bit pixels, providing a graphics mode that can be quickly scrolled.

Flipping characters in such a mode allows 64-byte characters to be reused in a graphical display without too much obvious repetition, e.g., for in textures in games.

Combining this with variable width characters introduces even more opportunity to reuse characters, and thus allow more interesting and complex high-resolution graphics within the limits of the 128KB of chipram.

Wednesday, 29 October 2014

Hardware thumbnail generation is now usable

Today I had a chance to fix a few bugs with the hardware thumbnail generation, and actually test it out by writing a small program that does a raster split, with the thumbnail being draw in 256-colour mode in the bottom corner of the screen every frame, as you can see below:

There are still a few glitches, but is is working pretty nicely. In particular, you can see that it is being drawn in real-time, because the thumbnail contains an image of the thumbnail that contains an image of the thumbnail :)

Things you can't see, is that the thumbnail is a bit different every frame, I think because the counter that decides which raster to look at isn't reset at the start of each frame. This causes some weird things to happen. Also, the thumbnail just contains the value of chosen pixels. I am considering changing this so that it shows the average of the pixels in the sample area of the raster line, but at the same time, the current scheme works fairly well.

This is a nice milestone for a few reasons.

First, the hardware thumbnail generator is clearly working to some reasonable degree.

Second, full-colour text mode is working fairly well as well, such that I could write this program.

And finally, I have actually written a programme that does something (slightly) useful, using C65GS special features, and it works :)

Next stop is to fix the counter problem, and see if I am then happy enough with it, and if so, to move on to some of the other interesting things in my queue, like enhanced sprites.

Thursday, 23 October 2014

Hardware thumbnail generator for task-switcher

One of the main reasons for implementing the hypervisor is so that it will be possible to switch between different tasks running on the machine. The tasks won't be running at the same time, but rather they will be suspended while another task is running.

For a task-switcher to be nice, it would be really handy to be able to show a low-res screen-shot of the last state of each task so that the user can visually select which one they want. In other words, to have something that is not too unlike the Windows and OSX window/task switcher interfaces.

However, this is tricky on an 8-bit computer that has no frame buffer, and may be using all sorts of crazy raster effects.

Thus I need some way to have the VIC-IV update a little low-res screen shot, i.e., a thumbnail image, that the hypervisor can read out, and retain for later task-switching calls to show the user what was running in each task before they were suspended.

So I set about implementing a little 4KB thumbnail buffer which is automatically written to by the VIC-IV, and which can be read from the hypervisor. This resolution allows for 80x50, which should be sufficient to get the idea of what is on a display. Each pixel is an 8-bit RRRGGGBB colour byte.

Because the VIC-IV writes the thumbnail data directly from the pixel stream, it occurs after palette selection, sprites and all raster effects. That is, the thumbnails it generates should be "true".

After a bit of fiddling around, it is mostly working.

To test it, I wrote a little BASIC programme that reads from the one-byte access to the 4KB buffer, copying it to $4000-$4FFF. Then I used the serial monitor to grab that copy of the data, and wrote some UNIX shell scripts and a little C programme to munge it into an 80x50 Windows BMP file.

Here is how it looks, with the image rather enlarged to make it easier to see:

While not perfect, it is an improvement on the first capture, where I forgot to read from the start of the thumbnail buffer, so it was all out of whack:

In need to find out what is causing the "clouds", and also why it is writing only 77 pixels per line instead of 80 pixels per line.

But other than these problems, I am well on the way to being able to present a nice graphical display to allow for switching between tasks from the hypervisor.

Thursday, 16 October 2014

Sprites behind the border, and another bug discovered

Our almost-4yo went to sleep on the way home at 16h30 today, and so as a result is now up at 02h00. While I'd rather be sleeping, being up with him for a while gives me the chance to try the latest change that I left synthesising when I went to bed. That change was to make the VIC-II sprites honour the border.

My favourite way to test sprites at the moment is to run Lemmings. This confirmed that the sprites were now honouring the border. I also finally remembered the controls for Lemmings to start a game, and was pleasantly surprised to find that the game works, with little lemmings walking around the place as they should. The game is raster interrupt driven, so the speed was more or less correct as well, as you can see from the following screen shot:

I also learned two extra things:

1. Lemmings apparently uses sprites for the main display.

2. I have a bug where the bottom row of each sprite appears first.

I was also unable to see the cross-hairs, which I assume must be done with characters or bitmap data.

A quick check in VICE confirmed that this is indeed how the cross-hairs are drawn. So now I need to find out what is going wrong with this on the C65GS. I do at least now know that it is in characters $FC and $FD, and the screen is at $4000 for half the frames.

A quick bit of poking around has revealed the problem: I haven't implemented sprite background priority yet, so the sprites are hiding the cross hairs.

In theory, I should be able to use the joystick to move the cross-hairs to a blank section so that I can see it, however, for some reason joystick control isn't working. Maybe I have messed up the joystick CIA input in some way. I'll have to investigate this further, along with the sprite display problem.

Tuesday, 14 October 2014

More work on sprites

I don't have any nice screen shots to put in here (but I might add some in later), but I have been working on VIC-II sprites.

These sprites are now displaying properly, apart from the lack of border/foreground priority and hardware collision detection. Sprite positions are now correct with regard to the text/bitmap screen.

Unfortunately, adding the extra logic to the VIC-IV memory access paths has thrown FPGA timing closure out the window for now.

The 192MHz pixel clock requires timing within about 5.1ns, but is currently sitting around 7.3ns. It is an amazing testimony to the Artix7 FPGAs that the system still seems to run flawlessly. This is partly because the FPGA is speed rated for operation at 85 degrees Centigrade and an operating voltage of 0.95 Volts instead of the nominal 1.00 Volt supply.

In any case, I want to get the timing at least close to meeting closure (i.e., being fast enough), so that I can avoid problems later, and also to make sure that everything else that I want to add will still fit.

My approach to this at the moment is to unify the VIC-II compatibility sprite data fetches so that there is only one extra data stream that has to be plugged into the chipram/fastram. I am part way through this, and have already improved timing to about 6.6ns, and it looks like it shouldn't be too hard to further improve on this.

I have also started thinking about the design for the new sprites. This is all subject to change, but here is what I am thinking about at the moment:

The basic design of the new VIC-IV sprites, is that each sprite will have a dedicated 4KB memory buffer, and will be strictly one byte per pixel. This allows for sprites of up to 64x64 256 colour pixels.

Like with the VIC-II, one physical sprite can be used multiple times on a frame without reloading the data by altering the data offset within the 4KB block, and possibly the height and width of the sprite. I am also thinking about allowing sprites to be much wider.

Foreground/background priority will be by applying a bit mask to the character/bitmap data to decide whether it should appear in front of the sprite or behind the sprite. This will allow sprites and the background to perform many of the functions of Amiga-style bit planes, although the way it will be done will be rather different.

Bit masks are also provided to allow modification of the colours of sprites. For example applying and AND mask of $1f and an OR mask of $80 will translate all colours to $80-$9F. This can be used to allow a common image to be used for different characters in a game, with selected colours being altered. The 256 colour sprite palette can be separated from the bitmap palette, so there is improved flexibility compared to just applying bit masks to a flat 256 colour palette shared by all on-screen elements. If I get really excited it might even be possible to use the other two 256 colour palettes for different sprites.

Finally, I intend to provide hardware scaling and rotation support. I thought about having simple angle and zoom factor settings, but currently I am thinking that I will simply provide a linear 2D transformation matrix per sprite so that other effects can also be used.

The registers for the VIC-IV sprites are currently planned to live at $D710-$D7FF, allowing for up to 15 of these sprites, but there may end up being less than these depending on how many I can wrangle in.

All this is subject to change, as is the register map, but here is the structure I am currently looking at:

$D7x0-$D7x1 - Enhanced sprite X position in physical pixels (lower 12 bits)
$D7x1.4-7 - Enhanced sprite width (4 -- 64 pixels)
$D7x2-$D7x3 - Enhanced sprite Y position in physical pixels (16 bits)
$D7x3.4-7 - Enhanced sprite height (4 -- 64 pixels)
$D7x4 - Enhanced sprite data offset in its 4KB SpriteRAM (x16 bytes)
$D7x5 - Enhanced sprite foreground mask
$D7x6 - Enhanced sprite colour AND mask (sprite not visible if result = $00)
$D7x7 - Enhanced sprite colour OR mask
$D7x8-$D7x9 - Enhanced sprite 2x2 linear transform matrix 0,0 (5.11 bits)
$D7xA-$D7xB - Enhanced sprite 2x2 linear transform matrix 0,1 (5.11 bits)
$D7xC-$D7xD - Enhanced sprite 2x2 linear transform matrix 1,0 (5.11 bits)
$D7xE-$D7xF - Enhanced sprite 2x2 linear transform matrix 1,1 (5.11 bits)

The attentive reader will note that nowhere does this address the 4KB data blocks for each sprite. This will be direct mapped in the 28-bit address space. I am tossing around the idea of over-mapping it with the 64KB colour RAM at $FF80000 (the first 1KB of which is also available at $D800 for C64 compatibility). The reason for this is that the 4KB sprite RAM will probably be write-only to simplify the data plumbing. However, to allow for freezing (and hence multi-tasking), I really do want some way to read the sprite data. The trade-off of course is that this means that you wouldn't be able to use all 64KB for colour RAM if it also being used as a proxy to the sprite RAM data.

Monday, 6 October 2014

Initial work on sprites

Last night I didn't sleep solidly, so I got up and did a bit more work on implementing VIC-II sprites in the C65GS's VIC-IV.

The focus here is on implementing "normal" C64/C128/C65 sprites for existing software. As such the focus is not on adding new functionality to these sprites, in particular allowing more colours or more than 8 sprites (although I am planning to relax the 21 pixel high limitation to allow taller sprites, and if all goes well, I may also allow wider sprites).

Along with the SID chip, it is the sprites that really made the C64 stand out from its competition in the early 1980s. Therefore it is important that I get them right, and so far as possible implement all required functionality. So let's just go over what the sprites are, and how they work on the VIC-II/VIC-III (they behave identically on the C64/128 VIC-II and C65 VIC-III).

Basically the sprites are bitmap objects that are drawn either on top or behind the background graphics in real-time as the frame is drawn raster by raster. This is done with dedicated hardware support in the VIC-II/III chips that allows the user to simply provide the X and Y coordinates at which to display each sprite, and a pointer to the start of the bitmap data. There are also some special flags to modify the priority of the sprites with regard to the rest of the display, so that they can appear "in front" or "behind" the main graphics -- and this can be controlled separately for each sprite. There is also hardware detection for sprite-to-sprite and sprite-to-foreground collision that can be used in games to detect when things touch. Altogether, this allows much more advanced games and graphics on the 1MHz CPU of a C64 compared to contemporary machines. The cost of this flexibility and power is that the sprites consume about 3/4 of the space in the VIC-II, however history has shown that this was a great investment.

Amongst the 8 sprites, they have a fixed priority with respect to one another, so that lower numbered sprites will always appear in front of higher numbered sprites. This can be easily implemented by creating a pipeline of 8 identical sprite blocks that draw over the output of the previous sprite.

There is some circumstantial evidence to suggest that this is exactly what the VIC-II/III does, as there is a 12 pixel latency in its video pipeline, and it is reasonable to suspect that 8 of those cycles are for the 8 sprite compositing stages. Also, by staging the sprites in a linear pipeline, it is easier to meet the timing requirements, because the sprite signals need only move to the next sprite in the pipe-line, instead of all having to be gathered together in some other way, for example, a tree structure, although this would be possible. This is especially relevant for the C65GS where the video dot clock is running at 192MHz, and so I have to keep the logic depth shallow, and avoid dependencies on distant signals.

This pipeline is what I have managed to get working at present, as can be seen in the following screen shot:

There are a couple of obvious things:

1. The red sprite is visible over the top border. This is because I don't have border masking active for sprites. This will be easy enough to do, but I will defer it until I have finished the rest of the work on the sprites, as it is convenient in the meantime to see the sprites wherever they are.

2. The sprites are showing a solid block of colour. This is because I haven't implemented the fetching of the bitmap data by the VIC-IV, and feeding it into the sprite pipeline (more on this in a moment).

There are also some things not working that you can't see right now, for example foreground/background priority, and the hardware collision detection stuff.

However, what is clear is that the sprites do work, and the synthesis results show that by using the pipelined approach I described above, the timing of the design in the FPGA is no worse than before. The sprites themselves are currently consuming about 5% of the entire FPGA, which is quite acceptable. The complete design is now consuming about 42% of the FPGA.

Now, back to feeding bitmap data into the sprite pipeline. As I mentioned earlier, at 192MHz it isn't actually possible to feed data into (or extract data out of) all 8 sprites in parallel, because the logic depth and physical distance on the FPGA die becomes too great.

To get around this, I have constructed a data delivery pipeline that allows the VIC-IV to feed bitmap data to any of the 8 sprites, and it is forwarded by each sprite to the following sprite. Thus in return for a latency of 8 cycles, we can deliver bitmap data to any sprite without messing up the timing closure of the design.

This allows the VIC-IV to feed data to the sprites, however, it needs to know what address to fetch the data from.

One of the rather strange tricks the VIC-II used to reduce the number of registers in the design, is that a few bytes at the end of screen RAM are used to hold the data pointers to the sprites. The Y position within each sprite is then multiplied by 3 and added to the base address from this pointer to work out which 3 bytes need to be fetched and buffered in each sprite.

On the VIC-IV, the sprites exist outside of the main design due to the timing issues described above. Thus there has to be a third data pipeline that allows the sprites to tell the VIC-IV the Y position they are currently drawing. The VIC-IV can then fetch the required bytes, and pass them through the data pipeline.

All of these extra paths are plumbed through the sprite pipeline, but a few important pieces are not finished, but hopefully I will be able to get to these things done in the not too distant future.

After that, it will be time to implement the VIC-IV enhanced sprites, for which I have a few ideas.

MEGA65 Links