Last night I didn't sleep solidly, so I got up and did a bit more work on implementing VIC-II sprites in the C65GS's VIC-IV.
The focus here is on implementing "normal" C64/C128/C65 sprites for existing software. As such the focus is not on adding new functionality to these sprites, in particular allowing more colours or more than 8 sprites (although I am planning to relax the 21 pixel high limitation to allow taller sprites, and if all goes well, I may also allow wider sprites).
Along with the SID chip, it is the sprites that really made the C64 stand out from its competition in the early 1980s. Therefore it is important that I get them right, and so far as possible implement all required functionality. So let's just go over what the sprites are, and how they work on the VIC-II/VIC-III (they behave identically on the C64/128 VIC-II and C65 VIC-III).
Basically the sprites are bitmap objects that are drawn either on top or behind the background graphics in real-time as the frame is drawn raster by raster. This is done with dedicated hardware support in the VIC-II/III chips that allows the user to simply provide the X and Y coordinates at which to display each sprite, and a pointer to the start of the bitmap data. There are also some special flags to modify the priority of the sprites with regard to the rest of the display, so that they can appear "in front" or "behind" the main graphics -- and this can be controlled separately for each sprite. There is also hardware detection for sprite-to-sprite and sprite-to-foreground collision that can be used in games to detect when things touch. Altogether, this allows much more advanced games and graphics on the 1MHz CPU of a C64 compared to contemporary machines. The cost of this flexibility and power is that the sprites consume about 3/4 of the space in the VIC-II, however history has shown that this was a great investment.
Amongst the 8 sprites, they have a fixed priority with respect to one another, so that lower numbered sprites will always appear in front of higher numbered sprites. This can be easily implemented by creating a pipeline of 8 identical sprite blocks that draw over the output of the previous sprite.
There is some circumstantial evidence to suggest that this is exactly what the VIC-II/III does, as there is a 12 pixel latency in its video pipeline, and it is reasonable to suspect that 8 of those cycles are for the 8 sprite compositing stages. Also, by staging the sprites in a linear pipeline, it is easier to meet the timing requirements, because the sprite signals need only move to the next sprite in the pipe-line, instead of all having to be gathered together in some other way, for example, a tree structure, although this would be possible. This is especially relevant for the C65GS where the video dot clock is running at 192MHz, and so I have to keep the logic depth shallow, and avoid dependencies on distant signals.
This pipeline is what I have managed to get working at present, as can be seen in the following screen shot:
There are a couple of obvious things:
1. The red sprite is visible over the top border. This is because I don't have border masking active for sprites. This will be easy enough to do, but I will defer it until I have finished the rest of the work on the sprites, as it is convenient in the meantime to see the sprites wherever they are.
2. The sprites are showing a solid block of colour. This is because I haven't implemented the fetching of the bitmap data by the VIC-IV, and feeding it into the sprite pipeline (more on this in a moment).
There are also some things not working that you can't see right now, for example foreground/background priority, and the hardware collision detection stuff.
However, what is clear is that the sprites do work, and the synthesis results show that by using the pipelined approach I described above, the timing of the design in the FPGA is no worse than before. The sprites themselves are currently consuming about 5% of the entire FPGA, which is quite acceptable. The complete design is now consuming about 42% of the FPGA.
Now, back to feeding bitmap data into the sprite pipeline. As I mentioned earlier, at 192MHz it isn't actually possible to feed data into (or extract data out of) all 8 sprites in parallel, because the logic depth and physical distance on the FPGA die becomes too great.
To get around this, I have constructed a data delivery pipeline that allows the VIC-IV to feed bitmap data to any of the 8 sprites, and it is forwarded by each sprite to the following sprite. Thus in return for a latency of 8 cycles, we can deliver bitmap data to any sprite without messing up the timing closure of the design.
This allows the VIC-IV to feed data to the sprites, however, it needs to know what address to fetch the data from.
One of the rather strange tricks the VIC-II used to reduce the number of registers in the design, is that a few bytes at the end of screen RAM are used to hold the data pointers to the sprites. The Y position within each sprite is then multiplied by 3 and added to the base address from this pointer to work out which 3 bytes need to be fetched and buffered in each sprite.
On the VIC-IV, the sprites exist outside of the main design due to the timing issues described above. Thus there has to be a third data pipeline that allows the sprites to tell the VIC-IV the Y position they are currently drawing. The VIC-IV can then fetch the required bytes, and pass them through the data pipeline.
All of these extra paths are plumbed through the sprite pipeline, but a few important pieces are not finished, but hopefully I will be able to get to these things done in the not too distant future.
After that, it will be time to implement the VIC-IV enhanced sprites, for which I have a few ideas.