Sunday 30 March 2014

Running synthmark64 before optimisation of CPU

I have a few CPU optimisations I plan to implement soon, primarily relating to fast instruction decode by making use of the 64-bit wide bus to fetch whole instructions in a single cycle*.

Thanks to the folks on #c-64, I found out about the synthmark64 benchmark program and summary.

Interesting to read through.  As can be seen, things like the SuperCPU and Chameleon are both around 20x faster than a stock C64, provided you don't touch IO.

I managed to get synthmark64 running on the C65GS in C64 mode, although not without some weird fiddling with the serial monitor to get it unstuck.  [Update: found the problem. I had left a CPU breakpoint enabled, which synthmark was triggering.  Cleaning that up lets it run properly without fiddling].

Once unstuck, all the tests run through repeatedly without further intervention.

This means I have a synthmark64 speed rating for the C65GS as it stands, before I optimise the CPU, which will allow me to properly evaluate the utility of the improvements I make.

[Update: Here is a screenshot after I fixed the CIA timer bug that was causing the display to show 37x instead of 18x, as discussed further down]

Display showing correct speed up after I fixed the CIA timer bug.
Before you look at the screen shot and go all googly-eyed over the prospect of the machine currently being 37x faster, and thus almost 2x the speed of the SuperCPU or Chameleon, be aware that there is a bug in the CIA where phi0 is being fed in at 500KHz, not 1MHz, and so the counters are running at half speed, which means you need to halve the results.

Nonetheless, this means the machine is already about as fast as the Chameleon or SuperCPU for most operations, despite the dreadful 2-cycles-per-byte-of-instruction decoder it is currently using.  And for IO operations it is much, much faster.

Don't forget to halve these values for real comparison, because a CIA clock bug makes the machine look 2x faster than it is!

The obvious thing to see here is that the C65GS doesn't slow down when it hits I/O, because the I/O all runs at 48MHz instead of 1MHz like on a real C64.

It is also interesting to see that the C65GS has a relatively fast JSR compared with the others.  This is probably because JSR causes three pushes to the stack, which are writes, and so don't incur any wait-states.

More on this as I implement the various improvements, and also get around to fixing that CIA clock bug.


No comments:

Post a Comment