Friday, 20 June 2014

A speed improvement

Reflecting on the recent benchmark results, and especially that the revision 9 Chameleon is almost exactly the same speed as the C65GS in the bouldermark benchmark, I wondered if there was any low-hanging fruit I could tackle to increase the speed of the C65GS.

The main slow-down with the C65GS is the wait-state on reading chipram.  I had tried various ways to supress the wait-state at its root cause in the FPGA dual-ported block RAM without luck.  Then it occurred to me this morning that I could make a single-port shadow RAM that shadows all of chipram.  So writing to chipram writes to both, and reads by the CPU would be sourced from the chipram -- with no wait-state.

So as a reminder of the state of affairs before todays improvements:


Removing the wait state on chipram by implementing the shadow RAM had quite a nice impact:


Functional calls are about 30% faster, and RAM operations in general are all moderately improved, as might be expected.  This also got bouldermark quite a bit faster.


In the process I realised what should have been obvious to me, that implied/accumulator mode single-byte instructions were still taking two cycles, and could be easily reduced to one cycle.  This makes NOPs run at an amazing 71x, and pushed the overall rating up a little to 26.9x:


BoulderMark now indicates just over 55x.  I am still at a loss why the machine is so much faster than a stock C64 for BoulderMark, but the same phenomena is visible with the latest version of the Chameleon that gets a rating of around 14,000 (see http://wiki.icomp.de/wiki/C64_Benchmarks).  That's a mystery that will have to remain for now.


In the meantime, I have a couple more ideas to improve performance that I will try.

No comments:

Post a Comment