Saturday, December 23, 2017

Fixing IRQ following BRK / BoulderMark score = 102x stock C64

The BRK instruction on the 6502 is effectively just a software-triggered IRQ. The only difference is that a special bit gets set in the processor flags when pushed on the stack, and that BRK increments the program counter by two.

I had been seeing a problem for a while now, where some instructions would not work reliably. I thought that I had a bug in the branch instructions, but it turns out the problem was in BRK, which is used in the Lorenz test programs for branches, to find out where the address a branch has gone to.  Basically these tests would sometimes fail. But only sometimes. Intermittent bugs can be a real pain, and this was no exception.

First, I instrumented the Lorenz test program for BNE to report the expected and actual branch address, and then to infinite loop in the test.  I noticed that it was always 2 bytes later than expected.  My first guess was that the program counter was being mistakenly incremented during a branch instruction under some special condition.  By accident, I discovered that it was dependent on an IRQ occuring. 

This really was a piece of luck, as I had disabled IRQs on a Nexys4 MEGA65, so that I could single-step without always ending up in the IRQ routine.  I found that I couldn't make the problem occur in that mode, even if I set the CPU free running at 50MHz overnight. 

This was a major clue to tracking down the problem, although I still thought the problem was in the branch instruction not setting some program counter control flag correctly. But then I remembered that BRK is a software interrupt, and if it got tangled up with a real IRQ, and I wasn't handling things properly, then things could go bad, in exactly the way that I was seeing.

So, after having bashed my head against this bug on and off for a few months, it tool two lines of VHDL, to make sure that an IRQ could never get confused with a BRK instruction.

What was particularly strange about this bug in the end, is that I had noticed that at some point along the line the BoulderMark benchmark for the C64 had stopped working. This is part of what made me suspect it was the branch instructions, as I could see that faulty branching could certainly make things crash -- although why it should only happen in some programs and not others was a mystery.

Now, the situation has been turned on its head: I have fixed the BRK instruction's interaction with IRQs, and not modified the operation of any other instruction, and suddenly BoulderMark is working again -- even though I have confirmed that it never uses BRK anywhere.  It is a bit dissatisfying, actually, to not know exactly why this has fixed BoulderMark, as it leaves a niggly fear that there might be some other subtle bug lurking still in all this.  But, for now I am just happy that BoulderMark runs again.  While the few changes to the CPU since it was last working stably have been relatively minor, it was still nice to see that the MEGA65 has now ticked over to yielding a BoulderMark score that is >100x that of a stock C64, as you can see below.