Sunday 2 May 2021

Debugging the 32-bit virtual-register instructions

The MEGA65 has a function where you can use A,X,Y and Z together as a 32-bit virtual register, so that 32-bit operations can be done much less painfully.

For example, to add two 32-bit values, on a 6502 you need:

CLC
LDA val1+0
ADC val2+0
STA out+0
LDA val1+1
ADC val2+1
STA out+1
LDA val1+2
ADC val2+2
STA out+2
LDA val1+3
ADC val2+3STA out+3

That's a lot of instructions and CPU cycles, and plenty of chance to get copy-paste errors as you do the carry through the various bytes.

What would be nice, is to be able to do:

CLC
LDQ val1
ADQ val2
STQ out

And the MEGA65 makes this possible, by using special prefixes on various instructions. So to do the above, you put the "next instruction is a Q instruction" prefix (two NEG instructions) on the front of the normal version of the instruction, so LDQ val becomes:

NEG
NEG
LDA val


So our whole little 32-bit addition using Q would look like this fully expanded:

CLC
NEG
NEG
LDA val1
NEG
NEG
ADC val2
NEG
NEG
STA out

But you don't need to do this, because most C64 assemblers now support MEGA65's 45GS02 CPU, and will let you just do "ADQ $1234" etc.

So that's all great, except that the instruction implementation on the MEGA65 had some timing closure problems, as it took too long to get the A,X,Y and Z registers, potentially do some 32-bit operation on them with a long carry-chain, and then get the results back to the A,X,Y and Z registers again.

I started hacking away at fixing those problems, which then led to the need for a convenient test harness for verifying that the instructions work correctly.

I ended up writing this using CC65, with a little helper routine in assembly language that tests the instruction.  The helper routine looks like this:

  /* Setup our code snippet:
     SEI
     ; LDQ $0380
     NEG
     NEG
     LDA $0380
     ; Do some Q instruction
     CLC
     NEG
     NEG
     XXX $0384
     ; Store result back
     ; STQ $0388
     NEG
     NEG
     STA $0388
     ; And store back safely as well
     STA $038C
     STX $038D
     STY $038E
     STZ $038F
     CLI
     RTS
   */
unsigned char code_snippet[31]=
  {
   0x78,0x42,0x42,0xAD,0x80,0x03,0x18,0x42,0x42,0x6D,0x84,0x03,0x42,0x42,0x8d,0x88,
   0x03,0x8d,0x8c,0x03,0x8e,0x8d,0x03,0x8c,0x8e,0x03,0x9c,0x8f,0x03,0x60,0x00
  };
#define INSTRUCTION_OFFSET  9                
 

 Then to run a test, we can just mash the right values into $0380-$0387, and check the results in $0388-$038F (or $0384-$0387, if testing an RMW instruction):

  // Run each test
  for(i=0;tests[i].opcode;i++) {
    expected= tests[i].expected;
    // Setup input values
    *(unsigned long*)0x380 = tests[i].val1;
    *(unsigned long*)0x384 = tests[i].val2;
    
    code_buf[INSTRUCTION_OFFSET]=tests[i].opcode;
    __asm__ ( "jsr $0340");
    if (tests[i].rmw) result_q= *(unsigned long*)0x384;
    else result_q= *(unsigned long*)0x388;
    if (result_q!=expected) {
      snprintf(msg,64,"FAIL:#%d:$%02X:%s",
           (int)i,(int)tests[i].opcode,tests[i].instruction);
      print_text(0,line_num++,2,msg);
      snprintf(msg,64,"     Expect=$%08lx, Saw=$%08lx",expected,result_q);
      print_text(0,line_num++,2,msg);
      errors++;
    if (line_num>=23) {
    print_text(0,line_num,8,"TOO MANY ERRORS: Aborting");
    while(1) continue;
      }
    }
  }
  snprintf(msg,64,"%d tests complete, with %d errors.",
       i,errors);
  print_text(0,24,7,msg);

Then the last key part, was to make a simple way to define the tests. I do this using a struct in C, which makes life much easier to add new tests: Just add the appropriate single line to the tests block:

struct test tests[]=
  {
   // ADC - Check carry chain works properly
   {0,0x6d,"ADC",0x12345678,0x00000000,0x12345678},
   {0,0x6d,"ADC",0x12345678,0x00000001,0x12345679},
   {0,0x6d,"ADC",0x12345678,0x00000100,0x12345778},
   {0,0x6d,"ADC",0x12345678,0x00000101,0x12345779},
   {0,0x6d,"ADC",0x12345678,0x000000FF,0x12345777},
   {0,0x6d,"ADC",0x12345678,0x0000FF00,0x12355578},
   {0,0x6d,"ADC",0x12345678,0x0DCBA989,0x20000001},
   // EOR
   {0,0x4d,"EOR",0x12345678,0x12340000,0x00005678},
   {0,0x4d,"EOR",0x12345678,0x00005678,0x12340000},
   // AND
   {0,0x2d,"AND",0x12345678,0x0000FFFF,0x00005678},
   {0,0x2d,"AND",0x12345678,0xFFFF0000,0x12340000},
   // ORA
   {0,0x2d,"AND",0x12340000,0x00005678,0x00000000},
   {0,0x2d,"AND",0x12345600,0x00005678,0x00005600},
   // INC
   {1,0xEE,"INC",0,0x12345678,0x12345679},
   {1,0xEE,"INC",0,0x00000000,0x00000001},
   {1,0xEE,"INC",0,0x00FFFFFF,0x01000000},
   // DEC
   {1,0xCE,"DEC",0,0x12345678,0x12345677},
   {1,0xCE,"DEC",0,0x00000000,0xFFFFFFFF},
   {1,0xCE,"DEC",0,0x00FFFFFF,0x00FFFFFE},
   
   {0,0x00,"END",0,0,0}
  };

This made it all very nice and comfortable to test that the latest bitstream had fixed the known problems with those instructions (more tests for others need to be written still):

And to make sure I wasn't imagining things, I tried it out on an older bitstream that didn't have the corrections in it, and confirmed that it fails horribly, as expected:

So now we can write more tests for the rest of the Q instructions, and make sure that they are all fine.


No comments:

Post a Comment