Sunday, 10 December 2017

Automatic 4502 / 6502 Instruction Set Switching

We have known for a long time, that we need to support 6502 illegal opcodes on the MEGA65.  Initially, we thought that this would affect only a very small percentage of C64 software, however, it seems that a reasonable fraction of software has trouble with illegal opcodes. Perhaps it is one of more of the common decrunch routines.  Or it could just be that we have some subtle bug in our 4502 implementation that means some 6502 instructions sometimes go astray -- even though we pass the runnable 6502 instruction test suite for all official op-codes.

In any case, we know we need to add "6502 mode" to our CPU.  In fact, the bulk of the work was done quite some time ago, but I am only just now getting around to testing it.

Give or take getting the precise behaviour of the illegal op-codes correct, it wasn't too hard to add a second personality to the CPU: It is just some of the instruction fetch and decode logic that needed to be duplicated. Then the CPU needed a flag to indicate which mode to be in.

Then it should just be a case of working out when we are in C64 mode versus C65 mode, and setting the CPU mode accordingly, right? Unfortunately not.

This is because the C65's C64-mode KERNAL ROM uses 4502 opcodes to work out whether it should talk to the internal 1581/1565 drive, or to a drive on the IEC bus.

There are a few key parts of the routine we need to worry about (this is from t he 910111 version of the C65 ROM):

$F72C - Context switch to C65 DOS
$F83E - Context switch back from C65 DOS on return from DOS call

The context switch to C65 DOS and back are fairly similar, and worth a quick look. First, context switching to the C65 DOS:

F72C   78         SEI
F72D   48         PHA
; C65 IO / VIC-III mode enable sequence
F72E   A9 A5      LDA #$A5      
F730   8D 2F D0   STA $D02F
F733   A9 96      LDA #$96
F735   8D 2F D0   STA $D02F     
; set bit 6 in $D031 to put CPU at 3.5MHz
F739   A9 40      LDA #$40
F73A   0C 31 D0   TSB $D031     
; bank in $C000 interface ROM and remove CIAs from IO map
; so that 2KB of colour RAM is visible $D800-$DFFF
F73D   A9 21      LDA #$21
F73F   0C 30 D0   TSB $D030
; Save registers from C64 mode, so that they can be restored
; on return
F742   68         PLA
F743   8D F6 DF   STA $DFF6
F746   8E F7 DF   STX $DFF7
F749   8C F8 DF   STY $DFF8
F74C   9C F9 DF   STZ $DFF9
; Now pull the return address from the stack, and save that, too.
F74F   68         PLA
F750   8D FB DF   STA $DFFB
F753   68         PLA
F754   8D FC DF   STA $DFFC
; Remember what the stack pointer was
F757   BA         TSX
F758   8E FF DF   STX $DFFF
; Rearrange memory map:
; Map $0000-$1FFF to $10000-$11FFF 
; Map $8000-$BFFF to $20000-$23FFF 
; (C64 KERNAL stays visible at $E000-$FFFF)
F75B   A9 00      LDA #$00
F75D   A2 11      LDX #$11   ($0000+$10000)
F75F   A0 80      LDY #$80
F761   A3 31      LDZ #$31   ($8000+$18000)
F763   5C         MAP        ; activate new map
; We are now in C65 DOS memory map
; Set stack pointer to $1FF
F764   A2 FF      LDX #$FF
F766   9A         TXS
; Load the saved return address, and put it into the C65 DOS
; stack
F76A   48         PHA
F76E   48         PHA
; Restore all the saved registers
F76F   AD F6 DF   LDA $DFF6
F772   AE F7 DF   LDX $DFF7
F775   AC F8 DF   LDY $DFF8
; Return to the return address that had just been copied to this stack
F77B   60         RTS

The general procedure of this routine is quite interesting. The last few bytes of colour RAM (it is 2KB long on the C65, and is visible $D800-$DFFF in when the CIAs are banked out of the way) are used as a scratch transfer area.  The contents of the registers are saved, so that they are available to the DOS function that has been called. The only piece of mild gymnastics going on is the way that the return address of the caller is copied from the C64 stack to the C65 DOS stack.  The C64 KERNAL is kept visible, so that the RTS will continue in the C64 KERNAL routine at that point, which then makes the indirect jump into the C65 DOS to do the necessary work.  There is no risk of interrupts happening while in this mode, as IRQs and NMIs are both disabled by the MAP instruction, until a NOP instruction is executed.

This routine takes 21 ~1MHz clock cycles (26 including the JSR) before the CPU is switched to 3.5MHz.  Then 85 more clock cycles at 3.5MHz. The total cost of switching to the C65 DOS context is thus 50 micro seconds.

The return routine is similar, essentially reversing the process, although the handling of the two different return addresses at $DFFB/C versus $DFFD/E is still not entirely clear to me.

; Pop return address of caller and save, ready for copying to C64
; stack.
F83E  68          PLA
F842  68          PLA
F843  8D FE DF    STA $DFFE
; restore C64 memory map and stack pointer for the original
; C64-context caller (61 cycles at 3.5MHz)
F846  20 7C F7    JSR $F77C
; clear bit 7 in $C0, indicating not a C65 internal drive
F849  77 C0       RMB7 $C0  
F84B  6B          TZA
F84C  10 0C       BPL $F85A
F84E  A9 00       LDA #$00
; set bit 7 in $C0, indicating a C65 internal drive
F850  F7 C0       SMB7 $C0 
; Push return address back onto C64 stack 
F855  DA          PHX
F859  DA          PHX
; Set bits in $90 (status) if required from A
F85A  04 90       TSB $90   
; Restore CPU registers
F85C  AE F7 DF    LDX $DFF7
F85F  AC F8 DF    LDY $DFF8
F862  AB F9 DF    LDZ $DFF9
F865  AD F6 DF    LDA $DFF6
; bank out $C000 ROM and bank CIAs back in.
F868  48          PHA
F869  A9 21       LDA #$21
F86B  1C 30 D0    TRB $D030
; return CPU to 1MHz. 
F86E  A9 40       LDA #$40
F870  1C 31 D0    TRB $D031
; return to VIC-II mode
F873  8D 2F D0    STA $D02F
; Re-restore Accumulator
F876  68          PLA
; Re-enable IRQ & NMI after MAP change made in call to $F846
F877  EA          EOM       
F878  58          CLI
F879  18          CLC
F87A  60          RTS

Because the C65 IO mode and 3.5MHz CPU mode is already active, the cost is 56 cycles at ~3.5MHz = ~16 micro seconds for the call to $F77C, plus 70 cycles at 3.5MHz (= ~20 micro seconds) and 15 cycles at ~1MHz for the routine itself.  The total cost for switching from C64 mode to C65 DOS and back again -- without actually doing any work is thus 50 microseconds for the switch to the C65 DOS context, plus 16 + 20 + 15 = 51 microseconds to switch back, i.e., ~0.1 millisecond.  (I have counted both code paths for the stack restoration, as I am not entirely sure what is going on there.  However, that only makes about 5.5 microseconds difference.)

This convoluted process explains why the C65 DOS is so incredibly slow, achieving less than 2KB/second, even though the internal floppy drive can theoretically read at 30KB/sec, as there is a complete and time-consuming context switch whenever a byte is read from the internal floppy drive. I've been tempted to patch the C65 DOS to either support an efficient LOAD system call, or to implement some sort of buffering for sequential reads.  I've even thought about teaching the CPU about these context switch routines, and basically having a special case for them, that does the context switch in just one or a few cycles. However, that is a job for another day. In the meantime, on an M65, running the CPU at 50MHz is an easy interim solution, as loading can happen at some tens of KB/second, even with this ineffeciency.

Meanwhile, back on the topic of CPU personality selection, those two routines are not particularly troublesome, as they only use 4502 opcodes after using $D02F to enable C65 / VIC-III IO mode, which we can use as a reliable clue to switch the CPU to 4502 mode.  This gives us our first rule:

(1) If the C65 / VIC-III (or M65 / VIC-IV) IO mode is enabled, the CPU should always be in 4502 mode.

The routines to enter the various DOS calls are, however, a little troublesome.   They are all very similar, so I present just one of them here as an example:

F7E4 FF C0 09 BBS7 $C0,$F7F0 ; Is current device C65 DOS?
F7E7 20 C7 ED JSR $F72C ; Context switch to C65 DOS memory map
F7EA 22 0A 80 JSR ($800A) ; Call C65 DOS TALK routine
F7ED 20 3E F8 ; Context switch back to C64 memory map
F7F0 4c C7 ED JMP $EDC7 ; Send TALK on IEC bus in all cases

The instructions in bold are 4502 instructions.

Opcode $22 is normally a KIL instruction on the 6502, so we could safely make that do the new indirect JSR at all times, without great risk. $FF is normally ISC $nnnn,X, where ISC is the combination of INC and SBC. I have no idea if people find uses for that opcode.  However, we don't need to worry about that, because these opcodes only exist in the C64-mode KERNAL on the C65.  Thus, we can simply switch the CPU to 4502 mode whenever executing code in the KERNAL in C64 mode.  This gives us our second rule:

(2) When executing code in the KERNAL (ROM at $E000-$FFFF), the CPU should always be in 4502 mode.

So, if the CPU were in 6502 mode in C64 mode, then these two rules would ensure that the C65 internal DOS would work.  So that's good.

Now the trick is we just need to work out when we are in C64 mode and when we are in C65 mode.

First, a C65 starts up in C64 mode, and then escalates to C65 mode if it doesn't see why it should stay in C64 mode. That is, the machine always starts in C64 mode.  Fortunately, the switch to C65 mode does enable C65 / VIC-III IO mode, so it is possible that we need no further rules.

Finally, when the Hypervisor is running, the CPU should always be in 4502 mode.

(3) When executing code in Hypervisor mode, the CPU should always be in 4502 mode.

So, in theory at least, that should be all we need to automatically set the CPU personality, in a way that maximises compatibility, i.e., where C64 programs get a 6502-compatible CPU, unless they ask for something different, but the C65's C64-mode KERNAL runs in 4502 mode, so that the rather ugly DOS inter-process communications can still happen.

The present state of play, is that this is all implemented, but not yet enabled by default or tested on the MEGA65.  This is one of the jobs on my list for the coming week.


  1. It's still not clear for me, that having the DMA in C65, why didn't they use it for data transfer in C65? On the M65, it would be even possible to use the "32-bit linear addressing" stuff in addition to that.

    1. Like on the C64, they got a working DOS, and then seem to have stopped. DMA can only really be used for a LOAD routine, not for providing byte-by-byte sequential access.

  2. When switching to 6502 mode, you should also change two things: 1.) instruction execution timing (all opcodes to match the original cycle count) 2.) VIC badline emulation (like DTV). Are these also planned?

    1. Hello,
      Yes, the instruction timings also change with CPU personality. VIC badline emulation is not yet there, and will be a function of the VIC-IV, not the CPU. Badline emulation will occur when the CPU is at 1, 2 or 3.5MHz, regardless of CPU personality, i.e., it will only be disabled when the CPU is running at full-speed.

  3. This one is my opinion not a good idea:

    "When executing code in the KERNAL (ROM at $E000-$FFFF), the CPU should always be in 4502 mode."

    It will cause problems regarding the RAM below the ROM. There are programs that use this RAM for their own use, which means you need to support illegal opcodes. Another group of copies ROM into RAM, then switches ROM off and does some patches. For this group of software, you need to support 4502 instructions.

    1. Hello,
      It is currently configured to only force 4502 mode when reading from the ROM in the KERNAL space. If you copy to RAM, or bank out ROM and run code there, it will be 6502 by default: But if software wants 4502 mode, it can still ask for it. While this won't solve the copy-rom-to-RAM-and-patch programs, it is the best compromise I can think of. Of course, for those programs that want to patch the ROM, you could start the M65 with a pure C64 ROM, and 4502 mode disabled. At the end of the day, 100% compatibility is not possible, but we can get fairly close, and have work arounds for many of the exceptions. But if you have ideas how we can do it better, please share them, so that we can take them into account.


  4. I understand the challenge of finding a solution that is both elegant and compatible at the same time. It is clear that some compromise needs to be made. However, the ROM to RAM copy is not a small thing as technique is explained in a lot of C64 books, the impact might be similar to having no illegal opcodes at all.

    I think my preferred solution would be to remap the BBS7 instruction to a different opcode in 6502 mode. It requires a ROM patch and yes, that is not nice. Having the same instruction on two different opcodes is not nice either. However, this approach, combined with the remaining rules, would both result in a system with few practical disadvantages: Illegal opcodes would work fine, just as well as the ROM switching to 4502 mode. A possible opcode candidate would be $F2. ROM patching would be trivial, simply change $FF to $F2 in a few places, as the code stays the same size, no re-arrangement of instructions is necessary this way.

    Another solution that I can think of is to make the KERNAL execution detection more advanced, for example that the jump table at the end of the KERNAL are used as a signature. If the jump table is present, 4502 mode should be used. It would work, but is a lot less elegant IMO than opcode remap.

    Lastly, as you suggest in the text, not support this particular addressing mode of ISC is another approach, with some compatibility impact, but the impact might well be lower than the impact of not supporting ROM to RAM copying.

    1. Hello,

      Yes, life is never easy ;) The main problem with the signature detection is that the logic for that cannot be tested every clock cycle, so there has to be something to monitor this over time, and it would likely be a bit error prone. I would like the ROMs to be able to be used unmodified, if at all possible, but it could be possible to have a "maximum C65 compatibility" versus "maximum C64 compatibility" option that selects between different behaviours, and perhaps patches the ROM at the same time. I also agree that losing $FF as an ISC illegal opcode is quite likely a small loss of 6502 compatibility, and would in many ways be the simplest option. Anyway, the good thing is that all these options we are looking at are only small architectural tweaks, and we don't have to be locked in to a particular solution right now.