In any case, we know we need to add "6502 mode" to our CPU. In fact, the bulk of the work was done quite some time ago, but I am only just now getting around to testing it.
Give or take getting the precise behaviour of the illegal op-codes correct, it wasn't too hard to add a second personality to the CPU: It is just some of the instruction fetch and decode logic that needed to be duplicated. Then the CPU needed a flag to indicate which mode to be in.
Then it should just be a case of working out when we are in C64 mode versus C65 mode, and setting the CPU mode accordingly, right? Unfortunately not.
This is because the C65's C64-mode KERNAL ROM uses 4502 opcodes to work out whether it should talk to the internal 1581/1565 drive, or to a drive on the IEC bus.
There are a few key parts of the routine we need to worry about (this is from t he 910111 version of the C65 ROM):
$F72C - Context switch to C65 DOS
$F83E - Context switch back from C65 DOS on return from DOS call
The context switch to C65 DOS and back are fairly similar, and worth a quick look. First, context switching to the C65 DOS:
F72C 78 SEI
F72D 48 PHA
; C65 IO / VIC-III mode enable sequence
F72E A9 A5 LDA #$A5
F730 8D 2F D0 STA $D02F
F733 A9 96 LDA #$96
F735 8D 2F D0 STA $D02F
; set bit 6 in $D031 to put CPU at 3.5MHz
F739 A9 40 LDA #$40
F73A 0C 31 D0 TSB $D031
; bank in $C000 interface ROM and remove CIAs from IO map
; so that 2KB of colour RAM is visible $D800-$DFFF
F73D A9 21 LDA #$21
F73F 0C 30 D0 TSB $D030
; Save registers from C64 mode, so that they can be restored
; on return
F742 68 PLA
F743 8D F6 DF STA $DFF6
F746 8E F7 DF STX $DFF7
F749 8C F8 DF STY $DFF8
F74C 9C F9 DF STZ $DFF9
; Now pull the return address from the stack, and save that, too.
F74F 68 PLA
F750 8D FB DF STA $DFFB
F753 68 PLA
F754 8D FC DF STA $DFFC
; Remember what the stack pointer was
F757 BA TSX
F758 8E FF DF STX $DFFF
; Rearrange memory map:
; Map $0000-$1FFF to $10000-$11FFF
; Map $8000-$BFFF to $20000-$23FFF
; (C64 KERNAL stays visible at $E000-$FFFF)
F75B A9 00 LDA #$00
F75D A2 11 LDX #$11 ($0000+$10000)
F75F A0 80 LDY #$80
F761 A3 31 LDZ #$31 ($8000+$18000)
F763 5C MAP ; activate new map
; We are now in C65 DOS memory map
; Set stack pointer to $1FF
F764 A2 FF LDX #$FF
F766 9A TXS
; Load the saved return address, and put it into the C65 DOS
; stack
F767 AD FC DF LDA $DFFC
F76A 48 PHA
F76B AD FB DF LDA $DFFB
F76E 48 PHA
; Restore all the saved registers
F76F AD F6 DF LDA $DFF6
F772 AE F7 DF LDX $DFF7
F775 AC F8 DF LDY $DFF8
F778 AB FA DF LDZ $DFFA
; Return to the return address that had just been copied to this stack
F77B 60 RTS
The general procedure of this routine is quite interesting. The last few bytes of colour RAM (it is 2KB long on the C65, and is visible $D800-$DFFF in when the CIAs are banked out of the way) are used as a scratch transfer area. The contents of the registers are saved, so that they are available to the DOS function that has been called. The only piece of mild gymnastics going on is the way that the return address of the caller is copied from the C64 stack to the C65 DOS stack. The C64 KERNAL is kept visible, so that the RTS will continue in the C64 KERNAL routine at that point, which then makes the indirect jump into the C65 DOS to do the necessary work. There is no risk of interrupts happening while in this mode, as IRQs and NMIs are both disabled by the MAP instruction, until a NOP instruction is executed.
This routine takes 21 ~1MHz clock cycles (26 including the JSR) before the CPU is switched to 3.5MHz. Then 85 more clock cycles at 3.5MHz. The total cost of switching to the C65 DOS context is thus 50 micro seconds.
The return routine is similar, essentially reversing the process, although the handling of the two different return addresses at $DFFB/C versus $DFFD/E is still not entirely clear to me.
; Pop return address of caller and save, ready for copying to C64
; stack.
F83E 68 PLA
F83F 8D FD DF STA $DFFD
F842 68 PLA
F843 8D FE DF STA $DFFE
; restore C64 memory map and stack pointer for the original
; C64-context caller (61 cycles at 3.5MHz)
F846 20 7C F7 JSR $F77C
; clear bit 7 in $C0, indicating not a C65 internal drive
F849 77 C0 RMB7 $C0
F84B 6B TZA
F84C 10 0C BPL $F85A
F84E A9 00 LDA #$00
; set bit 7 in $C0, indicating a C65 internal drive
F850 F7 C0 SMB7 $C0
; Push return address back onto C64 stack
F852 AE FE DF LDX $DFFE
F855 DA PHX
F856 AE FD DF LDX $DFFD
F859 DA PHX
; Set bits in $90 (status) if required from A
F85A 04 90 TSB $90
; Restore CPU registers
F85C AE F7 DF LDX $DFF7
F85F AC F8 DF LDY $DFF8
F862 AB F9 DF LDZ $DFF9
F865 AD F6 DF LDA $DFF6
; bank out $C000 ROM and bank CIAs back in.
F868 48 PHA
F869 A9 21 LDA #$21
F86B 1C 30 D0 TRB $D030
; return CPU to 1MHz.
F86E A9 40 LDA #$40
F870 1C 31 D0 TRB $D031
; return to VIC-II mode
F873 8D 2F D0 STA $D02F
; Re-restore Accumulator
F876 68 PLA
; Re-enable IRQ & NMI after MAP change made in call to $F846
F877 EA EOM
F878 58 CLI
F879 18 CLC
F87A 60 RTS
Because the C65 IO mode and 3.5MHz CPU mode is already active, the cost is 56 cycles at ~3.5MHz = ~16 micro seconds for the call to $F77C, plus 70 cycles at 3.5MHz (= ~20 micro seconds) and 15 cycles at ~1MHz for the routine itself. The total cost for switching from C64 mode to C65 DOS and back again -- without actually doing any work is thus 50 microseconds for the switch to the C65 DOS context, plus 16 + 20 + 15 = 51 microseconds to switch back, i.e., ~0.1 millisecond. (I have counted both code paths for the stack restoration, as I am not entirely sure what is going on there. However, that only makes about 5.5 microseconds difference.)
This convoluted process explains why the C65 DOS is so incredibly slow, achieving less than 2KB/second, even though the internal floppy drive can theoretically read at 30KB/sec, as there is a complete and time-consuming context switch whenever a byte is read from the internal floppy drive. I've been tempted to patch the C65 DOS to either support an efficient LOAD system call, or to implement some sort of buffering for sequential reads. I've even thought about teaching the CPU about these context switch routines, and basically having a special case for them, that does the context switch in just one or a few cycles. However, that is a job for another day. In the meantime, on an M65, running the CPU at 50MHz is an easy interim solution, as loading can happen at some tens of KB/second, even with this ineffeciency.
(1) If the C65 / VIC-III (or M65 / VIC-IV) IO mode is enabled, the CPU should always be in 4502 mode.
The routines to enter the various DOS calls are, however, a little troublesome. They are all very similar, so I present just one of them here as an example:
F7E4 FF C0 09 BBS7 $C0,$F7F0 ; Is current device C65 DOS?
F7E7 20 C7 ED JSR $F72C ; Context switch to C65 DOS memory map
F7EA 22 0A 80 JSR ($800A) ; Call C65 DOS TALK routine
F7ED 20 3E F8 ; Context switch back to C64 memory map
F7F0 4c C7 ED JMP $EDC7 ; Send TALK on IEC bus in all cases
The instructions in bold are 4502 instructions.
Opcode $22 is normally a KIL instruction on the 6502, so we could safely make that do the new indirect JSR at all times, without great risk. $FF is normally ISC $nnnn,X, where ISC is the combination of INC and SBC. I have no idea if people find uses for that opcode. However, we don't need to worry about that, because these opcodes only exist in the C64-mode KERNAL on the C65. Thus, we can simply switch the CPU to 4502 mode whenever executing code in the KERNAL in C64 mode. This gives us our second rule:
(2) When executing code in the KERNAL (ROM at $E000-$FFFF), the CPU should always be in 4502 mode.
So, if the CPU were in 6502 mode in C64 mode, then these two rules would ensure that the C65 internal DOS would work. So that's good.
Now the trick is we just need to work out when we are in C64 mode and when we are in C65 mode.
First, a C65 starts up in C64 mode, and then escalates to C65 mode if it doesn't see why it should stay in C64 mode. That is, the machine always starts in C64 mode. Fortunately, the switch to C65 mode does enable C65 / VIC-III IO mode, so it is possible that we need no further rules.
Finally, when the Hypervisor is running, the CPU should always be in 4502 mode.
(3) When executing code in Hypervisor mode, the CPU should always be in 4502 mode.
So, in theory at least, that should be all we need to automatically set the CPU personality, in a way that maximises compatibility, i.e., where C64 programs get a 6502-compatible CPU, unless they ask for something different, but the C65's C64-mode KERNAL runs in 4502 mode, so that the rather ugly DOS inter-process communications can still happen.
The present state of play, is that this is all implemented, but not yet enabled by default or tested on the MEGA65. This is one of the jobs on my list for the coming week.
It's still not clear for me, that having the DMA in C65, why didn't they use it for data transfer in C65? On the M65, it would be even possible to use the "32-bit linear addressing" stuff in addition to that.
ReplyDeleteLike on the C64, they got a working DOS, and then seem to have stopped. DMA can only really be used for a LOAD routine, not for providing byte-by-byte sequential access.
DeleteWhen switching to 6502 mode, you should also change two things: 1.) instruction execution timing (all opcodes to match the original cycle count) 2.) VIC badline emulation (like DTV). Are these also planned?
ReplyDeleteHello,
DeleteYes, the instruction timings also change with CPU personality. VIC badline emulation is not yet there, and will be a function of the VIC-IV, not the CPU. Badline emulation will occur when the CPU is at 1, 2 or 3.5MHz, regardless of CPU personality, i.e., it will only be disabled when the CPU is running at full-speed.
This one is my opinion not a good idea:
ReplyDelete"When executing code in the KERNAL (ROM at $E000-$FFFF), the CPU should always be in 4502 mode."
It will cause problems regarding the RAM below the ROM. There are programs that use this RAM for their own use, which means you need to support illegal opcodes. Another group of copies ROM into RAM, then switches ROM off and does some patches. For this group of software, you need to support 4502 instructions.
Hello,
DeleteIt is currently configured to only force 4502 mode when reading from the ROM in the KERNAL space. If you copy to RAM, or bank out ROM and run code there, it will be 6502 by default: But if software wants 4502 mode, it can still ask for it. While this won't solve the copy-rom-to-RAM-and-patch programs, it is the best compromise I can think of. Of course, for those programs that want to patch the ROM, you could start the M65 with a pure C64 ROM, and 4502 mode disabled. At the end of the day, 100% compatibility is not possible, but we can get fairly close, and have work arounds for many of the exceptions. But if you have ideas how we can do it better, please share them, so that we can take them into account.
Paul.
I understand the challenge of finding a solution that is both elegant and compatible at the same time. It is clear that some compromise needs to be made. However, the ROM to RAM copy is not a small thing as technique is explained in a lot of C64 books, the impact might be similar to having no illegal opcodes at all.
ReplyDeleteI think my preferred solution would be to remap the BBS7 instruction to a different opcode in 6502 mode. It requires a ROM patch and yes, that is not nice. Having the same instruction on two different opcodes is not nice either. However, this approach, combined with the remaining rules, would both result in a system with few practical disadvantages: Illegal opcodes would work fine, just as well as the ROM switching to 4502 mode. A possible opcode candidate would be $F2. ROM patching would be trivial, simply change $FF to $F2 in a few places, as the code stays the same size, no re-arrangement of instructions is necessary this way.
Another solution that I can think of is to make the KERNAL execution detection more advanced, for example that the jump table at the end of the KERNAL are used as a signature. If the jump table is present, 4502 mode should be used. It would work, but is a lot less elegant IMO than opcode remap.
Lastly, as you suggest in the text, not support this particular addressing mode of ISC is another approach, with some compatibility impact, but the impact might well be lower than the impact of not supporting ROM to RAM copying.
Hello,
DeleteYes, life is never easy ;) The main problem with the signature detection is that the logic for that cannot be tested every clock cycle, so there has to be something to monitor this over time, and it would likely be a bit error prone. I would like the ROMs to be able to be used unmodified, if at all possible, but it could be possible to have a "maximum C65 compatibility" versus "maximum C64 compatibility" option that selects between different behaviours, and perhaps patches the ROM at the same time. I also agree that losing $FF as an ISC illegal opcode is quite likely a small loss of 6502 compatibility, and would in many ways be the simplest option. Anyway, the good thing is that all these options we are looking at are only small architectural tweaks, and we don't have to be locked in to a particular solution right now.
Paul.