Saturday, 8 November 2014

Easily Accessing All of Memory

As I have begun work on writing a general-purpose Unicode string renderer for the C65GS, it has got me right at the coal face of developing software to run on this machine, as compared to just making the machine. Creator and user have rather different needs and experiences.
Until now, I had simply assumed that I would use DMA and memory banking to provide access to all of memory.  Technically it provided everything that one could need.  Once I started to write software for the machine, it became immediately apparent that these methods were fine for accessing slabs of memory, but would be rather inconvenient for the normal use case of reading or writing some random piece of memory somewhere.
What I found was that if I had a pointer to memory and wanted to PEEK or POKE through that pointer, it was going to be a herculean task, and one that would waste many bytes of code and cycles of CPU to accomplish -- not good for a task that is the mainstay of software.
I was also reminded that the ($nn),Y operations of the 6502 are essentially pointer-dereference operations.  So I thought, why don't I just allow the pointers to grow from 16-bits to 32-bits.  Then one could just use ($nn),Y or ($nn),Z operations to act directly on distant pieces of memory.
Slight problem with this is that the 4502 has all 256 opcodes occupied, so I couldn't just assign a new one.  I would need some sort of flag to indicate what size pointers should be.  This had to be done in a way that would not break existing 6502 or 4502 code.
The experience of the 65816 led me to think that a global flag was not a good idea, because it makes it really hard to work out what is going on just by looking at a piece of code, especially where instruction lengths change.
So I decided to go for a bit of an ugly hack: If an instruction that uses the ($nn),Z addressing mode immediately follows and EOM instruction (which is what NOP is called on the 4502), then the pointer would be 32-bits instead of 16-bits.
While ugly, it seems to me that it should be safe, because no 6502 code uses ($nn),Z, because it doesn't exist. Similarly, there is so little C65 software that it is unlikely that any even uses ($nn),Z, and even less of it should have an EOM just before such an instruction.  
In fact, in the process of implementing 32-bit pointers, I discovered that ($nn),Z on the 4510 was actually doing ($nn),Y, among other bugs.  So clearly the C65 ROM mustn't have even been using the addressing mode at all!
Here is the summary of how this new addressing mode works in practise.  The text below is as it appears in the C65GS System Notes which is being developed with the help of the community.

32-bit Memory Addresses using 32-bit indirect zero-page indexed addressing

The ($nn),Z addressing mode is normally identical in behaviour to ($nn),Y other than that the indexing is by the Z register instead of the Y register.  That is, two bytes of zero-page memory are used to form a 16-bit pointer to the address to be accessed. However, if an instruction using the ($nn),Z addressing mode is immediately preceded by an EOM instruction, then it uses four bytes of zero-page address to form a 32-bit address.  So for example:

zppointer: .byte $11,$22,$33,$04

ldz #$05
lda ($nn),Z

Would load the contents of memory location $4332216 into the accumulator.

LDA, STA, EOR, AND, ORA, ADC and SBC are all available with this addressing mode.

Memory accesses made using 32-bit indirect zero-page indexed addressing require three extra cycles compared to 16-bit indirect zero-page indexed addressing: one for the EOM, and two for the extra pointer value fetches.

This makes it fairly easy to access any byte of memory in the full 28-bit address space.  The upper four bits should be zeroes for now, so that in future we can expand the C65GS to 4GB address space.

1 comment:

  1. Paul, looking at this I'm very glad! I was through learning programming 65816 very interested to create indirect of indirect access 'cause 65816 24-bit address space is by bankswitching, I wanted to override it and add else one bankswitching meaning that MMU of C128 allows to operate with 16 banks, but collaboration with 65816 is not possible and I left that idea...
    So, for C128 or C64 with SuperCPU must to be enough 16MB RAM.. You reach 4GB = 256 banks of 16MB looking at it that way...
    Really want to say: you're building computer for 21st century and I want to programming it!