Friday 3 April 2015

About C65GS Memory

As promised, here is a very brief introduction to the memory available to a programmer on the C65GS, and some related notes:

First, there are three types of addresses you need to know about on the C65GS:

1. C64-style 16-bit CPU mapped addresses.  These are your good old friends like $0800, $D020 and $FFD2 and so on.  They are how you reference things with the CPU if you are using the machine like a normal 6502-based C64.

2. C65-style 20-bit addresses.  These are the addresses you can reference using the C65's MAP instruction.  You identify which 20-bit address, like $20000 where the C65 DOS ROM lives, you want to map somewhere in the 64KB address space, do some strange calculations, and voila, you have some piece of the 1MB address space mapped.

3. C65GS 28-bit addresses.  There aren't too many computers with 28-bit address busses, so I though that we should have one.  The first 1MB of these match up with the C65's 20-bit address space, so $0020000 is also the C65 DOS ROM.  Needless to say in the 256MB of address space, there are other interesting things.

Now, for a horribly simplified memory map with only the relevant parts left in:

$0000000 - Same as $00 on C64 (CPU port)
$0000001 - Same as $00 on C64 (CPU port)
$0000002 - $000FFFF - C64 ~64KB RAM. Zero wait states. VIC-IV can see this.
$0010000 - $001F7FF - 62KB. C65 2nd 64KB RAM. Zero wait states. VIC-IV can see this.
$001F800 - $001FFFF - First two 2KB of colour RAM. 1 wait state. VIC-IV can see this only for colour information, so don't try putting character sets or bitmap data there.
$0020000 - $003FFFF - 128KB C65 "ROM". Zero wait states. VIC-IV can't see this. Really a RAM! You can replace the contents.  Think of it as like fastram on an Amiga.
$8000000 - $FEFFFFF - 127MB of DDR RAM, well it will be when I get the DDR controller working. VIC-IV can't see this. yet.
$FF80000 - $FF8FFFF - 64KB colour RAM. 1 wait state. VIC-IV can see this only for colour information.

So all up, you have:

126KB "chipram" that the VIC-IV can use for bitmaps, sprites and so on.
128KB "fastram" which will have the ROM in it when you start, and after you replace the ROM, you won't have the ROM any more.
64KB colour RAM.  Works great, except when you try to run code from it for reasons I have yet to investigate.
and later, 128MB DDR RAM, which has horrible latency, made up for only slightly by a little cache. currently very buggy, as described in previous posts.

So you thus have about 256KB RAM for code, and 64KB of colour RAM which can also double for storing stuff in a more general sense, including code, just don't try to run code from there right now.  In time I will fix this.

Now, for the truly dedicated, you can delaminate the 128KB chipram from its shadow RAM.  The shadow RAM is what the CPU really reads from.  The chipram then becomes write-only to the CPU, i.e., the VIC-IV still reads it, but the CPU can't tell what was put there anymore.  In many cases, this isn't a real problem.  You can then map the 128KB shadow RAM somewhere else in the first 8MB of address space (it is configurable), and have an extra 128KB of fast RAM.  Thus you can have 126KB chipram, 256 KB fastram and 64KB colour RAM.  I might later disable this option, since it makes it almost impossible to freeze a program that uses it, since you would have to use sprite collision tricks to read the data back out of the chipram, which would take many frames, or I'd have add some sort of horrible reflection process from the VIC-IV, all while trying to not overstrain the already overstrained memory bus on the VIC-IV side of things.

I completely concede that this is a very bizarre arrangement. It is however what you get when providing backward compatibility with a machine that was never really finished, and which in turn provides backwards compatibility with a machine that was almost ten years old at the time.  I'd also say that it adds a certain degree of charm.

Now, for those wanting an easy way to access any byte in the 28-bit address space, you can use the new-and-improved Z-indirect addressing mode.  Ordinarily it works like the indirect-Y addressing mode you know:

LDA ($nn),Y

The 4502 has the Z version:

LDA ($nn),Z

This also behaves as you would expect, de-referencing the 16-bit pointer at $nn and $nn+1

However, if you preceed this instruction with a NOP, then it dereferences a 32-bit pointer at $nn through to $nn+3, allowing easy access to single bytes anywhere in memory. So the following routine would read the colour RAM byte for the first column of the 2nd row of the C64-mode screen (address $FF80028).

pointer: .byte $28,$00,$F8,$0F

LDZ #$00
NOP
LDA (pointer),Z

This instruction takes just 2 more cycles than the normal 16-bit indirect version, making for very fast access to arbitrary memory.  If the C65 BASIC were re-written using this, it would likely be quite a lot faster (as it stands it is about 3x slower than C64 BASIC).

 Note that this mode allows access to 4GB of address space, allowing for fun future expansion.  Consequentially, you should always make sure the upper nybl is $0, so that your programs will work on the future C65GS+.

To make running programs bigger than 64KB easier, I am also part-way through implementing 32-bit addressed JMP, JSR and RTS instructions.  These will automatically map the correct 16KB of RAM to $4000-$7FFF and jump to it.  These will be selected by preceeding JMP, JSR or RTS with two SED instructions, e.g., would jump to the routine located at addr, where addr is a 32-bit address:

SED
SED
JMP <<addr
.word >>addr

This avoids the need to do crazy bank switching calculations every time you want to call a little routine, or return from one.

Indeed both these enhancements were planned in response to how horrible and inefficient it was to endlessly use DMA lists and the MAP instruction.  Now we just need for me to finish the far-jump stuff, and then to make some tools that can actually use them.  Of course, these two features make it much easier to contemplate targeting a C compiler at the C65GS, because the compiler can ignore all banking apart from the requirement that each function be less than 16KB long, or be broken into 16KB long pieces.

1 comment:

  1. If you're considering adapting a C compiler, I'd suggest you take a look at vbcc :
    http://www.compilers.de/vbcc.html
    It's efficient and a lot smaller than the big names.

    ReplyDelete