tag:blogger.com,1999:blog-4017745189504803687.post528312746116561749..comments2024-02-28T07:29:15.484+10:30Comments on Making a C64/C65 compatible computer: On cycle count predictability and related thingsPaul Gardner-Stephenhttp://www.blogger.com/profile/10150903760695355706noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-4017745189504803687.post-32939270229008327202016-07-22T01:33:41.725+09:302016-07-22T01:33:41.725+09:30This comment has been removed by the author.The Mindhttps://www.blogger.com/profile/15982535603996996554noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-50408691521894962222016-05-15T21:49:09.542+09:302016-05-15T21:49:09.542+09:30The simplest solution here is to make the 4502 MAP...The simplest solution here is to make the 4502 MAP instruction on the secondary cores instead be a hypervisor trap. The hypervisor can then do the memory mapping. But as I say, I haven't really come to a final position -- it will depend on how the final CPU implementation looks, as to how I can best implement memory protection of some kind, while allowing the secondary cores as much latitude as possible. It might be, for example, that IO can be enabled/disabled for each chip for each core, e.g., you can choose which core can see CIA1 or the VIC-IV etc. This would need to be in addition to protection for the RAM, so that other cores can't (without permission) scribble over the RAM another core or process is using. It would be faster if most allowed remapping can be done by the core itself, so that hypervisor traps can be avoided. It also gets a bit interesting with the 32-bit indirect ZP addressing mode, since that doesn't see the effects of any memory mapping, and so I'll need to come up with some interesting protection scheme for that as well. As I say, there is still some thinking to do on this.<br /><br />Paul.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-60517260302494756542016-05-15T21:33:23.218+09:302016-05-15T21:33:23.218+09:30Hmm... I'm hoping that the whole address spac...Hmm... I'm hoping that the whole address space (less the reserved area) would be open to user tasks so that no I/O appears in the address space. Wouldn't this mean that in order for the traps to work, a call into the system (in the reserved space) would need to map in that I/O?<br /><br />At any rate, the only way I can think of to handle system calls from a user task requires that the auxiliary CPUs be able to perform memory mapping, even if its just to map in the appropriate system "ROM". I don't really know anything about the hypervisor, however. <br /><br />I really wouldn't want the I/O area to be present for user tasks. Perhaps instead of the hypervisor for the auxiliary CPUs, they can do things like signal and wait? But, as I said, I think that they would need to be able to perform some memory mapping, even if its simple ($00, $01?). <br /><br />This leads to some issues because it would have to be done via an instruction if not $00 and $01 and a 6510 "personality" would need one reclaimed for the job (and even for the 6502). This is not too much of a concern for me since I personally think the auxiliary CPUs should be 4502s or 6502s anyway. I'm not sure how the 4502 MAP instruction works or what you would want.Anonymoushttps://www.blogger.com/profile/04831681461356153688noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-54361234536385126232016-05-15T20:56:37.095+09:302016-05-15T20:56:37.095+09:30Hello,
This is more or less in line with my think...Hello,<br /><br />This is more or less in line with my thinking. The primary CPU is the only one that would be able to access the hypervisor, and would certainly not be able to be stopped by the other CPUs. By limiting access to the hypervisor, all these other things should mostly fall out.<br /><br />It might be, for example, that if one of the other cores asks for the hypervisor through a trap, that the calling core is suspended until the main core gets around to responding to it.<br /><br />Anyway, this is still all a bit fluid while I get the new CPU design actually working, and then work through what is easier and harder to implement. The end result is that I do intend it to be possible to have such heirarchical coordination between the cores, if only because we want to use this machine for teaching computer architecture.<br />Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-29109758659084636982016-05-15T20:21:48.790+09:302016-05-15T20:21:48.790+09:30I'm thinking of the situation where we have a ...I'm thinking of the situation where we have a multi-tasking operating system using multiple cores. I'd like to reserve the "primary core" for "ring 0" or privileged tasks and the others for user tasks. I think some means of interrupting the process on the auxiliary cores is going to be necessary for simplifying the task switching process. Otherwise, we need to be able to halt/stall and resume them and preserve and restore the PC from the "primary" core. <br /><br />Using interrupts, this is almost done for us so long as we reserve some of the address space for the system, I'm thinking the top $F000-$FFFF for interfacing as well as being able to switch in/out $0000-$0FFF for application specific state/system info. If there are no tasks to run, it could be a little ugly, though.<br /><br />What are your thoughts?Anonymoushttps://www.blogger.com/profile/04831681461356153688noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-39134290815114321452016-05-15T18:32:12.156+09:302016-05-15T18:32:12.156+09:30Hello,
I might add a little interrupt switch that...Hello,<br /><br />I might add a little interrupt switch that allows each source to be set to any core, or I might just have the sources fixed to particular cores. Not entirely decided yet. Likewise for inter-core communications.<br /><br />For memory maps, they will all "see" the same 28-bit address space, but I may disable memory remapping from the auxiliary cores. For floppy emulation, they will simply be set to a memory map that contains what they need an nothing else. As on a real C65, $01 and $00 are actually $00001 and $00000, so if you map other memory, they aren't visible, so the problem of other cores accessing them is elegantly avoided, without adding any complexity.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-44786832340441064102016-05-15T16:49:59.976+09:302016-05-15T16:49:59.976+09:30I'm wondering about interrupt handling/sharing...I'm wondering about interrupt handling/sharing amongst the CPU cores. Will there be a mechanism for determining which sources will go where? Will there be a way of generating interrupts on a core from another one?<br /><br />Also, will memory mapping be available for each of the cores, independently? Will only one of the cores be able to function as the 6510/4510 with the $00, $01 IO ports? How will you handle the situation where the drive 6502s mustn't have those ports?<br /><br /><br />Daniel.Anonymoushttps://www.blogger.com/profile/04831681461356153688noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-5204877568534206462016-05-05T07:50:15.840+09:302016-05-05T07:50:15.840+09:30Hello,
Unfortunately exposing such information wo...Hello,<br /><br />Unfortunately exposing such information would add considerably to the complexity of the CPU. It is also complicated by the lack of any unused opcodes.<br /><br />That said, the pipeline is actually relatively simple and predictable in terms of cycle timing, so it will almost certainly be possible to still do cycle counting in your head as you write code. In fact, it will likely be simpler than on a regular 6502, because instructions will typically take either 1, 2 or 10 cycles (exact values to be finalised).<br /><br />Paul.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-90159811618347410522016-05-05T06:09:01.660+09:302016-05-05T06:09:01.660+09:30Hello Paul,
I'm in favor of the new pipelined...Hello Paul,<br /><br />I'm in favor of the new pipelined architecture. I'm wondering if there is a way to expose either values being evaluated in interim stages or status flags to indicate what stage an instruction is in.<br /><br />Just as an example case, for an INC $add instruction, you have to fetch memory, add, and write it back out - if the value stuck around while in the add phase of the pipeline, maybe we could transfer the value pre-add or post-add into the accumulator?<br /><br />If status flags were available, it might be helpful when profiling code. The big difference with existing 6502 is that you can do the math in your head about how many cycles an instruction is going to take, with branching and page boundaries. But with a pipeline, it depends on which stage the pipe is in, whether you can do some math on a different register while fetching from memory, etc. This level of complexity requires profiling tools to optimize code. I remember a simple command-line tool that Hitachi provided for the SH-4 that would visual print out the stages of the pipeline for a given set of instructions.<br /><br /> - Gary<br /><br />Anonymoushttps://www.blogger.com/profile/03664327725235821055noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-71843962629974761292016-04-28T05:20:19.020+09:302016-04-28T05:20:19.020+09:30Hello,
Thanks. I am looking forwards to seeing w...Hello,<br /><br />Thanks. I am looking forwards to seeing what the result will be as well. I'm hoping for atleast 100x in SynthMark, but I won't know until I get it all done.<br /><br />For the VIC-IV, which is our enhanced version of the VIC-III, I don't yet know if it will be possible to increase the graphics RAM -- it will depend on available resources in the FPGA, which I won't know until we have everything else settled.<br /><br />Paul.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-80078630626673755132016-04-28T05:15:24.320+09:302016-04-28T05:15:24.320+09:30Hello,
Thanks :) It is always nice to hear encour...Hello,<br /><br />Thanks :) It is always nice to hear encouraging words.<br /><br />The project is called the MEGA65 these days. The blog pre-dates that change, which is why it still appears as "C65GS" in the URL.<br /><br />Paul.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-11932605058744265332016-04-28T02:09:15.330+09:302016-04-28T02:09:15.330+09:30Looking forward to see how the new cpu will perfor...Looking forward to see how the new cpu will perform.<br /><br />Do you plan to further enhance the VicV - and perhaps increase the amount of graphics ram beyond 128kb, or will that break compatibility?Soleihttps://www.blogger.com/profile/12093743561530649494noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-37819403947710575672016-04-28T00:34:05.354+09:302016-04-28T00:34:05.354+09:30Thanks for your responses, Paul. I'm excited ...Thanks for your responses, Paul. I'm excited about the C65GS/MEGA65/whatever its current project name is.<br />robhttps://www.blogger.com/profile/12074939979211461276noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-57849895746376795012016-04-27T16:45:13.670+09:302016-04-27T16:45:13.670+09:30Well, it is partly because the implementation of t...Well, it is partly because the implementation of the previous CPU happened somewhat organically, and so it isn't as efficient as possible. Also, with the new CPU, sharing the memory controller among the three cores that we need, also generates some significant space savings. However, there is nothing special about the new design that makes use of any special FPGA resources.<br /><br />Paul.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-30765902761194497832016-04-27T07:07:29.456+09:302016-04-27T07:07:29.456+09:30You also mentioned that the CPU would be "sma...You also mentioned that the CPU would be "smaller"... even while it sounds bigger (to me). Is it that you're leveraging capabilities that are provided "for free" in the FPGA's environment, or is there something else that I don't know about CPU design (and there's a lot I don't know about it!)?robhttps://www.blogger.com/profile/12074939979211461276noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-18441006498602698122016-04-24T11:28:46.270+09:302016-04-24T11:28:46.270+09:30Hello,
So the 4510 has a 4502 CPU core in it, plu...Hello,<br /><br />So the 4510 has a 4502 CPU core in it, plus two 6526 CIAs (or are they 4526s?). I use 4502 to keep being specific to the fact that I am not talking about the CIAs. As for the Chameleon, it's 6510 is a 6502 + 6 pin IO port, just like the 6510 in a real C64. While my terminology might not be perfect, there is nonetheless method in my madness.<br /><br />Paul.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-81358889437807473292016-04-24T08:34:41.939+09:302016-04-24T08:34:41.939+09:30It's called a CSG 4510 VICTOR (NOT 4502, there...It's called a CSG 4510 VICTOR (NOT 4502, there is and never was and never will be a 4502). <br /><br />And C64 and thus Chemeleon uses a 6510 (not: 6502: That is the PET, VIC20 and 1541's CPU) <br /><br />Reading that gives me headache. Anonymoushttps://www.blogger.com/profile/07736535998226717214noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-13825884317678512002016-04-23T02:07:01.450+09:302016-04-23T02:07:01.450+09:30Saying Mega65 have it's own personality and a ...Saying Mega65 have it's own personality and a strong one gave me the greatest smile ever!!! MY EVER NEED TO OWN one...ever so lingers in my vain!Froyhttps://www.blogger.com/profile/03470479672215256496noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-66577688021960685302016-04-22T11:29:50.074+09:302016-04-22T11:29:50.074+09:30Hello,
We really want it to keep it compatible wi...Hello,<br /><br />We really want it to keep it compatible with the original ROM. We will probably do something that dynamic that can work out which mode it should be in. It might even be that we make it so that it is in 4502 mode only when in the kernal, but 6502 mode otherwise, if the machine is otherwise in c64 mode.<br /><br />As for JiffyDOS, this will be upto people to do themselves. We might be able to make an even faster loader, however, that uses custom code, and takes advantage of the fact that the MEGA65 has much faster I/O, and so could potentially lock to the true 1MHz clock of the floppy drive, and allow the drive to write to the port as fast as it can, and purely on the basis of timing, read the bits correctly on the receive side. We might even be able do the GCR decoding on the receive side, and effectively just stream the GCR read register over the serial line. This should certainly work for 1571 and 1581 drives where we can use the shift register, but I suspect it might even be possible on a 1541, if we use illegal opcodes, e.g., with something along the lines of LAX gcr / AND #$03 / STA ioport / TXA / LSR / LSR / TAX / STA ioport / TXA / LSR / LSR / TAX / AND #$03 / STA ioport / TXA / LSR / LSR / STA ioport -- but this is just off the top of my head, there might be problems with doing this, or not.<br /><br />Paul.Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-65842432426945707462016-04-22T10:18:40.226+09:302016-04-22T10:18:40.226+09:30For the C64 mode, you should rather use an own cus...For the C64 mode, you should rather use an own custom modified one instead of the original Kernal, thus getting rid of the 4510 code (and perhaps optionally adding JiffyDOS protocol for external periferals, too).Anonymoushttps://www.blogger.com/profile/17684373497811345881noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-72594149296898591722016-04-22T09:40:31.511+09:302016-04-22T09:40:31.511+09:30Hello,
As you are hinting at, this is more of les...Hello,<br /><br />As you are hinting at, this is more of less what we are doing anyway. We already support C128 and C65 speed switch sequences. Adding the DTV one shouldn't be too hard, although I haven't researched exactly how it works. The problem is if things check if it works, and then tries to do DTV-specific things.<br /><br />As for 6502/4502 switching, it is not unfortunately that simple, because the C64-mode kernal on the C65 actually uses some 4502 instructions for the DOS, but these get run at 1MHz, and in an otherwise completely C64 context.<br /><br />Paul. Paul Gardner-Stephenhttps://www.blogger.com/profile/10150903760695355706noreply@blogger.comtag:blogger.com,1999:blog-4017745189504803687.post-20846986726520317302016-04-22T09:23:00.278+09:302016-04-22T09:23:00.278+09:30I think it is the best way to make as many compone...I think it is the best way to make as many components as possible be optional (or modular) - and let it be up to the actual user if he turns them on or off from the application currently. These pipelining and caching (etc.) features seem slightly similar to that of the DTV (and the little bit of SuperCPU). Thus, make the 1, 2 and 3.5 MHz modes just exactly cycle-accurate with maximal backwards compatibility for the existing old software made (or new ones being made) for C64, C128 and C65 computers. Plus add your special MEGA65 (formerly 48 MHz) mode. (These have all already been there of course.)<br /><br />Now, on the top of that, add yet another switch to also activate the extra features with pipelining and caching (in any mode, not depending on the speed). If you make this switch to be activated by the same instruction sequence which the DTV does, then you also gain another backwards compatibility above the others: partially for the DTV (more than nothing, at least). Perhaps you might also add yet another switch in the manner of the SuperCPU... (But only for altering the speed - not the instruction set or other things, of course.)<br /><br />And finally, to also make it simple at the same time for the user, that POKE0,65 - which turns on ALL of these at once. (And that POKE0,64 which turns off all of these at once, too.)<br /><br />The 1 and 2 MHz modes should automatically use the 6510, while every other mode the 4510 instruction set.<br /><br />And then so will everybody be happy.Anonymoushttps://www.blogger.com/profile/17684373497811345881noreply@blogger.com