For a while now I have been thinking about making a simple fast-loader for the MEGA65 that bypasses the C65 DOS, and directly accesses the floppy controller. It's a topic that comes up from time to time for developers who want to load large files from disk, for example. So I spent a couple of hours yesterday writing a proof-of-concept version. My design criteria were:
1. Must be able to be run from an IRQ, so that it can be used in games or demos to load in the background while other activity goes on. The C65 DOS cannot be sensibly used for this, because when it runs, it blocks all interrupts for arbitrary periods of time, which can exceed 200ms(!!!).
2. Must allow loading to any address in memory.
3. Must be small enough that it can be easily incorporated into other programs.
(1) and (3) meant that it had to be written in assembly.
So here's what I created. It still is missing a few things, like it doesn't save and restore DMA list address registers (in case you were composing a DMA job in real-time, just as the IRQ triggered), and doesn't support specifying how much of a file to load, to allow progressive streaming in of a file. Both would be fairly easy to implement. But back to what we do have, an annotated walk through the source:
First up, to demonstrate it, we have a simple BASIC header (I am running it from C64 mode, but you could almost as easily run it from C65 mode):
basic_header
!byte 0x10,0x08,<2021,>2021,0x9e
!pet "2061"
!byte 0x00,0x00,0x00
Then we have the start of the demo program that is using the fast-loader. The actual fast-loader code will come a bit later. We do the usuals of making sure we have MEGA65 IO enabled, and the CPU at full-speed, as well as have some boiler plate to clear the screen and set screen colours etc:
program_start:
;; Select MEGA65 IO mode
lda #$47
sta $d02f
lda #$53
sta $d02f
;; Select 40MHz mode
lda #65
sta $0
lda #$00
sta $d020
sta $d021
lda #$01
sta $0286
jsr $e544
Next it is time to setup our raster interrupt. This should all be very familiar to C64 coders:
;; Install our raster IRQ with our fastloader
sei
lda #$7f
sta $dc0d
sta $dd0d
lda #$40
sta $d012
lda #$1b
sta $d011
lda #$01
sta $d01a
dec $d019
lda #$16
sta $d018
lda #<irq_handler
sta $0314
lda #>irq_handler
sta $0315
cli
We'll get to the IRQ handler in a moment, but we will finish looking at the real-time part of the program first. The fast-loader uses a single byte state/status variable to keep track of what it is doing. If it is $00, then the loader is idle. If you want to ask it to load something, you setup the filename and load address, and then write $01 into the variable. It will go back to $00 when its done, or have bit 7 set if there is some kind of error. This means you can check status with BEQ and BMI. The load address will progressively update to show where it is loaded to, if that's important for you to track. In the example, we load the game GYRRUS into bank 4 at $00040000:
;; Example for using the fast loader
;; copy filename from start of screen
;; Expected to be PETSCII and $A0 padded at end, and exactly 16 chars
ldx #$0f
lda #$a0
clearfilename:
sta fastload_filename,x
dex
bpl clearfilename
ldx #$ff
filenamecopyloop:
inx
cpx #$10
beq endofname
lda filename,x
beq endofname
sta fastload_filename,x
bne filenamecopyloop
endofname:
inx
stx fastload_filename_len
;; Set load address (32-bit)
;; = $40000 = BANK 4
lda #$00
sta fastload_address+0
lda #$00
sta fastload_address+1
lda #$04
sta fastload_address+2
lda #$00
sta fastload_address+3
Remember what I said about the status variable? We need to make sure it is $00 before we submit our load request. This is important because when the fast-loader initialises, it doesn't know what track the drive is on, and so it seeks back to track 0 first. So we make sure that that completes before we submit our job. If we didn't do this, reading of any sector from the disk on a real drive would hang, because the head would be on the wrong track.
;; Give the fastload time to get itself sorted
;; (largely seeking to track 0)
wait_for_fastload:
lda fastload_request
bne wait_for_fastload
Finally the fast-loader is ready, so we can then submit our job. It really is this simple:
;; Request fastload job
lda #$01
sta fastload_request
We can then go off and do whatever we want in real-time, knowing that the raster interrupt will be calling the fast-loader, and allowing it to progress in the background. For simplicity, in our demo we just wait for the fast-load to complete, and indicate if an error occurred, or if it loaded ok.
;; Then just wait for the request byte to
;; go back to $00, or to report an error by having the MSB
;; set. The request value will continually update based on the
;; state of the loading.
waiting
lda fastload_request
bmi error
bne waiting
beq done
error
inc $042f
jmp error
done
inc $d020
jmp done
That's over and done with for real-time, so now lets look at our raster interrupt. This is also quite simple: Acknowledge the IRQ source, set border colour to white, call the fastload_irq routine, then return the border colour to black, before returning via the well known $EA81 interrupt exit handler code in the C64 KERNAL. You can of course do whatever you want, but this shows just how simple it can be. The border colour stuff is of course optional, but let's us see just how little raster time this loader uses.
irq_handler:
;; Here is our nice minimalistic IRQ handler that calls the fastload IRQ
dec $d019
;; Call fastload and show raster time used in the loader
lda #$01
sta $d020
jsr fastload_irq
lda #$00
sta $d020
;; Chain to KERNAL IRQ exit
jmp $ea81
As mentioned, I set this demo up to load GYRRUS into bank 4, just because that was a file on the disk image I had active in my MEGA65 at the time. Note that the filename has to be padded with $A0s, because the fast-load code literally compares all 16 bytes of the filename with the 16 bytes of filename in the directory sectors. It doesn't support partitions or sub-directories on the disk image, but someone could hack that in if they wanted it, but I don't think it will be necessary for almost all use-cases.
filename:
;; GYRRUS for testing
!byte $47,$59,$52,$52,$55,$53,$a0,$a0
!byte $a0,$a0,$a0,$a0,$a0,$a0,$a0,$a0
;; ----------------------------------------------------------------------------
;; ----------------------------------------------------------------------------
;; ----------------------------------------------------------------------------
So that was the code for our example driver of the fast load. For your own programs, you can cut everything above here away, and just keep what follows. It requires about 1.2KB, including the 512 byte sector buffer, so its quite small in the grand scheme of things.
;; ------------------------------------------------------------
;; Actual fast-loader code
;; ------------------------------------------------------------
First up, we have the variables and temporary storage for the fast loader: The filename and length (which actually gets ignored, because of the use of $A0 padding, so can be removed at some point), the address where the user wants to load, and the state/status variable. These four variables are the only ones you need to access from your code. Everything else that follows is internal to the fast-loader.
fastload_filename:
*=*+16
fastload_filename_len:
!byte 0
fastload_address:
!byte 0,0,0,0
fastload_request:
;; Start with seeking to track 0
!byte 4
This variable keeps track of which physical track on the disk the loader thinks the head is currently over, so that we can step to the correct track:
fl_current_track: !byte 0
Then we have variables for the logical track and sector of the next 256 byte block of the file. These have to get translated into the physical track and sector of the drive, which like the 1581, stores two blocks in each physical sector.
fl_file_next_track: !byte 0
fl_file_next_sector: !byte 0
Then finally, we have the 512 byte sector buffer. Now, this could be optimised away, by enabling mapping of the sector buffer at $DE00-$DFFF, but I couldn't be bothered remembering how to do that, and also didn't want to cause potential problems for code that also uses REU emulation or other things that might appear in the IO area. It's not that it can't be done, but rather that I just took the quick and easy path. It would be a great exercises for the reader to change this, and reduce the total size of the loader to <1KB as a result.
fastload_sector_buffer:
*=*+512
Now let's take a look at the fast-loader's IRQ handler. It basically checks if there is an active request, and if not does nothing. Then it checks if the floppy controller is busy doing something that it asked it to earlier. If so, it does nothing. But if we have an active job, and the floppy controller is not busy, this means that we can ask for the next operation to occur. The fastload_request variable doubles as the state number for the resulting simple state-machine. This approach really simplifies the code a lot, and makes it much easier to run in an interrupt.
Before going further, it is worth noting that if you run the interrupt on a normal raster IRQ, the loader will be able to load at most one block = 254 bytes of usable data per frame. This means 254 x 50 = ~12.7KB/sec in PAL or 15.2KB/sec in NTSC. If you are using a real 800KB 1581 disk, that's not a problem, because the drive will slow you down more than that. But if you are using a disk image, or one of the MEGA65's HD disk formats, then this will slow things down.
The easy solution is to have your IRQ routine trigger multiple times per frame, or enable IRQs in the floppy controller, and have it be called on demand whenever a sector is ready. You will need to acknowledge the floppy controller interrupts, if you do that.
There is also a further ~2x speed up without doing that which is possible by modifying the loader to realise when a single sector contains two consecutive blocks of a file. It doesn't currently do this, which is a bit stupid. Fixing that would also be a great exercise for the reader.
fastload_irq:
;; If the FDC is busy, do nothing, as we can't progress.
;; This really simplifies the state machine into a series of
;; sector reads
lda fastload_request
bne todo
rts
todo:
lda $d082
bpl fl_fdc_not_busy
rts
fl_fdc_not_busy:
;; FDC is not busy, so check what state we are in
lda fastload_request
bpl fl_not_in_error_state
rts
fl_not_in_error_state:
It's worth explaining how the IRQ handler calls the various routines for the different states, because it uses a nice feature of the 65CE02: JMP indirect, X-indexed. This instruction basically allows you to have a jump-table without the silly push-addr-minus-one to stack trick you have to use on the C64. The resulting code is quite a lot simpler and clearer as a result:
;; Shift state left one bit, so that we can use it as a lookup
;; into a jump table.
;; Everything else is handled by the jump table
cmp #6
bcc fl_job_ok
;; Ignore request/status codes that don't correspond to actions
rts
fl_job_ok:
asl
tax
jmp (fl_jumptable,x)
fl_jumptable:
!16 fl_idle
!16 fl_new_request
!16 fl_directory_scan
!16 fl_read_file_block
!16 fl_seek_track_0
!16 fl_step_track
The first of those state routines is the one for when the loader is idle: Just return immediately. This can be optimised away, since there are (1) plenty of other RTS instructions we could point at; and (2) because it never gets called, because we have the short-circuit exit at the start of the IRQ handler. If you haven't already gotten the idea by now, you can tell that I have really just hacked this together until it works, and then stopped to document it. Lots of opportunities for you to get involved and improve it ;)
fl_idle:
rts
The next state handler checks if we are on track 0 yet, and if not, commands a step towards track 0, which like all other floppy controller actions, will have the floppy controller busy until the step has completed. Again, our nice busy check in the start of the IRQ handler means that we can just keep stepping in this routine until we reach track 0. Note how it writes $00 into fastload_request when done, to indicate that the loader is idle and ready for a new job.
fl_seek_track_0:
lda $d082
and #$01
bne fl_not_on_track_0
lda #$00
sta fastload_request
sta fl_current_track
rts
fl_not_on_track_0:
;; Step back towards track 0
lda #$10
sta $d081
rts
As you saw in the demo driver code, to submit a new job, you write $01 into fastload_request. This causes the following routine to be run when the IRQ is next triggered. It puts $02 into fastload_request, so that it knows that it has just accepted a job, and also immediately requests the reading of the first physical sector that contains a directory block, ready for us to look for the requested file.
fl_new_request:
;; Acknowledge fastload request
lda #2
sta fastload_request
;; Start motor
lda #$60
sta $d080
;; Request T40 S3 to start directory scan
;; (remember we have to do silly translation to real sectors)
lda #40-1
sta $d084
lda #(3/2)+1
sta $d085
lda #$00
sta $d086 ; side
;; Request read
jsr fl_read_sector
rts
The above set fastload_request to call this routine on each IRQ, i.e., as each sector of the directory is loaded. We then look through the whole 512 byte sector for a matching filename, and if found, change state to load the file from the logical track and sector of the first block of the file as obtained from the directory listing. Note that we ignore the file type, including if the file is deleted. Again, a great opportunity for someone to improve the loader.
fl_directory_scan:
;; Check if our filename we want is in this sector
jsr fl_copy_sector_to_buffer
;; (XXX we scan the last BAM sector as well, to keep the code simple.)
;; filenames are at offset 4 in each 32-byte directory entry, padded at
;; the end with $A0
lda #<fastload_sector_buffer
sta fl_buffaddr+1
lda #>fastload_sector_buffer
sta fl_buffaddr+2
fl_check_logical_sector:
ldx #$05
fl_filenamecheckloop:
ldy #$00
fl_check_loop_inner:
fl_buffaddr:
lda fastload_sector_buffer+$100,x
cmp fastload_filename,y
bne fl_filename_differs
inx
iny
cpy #$10
bne fl_check_loop_inner
;; Filename matches
txa
sec
sbc #$12
tax
lda fl_buffaddr+2
cmp #>fastload_sector_buffer
bne fl_file_in_2nd_logical_sector
;; Y=Track, A=Sector
lda fastload_sector_buffer,x
tay
lda fastload_sector_buffer+1,x
jmp fl_got_file_track_and_sector
fl_file_in_2nd_logical_sector:
;; Y=Track, A=Sector
lda fastload_sector_buffer+$100,x
tay
lda fastload_sector_buffer+$101,x
fl_got_file_track_and_sector:
;; Store track and sector of file
sty fl_file_next_track
sta fl_file_next_sector
;; Request reading of next track and sector
jsr fl_read_next_sector
;; Advance to next state
lda #3
sta fastload_request
rts
fl_filename_differs:
;; Skip same number of chars as though we had matched
cpy #$10
beq fl_end_of_name
inx
iny
jmp fl_filename_differs
fl_end_of_name:
;; Advance to next directory entry
txa
clc
adc #$10
tax
bcc fl_filenamecheckloop
inc fl_buffaddr+2
lda fl_buffaddr+2
cmp #(>fastload_sector_buffer)+1
bne fl_checked_both_halves
jmp fl_check_logical_sector
fl_checked_both_halves:
;; No matching name in this 512 byte sector.
;; Load the next one, or give up the search
inc $d085
lda $d085
cmp #11
bne fl_load_next_dir_sector
;; Ran out of sectors in directory track
;; (XXX only checks side 0, and assumes DD disk)
;; Mark load as failed
lda #$80 ; $80 = File not found
sta fastload_request
rts
We now have several little utility routines related to reading sectors from the disk, including doing the conversion from 1581 logical sectors to 3.5" floppy physical sectors, and tracking the head if we aren't on the correct track already etc. If it detects that it needs to step the head, it changes fastload_request to point to a handler for that, which in turn sets it back to the handler for reading blocks of the file.
Note that I haven't actually tried this on a real disk, yet. This should be done, as there will quite likely be some subtle problem that will need shaking out, most likely with the track stepping. But it shouldn't be too hard to fix, and who knows, I might have got it right the first time ;)
fl_load_next_dir_sector:
;; Request read
jsr fl_read_sector
;; No need to change state
rts
fl_read_sector:
;; Check if we are already on the correct track/side
;; and if not, select/step as required
lda #$40
sta $d081
rts
fl_step_track:
lda #3
sta fastload_request
;; FALL THROUGH
fl_read_next_sector:
;; Check if we reached the end of the file first
lda fl_file_next_track
bne fl_not_end_of_file
rts
fl_not_end_of_file:
;; Read next sector of file
jsr fl_logical_to_physical_sector
lda fl_current_track
lda $d084
cmp fl_current_track
beq fl_on_correct_track
bcc fl_step_in
fl_step_out:
;; We need to step first
lda #$18
sta $d081
inc fl_current_track
lda #5
sta fastload_request
rts
fl_step_in:
;; We need to step first
lda #$10
sta $d081
dec fl_current_track
lda #5
sta fastload_request
rts
fl_on_correct_track:
jsr fl_read_sector
rts
Here we have another utility routine that does the logical-to-physical track and sector conversion. Again, this basically mirrors what the 1581 does. It will need modifying to use the fast-loader on HD disks, because there will be more sectors on each side of the disk.
fl_logical_to_physical_sector:
;; Convert 1581 sector numbers to physical ones on the disk.
;; Track = Track - 1
;; Sector = 1 + (Sector/2)
;; Side = 0
;; If sector > 10, then sector=sector-10, side=1
lda #$00 ; side 0
sta $d086
lda fl_file_next_track
dec
sta $d084
lda fl_file_next_sector
lsr
inc
cmp #10
bcs fl_on_second_side
sta $d085
jmp fl_set_fdc_head
fl_on_second_side:
sec
sbc #10
sta $d085
lda #1
sta $d086
;; FALL THROUGH
fl_set_fdc_head:
;; Select correct side of real disk drive
lda $d086
asl
asl
asl
and #$08
ora #$60
sta $d080
rts
This is the routine that really does the loading: It gets the read physical sector, works out which half of it contains the data for us, DMAs the read bytes into the destination location in memory, and then follows the block chain to the next block of the file, and detects the end-of-file marker indicated by logical track = $00.
fl_read_file_block:
;; We have a sector from the floppy drive.
;; Work out which half and how many bytes,
;; and copy them into place.
;; Get sector from FDC
jsr fl_copy_sector_to_buffer
;; Assume full sector initially
lda #254
sta fl_bytes_to_copy
;; Work out which half we care about
lda fl_file_next_sector
and #$01
bne fl_read_from_second_half
fl_read_from_first_half:
lda #(>fastload_sector_buffer)+0
sta fl_read_dma_page
lda fastload_sector_buffer+1
sta fl_file_next_sector
lda fastload_sector_buffer+0
sta fl_file_next_track
bne fl_1st_half_full_sector
fl_1st_half_partial_sector:
lda fastload_sector_buffer+1
sta fl_bytes_to_copy
;; Mark end of loading
lda #$00
sta fastload_request
fl_1st_half_full_sector:
jmp fl_dma_read_bytes
fl_read_from_second_half:
lda #(>fastload_sector_buffer)+1
sta fl_read_dma_page
lda fastload_sector_buffer+$101
sta fl_file_next_sector
lda fastload_sector_buffer+$100
sta fl_file_next_track
bne fl_2nd_half_full_sector
fl_2nd_half_partial_sector:
lda fastload_sector_buffer+$101
sta fl_bytes_to_copy
;; Mark end of loading
lda #$00
sta fastload_request
fl_2nd_half_full_sector:
;; FALLTHROUGH
fl_dma_read_bytes:
;; Update destination address
lda fastload_address+3
asl
asl
asl
asl
sta fl_data_read_dmalist+2
lda fastload_address+2
lsr
lsr
lsr
lsr
ora fl_data_read_dmalist+2
sta fl_data_read_dmalist+2
lda fastload_address+2
and #$0f
sta fl_data_read_dmalist+12
lda fastload_address+1
sta fl_data_read_dmalist+11
lda fastload_address+0
sta fl_data_read_dmalist+10
;; Copy FDC data to our buffer
lda #$00
sta $d704
lda #>fl_data_read_dmalist
sta $d701
lda #<fl_data_read_dmalist
sta $d705
;; Update load address
lda fastload_address+0
clc
adc fl_bytes_to_copy
sta fastload_address+0
lda fastload_address+1
adc #0
sta fastload_address+1
lda fastload_address+2
adc #0
sta fastload_address+2
lda fastload_address+3
adc #0
sta fastload_address+3
;; Schedule reading of next block
jsr fl_read_next_sector
rts
We are now almost at the end. What we have here is the DMA lists for copying the read data to its final destination, as well as the routine and DMA list for copying a physical sector from the FDC's buffer down to fastload_sector_buffer. As previously noted, we can probably shrink the whole thing (and make it use less raster time) by avoiding that copy, if we instead fiddle the IO banking to make the floppy sector buffer map at $DE00-$DFFF (there is a special bit that enables this). But what we have here works, and isn't that much slower, as the DMA doesn't take very long.
fl_data_read_dmalist:
!byte $0b ; F011A type list
!byte $81,$00 ; Destination MB
!byte 0 ; no more options
!byte 0 ; copy
fl_bytes_to_copy:
!word 0 ; size of copy
fl_read_page_word:
fl_read_dma_page = fl_read_page_word + 1
;; +2 is to skip track/header link
!word fastload_sector_buffer+2 ; Source address
!byte $00 ; Source bank
!word 0 ; Dest address
!byte $00 ; Dest bank
!byte $00 ; sub-command
!word 0 ; modulo (unused)
rts
fl_copy_sector_to_buffer:
;; Make sure FDC sector buffer is selected
lda #$80
trb $d689
;; Copy FDC data to our buffer
lda #$00
sta $d704
lda #>fl_sector_read_dmalist
sta $d701
lda #<fl_sector_read_dmalist
sta $d705
rts
fl_sector_read_dmalist:
!byte $0b ; F011A type list
!byte $80,$ff ; MB of FDC sector buffer address ($FFD6C00)
!byte 0 ; no more options
!byte 0 ; copy
!word 512 ; size of copy
!word $6c00 ; low 16 bits of FDC sector buffer address
!byte $0d ; next 4 bits of FDC sector buffer address
!word fastload_sector_buffer ; Dest address
!byte $00 ; Dest bank
!byte $00 ; sub-command
!word 0 ; modulo (unused)
And that's it. The loader really is quite simple, especially compared with a 1541 fast-loader. You can find the source in https://github.com/mega65/mega65-tools, just look for fastload-demo.asm.
Finally, a somewhat arbitrary screen-shot, because every blog post requires at least one, but its kind of hard to show a fast-loader in action in a still image.