Tuesday 12 March 2019

Speeding up formatting and freezing with multi-sector SD card writes

The MEGA65 format utility can take a terribly long time to run, making you feel rather much more like you are in the 1980s than you might like to, especially when formatting a large capacity 64GB SDXC card or similar.   This is because until now we have not supported multi-sector writes to the SD card.  As a result, the SD card controller must read an entire flash block (often 64KB or larger), modify the 512 byte sector we have written, erase the entire 64KB block in the flash, and then put the new data back down.  The end result is that it is about 10x slower or worse than it could be.

This problem also affects the freeze menu that has to save ~512KB of data every time you freeze.  Because the SD cards are smart internally, they tend to ignore writes with unchanged data, but even then, you have the penalty of the ~64KB block read internally in the card.  Thus freezing a problem that had very different memory contents than the last frozen program could take a number of seconds.  This was quite annoying when all you might want to do in the freeze menu is toggle between PAL and NTSC, for example.

Thus I figured the time had come to attack the root cause of these problems and implement multi-sector writes on the SD card.  But first I had to figure out exactly how they work. This was not as easy as it sounded, as there is a lot of contradictory information out there.

The only certain thing, was that CMD25 should be used instead of CMD24 to initiate a multi-block write. Some sources said you should also use a CMD18 to say how many blocks you are going to write, and others said you should use a CMD23 after to flush the SD card's internal cache.  Then everyone seemed to disagree on the actual mechanics of multi-sector writes.

What I figured out was:  You don't need CMD18 or CMD23, just CMD25.  Then use the 0xFC data start token instead of 0xFE when writing the first sector. Then just keep writing extra sectors, this time with an 0xFC token until done, after which you should write an 0xFD token (without a sector of data following it) to mark the end of the multi-sector write.  If you try to write data after the 0xFD, then really bad/weird things happen that require you to physically power things off and on again.

In the end, the main part of the change to the low-level SD card controller was quite small:

if write_multi = '0' then
+                -- Normal CMD24 write
                 txCmd_v := WRITE_BLK_CMD_C & addr_i & FAKE_CRC_C;  -- Use address supplied by host.
-                addr_v  := unsigned(addr_i);  -- Store address for multi-block operations.
+                state_v    := START_TX;  -- Go to this FSM subroutine to send the command ...
+                rtnState_v := WR_BLK;  -- then go to this state to write the data block.
+              elsif write_multi = '1' and write_multi_first='1' then
+                -- First block of multi-block write
+                -- We should begin things with CMD25 instead of CMD24, but
+                -- then we don't need to repeat it for each extra block
+                txCmd_v := WRITE_MULTI_BLK_CMD_C & addr_i & FAKE_CRC_C;  -- Use address supplied by host.
+                state_v    := START_TX;  -- Go to this FSM subroutine to send the command ...
+                rtnState_v := WR_BLK;  -- then go to this state to write the data block.
+              else
+                -- We are in a multi-block write.  So just go direct to WR_BLK
+                state_v    := WR_BLK;       -- Go to this FSM subroutine to send the command ...               
               end if;

Essentially the change was just to write the correct command for starting a multi-block write, and to write subsequent blocks with the correct data token at the start, and without re-sending any CMD25 or anything else. I actually managed to get that part correct on almost the first go, once I had figured the mechanics out, which rather surprised me.  But it took quite a bit more effort to get FDISK/FORMAT and the hypervisor freeze routines to work properly with it.  Indeed there are still a few wrinkles to work out. For example, when copying the frozen program to a new freeze slot, trying to use multi-sector writes causes things to go haywire. So for now, I have just disabled that.

That said, the changes to the various programs are actually quite small as well.  It's just the usual problem of writing assembly language and low-level C code that works reliably.  Add in the fun of CC65 not telling you when you have run out of memory during compilation, and life remains "interesting".  For example, here are the changes to the freeze routine:

       jsr copy_sdcard_regs_to_scratch

        ; Save current SD card sector buffer contents
-       jsr freeze_write_sector_and_wait
+       jsr freeze_write_first_sector_and_wait

        ; Save each region in the list
        ldx #$00
@@ -31,6 +31,8 @@ freeze_next_region:
        cmp #$ff
        bne freeze_next_region

+       jsr freeze_end_multi_block_write

        jsr sd_wait_for_ready_reset_if_required

-       ; Trigger the write
-       lda #$03
+       ; Trigger the write (subsequent sector of multi-sector write)
+       lda #$05
        sta $d680

        jsr sd_wait_for_ready
@@ -168,6 +206,13 @@ freeze_write_sector_and_wait:

+       jsr sd_wait_for_ready
+       lda #$06
+       sta $d680
+       jsr sd_wait_for_ready
+       rts

In short, use the new commands $04 (begin multi-sector write), $05 (write next sector of multi-sector write) and $06 (finish multi-sector write) when writing to $D680, the SD card control register. The changes to FDISK/FORMAT were similarly simple, again, just changing the bulk-erase function to use these new commands.  While I was there, I also implemented a little SD card reading speed test, because I was curious how fast (or slow) the SD card was going.  My 8GB class 4 card can read more than 700KB/second. One day I will implement the 4-wire interface, which will allow speeds 8x faster, but for now 700KB/sec is ample.

All up, the result is already very pleasing:  Formatting an SD card is easily 10x faster, taking less than 1 minute for an 8GB card now, instead of 10 minutes or more.  You can see how fast it is in this video:

Freezing is now also MUCH faster, reliably now only about 1 second to get to the menu on my machine here.  The biggest delay when freezing is now waiting for the monitor to resync if the frozen program was PAL (as the freeze menu is always 60Hz NTSC for better monitor compatibility).  You can see how fast the freeze menu is in the videos below:

This time, we will have the English video first, and the German one below:

Und auf Deutsch:

No comments:

Post a Comment