Sunday 3 January 2016

Making Hypervisor debugging easier

The hypervisor has bugs. This isn't really a surprise.

The problem is how to make it easy and efficient to debug the hypervisor.  I can already put break-points on the CPU via the serial monitor, however, that is a rather frustrating process, partly because the assembler I am using doesn't emit a nice symbol table to let me easily work out where to break.  So I wanted an easier approach.

I know there is a bug in the DOS function for opening and reading from a directory from user-land, which makes the whole machine reset back to kickstart trying to boot the machine.  This was the catalyst for me to find a nicer way.

So what I have done is to implement a checkpoint function in the hypervisor, which writes characters directly to the serial monitor interface.  With that, I could easily emit messages indicating where in the hypervisor the code has reached, and thus help to track down where things are going west.

I wanted this checkpoint system to be as simple to use as possible, so that I could easily add and remove checkpoints throughout the hypervisor code, but at the same time I want the checkpoint routine to show useful information, such as the address it was called from, and the current contents of the major registers.  So with something like:

   jsr checkpoint

I want it to produce output on the serial interface like:

Checkpoint @ $886C A:$0C, X:$04, Y:$10, Z:$03, P:$34 :

That would be a big step forward, but what I would really like, is to be able to attach text messages to each checkpoint, so that I can very easily relate them back to the source code, e.g.:

   jsr checkpoint
   .byte 0,"dos_readdir <1>",0

Should generate something like:

Checkpoint @ $886C A:$0C, X:$04, Y:$10, Z:$03, P:$34 :dos_readdir <1>  

That way, with checkpoints throughout the code, I should easily be able to generate progress logs of the hypervisor like the following, which traces a single call to dos_readdir:

Checkpoint @ $886C A:$0C, X:$04, Y:$10, Z:$03, P:$34 :dos_readdir <1>           
Checkpoint @ $8885 A:$00, X:$04, Y:$10, Z:$03, P:$37 :dos_readdir <2>
Checkpoint @ $889C A:$02, X:$00, Y:$1A, Z:$03, P:$37 :dos_readdir <3>
Checkpoint @ $88DC A:$80, X:$10, Y:$1A, Z:$03, P:$37 :drd_isdir
Checkpoint @ $88F4 A:$00, X:$FF, Y:$1A, Z:$03, P:$B5 :drce1
Checkpoint @ $8904 A:$02, X:$00, Y:$1A, Z:$03, P:$37 :drce1 <2>           
Checkpoint @ $8915 A:$81, X:$00, Y:$1A, Z:$03, P:$B5 :drce1 <3>           
Checkpoint @ $8923 A:$81, X:$00, Y:$1A, Z:$03, P:$B5 :drce_next_piece <1>
Checkpoint @ $8960 A:$00, X:$1B, Y:$0C, Z:$03, P:$37 :drce_next_piece <2>
Checkpoint @ $8986 A:$00, X:$00, Y:$0C, Z:$03, P:$36 :drce_next_piece <3>
Checkpoint @ $8A1A A:$00, X:$01, Y:$03, Z:$04, P:$36 :drce_ignore_lfn_piece
Checkpoint @ $8A39 A:$C0, X:$01, Y:$10, Z:$04, P:$B5 :drce_ignore_lfn_piece <2>
Checkpoint @ $8923 A:$C0, X:$01, Y:$10, Z:$04, P:$B5 :drce_next_piece <1>
Checkpoint @ $8960 A:$00, X:$1B, Y:$0C, Z:$04, P:$37 :drce_next_piece <2>
Checkpoint @ $8986 A:$00, X:$00, Y:$0C, Z:$04, P:$36 :drce_next_piece <3>
Checkpoint @ $89B8 A:$67, X:$05, Y:$0B, Z:$00, P:$36 :drce2
Checkpoint @ $89DA A:$2E, X:$0B, Y:$1A, Z:$00, P:$36 :drce3
Checkpoint @ $89FC A:$38, X:$0D, Y:$20, Z:$00, P:$36 :drce4
Checkpoint @ $8A1A A:$00, X:$0D, Y:$20, Z:$00, P:$36 :drce_ignore_lfn_piece
Checkpoint @ $8A39 A:$E0, X:$0D, Y:$10, Z:$00, P:$B5 :drce_ignore_lfn_piece <2>
Checkpoint @ $8923 A:$E0, X:$0D, Y:$10, Z:$00, P:$B5 :drce_next_piece <1>
Checkpoint @ $8A77 A:$22, X:$1B, Y:$0B, Z:$00, P:$35 :drce_shortname
Checkpoint @ $8A96 A:$5F, X:$1B, Y:$00, Z:$00, P:$34 :drce_shortname <2>
Checkpoint @ $8ABA A:$31, X:$0B, Y:$0B, Z:$00, P:$37 :drce5
Checkpoint @ $8B28 A:$0D, X:$0B, Y:$0B, Z:$00, P:$35 :drce_already_have_long_name
Checkpoint @ $8B78 A:$22, X:$04, Y:$0B, Z:$00, P:$35 :drce_fl <1>
Checkpoint @ $8BAB A:$01, X:$04, Y:$10, Z:$00, P:$37 :drce_noteof

The source code for part of the hypervisor that produces the above log has simple and easy to read (and later remove) checkpoint annotations:

; Get the current file entry, and advance pointer
; This requires parsing the current directory entry onwards, accumulating
; long filename parts as required.  We only support filenames to 64 chars,
; so long names longer than that will get ignored.
; LFN entries have an attribute byte of $0F (normally indicates volume label)
; LFN entries use 16-bit unicode values. For now we will just keep the lower
; byte of these

jsr checkpoint
.byte 0,"dos_readdir <1>",0

; clear long file name data from last call
lda #0
sta dos_dirent_longfilename_length

jsr checkpoint
.byte 0,"dos_readdir <2>",0
jsr dos_file_read_current_sector

jsr checkpoint
.byte 0,"dos_readdir <3>",0

ldx dos_current_file_descriptor_offset
lda [dos_file_descriptors+dos_filedescriptor_offset_mode],x
cmp #dos_filemode_directoryaccess
beq drd_isdir
cmp #dos_filemode_end_of_directory
bne drd_notadir

jsr checkpoint
.byte 0,"dos_readdir <4>",0

The checkpoint function itself is not too complex. The main things it needs to do is to save the major registers into some scratch space, pop the return address of the stack, so that we can show the call address in the output, and also update the return address to skip any message to be displayed.  

Other than that, it is mostly focused on preparing the message to be written to the serial monitor interface.  To keep the hardware simple, the serial monitor doesn't look after the timing between characters, so if the checkpoint routine sends characters too quickly, they will be lost. Thus the checkpoint routine has a loop to that waits between sending successive characters.  Since each checkpoint message is about 100 bytes long, and the serial monitor runs at 230400bps, this means we should be able to output 200 checkpoint messages per second, if we get the delay loop exactly right.

For those with a morbid interest in these things (or who would like to spot bugs for me), here is the checkpoint function in its entirety:

; Routine to record the progress of code through the hypervisor for
; debugging problems in the hypervisor.
; If the JSR checkpoint is followed by $00, then a text string describing the
; checkpoint is inserted into the checkpoint log.
; Checkpoint data is recorded in the 2nd 16KB of colour RAM.

; Save all registers and CPU flags
sta checkpoint_a
stx checkpoint_x
sty checkpoint_y
stz checkpoint_z
sta checkpoint_p

; pull PC return address from stack
; (JSR pushes return_address-1, so add one)
adc #$01
sta checkpoint_pcl
adc #$00
sta checkpoint_pch

; Only do checkpoints visibly if switch 12 is set
lda $d6f1
and #$10
beq cp9

inc $d020

; Write checkpoint byte values out as hex into message template
ldx checkpoint_a
jsr checkpoint_bytetohex
sty msg_checkpoint_a+0
stx msg_checkpoint_a+1
ldx checkpoint_x
jsr checkpoint_bytetohex
sty msg_checkpoint_x+0
stx msg_checkpoint_x+1
ldx checkpoint_y
jsr checkpoint_bytetohex
sty msg_checkpoint_y+0
stx msg_checkpoint_y+1
ldx checkpoint_z
jsr checkpoint_bytetohex
sty msg_checkpoint_z+0
stx msg_checkpoint_z+1
ldx checkpoint_p
jsr checkpoint_bytetohex
sty msg_checkpoint_p+0
stx msg_checkpoint_p+1
ldx checkpoint_pch
jsr checkpoint_bytetohex
sty msg_checkpoint_pc+0
stx msg_checkpoint_pc+1
ldx checkpoint_pcl
jsr checkpoint_bytetohex
sty msg_checkpoint_pc+2
stx msg_checkpoint_pc+3

; Clear out checkpoint message
ldx #39
lda #$20
cp4: sta msg_checkpointmsg,x
bpl cp4
; Read next byte following the return address to see if it is $00,
; if so, then also store the $00-terminated text message that follows.
; e.g.:
; jsr checkpoint
; .text 0,"OPEN DIRECTORY",0
; to record a checkpoint with the string "OPEN DIRECTORY"
ldy #$00
lda (<checkpoint_pcl),y

bne nocheckpointmessage

; Copy null-terminated checkpoint string
ldx #$00
cp3: lda (<checkpoint_pcl),y
beq endofcheckpointmessage
sta msg_checkpointmsg,x
cpy #40
bne cp3

; Skip $00 at end of message

; Advance return address following any checkpoint message
adc checkpoint_pcl
sta checkpoint_pcl
lda checkpoint_pch
adc #$00
sta checkpoint_pch

; Only do checkpoints visibly if switch 12 is set
lda $d6f1
and #$10
beq checkpoint_return

; output checkpoint message to serial monitor
ldx #0
cp5: lda msg_checkpoint,x
sta hypervisor_write_char_to_serial_monitor

; delay at least 2,000 cycles to allow character to be written
; each inner loop is 2 + 256 * (2+3) = ~1,250 cycles
; so 2 such loops should take long enough
ldy #2
ldz #0
cp6: inz
bne cp6
bpl cp6

cmp #10
bne cp5
; restore registers
lda checkpoint_p
lda checkpoint_a
ldx checkpoint_x
ldy checkpoint_y
ldz checkpoint_z

; return by jumping to the 
jmp (checkpoint_pcl)

and #$f0
jsr checkpoint_nybltohex
and #$0f
jsr checkpoint_nybltohex
and #$0f
ora #$30
cmp #$3a
bcs cpnth1
cpnth1: adc #$06

; checkpoint message
msg_checkpoint:      .byte "Checkpoint @ $"
msg_checkpoint_pc:    .byte "%%%% A:$"
msg_checkpoint_a:     .byte "%%, X:$"
msg_checkpoint_x:     .byte "%%, Y:$"
msg_checkpoint_y:     .byte "%%, Z:$"
msg_checkpoint_z:     .byte "%%, P:$"
msg_checkpoint_p:     .byte "%% :"
msg_checkpointmsg:    .byte "                                        "
     .byte 13,10  ; CR/LF