Monday 1 August 2016

Revenge of the Decimal Flag

[Deutsch Übersetzen unten]

I first learned to hate the 6502's decimal flag when I was about 17.  I was still at school, and was offered a job of writing the software for a dual-6502 based 30' (8+m) industrial roll former (the machine that puts the corrugations into corrugated iron sheeting).  If was a bit older and wiser, I might have thought twice about it, but at the time it seemed like a great idea. Anyway, the process went surprisingly well except for one intermittent bug: Sometimes it wouldn't count the distances out correctly.

This caused a hair raising last 3 days before the blasted thing was due to be shipped off to Argentina while we tried to track down the source.  The cause turned out to be that the 6502, while specified in the data-sheet to start up with the decimal flag clear, starts up, in fact, with the decimal flag only usually clear.  Needless to say, one of the absolute first things that I did with the MEGA65 was make sure that its CPU always starts up with the decimal flag clear.  End of problem. Well, so I thought...

Some of you will be aware that I have been trying to track down the source of some nasty bugs in the Hypervisor, where making trap calls into the Hypervisor would sometimes fail for seemingly inexplicable reasons.  This came to a head when I tried to build the new Disk Menu program into the Hypervisor ROM, instead of the amazingly horrible diskchooser thing that I cobbled together back in the beginning.

With the Disk Menu built in (running in user mode, not in the Hypervisor, but installed automatically at $C000 by the Hypervisor), the trap bug was happening consistently.  Big problem.  Especially since months of looking at it intermittently failed to provide a simple explanation.

Today in the lab we had a breakthrough.  Ben pointed out that the Hypervisor checkpoint debug system was displaying a corrupted message when we tried to debug the Disk Menu program.  This gave us a clue that I started to follow, and with a bit of poking around and following the single-step trace output (Gurce, please feel free to make that program that can show the instruction disassembly for the serial monitor as soon as you like :), I realised that the Hypervisor was incorrectly calculating the skip address when stepping over the message for a checkpoint.  It was adding #$01 to #$1D and getting #$24 as the result.

I was about to start pulling my hair out and wonder exactly what had gone so badly to pot that the ALU was now not even able to compute a simple addition.  Then some little neuron in the back of my mind told me that it looked like the result of a Binary Coded Decimal (BCD) addition.  I then cast my eyes to the right on the trace output to look at the CPU status register, and sure enough, there was the evil "D" staring at me: The CPU was in decimal mode! In the Hypervisor!

It then occurred to me that when the CPU traps to the hypervisor, it preserves the D flag's value, as well as saving it in the hyper_p register for restoration on exit.  A single line change to the code there has fixed that (and I also added a CLD instruction to the main Hypervisor trap entry point because I am paranoid).  Then it was time to go home, so we will have to try it out in the morning.  Hopefully it will let us finally get the Disk Menu program working, which will be a very big step forward.

This still leaves the mystery of how the D flag got set in the processor flags to begin with.  It really shouldn't have, as the Disk Menu program doesn't use decimal mode, either.  That will have to wait until tomorrow as well to be investigated.

For now, I am just happy that we have finally found the main bug, and can hopefully start moving forward again.

--

Ich war nur 17, wenn ich erst gelernt das "Dezimal Modus" Fahne des 6502s zu hassen. Ich war noch in der Schule und bekam ein Job, das Software ein 8,7m lang Roll Former Maschine zu schreiben. Es benutzt ein doppel-6502 Platte und hatte nur 8KB RAM pro CPU. Wenn ich weiser war, dann würde ich es nicht akzeptiert haben.  Alles ging ziemlich gut. Dass ist, bis wir hatten nur drei Tage, es zu fertigstellen, bevor es an Bord ein Schiff nach Argentinien muss. Alles funktioniert gut, außer dass manchmal zählt es die Lange falsch. Endlicht entdeckt uns, dass nur meistens gestartet ein 6502 mit dem D-Fahne leer, obwohl das Data-Sheet sagt es immer ohne D-Fahne starten würde.

Natürlich mit dem MEGA65, einige die erste Sache, dass ich gemacht hatte, war die D-Fahne aus am Reset machen. Ich hatte gedacht, dass alles war am Ende mit der blöden D-Fahne. Aber das war nicht so.

Wir haben für ein paar Monaten ein böse Bug im Hypervisor gekriegt. Manchmal, wenn eine Programme ein Hypervisor-Trap gerufen, dann wird es falsch gehen.  Ab und zu hatten wir es untersucht ohne Erfolg. Dann Heute im unseren Labor hat Ben bemerkt, dass wann wir die neue Disk Menu Programme debuggt, dass eine von Checkpoint Nachrichten war immer beschädigt. Endlich hatten wir ein reproduzierbar Bug, dass wir könnten untersuchen.

Nur eine Stunde später hatte ich entdeckt, dass die blöde D-Fahne war auf. Im Hypervisor! Ich war gar nicht glücklich.  Aber wie bekam die D-Fahne auf?  Dann erkannte ich, dass ich der D-Fahne Wert konserviert, wenn ein Hypervisor Trap auftritt.  Es braucht nur Eine Linie zu reparieren, nach wir hatten das Problem entdeckt. Aber es war schon Feierabend. Deshalb müssen wir Morgen prüfen, wenn es das Problem wirklich repariert.

No comments:

Post a Comment