Sunday, 2 May 2021

Debugging the 32-bit virtual-register instructions

The MEGA65 has a function where you can use A,X,Y and Z together as a 32-bit virtual register, so that 32-bit operations can be done much less painfully.

For example, to add two 32-bit values, on a 6502 you need:

CLC
LDA val1+0
ADC val2+0
STA out+0
LDA val1+1
ADC val2+1
STA out+1
LDA val1+2
ADC val2+2
STA out+2
LDA val1+3
ADC val2+3STA out+3

That's a lot of instructions and CPU cycles, and plenty of chance to get copy-paste errors as you do the carry through the various bytes.

What would be nice, is to be able to do:

CLC
LDQ val1
ADQ val2
STQ out

And the MEGA65 makes this possible, by using special prefixes on various instructions. So to do the above, you put the "next instruction is a Q instruction" prefix (two NEG instructions) on the front of the normal version of the instruction, so LDQ val becomes:

NEG
NEG
LDA val


So our whole little 32-bit addition using Q would look like this fully expanded:

CLC
NEG
NEG
LDA val1
NEG
NEG
ADC val2
NEG
NEG
STA out

But you don't need to do this, because most C64 assemblers now support MEGA65's 45GS02 CPU, and will let you just do "ADQ $1234" etc.

So that's all great, except that the instruction implementation on the MEGA65 had some timing closure problems, as it took too long to get the A,X,Y and Z registers, potentially do some 32-bit operation on them with a long carry-chain, and then get the results back to the A,X,Y and Z registers again.

I started hacking away at fixing those problems, which then led to the need for a convenient test harness for verifying that the instructions work correctly.

I ended up writing this using CC65, with a little helper routine in assembly language that tests the instruction.  The helper routine looks like this:

  /* Setup our code snippet:
     SEI
     ; LDQ $0380
     NEG
     NEG
     LDA $0380
     ; Do some Q instruction
     CLC
     NEG
     NEG
     XXX $0384
     ; Store result back
     ; STQ $0388
     NEG
     NEG
     STA $0388
     ; And store back safely as well
     STA $038C
     STX $038D
     STY $038E
     STZ $038F
     CLI
     RTS
   */
unsigned char code_snippet[31]=
  {
   0x78,0x42,0x42,0xAD,0x80,0x03,0x18,0x42,0x42,0x6D,0x84,0x03,0x42,0x42,0x8d,0x88,
   0x03,0x8d,0x8c,0x03,0x8e,0x8d,0x03,0x8c,0x8e,0x03,0x9c,0x8f,0x03,0x60,0x00
  };
#define INSTRUCTION_OFFSET  9                
 

 Then to run a test, we can just mash the right values into $0380-$0387, and check the results in $0388-$038F (or $0384-$0387, if testing an RMW instruction):

  // Run each test
  for(i=0;tests[i].opcode;i++) {
    expected= tests[i].expected;
    // Setup input values
    *(unsigned long*)0x380 = tests[i].val1;
    *(unsigned long*)0x384 = tests[i].val2;
    
    code_buf[INSTRUCTION_OFFSET]=tests[i].opcode;
    __asm__ ( "jsr $0340");
    if (tests[i].rmw) result_q= *(unsigned long*)0x384;
    else result_q= *(unsigned long*)0x388;
    if (result_q!=expected) {
      snprintf(msg,64,"FAIL:#%d:$%02X:%s",
           (int)i,(int)tests[i].opcode,tests[i].instruction);
      print_text(0,line_num++,2,msg);
      snprintf(msg,64,"     Expect=$%08lx, Saw=$%08lx",expected,result_q);
      print_text(0,line_num++,2,msg);
      errors++;
    if (line_num>=23) {
    print_text(0,line_num,8,"TOO MANY ERRORS: Aborting");
    while(1) continue;
      }
    }
  }
  snprintf(msg,64,"%d tests complete, with %d errors.",
       i,errors);
  print_text(0,24,7,msg);

Then the last key part, was to make a simple way to define the tests. I do this using a struct in C, which makes life much easier to add new tests: Just add the appropriate single line to the tests block:

struct test tests[]=
  {
   // ADC - Check carry chain works properly
   {0,0x6d,"ADC",0x12345678,0x00000000,0x12345678},
   {0,0x6d,"ADC",0x12345678,0x00000001,0x12345679},
   {0,0x6d,"ADC",0x12345678,0x00000100,0x12345778},
   {0,0x6d,"ADC",0x12345678,0x00000101,0x12345779},
   {0,0x6d,"ADC",0x12345678,0x000000FF,0x12345777},
   {0,0x6d,"ADC",0x12345678,0x0000FF00,0x12355578},
   {0,0x6d,"ADC",0x12345678,0x0DCBA989,0x20000001},
   // EOR
   {0,0x4d,"EOR",0x12345678,0x12340000,0x00005678},
   {0,0x4d,"EOR",0x12345678,0x00005678,0x12340000},
   // AND
   {0,0x2d,"AND",0x12345678,0x0000FFFF,0x00005678},
   {0,0x2d,"AND",0x12345678,0xFFFF0000,0x12340000},
   // ORA
   {0,0x2d,"AND",0x12340000,0x00005678,0x00000000},
   {0,0x2d,"AND",0x12345600,0x00005678,0x00005600},
   // INC
   {1,0xEE,"INC",0,0x12345678,0x12345679},
   {1,0xEE,"INC",0,0x00000000,0x00000001},
   {1,0xEE,"INC",0,0x00FFFFFF,0x01000000},
   // DEC
   {1,0xCE,"DEC",0,0x12345678,0x12345677},
   {1,0xCE,"DEC",0,0x00000000,0xFFFFFFFF},
   {1,0xCE,"DEC",0,0x00FFFFFF,0x00FFFFFE},
   
   {0,0x00,"END",0,0,0}
  };

This made it all very nice and comfortable to test that the latest bitstream had fixed the known problems with those instructions (more tests for others need to be written still):

And to make sure I wasn't imagining things, I tried it out on an older bitstream that didn't have the corrections in it, and confirmed that it fails horribly, as expected:

So now we can write more tests for the rest of the Q instructions, and make sure that they are all fine.


Saturday, 24 April 2021

Working on a simple ethernet-enabled terminal programme for the MEGA65

There are still some nice C64 BBSs around that can be accessed via the internet.  So the topic came up in conversation about making a PETSCII terminal programme for the MEGA65.  This struck me as a good test for the MEGA65 port of WeeIP I have been grinding away on in the background.

I already had WeeIP to the point where it mostly works, including implementing ARP, DHCP, and DNS facilities, which all seem to work well enough that I can plug the MEGA65 into my Fritzbox router via a pair of ethernet-over-powerline adaptors to reach between my office and router.

The core of a PETSCII terminal programme is really just piping the keyboard input over a TCP/IP socket, and printing out whatever comes back over the socket.  Thus with a little bit of work, I can _sometimes_ get a display like this:

Very exciting and tantalising!

Except that it doesn't always display that, and often displays what we are lovingly calling "Haustiermist" (literally, "pet manure"), in the form of junk characters at the bottom.  This name comes from the cheesy name I came up for the terminal programme, "Haustierbegriff" which is the word-for-word translation of "PET TERM", but purposely using both words in the wrong sense, as a fine example of pseudo "Denglish", because we were talking about Denglish on the MEGA65 Discord server at the time.

So, now I need to investigate the "pet manure" and work out what is going wrong. LGB had previously pointed out this problem with WeeIP, and we both believe it is some faulty piece of packet length handling or similar.  So its time to investigate...

We will start by looking at the most recently received ethernet frame in a connection where the Haustiermist was particularly bad, which will hopefully be the last TCP packet containing the data + manure.  

Digging around in this, I found that the offset to the first byte in the TCP packet was not being calculated properly. It was fixed at 68, rather than adding the length of the Ethernet, IP and TCP header lengths.  Fixing that, we now have increased happiness, with no Haustiermist visible :)

 


This is very nice progress :)

But it doesn't matter if I press the delete key or anything else, nothing seems to happen.  There is also a problem with closing the TCP connection, but that can wait until I get actual interaction working...

I'm currently using the ASCII hardware accelerated keyboard scanner on the MEGA65, i.e., reading $D610 to get each new character. This is nice when you want ASCII, but less ideal for PETSCII, where using the ROM-supplied character input routine is probably better.

Also, it looks like we might still have some problem with the TCP data reception stuff, because when the BBS times out and disconnects, it displays part of the disconnection display, but then shows gibberish.  I'm not really sure of the cause of that yet, and whether it is on my side or something else.  

This is the kind of thing I am seeing:


The junk looks rather repetitive, like some kind of escape code. Or it could just be more "pet poo".

So I installed cgterm and tried that, and that let me connect, register and do everything just fine.  So I know that I can connect to the BBS if all is well... Just not from the MEGA65 yet.

Looking at how it works, the protocol doesn't seem to support many strange escape sequences that would be causing that kind of rubbish on the screen.

What would be nice, is if I could connect to my laptop from the MEGA65, as I could then feed various things and see how it behaves.  However my home router doesn't seem to want to allow connections between wired and wireless clients -- or else there is something else fishy going on.

While I think about potential solutions to that, I might try to work on why sending backspace doesn't work, and try calling the KERNAL to check for keyboard input, to get PETSCII keyboard input working.

In trying to do that, I hit a weird problem where just including conio.h in CC65 would stop the TCP connection from being established.  So I ended up just hammering the C64's keyboard buffer directly, which seems to work, in that it gets the characters.  Here is the little bit of horror:

     // Directly read from C64's keyboard buffer
     if (PEEK(198)) {
       buf[0]=PEEK(631);
       socket_select(s);
       socket_send(buf, 1);
       POKE(0xD020,PEEK(0xD020)+1);
       POKE(198,0);
     }

However the characters still don't actually seem to get received by the BBS.  I'm also seeing the logon banner for the BBS being truncated.  If I remove the above code, it magically works again.  Something VERY weird is going on here... No, actually it just works _sometimes_.  

After several attempts, I did get to the point where the BBS was asking me for my ID:

 

However, no matter what I tried, I could not get it to show any further signs of interaction.

At this point, I am beginning to suspect that my port of WeeIP has some interesting bugs left in it, that I need to fix.  They are very likely my fault, as I did hack things around quite a bit to make it work with the MEGA65's Ethernet controller.

Update: I did manage to get it to accept my ID and start letting me enter my password, but after the first character of the password, it stopped responding:


 

This doesn't really change my view on the most probable cause, which is that the TCP/IP state machine is not correctly handling things, in particular, I suspect that packet loss in one or the other direction is occurring (possibly due to problems in the MEGA65's Ethernet controller, but that's pure speculation right now), and that it never recovers, due to not sending the correct ACK or otherwise.

I thus started digging through to find what is going on.  

First up, I found why I couldn't make it connect to my laptop via the local wifi: There was a bug in weeIP's handling of sending TCP/UDP packets when there is no ARP entry: It marked the packet as sent, even though it couldn't yet be sent.  I fixed this, so that it would retry to send the packet after a short interval, to give the ARP resolution time to run (it was being correctly triggered).  With that done, I can now connect to sockets on my local network. 

That makes debugging MUCH easier from here, as I can just use netcat to listen, and try typing stuff from each end, and seeing what happens.

This revealed that typing on the netcat side always results in output on the MEGA65, although sometimes with delays of several seconds for it to appear, for no reason I can currently figure out.  Maybe some Ethernet frames do get dropped on the MEGA65 side due to CRC errors or something, although I'm pretty sure that is not a problem any more.  

Anyway, that's less of an issue than in the opposite direction, where only sometimes does what is typed get through, and it doesn't back-log, but rather gets lost, if it didn't get through at first. So there is some fairly major bug in weeIP that needs fixing here.

On the plus side, though, once that is fixed, it should work. Also, it means I might be able to try to interact with the BBS by just typing the same key over and over until it does get recognised.

I think part of the problem is that when there is still unacknowledged data, the socket_send() function just replaces the buffer of what needs sending, instead of either: (a) reporting "not yet ready"; or (b) adding new data to the end of the queued data. Either of those would be helpful improvements on the current state of affairs.

It turned out that (a) was the easier option for me to implement. So now if socket_send() realises that there is pending data to be sent, it returns failure. It is up to the caller to then asynchronously retry.  This does get typing text in order to work correctly now, for the most part, and more the point, without losing typed text. Of course, if acknowledging takes too long, and you have over-filled the C64's 10 character keyboard buffer, you will still lose key strokes. But that's easy enough to fix by adding our own intermediate buffer which is much larger.

So... this has made a BIG difference, and I am now basically able to use the BBS to some degree, as we can see here:

However, if the BBS is engaged, then I don't see the engaged message.  I think this is because weeIP doesn't correctly handle the case when there is a FIN or RST flag, but the packet contains data.

This is going to be a little tricky to fix, as we need to simultaneously return a WEEIP_EV_DISCONNECT and WEEIP_EV_DATA.  I might just have to make a new event that covers both, or see if the event codes are bits in a bit field, in which case we can handle it that way.

I also still need to find out why there are patches of rubbish data that are received.  I still don't know if this is some special character codes that I need to support, or whether some graphics characters are shifted when sent, or something like that.

But both of these will have to wait for another day.

I had a little time tonight to investigate, and added support for receiving data with a RST packet, but it still doesn't solve the problem.  I also added a very simple menu to choose from a pre-determined list of BBSs to help testers have some fun, but the latency bug is still there, which makes things a bit annoying.

But still, it led to some fun testing, connecting to some other BBSs:

I think I'll forget about the data on disconnect problem for now, and focus on tracking down the latency bug, so that it becomes much more pleasant and responsive to use.  If nothing else, I don't want people getting the false idea that the MEGA65's Ethernet interface is slow, when it is in reality super fast.

So I have had a little bit of time to think about the latency bug, and begin investigating it. I recall that there was a problem with the Ethernet controller not returning packets immediately, but rather, only releasing a packet when another packet had been received.  That would cause this problem we are seeing, but I need to be sure that that is what is happening.

To check that, I need a simple packet analyser that can run on the MEGA65. WireShark would be nice, but is much too big, so I have started writing "WireKrill" as somthing a bit smaller.  Basically it just displays the first part of each received packet as hex for now:

I'll add some further pretties to it, like being able to correctly decode ICMP PING packets, and display info about the pinger. That way I can then use ping on my laptop to prod the MEGA65, and see if the MEGA65 receives the packets immediately, or delayed by one packet.

Okay, I can now display ICMP ping packets:

That display was caught mid-scroll, thus the funny colours at the bottom etc, but you get the general idea.

I did have a bug where each packet was being received 3 times, which I have figured out how to work around.

But more importantly, having this tool has let me confirm that the Ethernet controller seems to always be one frame late in announcing the reception of frames. This means that the last received frame isn't actually available to the computer until another frame arrives.  This could very easily cause the very laggy behaviour for the BBS interaction.

I was able to confirm that this is the problem, and that by feeding a regular stream of packets to the network that were totally unrelated, I was able to greatly reduce the lag when using the BBS client.

So now I need to work out why the Ethernet controller does this funny delay of one packet before releasing the packets to the CPU-side, or how exactly I am mis-handling the rotation of the ring buffers.  But that will need to wait for another day.

Tuesday, 30 March 2021

Guest Post from Bitshifter: Fixing the Oldest and Nastiest Bug in Commodore BASIC

We again are able to enjoy another guest-post from Bitshifter, as he fixes and enhances the C65 ROMs for use on the MEGA65:

---  

The probably oldest and nastiest bug in Commodore BASIC

A huge part of working on the ROM for the MEGA65, which contains the resident part of the operating system and BASIC interpreter, is Debugging. Not counting the character sets, the source code, written in 45GS02 assembler, has about 30000 lines, separated into the modules kernel, editor, DOS, BASIC and graphics.

The debugging is necessary mainly for two reasons:
1) Programmers, developers, software engineers are humans.
2) A source code, that is not your own, is full of traps, side effects and assumptions, that one is not aware of.
Of course my own code is full of traps and side effects too, but I know them (at least for the next few months).

While working on the ROM, fixing Commodore bugs, optimising code to get free space for extensions and introducing new features (and sometimes new bugs), I write often some hundred lines of assembly code and make changes on existing code. Sometimes these changes seem to work perfectly, some result in crashes and freezes, because of errors and some only seem to work fine in my own test, but fail if a developer named „ubik“ demonstrates situations, which the code cannot handle.

Well, I’ll not tell a long story, how I debug, but come directly to the bug mentioned in the title. I tracked his existence down to BASIC 2.0 as used in the VIC-20, C64 and the early PET/CBM series and it seems, that it was never detected, documented or fixed.

It is related to temporary strings, the stack of descriptors for temporary strings, that has a size of 3, and the so called „garbage collection“, which in reality doesn’t collect garbage, but does a defragmentation of string storage.

Let’s look at an example:

10 V1 = 12345: V2 = 6789

20 A$ = RIGHT$(MID$(STR$(V1),2)) + RIGHT$(MID$(STR$(V2),2))


and let’s look into string memory, after the execution of these two statements:



The interpreter needed 6 temporary strings, these are the strings, which are followed by the two byte „garbage“ word in magenta, to create the resulting permanent string, marked here with the „back link“ word in yellow.


Each time, where a string is modified or new assigned, it gets a new allocation in string memory and the old string is flagged as „garbage“ by replacing its back link, which points to the string descriptor, by the garbage word, which is the length of the string, followed by $ff (The high byte $ff can never appear as part of a back link, because the highest address in string memory is $F6FF, therefore it is safe, to use this value as flag).


In programs with much string activity, the occupied string space, which grows from top ($f6ff) to bottom (top of array space), will be filled very fast and if there is no more space left for creating a new string, the call of the defragmentation routine is triggered. This starts at top of screen memory and copies all strings, which do not carry the garbage flag, to adjacent addresses, skipping the garbage strings. Doing this it is necessary, to update the string pointers of the descriptors. That’s the purpose of the backlink: After copying a string to the new address the back link is used to find the descriptor, and the two pointer bytes of the descriptor get the new address. Te descriptor can reside in scalar space, e.g. for a variable like AB$, or in array space, e.g. XY$(I,J) or it can be a temporary descriptor on the descriptor stack in the zero page. And this is, where the bug lurks.

The descriptor stack is 9 bytes long in the zero page and has therefore room for 3 descriptors. Additionally we have a stackpointer, which has four valid values for 0,1,2 or 3 descriptors in use. And we have a LASTPT called variable, which points to the last temporary descriptor, which was used to allocate a string. This is an optimisation tool. When a string is no more needed and it is the last string, that was allocated, the pointer to the used string memory can be updated by the length (plus 2 for the back link) instead of just flagging the string as garbage. This method can slow down the speed of filling the string memory somewhat and let’s the defragmentation happen less often.

This LASTPT is initialised at the execution start, alas only the high byte!




The low byte therefore has the value, that was either initialised at power up or has the value from a previous use. This use can for example be the loading of a program, because this involves string handling for the filename and therefore the use of the descriptor stack.


So it can happen, that the routine, that pops a value from the descriptor stack compares the current pointer with LASTPT, sees equality and decides not to flag the string as garbage, but to update the pointer to free string memory instead. This is an error, if the value of LASTPT is from a previous usage and was not set in the current statement. Alas, it will have no consequences in 99.9 % of the cases, because at the end of a statement all temporary descriptors are freed anyway.

The bug can drive mischief if following conditions meet:


1) A program with heavy usage of strings, which triggers the garbage collection / defragmentation frequently.


2) Statements with complex string operations, which need more than one temporary descriptor.


Then a really rare event can happen:

The garbage collection is triggered in the middle of a complex string formula with temporary descriptors on the stack. The garbage collection is aware of temporary descriptors and copies their strings too and updates the descriptors, but if one of the descriptors freed his string due to a false LASTPT comparison, his string is outside of the garbage collection area and will not be updated. And this causes strange things to be happen. The descripror has now a pointer, that does not point to a valid string. So the whole system of links and back links is corrupt. To make the bug hunting more interesting, this corrupt pointer often does no obvious harm until the next garbage collection, or the collection after it, but each will make the error worse, until back links appear in screen memory or strings dissappear or the computer freezes.


And all this because the setting of LASTPNT was not done with:

STA LASTPT

STA LASTPT+1


instead the first part, setting the low byte, was forgotten.

BTW, the highy byte is always zero, because the descriptor stack resides in the zero page, so this usage of the high byte is redundant.

I found this bug in all versions of Commodore BASIC, that I investigated, VIC-20, C64, C128, C65, MEGA65.

But it needed the „11 BASIC“ preprocessor/compiler of UBIK, to let the bug appear.

And it was very difficult to detect, because only huge programs with heavy string activity could activate it.


So the old programmer’s talk is true:

You can only find the penultimate error in a program!
There is always the ultimate error, that remained undetected.