Sunday, 2 November 2025

MEGAphone contact list and Dialer

Now that I've got the SMS message machinery more or less in place, it's time to turn to the contact list.  My current concept is for the SMS thread or the contact details (to allow editing of the contact) to be displayed along side this, since the device is basically locked to landscape for a variety of reasons.

My current thinking is just to display a list of contacts on the left, with scrolling as I've done for the SMS threads.  We probably do want searching using the index for contacts, so that they can be filtered and found easily. 

There also needs to be "new contact" and dial-pad buttons, as well as "call" and "message" buttons on the contact.   

But let's step back a moment, and think about the different displays that we want, so that we can architect this properly:

1. Contact list

2. Contact details (including to create a new contact) 

3. Dialpad / active call display

4.  SMS message list (which also handles call history as pseudo messages)

I'm thinking that each thing can only be displayed on a dedicated half of the screen.  So SMS message thread stays on the right-hand side.

The active call display should be on the left.  And perhaps that's the only thing that should go on the left.

This means we'd implement the contact list on the right, and have a way to navigate back from the SMS message thread to the contact list.  That's not a big problem.

It does mean that we should probably display a banner with the contact name and CALL button next to it at the top of the SMS display.

Let's start with the dialer. I need a nice way to make the dial pad keys.  Most likely, those need to be FCM glyphs, so that the digits can be big enough, as the Unicode fonts we have in our proportional text engine are limited to 16px tall.  They could be NCM to save some bytes, as I'm not aiming for anything too over the top visually.

I could even use sprites, but we don't have 12 sprites, so I'd have to multiplex or use tall sprites, that would then be tricky to change colour when a given digit is pressed.  So FCM/NCM glyphs it will have to be.

Because of the 640x480 video mode, we will probably want each digit to be 8 chars wide and 4 high to look square-ish, with the part with the digit on it taking up a central 4x2 area. So that means we need 4x2 x 64 bytes x 2 (remember we are in interlaced mode) symbols = 1,024 bytes per digit. So 1KB per digit, and we need 12KB overall.  We then just need to stash it somewhere above bank 0. I'll have a think about that.

So that just leaves us with needing to make the tool to generate the glyphs, so that we can load them and draw them in place.  Probably easiest is to adapt the shared resource font rasteriser, since it's basically the same task, but with a different output layout.

Much of the 12KB will be full of zeroes, though, so it will be worth trying to only keep the non-empty pixel blocks.

I might start by just writing one out, and making sure I have the format right. That way it will be small enough for me to fit the data into the existing test program. Once I have it all right, I can put it on the SD card somewhere, and load it in to an upper bank, without making the memory footprint of the binary bigger -- 12KB would be way too much for that.

Okay, so I have modified the dialpad generator tools, tools/make-dialpad.c, so that if any glyphs (actually interlaced glyph pairs) are all zeroes, then they don't get included in the file. This reduces the whole thing down to 8KB.

We still need somewhere to put it.  Banks 4 and 5 are FCM cache, so it can't go there.  Bank 0 is the program.  Work buffer is 88KB from mid-way through bank 1 to near the end of Bank 2. The rest of Bank 2 is consumed with the C64 KERNAL ROM.  Screen RAM is in Bank 1 at $12000, and goes to near $18000. 

That all leaves Bank 3, where the C65-mode ROM is, which we can re-use. It also leaves the 8KB at $10000-$11FFF in Bank 1, that CBDOS normally uses. But we aren't using CBDOS, so we can overwrite that.  So it looks like a good place to stash it.

Before we can use that, we need our helper program utilities to load files from the SD Card that I had made for the GRAZE web-browser-like program for the MEGA65. So with those, I should be able to have the file stored natively on the SD card in the /PHONE directory, and then load it at address $10000. This also gets us the utility for passing control of one program to another, for when the code inevitably ends up too big to fit in 64KB -- although with LLVM, I'm already fitting in more than I expected.

Okay, I have it partly working, loading the glyphs for the digits and displaying them. Without the SMS display to the right, it looks fine, but when that's there, something upsets some part of it -- possibly via some weird interaction with the interlace mode:

But the digits are being rendered correctly, so that means my tooling for preparing those is all right, which is nice to know :)

I'll take a look at the RRB glitching and figure out the cause tomorrow.

What I have already figured out, though, is that the GOTOX's are the cause of the glitching.  But the big question is how?

The issue only shows up if the GOTOX points to a position > 255.

The horizontal jiggle of the SMS display only happens when the high-res text mode is enabled. I'm trying to simplify the address calculation logic for that, even though I can't really see how it could cause this effect.

The GOTOX thing is also a big fat mystery. Maybe this bitstream was generated with poor timing closure or something?

CHARY16 addr calc fix didn't help. But havine 2 GOTOX's instead of one _does_ cause weird things to happen. So I'm assuming it must be a problem latching the 2nd screen RAM byte, since the problem only shows up with a GOTOX > 255.

Fascinatingly it's not whether it's <256 or not, but rather it seems that it's the lower bits of the position that matters. 255 is okay, as is 38x8-1 = 303 is okay, but 256 or 304 are bad.  After a bit more mucking about, it turns out the most of the problem is related to whether it's odd or even.  Odd is fine, even is bad.

But there are still some other artifacts to resolve:

These gaps and shifted lines seem to be aligned to where the SMS text display to the right are placed.  In particular, following the first pixel row of a character row.  In that situation we have the badline doing a big fat fetch, so there are fewer RRB cycles available.  My best guess is that I just have to reduce the number of render columns.

Yup, reducing our render count from 160 to 157 solves the problem:


Right, so now we can add the missing elements. I'd like a time and phone signal display on the top row. Like this:


 



I'll rearrange things so that it goes flush to the top-right, and the scroll bar for the SMS display doesn't reach the top. But I'm pretty happy with this. We have signal indication, battery level indication. Those were a bit of a bastard to get working, because the unicode codepoints for blocks aren't in the emoji fonts I'm using, so these are actually custom VIC-II mono chars.

But it's all progressing and starting to look like a plausible phone display!

So let's fix the formatting at the top right, and start plumbing more of this together.


 I've also refactored the status bar code so that it's now really easy to adjust the widths and positions of the elements using a bunch of #define statements in status.h:

// Space allocations for status bar
#define ST_PX_TIME 64
#define ST_GL_TIME 16

#define ST_PX_NETNAME_START (ST_PX_TIME)
#define ST_GL_NETNAME_START (ST_GL_TIME)

#define ST_PX_NETNAME 128
#define ST_GL_NETNAME 24 

#define ST_PX_RESERVED_START (ST_PX_NETNAME_START + ST_PX_NETNAME)
#define ST_GL_RESERVED_START (ST_GL_NETNAME_START + ST_GL_NETNAME)

#define ST_GL_RESERVED 50
#define ST_PX_RESERVED (199+38)

#define ST_PX_INDICATORS_START (ST_PX_RESERVED_START + ST_PX_RESERVED)
#define ST_GL_INDICATORS_START (ST_GL_RESERVED_START + ST_GL_RESERVED)

#define ST_GL_INDICATORS 24
#define ST_PX_INDICATORS 130

#define ST_PX_SIGNAL_START (ST_PX_INDICATORS_START + ST_PX_INDICATORS)
#define ST_GL_SIGNAL_START (ST_GL_INDICATORS_START + ST_GL_INDICATORS)

#define ST_GL_SIGNAL 8
#define ST_PX_SIGNAL 48

#define ST_PX_BATTERY_START (ST_PX_SIGNAL_START + ST_PX_SIGNAL)
#define ST_GL_BATTERY_START (ST_GL_SIGNAL_START + ST_GL_SIGNAL)

#define ST_GL_BATTERY 16
#define ST_PX_BATTERY 64

 

That's about it for the status bar. So let's make a routine to display a compact form of a contact, so that we can show the contact for whom the SMS is being composed above the SMS thread.  We can then re-use that routine for displaying the contact list.  With a bit of skullduggery we can also re-use it for editing the contact details.  We can indicate which field is active by changing the background colour of the field (or possibly by toggling the reverse field?), and allow the TAB key as well as touch inputs to rotate between them.

This approach also has the advantage that adding a contact requires just allocating the contact record, and then displaying the SMS thread for it.  Simple is good :) 

Okay, so let's work on displaying contacts. We'll start by showing the currently selected contact above the SMS message list on the right half. All we really need is labels for first name, last name and telephone number.  We can just use a Number and Name label to do all that we need.  To keep life simple, we can limit each field to a single line.

I have it now displaying that, but there's a glitch with printing the 3rd field box:

 


The weird thing though, is that the code that draws these fields looks like this, and thus their position should be identical:

char contact_draw(uint8_t x, uint8_t y,
          uint16_t x_start_px,
          uint8_t w_gl, uint16_t w_px,
          unsigned int contact_id,
          uint8_t active_field,
          unsigned char *contact_record)
{
  unsigned char *string;

  if (w_gl<20) return 1;
  if (w_px<96) return 2;

  uint8_t fields[3]={FIELD_FIRSTNAME, FIELD_LASTNAME, FIELD_PHONENUMBER};
  unsigned char *labels[3]
    ={(unsigned char*)"First:",(unsigned char *)"Last: ",(unsigned char *)"Phone:"};
    
  for(uint8_t field=0;field<3;field++) {
    
    draw_string_nowrap(x, y+field,
               FONT_UI,
               0x0f, // light grey for label text
               labels[field],
               x_start_px,
               LABEL_WIDTH_PX,
               LABEL_WIDTH_GL,
               NULL,
               VIEWPORT_PADDED,
               NULL,
               NULL);
    
    string = find_field(contact_record, RECORD_DATA_SIZE, fields[field],NULL);      
    draw_string_nowrap(x + LABEL_WIDTH_GL, y+field,
               FONT_UI,
               active_field==(field+1) ? 0x8f : 0x8b, // reverse medium grey if not selected
               (unsigned char *)string,
               x_start_px + LABEL_WIDTH_PX,
               w_px - LABEL_WIDTH_PX,
               w_gl - LABEL_WIDTH_GL,
               NULL,
               VIEWPORT_PADDED,
               NULL,
               NULL);
  }
  
  return 0;
}

With some poking around, I can confirm that the 3rd field is being written into the correct characters. That is, the problem is the padding of the "Phone:" label is too wide.

It looks like the issue is when the field has spare glyphs after the padding has reached the intended X position.  It's _supposed_ to then fill the remaining space with GOTOX tokens that hold it in position. But something is clearly going wrong with that.

The problem turned out to be how I was passing the viewport information to the drawing calls.  That's now fixed and it looks much better.  I would like to allow padding on the left of a string instead of on the right, so that I can have right-aligned text. This should be fairly easy by copying the screen and colour RAM across and then putting the padding (and GOTOX tokens with correct changed destination position) following that.

The main complication is that lcopy() isn't safe for overlapping addresses, specifically where the destination starts within the source string.

Got that fixed, and now the weird bug with GOTOX tokens pointing to even addresses has shown up again. I figured it was time once and for all to get to the bottom of that, so I pasted all 250KB of viciv.vhdl into ChatGPT and worked with it a bit to find the problem.

Initially it was suggesting various forms of nonsense, but then by luck or skill it found what I think is the actual problem: In CHARY16 mode, we set bit 0 of the character number when fetching the odd raster lines to achieve the interlace. But if that token's a GOTOX, then on the odd raster lines, that bit forms part of the X position... thus the 1px offset between odd and even raster lines.

It sounds promising, anyway.  Cooking a bitstream, and we'll know for sure in a moment -- yup, that fixes it:



In the meantime, let's start plumbing in the TAB key to cycle between the various fields, so that we can use the SMS thread display to also allow editing of the contact details.

I've added an active_field variable that we use to know which field is active.  For the SMS message draft box, I now have it so that the cursor appears or hides, depending on which field is active.

I now need to do similar for the contact fields, and also allow editing of them. This mostly consists of keeping track of which field we have in buffers.textbox.draft, so that we know which field to redraw when editing it, and where to stash the result whenever it changes.

This requires a bit of a re-work for the whole text input thing. Not fatally so, but still fairly extensive, because it previously assumed that buffers.textbox.draft only ever held the SMS message draft. But now it will be able to have that, or any of the fields from the contact, or the phone number currently being dialed / DTMF codes being input.

Previously I had the main input loop check at the end of the loop what needed to be (re)drawn.  But when you change fields, we need to redraw the old field and the new one. So I'll make a routine to redraw any single field, including knowing whether it's the active field or not, so that background colour is set correctly, and the cursor can be hidden or drawn in the field as appropriate.  It shouldn't actually be too hard, just a case of grinding through it.

I've done a bunch of stuff on that, and then also gotten around to caching the D81 mount status, so that we won't unnecessarily re-mount disk images that are already mounted, which should speed a bunch of stuff up -- and it does. It's now much faster to scroll through messages.

I've also done much of the work required for editing any of the fields, including in a contact.  And parts of it work nicely now, but things are still going wrong and the contact records are getting corrupted. I've possibly also messed up the provisioning stuff, as when I re-provisioned the SD card, the contacts are fine, but the message thread for the contact was empty.

Let's start by figuring out where the SMS messages from provisioning have gone. It looks like the messages never got written.  So let's look at the provisioning stuff: Yup, it looks like the messages don't actually get written. The contacts do, but not the messages in the threads.

It looks like contact 1 gets stuff written properly, but not the others.  Looks like this is our problem:

INFO: Rebuilding contact index before importing next SMS
src/telephony/index.c:177:disk_reindex(): Returning with error 1
src/telephony/index.c:246:contacts_reindex(): Returning with error 5
 

The problem is that I disabled disk_reindex() for native MEGA65 running, because the algorithm is obscenely slow right now. But it's okay for on Linux. So I just need to selectively disable it on MEGA65, but not cross-compiled Linux binaries.

Done, and now we have our message threads back. 

In the process, I realised that scrolling the message thread discards changes to the active text field, so we need to fix that, too. But that won't be hard. Done.

Now there's just random crashes when I do stuff that I need to track down.

It looks like delete_field() is causing them.  Local variables of the calling function get stomped, or is the stack gets stomped?  

Nope, just the interaction of an out-by-one error, the odd semantics of lcopy() on the MEGA65 (a length of 0 means a length of 64KB!) and a couple of other subtle logic bugs. With those fixed, I can now tab through the fields, and the contact fields are getting saved.

There is a problem now with saving and retrieval of the SMS draft, though. This is a bit weird, because it used to be fine.  I'm sure it will be some subtle thing again.  Tabbing through keeps the SMS draft. The issue is that unless you tab away from it, it doesn't get saved.  Actually, that applies to all fields.  It's probably okay like that, but it is a change from the earlier semantics where it got saved with every key-press, which ensured you couldn't lose a message draft if the machine crashed, rebooted or was otherwise suddenly stopped.

The more I think about it, the more I think that the old semantics were safer. I just have to see how much it slows things down, as it means removing the cursor, and then putting it back -- in the same place as before, not just at the end of the field.  I have code for doing this for when delete is pressed, but not for other edits:

      uint16_t cursor_stash = buffers.textbox.draft_cursor_position;
      // Remove cursor
      textbox_remove_cursor();
      // Save changes
      af_store(active_field,contact_id);
      // Reinsert cursor
      textbox_insert_cursor(cursor_stash);
      // Redraw
      af_redraw(0xff,active_field);

I also have it disabled for the SMS draft field, possibly because I was worried that it would be too slow for long text messages.  There's also a weird thing where the first key press when editing the SMS draft field gets munched. Or something weird is happening with it. So I need to get to the bottom of that. I'm suspecting that the length of the field is not properly initialised or something.

I've added some instrumentation, and it looks like it is actually being initialised correctly. So with "abcd" in the message draft, we start with this:

DEBUG: draft
0000: 61 62 63 64 01 00 00 00 00 00 00 00 00 00 00 00   abcd............
draft_len=0x0005, draft_cursor_position=0x0004
 

Then pressing 'e', the result is:

DEBUG: draft
0000: 61 62 63 64 65 01 00 00 00 00 00 00 00 00 00 00   abcde...........
draft_len=0x0006, draft_cursor_position=0x0005 

... which is how it should be.  But the 'e' doesn't get displayed.  If I then tab through the fields one full loop back to the SMS draft field and press 'f', it is as if the 'e' was never pressed.

But if I do it all over again, then the 'f' is still retained. So what's different between the two events?  We know that draft_len and draft_cursor_position and the draft text itself are correctly set and updated. So what goes wrong?

It looks like the edit action itself works.  So why does the field re-draw not work?

Hmm... Adding instrumentation seems to suppress the problem. Is it a compiler bug? It's feeling like it: Adding and removing instrumentation changes whether the problem occurs or not. That's really annoying, because making a unit test to elicit this problem isn't going to be simple.

Except it might not be a compiler bug: I can see a situation where the failure to save the field changes after each key press means that it gets discarded by the time the display is redrawn.  It still shouldn't happen, but I can see the mechanism. So fixing the save-on-every-change bug should fix this one, too. It's worth a try.

So that has everything almost perfect -- except for the contact fields when selected get redrawn slightly too wide, which is visible after de-selecting them, as the highlight extra bit remains to the right, like I'm pointing at here: 


 But apart from that, it's all working pretty nicely now:


So let's try to find the cause of that pixel poop.  Time to add more instrumentation, this time in the padded string drawing routine, draw_string_nowrap() and it's children.

The pixel pooh is not because the field when highlight is wider. Rather, it looks like the padding of non-highlighted string doesn't use up all of the remaining glyphs, and because of the width of padding glyphs that were in the highlighted version (which had a cursor) are different, that last glyph has a different width to make the padding work.  This is also why the pixel pooh is at most 8px wide.

So why is the non-highlighted version not overwriting that last glyph? The only difference in rendering is the lack of cursor -- otherwise it's only the colour.  It looks like the padding routine gets called with the correct arguments. So now to find out why it borks. 

Well, there we have it: when highlighted, it actually does write one extra glyph. Most fascinating.

[draw_string_nowrap] ENTER xg0=37 yg0=03 f=03 col=8F x_px0=00000197 x_px_vp=000000F7 xg_vp=37 padP=01 utf8=0xBB4F end=0x0000 pxUsed*=0x0000 gUsed*=0x0000 
[pad_string_viewport] ENTER xg0=3B yg0=03 col=0F x_px_vp_w=000000E1 xg_vp=37 abs_epx=028E 
PAD: glyph=3B , trim=00 
...
PAD: glyph=48 , trim=00 
PAD: glyph=49 , trim=0F 
Backtrace (most recent call first):
[02] 0x55EE draw_string_nowrap+0x01DB, SP=0xD000
[01] 0x7765 af_retrieve+0x000F, SP=0xD000
[00] 0x0C44 main+0x002D, SP=0xD000
[draw_string_nowrap] ENTER xg0=37 yg0=03 f=03 col=8B x_px0=00000197 x_px_vp=000000F7 xg_vp=37 padP=01 utf8=0xBB4F end=0x0000 pxUsed*=0x0000 gUsed*=0x0000 
[pad_string_viewport] ENTER xg0=3A yg0=03 col=0B x_px_vp_w=000000E4 xg_vp=37 abs_epx=028E 
PAD: glyph=3A , trim=00 
...
PAD: glyph=47 , trim=00 
PAD: glyph=48 , trim=0C 

This is quite odd, because of how the loop is constructed. It should always terminate after the same number of glyphs -- unless we haven't reserved enough glyphs for the field, in which case funny business can occur. But that doesn't seem to be what's happening here.

Found the problem: We only check for equality with x_glyphs_viewport, not if we are greater than.  And the values being passed in is the width of the field, rather than the right-hand glyph of the field. Thus it uses a variable number of glyphs, and we get this effect.

Right-o, so that's the last known visual glitch dealt with.  On to switching between the SMS thread display and the contact list display.  I'm not going to worry about searching or sorting of the contact list right now, just displaying and browsing them.  The rest is icing that can wait.

Let's make F3 toggle between contact list and SMS thread display.

We already have a routine that can display the three fields of a contact, so we can re-use that.  The main thing we need to do is to refactor things out so that we can support those two different display modes.

A couple of hours of fishing about in the gizzards and hooking everything up, we've now got a nice step forward: Contact list scrolling and selection works :)


In filming that I did hit a bug with the SMS message drafting field. The problem is during deletion it sometimes thinks the message box should be higher up, and possibly consists of multiple lines. There's also a possibly related annoying thing where it's defaulting to * as the message draft, when it should be empty. 

Let's look at that * bug: It looks like the * is present in the D81 as 0x2a 0x06 in the start of the draft records.  I'm guessing that this is the record number marker we pre-populate at the start of each sector on a record-oriented D81 in records.c.

We can fix it one of two ways: Make it skip over the first two bytes, so that the marker remains intact, or not pre-populate that record with the marker.  I've done the first, by sneakily allocating 2 bytes just before the textbox.draft[] array, and reading the record into there, so that it spills over in exactly the right way into the textbox.draft[] array. Net cost: 2 bytes of program size :)

So now lets figure out why editing the SMS draft sometimes causes it to think it's three lines tall instead of just one. It doesn't just happen with delete. It can also happen when typing.

The problem is that calc_break_points() is leaving the number of lines = 3. I think it's junk hanging around in the buffer and a lack of null termination -- but I'm not entirely sure.

My problem now, though, is that I have _finally_ run out of RAM. I'm 61 bytes short of being able to call dump_bytes() to see what's in the draft buffer when this happens. Grr!

Okay, refactored a bit of code to save some space, so that I can get the output now.

It looks like the last rendered SMS message is still hanging around. Or rather, that the af_retrieve() routine is not being called after redrawing the SMS thread display, I think. 

Yup -- that was it. I've added an af_dirty flag when retrieving a field so that we know if we need to call calc_line_breaks(), but only need to do it once, so that it doesn't slow things down. 

So now all the major GUI bugs have been dealt with.  All that's left in this part of things is to implement contact creation, which we'll do in a really simple way: Do the BAM allocation, and then write an empty record to that contact.

... except that results in "/<" in each of the contact fields, because the blank record doesn't have them set, and so the find_field() calls fail, resulting in garbage being returned.  Actually, I think it returns a null pointer, which then reads locations $0000 and $0001 before hitting a zero byte at $0002. 

Anyway, the solution is probably to build a proper blank contact. I do have a function for that, anyway. Yup, that fixed that. Then a couple of other minor list handling bugs, and a comfort improvement of putting the cursor in the "first name" box when creating a new contact, instead of defaulting onto the SMS draft editing box, and it's looking pretty good. Certainly fit for initial purpose.

So the question now is whether I have enough code space left in this binary to implement the remaining dialer stuff...

Specifically we need a caller/callee bar (which can show a contact or number dialed or be blank), and call / hang-up buttons and call status indicactor, as well as a text field that shows the digits dialed for DTMF injection in the call. 

This means we need icons for the dial and hang-up. I'm a bit concerned about where I'm going to find RAM to stash them, so I might tackle those first.  It might be possible to use the same tool I made to make the dialpad keys to also make a phone button to represent call, and something to represent hang up.

That increases the size of our dialer glyphs from 6KB to 10KB.  We currently stash that at $10000, but we only have 8KB reserved there.  So I'll need to move the screen on from $12000 to $12800 at least.  I don't remember what else I have stashed away in bank 1. I think work buffers for indexing.  Screen is 30 rows x $200 bytes = ~$4000 = 16KB, so I think we can just safely bump the screen address.

Meanwhile, I hit the compiled program size limit again. To work around that for the time being, I've added features.h where I #define various things we can live without for now. Initially, this is just SMS thread indexing, because that's the only thing that requires all of the indexing code.  By disabling that, I've won back enough space to keep things compiling for now.  I might also have to look at using a cruncher and different memory map at some point, to eek a bit more space, e.g., allowing the program to fully occupy $0200 - $CFFF, and then allowing some data to also be at $E000-$EFFF, and maybe even a bit further up. 

But that's all digression for now... let's get those remaining dialer elements in place!

First up, let's draw the call, mute and hang-up buttons: 



I still totally love the speak-no-evil monkey as the mute button.

I'm wondering if I shouldn't put the call control buttons down the side of the dial pad, instead of below, as the aspect ratio feels a bit out, and it doesn't leave much space at the top for the contact name of the call plus DTMF digit history from in the call. 

Yeah, I think it looks better:


So let's think about call state indication:

1. "Use dialpad or hold-contact to call"

2. "<telephone number + contact name being dialed + Calling>

3 "<telephone number + contact name being dialed + In Call (with timer?) + DTMF box>

4. "<telephone number + contact name being dialed + Incoming Call>

5  "<telephone number + contact name being dialed + Call Ended>  (replaces 1 if a call has been made "recently")

It's far from perfect, but it should work as a functional starting point.

The only problem is while implementing it, I'm hitting weird bugs again, which I think are from skirting close to the maximum program size -- just adding an extra string causes consistent crashes, but without generating a compile or link-time error.

So I'm going to have to find something else that I can selectively disable.  SMS sending is a fairly obvious candidate, but the problem there is that the call state handling actually needs to record pseudo-messages to indicate received, missed calls etc.

I could lose the function list for stack back-trace, but that would only gain about 1/2 KB right now.  But I might as well take the free bonus I can get from trimming some of the bootstrap symbol names that are really long like /usr/local/bin/../mos-platform/c64/lib/libcrt0.a(init-stack.S.obj):(.init.100). By cutting those at the last /, we can drop them down to something like libcrt0.a(init-stack.S.obj):(.init.100).

So that fixed the problem (for now), so it really does seem to be some self-stomping problem when we get close to using all RAM --- possibly the software stack goes down and treads on something. Anyway, I don't have time to track the problem down right now.

Now I'm tracking down a bunch of weird display glitches, most of which are related to the annoying RRB glitch thing. I really would like to know the cause and be able to fix it, because it's really cramping a bunch of things. 

Changes to $D05E (the number of glyphs that should be rendered) cause changes in the display, even when the extra glyphs should never appear on the screen. I'm assuming it's something crazy in the RRB paint state-machine not resetting at the start of the next raster if it's doing some specific phase of activity.  Anyway, I've worked around it by reducing the number of rendered columns, but allowing the text renderer to pretend the display is a bit wider, so that we don't get notching on the right-hand side.

Now the last remaining visual glitch is when scrolling down through an SMS thread the draft message field disappears sometimes, messing things up.  One part of the problem is the cursor deletion code is trimming when there is no cursor. Fixed that, and also selection of the correct field by default when displaying a contact.  Apart from the lack of cursor in the dialpad fields, and that you can't do anything with them, it's now looking good and glitch-free.

Well almost glitch free, typing some letters results in bogus glyphs appearing, instead of the correct ones. I think the glyph cache is incorrectly claiming to have the corresponding codepoint loaded, but it ends up having some other glyph in there.

Lower-case K is doing it reliably for me right now, so let's look and see what's in the cache. And it is claiming to be there in slot 73.  We use 256 bytes per glyph, so the data should be at $44900-$449FF.

The problem only occurs if the character hasn't already been included somewhere on the screen.  On start it isn't in the cache, but then typing it (with the contact that I'm currently displaying) it does indeed get loaded into slot 73 --- but clearly the data is getting messed up somehow.

And now I can't reproduce it anymore :/ 


Oh well, I'll track it down when it shows up next.

But let's get the cursor and editing working for the dialer fields. For simplicity, we'll only allow the cursor to be at the end of the dialer field. And the DTMF history in-call won't even have a cursor. It should also only appear when in-call and post-call.

Okay, cursor working, and also highlighting dialpad buttons should work now, too... except that I've run out of RAM again. I managed to find enough debug code that was hanging around to get it working again, and with a bit of fiddling, the dialpad does what we expect :)


And I think I'll stop here to post this, before it ends up even longer, still. 

Monday, 6 October 2025

SMS Thread Display, Message Editing etc

In a previous post, I got Unicode text rendering, complete with line breaking, emojis and a pile of other stuff working, that means we can show message threads.

Now we need to refactor that out into some thing more usable, and add the missing bits. Like being able to type a message, hit send, have a functional scroll-bar etc.

Then we'll have almost an entire working SMS system, sans talking to the actual cellular module -- which I really should finally hook up and test. 

Let's start with refactoring this stuff out into smsscreens.c 

I'd also like proper working scroll bars. After a bit of thought, I decided to use H640 multi-colour sprites that are full height (the VIC-IV in the MEGA65 lets you change the height of sprites, so that you don't need to use a multiplexer). That way I can have one colour for the background of the scroll area, and another for the foreground.

Done this way, the scroll bar implementation becomes embarrassingly simple:

char draw_scrollbar(unsigned char sprite_num,
            unsigned int start,
            unsigned int end,
            unsigned int total)
{
  unsigned char first;
  unsigned char last;
  unsigned long temp;

  if (!total) total=1;
  if (start>total) start=0;
  if (end>total) end=total;
  
  temp = start<<8;
  temp /= total;
  first = temp;

  temp = end<<8;
  temp /= total;
  last = temp;

  lfill(0xf100 + (sprite_num*0x300), 0x55, 0x300);
  lfill(0xf100 + (sprite_num*0x300) + first*3, 0xAA, (last-first+1)*3);
  
  return 0;  
}
  

So let's now implement simple scrolling through the message thread using the cursor keys, and show the current scroll region... And it works  :) The main issue is that it's quite slow to draw at the moment, because I haven't optimised anything. That's totally fine for now.




 And a video showing just how slow it is:


Okay, so that's the display of the messages working.  Next stop is letting you type and edit a message. I'm not going to allow up and down cursor keys to navigate the message draft: Those will still scroll the thread up and down.  Left and right we will make work, though. And any general key press will do what it should.  Backspace will work.  RETURN will be for send. For simplicity, we'll probably just always show the message edit box at the bottom. For now, I'll make it black on light grey, so that it's easy to tell apart from the message thread above.  

Unicode entry (e.g., for emoji) is something that I have yet to solve.  It's all a bit complicated because we can only have 512 unique glyphs on the screen at a time, because of our limited glyph buffer size.  What I might end up doing later is having an emoji/special character entry button that hides the SMS thread display, and just has an emoji/unicode point chooser, and when you select one, it passes it back to the editor that called it.  That will probably be a separate helper program, because of the code size it will entail.  It's also not on the critical path for demonstrating core functionality, so I'm not going to let myself be distracted by it.

Okay, so let's add this message drafting box.  Ah, that reminds me: the way I draw multi-line strings like this, I can't tell it to fill a certain number of lines. So I'll just make it take the space it needs, and it will grow as required, and trigger a re-draw of the whole screen if the number of lines in the message draft change. I'm also not yet sure how I'll handle the cursor. I might make it a special dummy character in the string (possibly just a | character, since it looks sufficiently cursor-ish).

Well, I continue to be pleased with the architecture I've laid down for this. It's not perfect, as we have those things I've describe above that we'll need to do. Nonetheless, I've been able to quickly plumb in the ability to do simple message drafting, complete with cursor key navigation, backspace, and because it's a C64-type machine, shift+HOME clears the message draft.

 

In short, we have all the ingredients ready, right up to the point where we need to add the message to the thread to simulate sending it, and then also send it to the cellular modem.

To finish that off, I'll need a routine to write to a record in the D81 from on native hardware. That shouldn't be too hard.  Then we can try to actually plumb it into the cellular communications stuff.

To test the saving and restoring of SMS drafts, I've added the bit of code necessary to read any saved draft, still using our horrible but adequate hack of using a | character as the cursor:

  // Read last record in disk to get any saved draft
  read_record_by_id(0,USABLE_SECTORS_PER_DISK -1, buffers.textbox.draft);
  buffers.textbox.draft_len = strlen(buffers.textbox.draft);
  buffers.textbox.draft_cursor_position = strlen(buffers.textbox.draft) - 1;
  // Reposition cursor to first '|' character in the draft
  // XXX - We really need a better solution than using | as the cursor, but it works for now
  for(position = 0; position<buffers.textbox.draft_len; position++) {
    if (buffers.textbox.draft[position]=='|') {
      buffers.textbox.draft_cursor_position = position;
      break;
    }
  }

Then all we need to do is to write that record back whenever we modify the draft message:

      // Update saved draft in the D81
      write_record_by_id(0,USABLE_SECTORS_PER_DISK -1, buffers.textbox.draft);

That should be all we need, once I have a working write_sector() routine. Which I think I have implemented now, but the draft message is notably not being restored when I re-run the program. So something is going wrong. 

The FDC busy flag strobes when requesting the write, so it seems like it should be working.  I've also confirmed that the buffer contents looks correct as it's being written.

Okay, I can actually confirm that it's being written to the sector on the D81. So why isn't it being loaded properly when the SMS thread is loaded?  The read sector call is failing is the reason.  So why is it failing? We know it works in the general case?  Is our record number too high, causing an invalid read?

Found the problem -- I wasn't mounting the disk image before reading the message draft :) With that fixed, it now retrieves the draft message on start-up. Nice!

So in principle, I should be able to now implement the "send message" function of hitting return by:

1.  Building an "outgoing" SMS message.

2. Allocating the next free record and writing it to it. 

3. Shifting the scroll position in the thread to the new end of the thread. 

4. Clear the draft message.

5. Redraw the thread.

We'll ignore what to do if the message thread is over-filled, and just silently fail. 

The good news is that we have all the functions we need to do this already, as this is just replicating what the import utility does when populating these message threads.

So let's start with forming the outgoing SMS message and logging it. It turns out that the sms_log() function does both 1 and 2.  All it needs is the phone number the message is being sent to. It's a little inefficient, in that it does the search for the contact that matches the phone number, instead of just slotting it into the current contact.  I might refactor that out, both for speed, and to stop messages being logged against the wrong contact if multiple contacts have the same phone number.

Hmm... the program is crashing now. I wonder if I haven't stomped over some RAM somewhere. I've had this before, where CC65 compiled programs go wonky on me when I hit memory limits.

I really have been using CC65 for this out of habit, but perhaps this is time to try one of the other MEGA65 supported compilers. Like LLVM-MOS.

I built and installed llvm-mos like this:

$ git clone https://github.com/llvm-mos/llvm-mos.git
$ cd llvm-mos
$ cmake -C clang/cmake/caches/MOS.cmake -G Ninja -S llvm -B build 
$ cd build
$ ninja
$ cd ..
$ ninja -C build install-distribution
$ cd ..
$ git clone https://github.com/llvm-mos/llvm-mos-sdk.git
$ cd llvm-mos-sdk
$ mkdir build && cd build
$ cmake -G Ninja .. -DCMAKE_INSTALL_PREFIX=/usr/local   # or $HOME/opt/llvm-mos
$ sudo ninja install

With that I should be able to use the same build setup as in GRAZE.

I've got it compiling now, but it's not working correctly, so I have to go through and debug whatever issues the switch to LLVM have caused. On the plus side, with optimisation enabled, the LLVM compiled program is only about 30KB, which is a significant improvement on CC65 -- even if I suspect it's mostly link-time optimisation, i.e., leaving out functions that don't ever get called. But if it still makes the difference between works and doesn't work, then that's fine by me.

This comes back to a recurring problem that I have with debugging this kind of thing: If the error occurs within nested function calls, then I initially can only detect the outer most error, which is probably just because some inner thing failed. When I'm cross-compiling, I have a fatal() macro that reports the failure tree. But for native builds I don't have that doing anything useful, because the screen gets all messed up, and because the code takes up space. But with LLVM, we have a bit more space available to us again.

We do have the serial monitor interface available to us, though. Or even just dumping stuff into RAM somewhere magic for later extraction.  The MEGAphone software doesn't use the AtticRAM at all, so I could just dump stuff up there, if it's too tricky to push the messages over the serial monitor interface. 

And now I'm pulling my hair out, because LLVM is generating incorrect code for the usleep() function and/or the function for writing strings to the serial UART monitor interface when I use it with -Oz, -O1, -O2 -O3. And it's failing in different ways.

So I think a lot of the problem is that mos-llvm assumes the Z register always contains zero. But then it gets set in places, and it doesn't realise what's happened.  Or perhaps somewhat equivalently it is using LDA ($zp) instructions that on the MEGA65 are actually LDA ($zp),Z, without realising.

Found the problem there: I was corrupting Z in another routine, which I've fixed.

But even after that, some stuff is output incorrectly from the mega65_uart_printhex() routine -- but only if I use stack storage for the hex string in that routine.  The problem is that the pointer it's using for the stack is pointing to $FFFB, i.e., under the KERNAL, and so it reads rubbish out, instead of the correct data.

That happens because the stack pointer is stored at $02 and $03, and something is causing a null pointer to be written to, thus overwriting the stack pointer.  Not cool.

Something to do with the shared resources helper assembly is borking things. I'm not sure why just yet, because it has an allocation for the 5 bytes it uses, so they shouldn't be at $0000.

Yup, for some reason the address is resolving to $0000.  Found the problem: the extern declaration for the _shres_regs area was as a pointer, and thus dereferncing the stored data, yielding $0000.

With that fixed, I can now see the failure points:

src/telephony/contacts.c:45:mount_contact_qso():0x03
src/telephony/contacts.c:58:mount_contact_qso():0x0A
src/telephony/contacts.c:60:mount_contact_qso():0x08
src/telephony/records.c:146:read_record_by_id():0x03
src/telephony/contacts.c:40:mount_contact_qso():0x01
src/telephony/contacts.c:41:mount_contact_qso():0x02
src/telephony/contacts.c:45:mount_contact_qso():0x03
src/telephony/contacts.c:51:mount_contact_qso():0x05
src/telephony/contacts.c:54:mount_contact_qso():0x06
src/telephony/contacts.c:58:mount_contact_qso():0x0A
src/telephony/contacts.c:60:mount_contact_qso():0x08
src/telephony/records.c:170:write_record_by_id():0x04
src/telephony/records.c:179:write_record_by_id():0x03
 

First is top of the list.  So I'll start investigating those. 

The first one was actually a debug thing I'd put in place. The second one is a file not found error when mounting the D81. This happens because the pointer to the filename is somehow winding up as a null pointer. Possibly because something upstream has messed with a null pointer. I _really_ don't like the SP being at $02-$03 in ZP with LLVM, as even the smallest null pointer dereference write will smash it, and make it hard to track down the root cause. Logged an issue: https://github.com/llvm-mos/llvm-mos/issues/505 

Anyway, I tracked the problem down. I hadn't converted the CC65 to LLVM calling convention for the assembly helper routine.  Now I'm back to it displaying the message thread for the contact :)

That said, it's still reporting some failures.  I'm now thinking I'd like to be able to get stack back-traces working, to make debugging even easier. After an hour or so of mucking about, I have it working as well as I can without having to load a line-by-line symbol table:

src/telephony/contacts.c:41:mount_contact_qso():0x02
Backtrace (most recent call first):
[02] 0x312A mount_contact_qso+0x00C5, SP=0xD000
[01] 0x10DB main+0x0660, SP=0xD000
[00] 0x0A89 main+0x000E, SP=0xD000
 

This is running entirely on the MEGA65, with this output via serial monitor UART interface. 

I can then make a utility that works out the exact line that those function offsets correspond to, should the need arise. But just having a general idea of the position of the call within the calling function should be enough.

I'm actually really happy with this. Actually, that statement really undersells just how happy I am to have access to stack back-traces in my code at reasonable cost.

I think it has the potential to be handy to lots of other folks, so I've documented it in a blog post of its own

Okay, so let's get back to debugging our actual bugs...

The problem now is that it's not mounting contact conversations properly. Or perhaps it's doing it correctly once, but not on subsequent attempts.

Yes, it's succeeding on the first attempt, but not on a subsequent call.

What's strange is it looks like the HYPPO chdir() call succeeded. i.e., that the problem is somewhere in the processing of the return value.  There is no HYPPO DOS error code set, which further supports this hypothesis.

Looks like the chddirroot() call isn't working. The problem was that the LLVM fileio.s didn't have the fix to chdirroot() that I had done a while back for CC65.

Now I'm finally back to having the SMS conversation thread displaying. But there is something funny where some of the messages are only having their first line displayed, instead of all lines of the message. 

Also pressing the INST/DEL key crashes it, instead of editing the draft SMS message. And now it's completely screwing up on start. Seems like Z register containing rubbish again. I've now added code to force Z = #$00 on startup, and it's getting further, but it looks like it's not loading the unicode glyphs any more.

So what weird thing is going on now?

Looks like the shared resource loader is loading the same empty glyph for every character. Which in turn is because it thinks that the font it's loading from is of zero length. So let's get to the bottom of this.

The magic string detection is failing. The SD card sector seems to be read correctly, and the magic_string buffer contains the right values, so how on earth is it failing?

The problem is that &magic_string[i] resolves to $002A, not the real address.

It turns out that LLVM doesn't seem to be calling it's own __copy_zp_data() function during initialisation prior to entering main() _and_ the linker decided that it would be better to relocate the magic_string down there in to ZP. It makes no sense, but that's what was happening.

With that fixed, it's looking less bad, at least while I had debug code in there -- it was even loading glyphs, although only the first row of text in a message was displaying.

Now it's back to the program crashing with Z=$F7 after I removed the debug code while I was figuring that ZP initialisation stuff out.

To figure out what's causing that, I've added an NMI catcher that shows the backtrace, like this:

NMI/BRK triggered.
                  Backtrace (most recent call first):
[04] 0x0A7E nmi_catcher+0x0001, SP=0xD000
[03] 0x4260 calc_break_points+0x0163, SP=0xD000
[02] 0x4205 calc_break_points+0x0108, SP=0xD000
[01] 0x1131 main+0x06A1, SP=0xD000
[00] 0xD000 __ashlhi3+0x772B, SP=0x41FF

Now that's a way faster way to get to the heart of the problem :)  

The code to add this was simple, too:


void nmi_catcher(void)
{
  mega65_uart_print("NMI/BRK triggered.\n");
  dump_backtrace();
  while(1) continue;
}

... 

  // Install NMI/BRK catcher
  POKE(0x0316,(uint8_t)&nmi_catcher);
  POKE(0x0317,((uint16_t)&nmi_catcher)>>8);
  POKE(0x0318,(uint8_t)&nmi_catcher);
  POKE(0x0319,((uint16_t)&nmi_catcher)>>8);
  POKE(0xFFFE,(uint8_t)&nmi_catcher);
  POKE(0xFFFF,((uint16_t)&nmi_catcher)>>8);
 

The fact that our function instrumentation is keeping track of the function entry and exit apart from the stack makes this quite robust.

Anyway, let's find out where this is all going west inside calc_break_points().

The double-call for it is the result of in-lining, I think. 

Anyway, with a bit more work, I have my NMI catcher showing the symbol and offset, and PC value of where it happened:

>>> NMI/BRK triggered.
 A:9E X:84 Y:05 P:B1 S:E8
  BRK source @ 0x4717 calc_break_points+0x04F5
Backtrace (most recent call first):
[03] 0x29A0 nmi_catcher+0x009E, SP=0xD000
[02] 0x43A8 calc_break_points+0x0186, SP=0xD000
[01] 0x111E main+0x06A1, SP=0xD000
[00] 0x0DD0 main+0x0353, SP=0x1E57

With that, I can disassemble and see what's going on: 

,00004700  64 05     STZ   $05
,00004702  64 06     STZ   $06
,00004704  86 07     STX   $07
,00004706  A6 1A     LDX   $1A
,00004708  86 08     STX   $08
,0000470A  64 09     STZ   $09
,0000470C  A6 1B     LDX   $1B
,0000470E  86 0A     STX   $0A
,00004710  A6 1A     LDX   $1A
,00004712  86 0B     STX   $0B
,00004714  A2 84     LDX   #$84
,00004716  0A        ASL   
,00004717  00 D0     BRK   $D0
,00004719  75 57     ADC   $57,X
,0000471B  00 D0     BRK   $D0
,0000471D  86 04     STX   $04

And look at that, there _is_ a BRK instruction in the middle of the instruction stream. What on earth is it doing there?   

Yup, something is bonkers here.  If I use this on the ELF object:

llvm-objdump -drS --no-show-raw-insn --print-imm-hex bin65/unicode-font-test.llvm.prg.elf

I can see this:

;   lcopy((unsigned long)buffers.textbox.break_costs,0x1A000L,RECORD_DATA_SIZE);
    46fe:       stz     $4                      ; 0xe004 <__heap_start+0x43d8>
    4700:       stz     $5                      ; 0xe005 <__heap_start+0x43d9>
    4702:       stz     $6                      ; 0xe006 <__heap_start+0x43da>
    4704:       stx     $7                      ; 0xe007 <__heap_start+0x43db>
    4706:       ldx     $1a                     ; 0xe01a <__heap_start+0x43ee>
    4708:       stx     $8                      ; 0xe008 <__heap_start+0x43dc>
    470a:       stz     $9                      ; 0xe009 <__heap_start+0x43dd>
    470c:       ldx     $1b                     ; 0xe01b <__heap_start+0x43ef>
    470e:       stx     $a                      ; 0xe00a <__heap_start+0x43de>
    4710:       ldx     $1a                     ; 0xe01a <__heap_start+0x43ee>
    4712:       stx     $b                      ; 0xe00b <__heap_start+0x43df>
    4714:       ldx     #$84
    4716:       lda     #$32
    4718:       jsr     $5775 <lcopy>
;   CHECKPOINT("post string_render_analyse");
    471b:       ldx     #$25
    471d:       stx     $4                      ; 0xe004 <__heap_start+0x43d8>

Let's focus on the bit that's broken:

,00004712  86 0B     STX   $0B
,00004714  A2 84     LDX   #$84
,00004716  0A        ASL   
,00004717  00 D0     BRK   $D0
,00004719  75 57     ADC   $57,X
 

vs 

    4712:       stx     $b                      ; 0xe00b <__heap_start+0x43df>
    4714:       ldx     #$84
    4716:       lda     #$32
    4718:       jsr     $5775 <lcopy>
 

Ah -- when I load it, but haven't yet run it, it's okay:

,00004712  86 0B     STX   $0B
,00004714  A2 84     LDX   #$84
,00004716  A9 32     LDA   #$32
,00004718  20 75 57  JSR   $5775
,0000471B  A2 B3     LDX   #$B3
,0000471D  86 04     STX   $04
,0000471F  A2 5D     LDX   #$5D 

So we're somehow overwriting this bit of code. That explains why the behaviour is so random.

Let's stick a watch on $4717 and see what's to blame for corrupting it.

Bingo -- we've found it:

w4717
.t0
.!
PC   A  X  Y  Z  B  SP   MAPH MAPL LAST-OP In     P  P-FLAGS   RGP uS IO ws h RECA8LHC
247B 0A 97 02 F7 00 01F2 0000 0000 A507    00     21 ..E....C ...P 15 -  00 - .....lh.
,0777247B  A4 09     LDY   $09

.D2470
,00002470  08        PHP   
,00002471  85 05     STA   $05
,00002473  A9 00     LDA   #$00
,00002475  A0 02     LDY   #$02
,00002477  91 04     STA   ($04),Y
,00002479  A5 07     LDA   $07
,0000247B  A4 09     LDY   $09
,0000247D  91 04     STA   ($04),Y
,0000247F  18        CLC   
,00002480  A5 08     LDA   $08
,00002482  69 02     ADC   #$02
,00002484  85 07     STA   $07
,00002486  A5 06     LDA   $06
,00002488  69 00     ADC   #$00
,0000248A  5A        PHY   
,0000248B  A4 07     LDY   $07

.m4
:00000004:1547470A1501004C0000000400000000

So why does the ZP vector at $04-$05 point there? And where is this bit of code?

Okay, so this is a bit embarrassing: It's in the trace-back logging code. In particular here:

;   callstack[depth] = (struct frame){ call_site, &__stack };
    2457:       lda     $40                     ; 0x6040 <__bss_size+0x2f24>
    2459:       asl
    245a:       rol     $6                      ; 0x6006 <__bss_size+0x2eea>
    245c:       asl
    245d:       sta     $4                      ; 0x6004 <__bss_size+0x2ee8>
    245f:       rol     $6                      ; 0x6006 <__bss_size+0x2eea>
    2461:       lda     #$15
    2463:       clc
    2464:       adc     $4                      ; 0x6004 <__bss_size+0x2ee8>
    2466:       tay
    2467:       lda     #$6b
    2469:       adc     $6                      ; 0x6006 <__bss_size+0x2eea>
    246b:       sta     $6                      ; 0x6006 <__bss_size+0x2eea>
    246d:       sty     $4                      ; 0x6004 <__bss_size+0x2ee8>
    246f:       sty     $8                      ; 0x6008 <__bss_size+0x2eec>
    2471:       sta     $5                      ; 0x6005 <__bss_size+0x2ee9>
    2473:       lda     #$0
    2475:       ldy     #$2
    2477:       sta     ($4),y                  ; 0x6004 <__bss_size+0x2ee8>
    2479:       lda     $7                      ; 0x6007 <__bss_size+0x2eeb>
    247b:       ldy     $9                      ; 0x6009 <__bss_size+0x2eed>
    247d:       sta     ($4),y                  ; 0x6004 <__bss_size+0x2ee8>
 

And again, the root cause is that Z has somehow got itself = $F7.

Where on earth is that coming from?  It's all the side-effect of Z not being zero on initial entry, which causes all sorts of things to go wrong in the initial pre-main() routines, and then thereafter.

I was able to force it to be cleared by adding this to hal_asm_llvm.s: 


    ;;  Ensure Z is cleared on entry    
    .section .init.000,"ax",@progbits
    ldz #0            
    cld            ; Because I'm really paranoid
 

With that all in place, it draws the display more or less properly again.  But if I try to edit the message, it still crashes-- but it does yield a stack back-trace at least for part of that problem. So I can investigate and fix that, and see what else remains broken:

src/telephony/records.c:144:read_record_by_id():0x01
Backtrace (most recent call first):
[02] 0x30C9 write_record_by_id+0x000E, SP=0xD000
[01] 0x1AFE main+0x107E, SP=0xD000
[00] 0x0A9A main+0x001A, SP=0xD000


src/telephony/records.c:75:append_field():0x01
Backtrace (most recent call first):
[04] 0x2DDE append_field+0x018E, SP=0xD000
[03] 0x1DEE main+0x136E, SP=0xD000
[02] 0x1C84 main+0x1204, SP=0xD000
[01] 0x1AFE main+0x107E, SP=0xD000
[00] 0x0A9A main+0x001A, SP=0xD000

Okay,  I'm ready to move forward again after a bunch of further diversions, and logging the odd issue against mos-llvm, and implementing a VHDL-based memory write-protection scheme to detect memory corruption bugs earlier.

I also had to re-provision the SD card files, because the SMS thread got corrupted during all of the above.

So now I have it at the point where it _looks_ like sendin an SMS works, except that after sending it doesn't show up in the thread.  Either it's not being written to the thread, or the message count in the thread is not being updated.

It looks like the problem is happening during the index update. Disabling that temporarily, I can now have an SMS message get stored into the message thread, and they display:

I need to fix the removal of the cursor before they get stored. And then also track down the reason why it messes up when updating the index.

Okay, I have the cursor hiding working now (although there are still some subtle bugs with cursor handling).  I've added instrumentation that lets me see which sectors of which disk image are being written to -- complete with the path and name of the disk image:

DEBUG: BAM sector before allocation
0000: 0 0 FF FF 07 00 00 00 00 00 00 00 00 00 00 00   ................
DEBUG: BAM sector after allocation
0000: 0 0 FF FF 0F 00 00 00 00 00 00 00 00 00 00 00   ................
Image in drive 00 is /PHONE/THREADS/0/0/0/3/MESSAGES.D8
DEBUG: Writing sector data beginning with
0000: 0 0 FF FF 0F 00 00 00 00 00 00 00 00 00 00 00   ................
Allocated record 003 for new SMS message
DEBUG: BAM sector read back after
0000: 0 0 FF FF 0F 00 00 00 00 00 00 00 00 00 00 00   ................
Image in drive 00 is /PHONE/THREADS/0/0/0/3/MESSAGES.D8
DEBUG: Writing sector data beginning with
0000: 0 27 00 00 06 0D 2B 39 39 39 32 36 37 35 3 34   .'....+99926754
Image in drive 0 is /PHONE/THREADS/0/0/0/3/MESSAGES.D8
DEBUG: Writing sector data beginning with
0000: 0 0 FF FF 07 00 00 00 00 00 00 00 00 00 00 00   ................
Image in drive 0 is /PHONE/THREADS/0/0/0/3/MESSAGES.D8
DEBUG: Writing sector data beginning with
0000: 0 03 00 00 06 0D 2B 39 39 39 32 36 37 35 3 34   ......+99926754
Image in drive 0 is /PHONE/THREADS/0/0/0/3/MESSAGES.D8
DEBUG: Writing sector data beginning with
...
 

With this, I can see that for some reason, the indexing code thinks that it's writing always to the MESSAGES.D81 (I need to fix that trimming from the filename displayed), even when updating the index. That would absolutely cause the kind of problem that we're seeing. So time to add some more instrumenting. I am so glad that I have the instrumentation stuff setup now.

Okay, found a big bug in write_sector: It was always selecting drive 0.

I'd like to optimise the indexing code, so that we only modify index sectors that have changed, since writing is much slower than reading. Ideally we would use freezer-style multi-sector writes to speed things further, but we don't have that implemented in the HAL yet. More the point, because we are accessing via the FDC emulation, multi-sector writes aren't actually possible.  So that'll have to go on the back-burner.

But what I can do in the meantime is add a busy indication with an hour-glass sprite. Except I've decided to go with a 1 TON "wait".  I may even add an IRQ routine to animate it, so that the weight seems to drop continuously while "weighting".  Okay, so it's tacky. But it puts everything in place for a much better wait indication. Anyone who's eyeballs are bleeding at what I have created is invited to submit alternate artwork for consideration. This what you have to improve upon:


So now the problem I have is that the text box for the SMS message  drafting is not being consistently drawn, and when it is, it's with a different height. First check is to see whether the flag to draw it is actually being seen.

Okay, so with a bit of debug output, we can see that it's being asked to be drawn, but the number of lines to be drawn is varying all over the place:

with_edit_box_P = 01, textbox.line_count = 00
with_edit_box_P = 01, textbox.line_count = 04
with_edit_box_P = 01, textbox.line_count = 04
with_edit_box_P = 01, textbox.line_count = 04
with_edit_box_P = 01, textbox.line_count = 04
with_edit_box_P = 01, textbox.line_count = 04
with_edit_box_P = 01, textbox.line_count = 01
with_edit_box_P = 01, textbox.line_count = 01
with_edit_box_P = 01, textbox.line_count = 02
with_edit_box_P = 01, textbox.line_count = 02
with_edit_box_P = 01, textbox.line_count = 04
with_edit_box_P = 01, textbox.line_count = 04
with_edit_box_P = 01, textbox.line_count = 02
 

Why? The draft message itself is empty.  And that's the problem: If it's zero bytes long, then it returns failure. I've fixed that to instead show that it should still show a single line in that case. The empty draft now has constant vertical space reserved on the screen, but if the string is empty, then that line still doesn't get shown.  I can live with that, because it should never happen, because we should always have a cursor character in there.

This then feeds back to the other bugs affecting the whole cursor thing, because there are a bunch of them.

This current one we can deal with by saying that if we have a string without a cursor marker, that we should add one to the end.

Also, if a string has more than one cursor marker, we should get rid of it.

Right now we use a | character to approximate a cursor. But we should probably go past that now, and use something better. Possibly a width trimmed reverse space. We could then enable the hardware blink attribute on it, to make it a proper blinking cursor.

We do need a way to represent the cursor in the string. I'd rather not use a >0x7f value, because then we have to worry about UTF8 encoding stuff. But we can use a low value <0x20, that normally wouldn't get displayed. Like 0x01, for example, and then just replace that with a cursor when we encounter it.

Okay, so I've implemented this, and we even now have a nice blinking cursor.  But if the cursor isn't at the end of the message, then when it gets reloaded, the cursor isn't there, and one of the characters next to where the cursor was gets munched. So I guess I'm handling the cursor finding thing wrong.  So, time for more serial monitor debug messages!

That's all fixed now, too.

So I think the last thing I'd like to deal with for now is to allow deleting messages, so that I don't clog up the message threads with all my testing.  This shouldn't be too hard: all I need to do is to deallocate the record in the BAM, and then update the index.  I can probably do this in two steps, by first zeroing out the message we're deleting, and pretending to index it. That will update the index. Then I just need to deallocate the BAM bit. But first, I need to key combo for it. I'm going to use SHIFT+DEL, since it's easy.

The more complex case is deleting messages that aren't at the end of the conversation.  I'm not updating the index when that happens, because the routine for reindexing a whole disk is currently embarrassingly slow on real hardware (of the order of an hour!).  But I don't need the index stuff right now, since searching isn't on the critical path. 

So I think that's probably everything we need here for the moment.  Time to get contact list and dialpad working, and then hook it up to the cellular modem and actually do some telephony! 

Saturday, 4 October 2025

Simple Memory Protection Scheme

Using LLVM has me wanting to implement a simple scheme that enforces memory protection, so that I can more easily detect memory corruption events, in part because of the unfortunate (although understandable) way that LLVM stores its stack pointer in zero-page at addresses $02 and $03, which renders them succeptible to easy corruption, which then results in all manner of down-stream corruption.  

While protecting the stack pointer itself would require munging with LLVM's code generator, I can at least make it possible to write-protect code and read-only data segments, wired so that they simulate a BRK instruction if attempted to be written to. 

Tracking via Issue #921

I'm going to have to work out how to do this in a way that can survive freeze and unfreeze, ideally without changing the frozen process image. But that can wait.

We'll start by defining what I want it to do, which I think is fairly simple:

1. Allow two ranges, each of which define a write-protected region.

2. Two flags that enable each write protected region.

3. A bit field that indicates whether each region should trigger a hypervisor trap (to freezer), an BRK-like IRQ,  or something else.

It was all a lot more mucking about to get working, due to some weird glitching causing false-positives. I've fixed those, and I can now cause an interrupt on a write to either protected region. However, the writes are still occurring, at least to chip RAM, which is hardly ideal.

That would because we have this weird split regime that got added in when we moved from the old synthesis tool, whose name I can't even remember at the moment, to Vivado.  The old one allowed a slightly weird BRAM timing configuration that was deeply depended on in the CPU design, and it was solved by splitting the whole thing into these two separate processes. But this means that our detection of the write violation and our inhibiting of the write is now split over two separate processes.

So I need some simple way to fix this, without messing up timing.

Well, I've got it enforcing for chip RAM now, but not IO. But I can live with that.

I doubt that this will make it into development, but who knows.  But here's the registers (write-only) for this:

$FFD5000-1 = low address (inclusive) for write-protect region 0
$FFD5002-3 = high address (inclusive) for write-protect region 0
$FFD5004-5 = low address (inclusive) for write-protect region 1
$FFD5006-7 = high address (inclusive) for write-protect region 1
$FFD5008 bit 0 = enable write protection region 0
$FFD5008 bit 4 = enable write protection region 1
$FFD5008 bit 1-3 = write protection region 0 violation angle: 111 = nothing, 000 = simulate BRK, 001 = NMI, 010 = trigger freezer.
$FFD5008 bit 5-7 = write protection region 1 violation angle: 111 = nothing, 000 = simulate BRK, 001 = NMI, 010 = trigger freezer.

Writing to any of $FFD5000-$5007 disables write protection for both regions. As does entering the freezer.

So for example we can do:

sffd5000 0 8 10 8 0 0 0 0 1

And that will write-protect $0800-$0810 inclusive, and trigger a fake BRK (which will trigger the MEGA65 ROM monitor by default).

So now I can make a little shim for LLVM that extracts the address range of the code and rodata segments, and then enforces write-protection.  Well, it felt like it should be possible to do with the linker, but I didn't have the time to dig deep, so I just made some python that parses the map file for the program (which I already had to add symbol tables to allow debug symbols in the natively generated stack backtraces on BRK instruction) to also make the 9-byte vector of values to get put at $FFD5000 to setup the write protection.

And with the latest commits to everything, it now works -- and I get a stack backtrace generated whenever the code or read-only data area gets written to :)

So now it's back to fixing the remaining bugs with the LLVM transition for the telephony software... 



 

 

 


Sunday, 28 September 2025

Stack backtrace on MEGA65 using MOS-LLVM

For the MEGAphone, I wanted to make my software debugging on the MEGA65 easier.  Stack back-traces are a great way to help debug errors, but we don't have gdb or lldb or anything like that on the MEGA65.  m65dbg and related tools can help here, but they don't have an easy way to provide the complete call stack.

To solve this for my needs, I added function instrumentation using the following in my Makefile:

COPT_M65=    -Iinclude    -Isrc/telephony/mega65 -Isrc/mega65-libc/include

COMPILER=llvm
COMPILER_PATH=/usr/local/bin
CC=   $(COMPILER_PATH)/mos-c64-clang -mcpu=mos45gs02 -Iinclude -Isrc/telephony/mega65 -Isrc/mega65-libc/include -DLLVM -fno-unroll-loops -ffunction-sections -fdata-sections -mllvm -inline-threshold=0 -fvisibility=hidden -Oz -Wall -Wextra -Wtype-limits

# Uncomment to include stacktraces on calls to fail()
CC+=    -g -finstrument-functions -DWITH_BACKTRACE

LD=   $(COMPILER_PATH)/ld.lld
CL=   $(COMPILER_PATH)/mos-c64-clang -DLLVM -mcpu=mos45gs02
HELPERS=        src/helper-llvm.c

LDFLAGS += -Wl,-Map,bin65/unicode-font-test.map
LDFLAGS += -Wl,-T,src/telephony/asserts.ld

# Uncomment to include stacktraces on calls to fail()
CC+=    -g -finstrument-functions -DWITH_BACKTRACE
 

Then for the build target, I run it twice, first to generate a map file with the memory addresses of all the functions in it, and then generate a C structure with the address of each function and its name listed:

# For backtrace support we have to compile twice: Once to generate the map file, from which we
# can generate the function list, and then a second time, where we link that in.
bin65/unicode-font-test.llvm.prg:    src/telephony/unicode-font-test.c $(NATIVE_TELEPHONY_COMMON)
    mkdir -p bin65
    rm -f src/telephony/mega65/function_table.c
    echo "struct function_table function_table[]={}; int function_table_count=0;" > src/telephony/mega65/function_table.c
    $(CC) -o bin65/unicode-font-test.llvm.prg -Iinclude -Isrc/mega65-libc/include src/telephony/unicode-font-test.c src/telephony/attr_tables.c src/telephony/helper-llvm.s src/telephony/mega65/hal.c src/telephony/mega65/hal_asm_llvm.s $(SRC_TELEPHONY_COMMON) $(SRC_MEGA65_LIBC_LLVM) $(LDFLAGS)
    tools/function_table.py bin65/unicode-font-test.map src/telephony/mega65/function_table.c
    $(CC) -o bin65/unicode-font-test.llvm.prg -Iinclude -Isrc/mega65-libc/include src/telephony/unicode-font-test.c src/telephony/attr_tables.c src/telephony/helper-llvm.s src/telephony/mega65/hal.c src/telephony/mega65/hal_asm_llvm.s $(SRC_TELEPHONY_COMMON) $(SRC_MEGA65_LIBC_LLVM) $(LDFLAGS)

The tool that generates the function list is fairly simple:

#!/usr/bin/env python3
import sys
import re

if len(sys.argv) != 3:
    print(f"usage: {sys.argv[0]} <mapfile> <output.c>")
    sys.exit(1)

mapfile, outfile = sys.argv[1], sys.argv[2]

entries = []
in_text = False

with open(mapfile) as f:
    for line in f:
        if ".text" in line and line.strip().endswith(".text"):
            in_text = True
            continue
        if ".rodata" in line:
            break
        if not in_text:
            continue
        # match lines like: " a7b      a7b     196b     1                 main"
        m = re.match(r"\s*([0-9a-fA-F]+)\s+[0-9a-fA-F]+\s+[0-9a-fA-F]+\s+\d+\s+(\S+)$", line)
        if m:
            addr = int(m.group(1), 16)
            name = m.group(2)
            # skip synthetic names if you want
            if name.startswith("bin") or name.endswith(".o:"):
                continue
            entries.append((addr, name))

with open(outfile, "w") as out:
    out.write("/* auto-generated from map file */\n")
    out.write("const struct function_table function_table[] = {\n")
    for addr, name in entries:
        out.write(f"  {{ 0x{addr:04x}, \"{name}\" }},\n")
    out.write("};\n")
    out.write(f"const unsigned function_table_count = {len(entries)};\n")

Then in an include file, I have:

#ifdef WITH_BACKTRACE
#define STR_HELPER(x) #x
#define STR(x)        STR_HELPER(x)

#define fail(X) mega65_fail(__FILE__,__FUNCTION__,STR(__LINE__),X)
void mega65_fail(const char *file, const char *function, const char *line, unsigned char error_code);
#else
#define fail(X)
#endif

struct function_table {
  const uint16_t addr;
  const char *function;
};

#endif

The last bit of setup then is to have a C file that includes the function table and implements the helper functions:

#include "includes.h"

extern const unsigned char __stack; 

#ifdef WITH_BACKTRACE
#include "function_table.c"
#endif

void dump_backtrace(void);

#ifdef WITH_BACKTRACE

__attribute__((no_instrument_function))
void mega65_uart_print(const char *s)
{  
  while(*s) {
    asm volatile (
        "sta $D643\n\t"   // write A to the trap register
        "clv"             // must be the very next instruction
        :
        : "a"(*s) // put 'error_code' into A before the block
        : "v", "memory"   // CLV changes V; 'memory' blocks reordering across the I/O write
    );

    // Wait a bit between chars
    for(char n=0;n<2;n++) {
      asm volatile(
           "ldx $D012\n"
           "1:\n"
           "cpx $D012\n"
           "beq 1b\n"
           :
           :
           : "x"   // X is clobbered
           );
    }
    
    s++;
  }

}

__attribute__((no_instrument_function))
void mega65_uart_printhex(const unsigned char v)
{
  char hex_str[3];

  hex_str[0]=to_hex(v>>4);
  hex_str[1]=to_hex(v&0xf);
  hex_str[2]=0;
  mega65_uart_print(&hex_str[0]);
}

__attribute__((no_instrument_function))
void mega65_uart_printptr(const void *v)
{
  mega65_uart_print("0x");
  mega65_uart_printhex(((unsigned int)v)>>8);
  mega65_uart_printhex(((unsigned int)v));
}

__attribute__((no_instrument_function))
void mega65_fail(const char *file, const char *function, const char *line, unsigned char error_code)
{

  POKE(0x0428,PEEK(0x02));
  POKE(0x0429,PEEK(0x03));

  mega65_uart_print(file);

  mega65_uart_print(":");

  mega65_uart_print(line);
  mega65_uart_print(":");
  mega65_uart_print(function);
  mega65_uart_print("():0x");

  mega65_uart_printhex(error_code);
  mega65_uart_print("\n\r");

  dump_backtrace();

  while(PEEK(0xD610)) POKE(0xD610,0);
  while(!PEEK(0xD610)) POKE(0xD021,PEEK(0xD012));

}

/*
  Stack back-trace facility to help debug error locations.

*/

#define MAX_BT 32
struct frame { const void *site, *stack_pointer; };
static struct frame callstack[MAX_BT];
static uint8_t depth, sp;

__attribute__((no_instrument_function))
void __cyg_profile_func_enter(void) {
  if (depth>=MAX_BT) depth--;
  
  // Get SPL into sp variable declared above.
  __asm__ volatile ("tsx" : "=x"(sp));
  // Now convert that in
  const uint8_t *stack_pointer = (void *)(0x0100 + sp);
  
  void *call_site = (void *)((*((uint16_t *)&stack_pointer[1])) - 1);
  
  callstack[depth] = (struct frame){ call_site, &__stack };
  depth++;
}

__attribute__((no_instrument_function))
void __cyg_profile_func_exit(void) {
  if (depth) --depth; // simple, assumes well-nested calls
}

__attribute__((no_instrument_function))
void dump_backtrace(void) {
  // For each frame, either:
  //  - print raw addresses, or
  //  - call your on-target addr2line() to print file:line + function

  mega65_uart_print("Backtrace (most recent call first):\n\r");
  unsigned char d= depth-1;

  for(unsigned char d = depth-1;d!=0xff;d--) {
    mega65_uart_print("[");
    mega65_uart_printhex(d);
    mega65_uart_print("] ");

    // Find function in table
    unsigned int func_num = 0;
    while(func_num<(function_table_count-1) && function_table[func_num+1].addr < (uint16_t)callstack[d].site)
      func_num++;

    // Display offset from function
    mega65_uart_printptr(callstack[d].site);
    mega65_uart_print(" ");
    mega65_uart_print(function_table[func_num].function);
    mega65_uart_print("+");
    mega65_uart_printptr((void*)((uint16_t)callstack[d].site - function_table[func_num].addr));

    // Show stack pointer
    mega65_uart_print(", SP=");
    mega65_uart_printptr(callstack[d].stack_pointer);
    mega65_uart_print("\n\r");
  } 
}
#endif

With all that in place, if you call fail(X) where X is an error code, the MEGA65's serial monitor interface will output something like this, and then wait for a keypress on the MEGA65's keyboard before contininuing:

src/telephony/contacts.c:44:mount_contact_qso():0x03
Backtrace (most recent call first):
[02] 0x312A mount_contact_qso+0x00C5, SP=0xD000
[01] 0x10DB main+0x0660, SP=0xD000
[00] 0x0A89 main+0x000E, SP=0xD000

 

So now I know that fail(3) was called from inside mount_contact_qso(), which was called from main().