Making a C64/C65 compatible computer

Wednesday, 6 August 2025

MEGAphone Module Building Progress: Cellular modem and DC:DC modules

Goran has been working on refining the MEGAphone modules I designed earlier in the year. So let's check in on the status of those.

Goran has improved the structure of the castellated pads, so that they can be fabricated more cheaply, without needing to pay the premium for castellation. A part of this is adding "backup holes" behind the castellations

2.4 Cellular Modem, including SIM card Module

2.4.5: Assembly

Let's start with the cellular modem module. After fabrication, Goran has assembled it, and checked the fit of the cellular modem:

2.4.7: Revision

You can kind of see in the photo below, that we had the wrong clip on the left, making it squish too low.

This was fixed by revising the part used for that, to one that stands taller:

Now it all fits together nicely, with clearance:

There was also a complication with the UART voltage selection, as some cellular modules use 1.8V and others 3.3V for this, so Goran had to also do some remediation on that to be able to test it.

So let's hook it up, and test it:

Without the 1.8V/3.3V fix, we see output from the modem, but we can't talk to it.

After fixing it to bypass the level converter, to allow 3.3V directly into it, we can get two-way communications, i.e., we can send a command like ATI, and get a response:

As mentioned, this is because some cellular modules require 3.3V for the UART:

We thought about just making the level converter voltage selectable, but that doesn't work, because the PCM digital audio interface is 1.8V on all modules we found so far:

And adding a pull-up to 3.3V didn't work, because our level-converter has a Xener or some similar to clamp the output to 1.8V regardless.

The interim solution was to cut a track, and route it elsewhere:

2.4.5 Testing

To test this module more thoroughly, we actually need the SD card/SIM card module and a DC:DC power module. Like this:

Gee, this whole idea of making the system modular is paying dividends in so many ways. They modules are designed to work together, and so we just bread-board them up, and can happily test away. It's even running from a battery, like it will need to. This is effectively letting us burn down integration risk early.

Here's a bit closer view of the SIM card adapter -- with a SIM card inserted -- connected to it:

So if it powers on, then we know the integration with the DC:DC converter is good. If it can see the cellular network and it responds READY to a +CPIN (pin unlock on SIM card status) with READY, then we know it's talking to the SIM card. Likewise, it has to be talking to the SIM card if it can read the CCIM from the card.

And yay! It does. So this confirms that both the cellular modem board (following its revisions) and the SIM/SD card modules are working, and the DC:DC module.

But let's be explicit in ticking off the DC:DC converter board...

2.1 High-Efficiency DC:DC Converter

2.1.4: Assembly

Here is the assembled 2A DC:DC converter board:

2.1.5: Testing

As described in the discussion for the cellular modem module, we have verified that this module works, by using it to power the cellular modem and SIM card modules.

We also checked that the 3.3V output is close enough (we built it using a 300K instead of 290K resistor, because that's what we had on hand, so the open-circuit voltage is a little higher (3.3V x 300/290 = 3.41V, plus a little for resistor tolerance and open-circuit voltage being a bit higher.

2.1.6: Revision

As Goran was working through this module, he found that I hadn't included as many GND pins as he thinks is sensible. He's probably right, so we have incorporated this in the updated version of the module.

So the summary is that we now have the cellular modem, SIM card holder and DC:DC regulator -- all the key ingredients we need to be able to start using these with a desktop MEGA65 to work on the software and integration of further modules.

MEGAphone FPGA board carrier modules

Time for another couple of modules, neither of which are particularly exotic. We just need something that can adapt the FPGA modules to our castellated module format. The real benefit of this is so that we can offset the modules by the thickness of the PCB, or mount modules on both sides of the PCB. so that we can reduce the overall thickness of the MEGAphone prototypes, and it gives us more flexibility for the FPGA modules we use.

We'll start with the Trenz TE0725 as the main FPGA. This has 2x50 pin 0.1" spaced headers. Our biggest issue is that it has two rows of such pins down each side, which means that if we use our same 0.1" spacing castellated connectors, that the module itself will be quite large. 50 x 0.1" = 5" length for the module. Annoying, yes. But probably tolerable. Especially since we can put modules on the opposite side of the PCB. We could save a few pins by reducing the number of GND and power pins. But that doesn't really feel like a sensible idea.

The two connectors on this board are 1.1" apart, and are aligned with 3 pin offset in the other direction. So let's add those connectors into the schematic and PCB to match what the TE0725 has:

I'll place them on the PCB in the right spot, then I can measure the cut-out required, and then use my module generation script to create the footprint.

... except it has >64 pins, so my automatic footprint generator doesn't support it. I'll have to hack it to support bigger, and make it just use a simple deterministic gap pin placement algorithm.

Here's the schematic for this board. As said, it's very simple, which kind of makes up for a couple of the other modules that were much more complex.

Ok. Done. I'm really not convinced that this is going to be a good solution, because those double-rows cause pretty horrible routing, as well as being excessively large.

I suspect we'll probably just put the headers directly onto the main assembly boards. But now we know. I'll still make one up to see just how bad the signal integrity is. Meanwhile, I've come to regard this as a module that I have drawn. Nothing more.

But in all honesty, if this is the one module that it doesn't make sense to make, then I'm still pretty happy with the whole approach. After all, it would have been a module for a module.

Anyway, let's move on to the FireAnt carrier. That at least has only single rows of 0.1" header down each side -- since it's the double rows on the TE0725 that were the whole problem here.

I was thinking again about whether to instead use an ICEsugar nano as the low-power FPGA board, in part because I have 5 of them sitting here, and I can't for the life of me find where I put the FireAnt boards I'm sure I purchased. The ICEsugar nanos have 8KB of RAM and the little FPGA is probably big enough for what we need. They are also nice and small. The main problem with them is that they seem to only be possible to power via the USB port.

This is because the power circuit looks like this:

The problem is that XC6206-1V2, that is only fed via the 5V USB. It's just a bit important, because it's the power for the FPGA... These aren't step-up/down regulators. So they rely on there being at least ~3.6V or 3.7V to generate the 3.3V output. This is a real shame, as otherwise VBAT (which can drop as low as 2.8V with our single cell LiFePO4 cell) could have been used.

One solution would be to remove U6 and the USB connector, and just run it from 3.3V supplied elsewhere on the MEGAphone. But removing the USB connector would prevent us from being able to reflash it after initial programming, which is hardly ideal. Or we could leave the USB connector, and risk frying things if we connected it up with some alternative supply.

Another solution would be to feed 5V directly to pin 3 of either U5 or U6, as those pins are large enough to tack to, and fairly accessible. There could still be back-flow issues if USB is plugged in, but that could be dealt with by putting a diode on it.

The question then becomes what the power consumption looks like. The ICE40 itself should be very low power, but the RISC-V CPU that talks to the USB bus may well consume a lot more. Hopefully the firmware on it is decent, and it goes to sleep when the USB bus is idle/disconnected.

A bit of quick probing with the multimeter has confirmed which pin on the DC regulators is the 5V input. So I can set my desk supply to 5V constant voltage and to display current consumption, and just measure it.

And it looks ok. My bench supply only shows 10mA accuracy, and it's showing 0.01A @ 5V. If I drop the desk supply to 4V, it's still showing 0.01A. And the same at 3.7V. I could construct the carrier so that if its powered from VBAT it's not physically possible to connect the USB to prevent problems there.

Actually, you know, we could power this from 3.3V by providing that to the high-side of the 3.3V regulator (so that it gets to the 1.2V regulator's high-side) with a Zener to protect from backflow if the 5V USB supply is connected. Then we could just run it from 3.3V without risk or complication. The 3.3V regulator on the ICEsugar would basically be doing nothing.

Or I can just cut the traces on the ICEsugar from the 5V pin of the USB-C connector, and require external 3.3V power, connected to both a 3.3V pin on a PMOD and the high-side of the 1.2V regulator. That requires only cutting the trace to the high-side of the 1.2V regulator. That one's a bit more of a pain to get to, but it's totally doable. Actually, I don't need to cut it: I _can_ use a diode to feed that one, because 3.3V minus a diode drop is still >> 1.2V.

That's a much better approach, because it's still fail-safe, without needing to modify the ICEsugar board... except that the DC regulators it uses aren't safe for back-power.

But that can be solved by simply removing the 3.3V reg and connecting our 3.3V external power to the high-side pad for that via a diode. A 1N1519 should be fine, and Jaycar sells them. So I could go buy one today and crock up a test circuit before I commit it to the schematic. Probably wise.

Acquired, regulatorectomy and diodotomy performed, and it all seems to work as expected: I can power and run it from 3.3V, and I can plug the USB in -- or both at once -- and there is a blessed lack of smoke, but happy blinking lights. I'll solder the diode in place now, and then see if I can talk to the module to program it still. If that works, then I know we're good with this approach.

I can at least see the USB device:

And the power consumption doubles when USB is plugged in, so that's a good sign that the RISC-V CPU might be throttling down when there is no USB. Or just turning off the USB circuit.

So let's see if we can't load an example project onto the ICEsugar, just to wrap it all up. Currently setting up the yosys open-source toolchain.

And we can, using this simple project I made.

I really do like the open-source FPGA synthesis toolchain. Unlike Vivado that takes ~1 minute to synthesise even just a blinking LED, yosys does it in less than a second. And generating the bitstream doesn't take minutes either (no idea why it even should/could in Vivado, but it does).

Apparently the iCEsugar nano can do USB serial as well as its funny fake mass storage device that you just drag the firmware onto to flash the thing.

I'm still mildly concerned that the iCEsugar uses a bit too much power, because of that RISC-V microcontroller on it. But that should be possible to optimise later by writing new firmware for that processor.

But the point is, we have a working low-power FPGA board, and so I can design the carrier board for it.

Annoyingly I can't find drawings of the board, so I have to use my ruler and guess a bit.

In terms of physical layout, the two rows of pins are 0.6" apart.

The PMOD on the end is not on the same 0.1" grid, however. centre of first row is 18.5mm along from hole at far end of the side PMODs. It's also centred on the other axis. What a pain to get exactly right without a drawing. You can see what I mean here.

Fortunately their unlabeled diagrams are high enough resolution that I should be able to confirm things to a reasonable degree of accuracy.

Also to add to the fun, pins A1 and B1 are routed to two different PMODs, despite there being 6 unallocated FPGA pins. So while there are potential for 20 pins, and 16 are traced out to headers, we really only have 14. All a bit crap. But it's what we've got.

But this module looks a lot saner than the TE0725 one:

So that's those two. I can try to send them off to PCBWay for ordering now, and then get back to the software side of things for a while.

Sunday, 3 August 2025

MEGAphone software tools and laying the foundation

In a recent post, I worked out how I am going to store SMS conversations, contacts, phone state and a pile of other stuff in D81 disk images, for convenient manipulation.

So now I need some tools that let me create those disk images and prepare their contents. Now, some of these functions will be needed on the MEGAphone, and others on the Linux build system. So I should try to make as many of them as common as possible to save re-work. This will also allow me to make unit tests for stuff that needs to run on the MEGA65.

So let's start with code that can take a 800KB blank disk image (or literally empty 800KB file on Linux) and populate it with a valid 1581 disk label and BAM sectors.

This is fairly straight-forward:

Disk header + first BAM sector on track 40 sector 0 (note I'm talking about physical 512 byte sectors here, because that's what the MEGA65 FDC sees).

Second BAM sector in first half of track 40 sector 1 + first 8 directory entries.

The wikipedia page for the 1581 has a reasonable run-down on what goes in the header, as well as the BAM sectors. There is an even better description here.

Now, for almost all these weird disk images for the telephony software, we also want to allocate all data blocks in a single sequential file, as far as the 1581 file system is concerned, so that if someone VALIDATEs such a disk image Nothing Bad Will Happen. Unlike, say, GEOS formatted disks.

To make the code runnable on Linux and on the MEGA65, I'll need to make a Hardware Abstraction Layer (HAL), just as we did with the MEGA65 FDISK program -- so this isn't the first time we've done this kind of thing. The correct HAL include will be selected at compile time, and the necessary files linked in:

#include <stdio.h>

extern unsigned char sector_buffer[512];
#define SECTOR_BUFFER_ADDRESS ((unsigned long long) &sector_buffer[0])

void hal_init(void);
void lfill(unsigned long long addr, unsigned char val, unsigned int len);
void lcopy(unsigned long long src, unsigned long long dest, unsigned int len);
void write_sector(unsigned char drive_id, unsigned char track, unsigned char sector);
char mount_d81(char *filename, unsigned char drive_id);
char create_d81(char *filename);

One of the challenges for the actual code to handle the disk images etc, is that it needs to be as compact as possible for compilation with CC65. Also, we have to remember that an int on CC65 is 32 bits, not 16 bits. Also, function calls are not particularly cheap.

Using our HAL, we can produce fairly compact code to assemble and write the sectors:

void format_image_fully_allocated(char drive_id,char *header)
{
char i,j;

/* Header + BAM Sector 1
---------------------------------------- */

// Erase sector buffer
lfill(SECTOR_BUFFER_ADDRESS,0x00,512);
// Clear header name
lfill(&header_template[4],0xa0,16);
// Set header name
for(i=0;(i<16) && (header[i]);i++) {
header_template[4+i] = ascii_to_petscii(header[i]);
}
lcopy(header_template,SECTOR_BUFFER_ADDRESS, sizeof(header_template));

// BAM is easy: Just a valid header, and leave all bitmap bytes $00, to
// indicate that there is no free space on the disk.
lcopy(bam_template,SECTOR_BUFFER_ADDRESS + 0x100,sizeof(bam_template));

write_sector(drive_id,40,0);

Note that for CC65 compatibility, we also have to declare variables at the start of blocks, because it generally uses an older C dialect.

So now I should make a tool that tries to create the set of D81s and test it under Linux, and then on the MEGA65.

This will start with a CONTACT0.D81 and STATE.D81 files. On Linux we can also create the directory structure. But I don't think we have the MKDIR Hypervisor call on the MEGA65 implemented yet. The MKFILE call does seem to be implemented, though -- but only supports creating normal files with sizes that are multiples of 512KB (well, allocation happens in multiples of 512KB -- it will write whatever specific file size is requested into the length field). So in theory we could use that as the basis for MKDIR, just changing the attribute byte and adding the . and .. special directory entries.

But let's instead just pike out, and require that the directory structure be created elsewhere for now, and only create it if we are on Linux. We can insert the SD card into a reader on the Linux machine as an easy way to bootstrap things. Otherwise I'll be lost for half a week or more just getting MKDIR to work, especially since I don't know if MKFILE is working reliably.

In fact, there is probably an argument for fully populating the whole structure with empty D81s. 1,580 contacts, each with an 800KB D81 for contacts and some supporting directory structure is still only a little over 1GB. And having everything pre-allocated will only speed things up. The main issue with doing that is when we sort the contacts, the mapping to the contact directories would change.

We can deal with that by pre-populating a contact UID in each otherwise empty contact, so that when the contact lists get sorted the underlying physical contact directory doesn't change. We then just have to make sure we erase the message log for a contact, and any other stuff we have in the contact directory, whenever we delete a contact, so that the message thread doesn't get accidentally associated with some new contact later on.

But first, let's check if I'm generating valid D81 files.

I found a couple of entertaining bugs: My check for not writing over the directory track when creating the sector chain was checking for track 0x40, not track 40 (decimal). Whoops!

I also realised I hadn't set the sector count for the pseudo-file I create that links to the chained sectors on the whole disk, so it was showing as 0 blocks. It's only cosmetic, but still worth fixing.

The next thing is that I know I've screwed up the sector linking somehow.

It should show monotonically increasing sector numbers in the links every 256 (0x100) bytes). But it looks like this (in correct links in bold).

00000000 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000200 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000300 01 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000400 01 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000500 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000600 01 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000700 01 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000800 01 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000900 01 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000a00 01 0b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000b00 01 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

So the links in the start of each 512 byte physical sector look fine, but those of the 2nd logical sector in each physical sector are wrong: I had forgotten to put the x2 in that I had for the first.

That got it better, but the final sector of the disk pointed to track 81, instead of having an end-of-file marker. Fixed that, and now c1541 can extract the file without error

c1541 -attach ../PHONE/STATE.D81 -extract

$ ls -latr tmp/all\ sectors
-rw-rw-r-- 1 paul paul 802639 Jul 28 10:14 'tmp/all sectors'

This looks encouraging. There should be 79 tracks (80 minus 1 for the directory track) x 20 physical sectors x 2 logical sectors per physical sector x 254 bytes per sector = 802640 bytes. The file is only 1 byte short. So maybe the end of file marker should be 0x00 not 0xFE. Yup --- that got the extra byte.

So the question is whether the file contents correctly represents the contents of the disk, and the sectors are in order. To make this easy to check, I added some code that writes the track and sector into the first data bytes of each sector, so that when I extract the "file", I can check the content of that file for the monotonically increasing track and sector numbers.

// First half always links to 2nd half of physical sector
lpoke(SECTOR_BUFFER_ADDRESS+0x000,i);
lpoke(SECTOR_BUFFER_ADDRESS+0x001,j*2+1);

#define CONTENT_TEST
#ifdef CONTENT_TEST
lpoke(SECTOR_BUFFER_ADDRESS+0x000+2,i);
lpoke(SECTOR_BUFFER_ADDRESS+0x001+2,j*2+1);
#endif

// Second half points to next physical sector, or to first
// sector of next track (or track 41 if we are on track 39,
// so that we skip the directory).
if (j<(20-1)) {
lpoke(SECTOR_BUFFER_ADDRESS+0x100,i);
lpoke(SECTOR_BUFFER_ADDRESS+0x101,j*2+1+1);
#ifdef CONTENT_TEST
lpoke(SECTOR_BUFFER_ADDRESS+0x100+2,i);
lpoke(SECTOR_BUFFER_ADDRESS+0x101+2,j*2+1+1);
#endif
} else {
lpoke(SECTOR_BUFFER_ADDRESS+0x100, (i==39)?41:(i+1));
lpoke(SECTOR_BUFFER_ADDRESS+0x101,0);
#ifdef CONTENT_TEST
lpoke(SECTOR_BUFFER_ADDRESS+0x100+2, (i==39)?41:(i+1));
lpoke(SECTOR_BUFFER_ADDRESS+0x101+2,0);
#endif
}

if (i==80&&j==(20-1)) {
// Terminate chain at end of disk

lpoke(SECTOR_BUFFER_ADDRESS+0x100, 0x00);
lpoke(SECTOR_BUFFER_ADDRESS+0x101, 0xff);
#ifdef CONTENT_TEST
lpoke(SECTOR_BUFFER_ADDRESS+0x100+2, 0x00);
lpoke(SECTOR_BUFFER_ADDRESS+0x101+2, 0xff);
#endif
}

This is all great, and I can see the data markers in the file:

$ hexdump tmp/all\ sectors

0000000 0101 0000 0000 0000 0000 0000 0000 0000
0000010 0000 0000 0000 0000 0000 0000 0000 0000
*
00000f0 0000 0000 0000 0000 0000 0000 0000 0201
0000100 0000 0000 0000 0000 0000 0000 0000 0000
*
00001f0 0000 0000 0000 0000 0000 0000 0301 0000
0000200 0000 0000 0000 0000 0000 0000 0000 0000
*
00002f0 0000 0000 0000 0000 0000 0401 0000 0000
0000300 0000 0000 0000 0000 0000 0000 0000 0000
*
So that's all good. The markers are every 254 bytes, because the first two bytes of each sector are used for the link to the next track and sector. This is why the position of the markers looks like it moves backwards.

That's all great.

What has me confused is that while the markers are sector then track. I was getting quite worried, until I remembered that hexdump by default shows 16 bit values, i.e, with byte order swapped. Why? No idea. But the point is that it works: I'm creating D81 files that have all sectors allocated into one big chain, and it's all working.

So now I can work on some tools that take a fictional contact list and message stream of messages between contacts, and smoosh them into the correct D81 files --- and then make a tool that goes the other way, i.e., extracts a similar canonical list of contacts and messages from the disk images. With those tools, I'll then be in a position to be able to easily test the actual routines for receiving and logging messages etc.

So let's start by coming up with a simple canonical format for contacts with first and last name, fictional phone numbers, and then generating message traffic. This sounds like a good job for some vibe coding with an LLM. And indeed it's done a pretty fine job quickly for me, letting me generate files like this:

CONTACT:+9912208959:Donald:Hall:
CONTACT:+9992142067:Jeffrey:Jones:
CONTACT:+9991073060:Nicole:Mccarty:
CONTACT:+9911406450:Alyssa:Esparza:
CONTACT:+9961792880:Sophia:Sanchez:
CONTACT:+9971265517:Derek:Perry:
CONTACT:+9995659445:Jason:Williams:
CONTACT:+9912477184:Erica:Mack:
CONTACT:+9997053229:Rita:Lee:
CONTACT:+9975118477:Christopher:Tate:
MESSAGERX:+9971265517:1325413834:sed tempor et sit labore ipsum:
MESSAGERX:+9961792880:1325401304:😁 MEGA65 lorem elit sed:
MESSAGERX:+9997053229:1325377469:eiusmod amet MEGAphone consectetur tempor et sit sit:
MESSAGERX:+9912477184:1325340580:✨🍑🐶🐶🍏 sit eiusmod incididunt dolore sed:
MESSAGERX:+9991073060:1325349012:🐨📱🎉🐹😳🐸 tempor amet ut aliqua amet ut sit elit consectetur sed:
MESSAGERX:+9911406450:1325396204:adipiscing elit lorem elit dolore labore labore sed magna elit 🥚😉😚🤬📱🍿:
MESSAGETX:+9961792880:1325418258:🛑🙄 sed amet adipiscing amet labore ipsum incididunt:
MESSAGETX:+9912208959:1325369837:amet dolor labore 🤯🍳📚😴🍊:
MESSAGETX:+9911406450:1325447029:magna adipiscing et ipsum dolor dolore incididunt sed ut:
MESSAGERX:+9911406450:1325514412:et amet sed do amet adipiscing ut ut et amet:

The only mistake it made was to use +99 as an unallocated prefix, instead of +999. But that was an easy fix.

So now I need to make the Linux tool that populates the CONTACT0.D81 with the contacts from such a file. Once I have that working, then I can work on the logging of messages.

We want to convert the CONTACT lines above into the binary format I defined in an earlier blog post. But before we can do that, we need to be able to encode and decode fields in records. We'll also want to allocate and free records, too. So I'll tackle all that first.

Let's start with building our 512 - 2x2 = 508 byte records (split by the track and sector marker for the 2nd 256-byte logical sector). We can easily go from packed to sectorised format:

void sectorise_record(unsigned char *record,
unsigned char *sector_buffer)
{
lcopy((unsigned long)&record[0],
(unsigned long)&sector_buffer[2],254);
lcopy((unsigned long)&record[254],
(unsigned long)&sector_buffer[256+2],254);
}

void desectorise_record(unsigned char *sector_buffer,
unsigned char *record)
{
lcopy((unsigned long)&sector_buffer[2],
(unsigned long)&record[0],254);
lcopy((unsigned long)&sector_buffer[256+2],
(unsigned long)&record[254],254);
}

This lets us then just treat the records as 508 contiguous blocks. We can then have very nice simple code to add, delete and find fields in these records:

char append_field(unsigned char *record, unsigned int *bytes_used, unsigned int length,
unsigned char type, unsigned char *value, unsigned int value_length)
{
// Insufficient space
if (((*bytes_used)+1+1+value_length)>=length) return 1;
// Field too long
if (value_length > 511) return 2;
// Type must be even
if (type&1) return 3;

record[(*bytes_used)++]=type|(value_length>>8);
record[(*bytes_used)++]=value_length&0xff;
lcopy((unsigned long)value,(unsigned long)&record[*bytes_used],value_length);
(*bytes_used)+=value_length;

return 0;
}

char delete_field(char *record, unsigned int *bytes_used, unsigned char type)
{
unsigned int ofs=2; // Skip record number indicator
unsigned char deleted=0;

while(ofs<=(*bytes_used)&&record[ofs]) {
unsigned int shuffle = 1 + 1 + ((record[ofs]&1)?256:0) + record[ofs+1];

if ((record[ofs]&0xfe) == type) {
// Found matching field
// Now to shuffle it down.
// On MEGA65, lcopy() has "special" behaviour with overlapping copies.
// Basically the first few bytes will be read before the first byte is written.
// The nature of the copy that we are doing, shuffling down, is safe in this context.
lcopy((unsigned long)&record[ofs+shuffle],(unsigned long)&record[ofs],(*bytes_used)-ofs-shuffle);
(*bytes_used) -= shuffle;

deleted++;

// There _shouldn't_ be multiple fields with the same type in a record, but who knows?
// So we'll just continue and look for any others.
} else ofs+=shuffle;
}
return deleted;
}

char *find_field(char *record, unsigned int bytes_used, unsigned char type, unsigned int *len)
{
unsigned int ofs=2; // Skip record number indicator

while(ofs<=(*bytes_used)&&record[ofs]) {
unsigned int shuffle = 1 + 1 + ((record[ofs]&1)?256:0) + record[ofs+1];

if ((record[ofs]&0xfe) == type) {
*len = ((record[ofs]&1)?256:0) + record[ofs+1];
return &record[ofs+2];
}

ofs+=shuffle;
}
return NULL;
}

Note that we have reserved the first two bytes of each record to have an immutable record number. We can't use the track and sector marker for this, because if the records get sorted, their unique identifier would change. For the main contacts database we want those to remain in order and unchanging: Contacts can only be added, deleted or edited there. It's only in the sorted lists that they will be rearranged, and where those record numbers will be used to point back to the original record, which is the one that will actually get edited.

We'll deal with those sorting and searching functions shortly. But first, lets actually build, store and retrieve actual contact records, and pre-set the record numbers on the disk image ready for use.

Lots of squashing the usual variety of bugs later, it looks like I can write contact records to the CONTACT0.D81 disk image:

00000000 01 01 3f 00 00 00 00 00 00 00 00 00 00 00 00 00 |..?.............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 01 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000200 01 03 00 00 02 06 4a 65 72 72 79 00 04 09 57 69 |......Jerry...Wi|
00000210 6c 6c 69 61 6d 73 00 06 0d 57 69 6c 6c 69 61 6d |lliams...William|
00000220 73 00 01 c7 de 70 00 00 00 00 00 00 00 00 00 00 |s....p..........|
00000230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000300 01 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000310 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400 01 05 00 00 02 08 59 6f 6c 61 6e 64 61 00 04 09 |......Yolanda...|
00000410 41 6e 64 65 72 73 6f 6e 00 06 0d 41 6e 64 65 72 |Anderson...Ander|
00000420 73 6f 6e 00 01 c7 de 70 00 00 00 00 00 00 00 00 |son....p........|
00000430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000500 01 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000510 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

... well, the names are written, but we're writing the last name twice, instead of the phone number. But that's an easy fix.

Now that I have all the machinery for contacts, I should be able to do much the same for messages, so that I have the import direction complete.

Importing messages is more complex, because we have to look up the contact from the incoming phone number. And to do that, we need our sorted versions of the CONTACT0.D81 file. So I guess I need to make the sorting routines next.

We want the sort routine to be (a) reasonably fast/efficient; and (b) still able to run on a MEGA65 without attic RAM. This is a bit annoying, because we don't have enough RAM to load a whole D81 into RAM to sort the records. We can probably read 64KB or 128KB at a time, sort those sectors and write them back, and then step forward through the disk image sorting each portion. If we make those portions overlap, and check if we changed the order of any sector in any portion, we can repeat the process until the list is sorted. Basically at the large-scale it's a bubble sort. At the fine scale, we could get fancier than a bubble sort.

Okay, after poking about and using ChatGPT, I've settled on a quick sort for the inner sort, and then a merge sort of the resulting 10 sorted 80KB slabs. This requires only one full read of the input disk, a full write to a scratch disk image, then reading that scratch disk image track-at-a-time, and then writing linearly to the sorted disk image.

That's much less bad than my naive approach. It does increase the complexity a bit, but hopefully not too much. I'm going to write some tests for this, because I can just smell that it's going to be a great breeding ground for esoteric bugs.

Once it is working, it will be great, though, because it exercises all sorts of things in this system, including sectorisation and desectorisation of data structures, searching for fields in records and goodness-knows-what else.

Making progress there. I do get records written into the sorted D81 -- but every record is the same. It looks like the records in the scratch D81 are sorted correctly for each slab. So it must be the merge that is wonky. Ok, found the problem: I wasn't fetching the record to actually be written, so it was just stamping out the old contents of the sector buffer each time. But with that fixed, the darn thing seems to work! Here's the contacts sorted by first and last names:

$ hexdump -C PHONE/SORT02-0.D81 | tail -42
*
000c7600 01 07 00 00 02 07 43 61 72 6d 65 6e 00 04 06 53 |......Carmen...S|
000c7610 6d 69 74 68 00 06 0d 2b 39 39 39 38 36 33 30 32 |mith...+99986302|
000c7620 39 34 35 00 00 00 00 00 00 00 00 00 00 00 00 00 |945.............|
000c7630 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7700 01 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7710 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7800 01 09 00 00 02 05 45 6d 6d 61 00 04 08 4a 6f 68 |......Emma...Joh|
000c7810 6e 73 6f 6e 00 06 0d 2b 39 39 39 37 33 30 31 34 |nson...+99973014|
000c7820 35 31 32 00 00 00 00 00 00 00 00 00 00 00 00 00 |512.............|
000c7830 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7900 01 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7910 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7a00 01 03 00 00 02 06 4a 65 72 72 79 00 04 09 57 69 |......Jerry...Wi|
000c7a10 6c 6c 69 61 6d 73 00 06 0d 2b 39 39 39 32 30 39 |lliams...+999209|
000c7a20 31 35 30 36 30 00 00 00 00 00 00 00 00 00 00 00 |15060...........|
000c7a30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7b00 01 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7b10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7c00 01 0b 00 00 02 07 4e 69 63 6f 6c 65 00 04 07 48 |......Nicole...H|
000c7c10 6f 6c 6d 65 73 00 06 0d 2b 39 39 39 36 36 30 34 |olmes...+9996604|
000c7c20 39 33 37 32 00 00 00 00 00 00 00 00 00 00 00 00 |9372............|
000c7c30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7d00 01 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7e00 01 05 00 00 02 08 59 6f 6c 61 6e 64 61 00 04 09 |......Yolanda...|
000c7e10 41 6e 64 65 72 73 6f 6e 00 06 0d 2b 39 39 39 38 |Anderson...+9998|
000c7e20 35 34 32 33 30 39 38 00 00 00 00 00 00 00 00 00 |5423098.........|
000c7e30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7f00 01 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7f10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c8000
$ hexdump -C PHONE/SORT04-0.D81 | tail -42
*
000c7600 01 05 00 00 02 08 59 6f 6c 61 6e 64 61 00 04 09 |......Yolanda...|
000c7610 41 6e 64 65 72 73 6f 6e 00 06 0d 2b 39 39 39 38 |Anderson...+9998|
000c7620 35 34 32 33 30 39 38 00 00 00 00 00 00 00 00 00 |5423098.........|
000c7630 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7700 01 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7710 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7800 01 0b 00 00 02 07 4e 69 63 6f 6c 65 00 04 07 48 |......Nicole...H|
000c7810 6f 6c 6d 65 73 00 06 0d 2b 39 39 39 36 36 30 34 |olmes...+9996604|
000c7820 39 33 37 32 00 00 00 00 00 00 00 00 00 00 00 00 |9372............|
000c7830 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7900 01 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7910 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7a00 01 09 00 00 02 05 45 6d 6d 61 00 04 08 4a 6f 68 |......Emma...Joh|
000c7a10 6e 73 6f 6e 00 06 0d 2b 39 39 39 37 33 30 31 34 |nson...+99973014|
000c7a20 35 31 32 00 00 00 00 00 00 00 00 00 00 00 00 00 |512.............|
000c7a30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7b00 01 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7b10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7c00 01 07 00 00 02 07 43 61 72 6d 65 6e 00 04 06 53 |......Carmen...S|
000c7c10 6d 69 74 68 00 06 0d 2b 39 39 39 38 36 33 30 32 |mith...+99986302|
000c7c20 39 34 35 00 00 00 00 00 00 00 00 00 00 00 00 00 |945.............|
000c7c30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7d00 01 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7d10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7e00 01 03 00 00 02 06 4a 65 72 72 79 00 04 09 57 69 |......Jerry...Wi|
000c7e10 6c 6c 69 61 6d 73 00 06 0d 2b 39 39 39 32 30 39 |lliams...+999209|
000c7e20 31 35 30 36 30 00 00 00 00 00 00 00 00 00 00 00 |15060...........|
000c7e30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000c7f00 01 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000c7f10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*

I know I shouldn't be amazed, but it's still a really nice feeling, because while this code is running under Linux, it's been written using the faked MEGA65 C API, and should be able to be easily compiled for the MEGA65 when I get to that point.

But first, let's make our export utility, that will let us extract the contacts we've loaded in, so that we can make unit tests that populate and then check.

Okay, that's now working -- and thanks to all the foundation work to date, wasn't particularly hard to do. So now I can run a command like this:

$ python3 src/telephony/sms-stim.py 5 10 ; \
make src/telephony/linux/{import,export,provision} \
&& src/telephony/linux/provision \
&& src/telephony/linux/import stim.txt \
&& src/telephony/linux/export export.txt

And then export.txt will end up containing something like this:

$ cat export.txt
CONTACT:+99920915060:Jerry:Williams:0:
CONTACT:+99985423098:Yolanda:Anderson:0:
CONTACT:+99986302945:Carmen:Smith:0:
CONTACT:+99973014512:Emma:Johnson:0:
CONTACT:+99966049372:Nicole:Holmes:0:

And they match! The only difference is that exporting contacts includes the number of unread messages for each contact. I've since updated the import program to allow setting those values, too.

So the next step is to import and export messages. Exporting will be fairly easy. Importing is a bit fiddlier. We need to have our contact index by phone number working, so that the messages can be efficiently inserted. So let's work on making the indices.

As a reminder, the indices will index every two-byte pair in the field being indexed. But since we have only 1,580 sectors available in an index D81, each of which can hold two index pages, we can't index all 256x256 combinations. Our practical limit is 56x56 = 3,136 halves = 1,568 full sectors.

Now, it would be nice if we could have had 64x64, as we could just mask bytes into the 0x20 -- 0x7F range, which would make indexing unicode UTF-8 sequences a bit nicer easier. We might just be able to do it, if we accept the trade-off of having index pages spanning sectors. This is because each index is a 1-bit bitmap of 1,580 bits = 198 bytes. 790K / 198 bytes = 4,085, which is just under 64x64. Drat! But 799KB (less 8 bytes per KB for CBM DOS compatible sector links) is just over 4096x198 bytes. So if we limit ourselves to a single directory sector we can make it work. It's a bit of a hack.

The question is whether it's worth the hack? I'd have to modify the disk image provisioning stuff to handle it. Index searches would require up to twice as many sector accesses, too. The more I think about it, I don't think it is worth it. Instead we can just use repeated subtraction to wrap all byte values into the modulo 56 space. For 8-bit values we only need to subtract at most 4 types, which feels reasonable. The only niggling concern I have is if this results in unfortunate aliasing. Both problems could be mitigated by using a custom byte mapping with a 256-byte lookup table. Probably requires barely more RAM than implementing the repeated subtraction.

An related topic is selecting between case sensitive and case-insensitive indexing. The norm is case-insensitive, so we'll just make our 256-byte mapping table handle that.

I think the mapping table can just be 1:1 except for the upper/lower case chars. Forming sieves of each diphthong's bitmap should let us filter very efficiently, I think. I could get all fancy pants and add the bitmaps together so that we can get a sense of quality of match via number of matching diphthongs, that would also allow for ranking and thus accepting partial matches where no exact matches exist. That's outside of our core requirements, and would require a 1,580 byte structure when searching, which would be a similarity score of each message slot to the query text, which will be really nice.

And maybe map space, tab, carriage-return and line-feed all to the same value, so that the type of white-space doesn't matter so much? But I'm not even sure that will be necessary if I have this nice similarity score thing happening.

So let's make that mapping table with just case folding, make the index builder, and see how it works. For efficiency, we don't want to do random reads and writes as we go over each diphthong in a message. So we should probably build the bitmap that represents the diphthongs present in the message being indexed (requires only 1,580 bits), and then go linearly through the index reading and updating each index -- not just setting the bits for the present diphthongs, but making sure we clear those not present in the record. This will make creating or modifying a contact or message take perhaps a second or two. But that feels acceptable. Otherwise we risk indices getting out of sync, which will make the search function mess up. So we'll start with that, and optimise it if we need to.

I've refactored a bunch of code for this, in part to handle if the work area we have is no the current 128KB. This is because if a message or contact is being added or edited, we will want to have the display active. But we need space in the precious chip RAM for the unicode font glyphs. It might be that I can still keep enough space for those and a large enough work area for the indexes and sorting. If so, great. But I suspect I'll have to reduce it from 128KB to perhaps 40KB or even smaller. We'll see. But the main thing is that we're prepared for it now.

We will want to take considerable care to keep the indices up to date, because to build the index for all contacts or all messages in a thread requires a complete pass over the index D81 for each contact or message. That can take a long time! Later, I should make this much faster by using a substantial work area to generate the index for the contacts and messages in batches. This should happen in a separate helper program with no messages on the screen, so that we can build the index bitmaps for perhaps a few hundred records in one go, and then only need a few passes, instead of currently potentially requiring 1,680 passes if a thread or contact D81 were full... which could take a couple of hours.

I'll get it working in this inefficient way first, and then work on optimising it.

But first I've gone through and refactored out all large buffers used by the different modules (eg sort vs indexing) that cannot call each other. By "large" I mean anything at all, since our code + data size is limited to about 40KB with CC65. But my initial target was the 508 record buffers that I need for sorting and indexing. I also made all the header files reentrent, as the dependencies between them is starting to get a bit more complex.

It really would be nice if the TE0725's with HyperRAM were in stock, so that we could have the extra 8MB as Attic RAM like on the desktop MEGA65. This would let us do all index generation in a single pass, and deal with this whole efficiency problem that way. I'll implement it all to work without Attic RAM for now, but I'll then probably just make some #ifdefs to switch to much more efficient algorithms (and not using precious chip RAM for the 128KB work area) when Attic RAM is available.

But let's keep on task for now, to actually produce the contact/message thread indexes.

And now I've broken something. Sorting the contacts D81 is failing. Found it: Just a stupid error in the slab_read() refactor, where I had it reading the slab from the wrong disk image.

Okay, so something is being written to the indexes now when I create them. But something is broken. Only very few index pages get set. And none of the index page bytes have more than a single bit set. All highly suspicious. Sure, I have only 5 contacts I'm indexing, but I still really expect more than 10 bits to be set in the index.

Yup, it looks like only diphthongs 0 and 0x780 are being set, and are being set for each contact. Let's figure out what's happening there. But I'm also not convinced that even those are being written out correctly, since the 10 bits that are being set are not localised to the two index pages, but a single bit each in 10 index pages.

Okay, so first problem there was that I wasn't actually using the table to map characters to the modulo 56 number space. As a result totally invalid diphthongs were being generated. I've fixed that now, but they aren't all being written. Something is still going wonky, possibly in writing them out, rather than the code that is generating the set of diphthongs now. This is what I see:

DEBUG: Diphthong 0x00b is in the indexed text.
DEBUG: Diphthong 0x26d is in the indexed text.
DEBUG: Diphthong 0x12a is in the indexed text.
DEBUG: Diphthong 0x402 is in the indexed text.
DEBUG: Diphthong 0x409 is in the indexed text.
DEBUG: Setting bit for diphthong 0x00b from record 1 at slab offset 0x00b00, slab 0
DEBUG: Setting bit for diphthong 0x12a from record 1 at slab offset 0x12a00, slab 0
DEBUG: Setting bit for diphthong 0x80b from record 1 at slab offset 0x08b00, slab 3
DEBUG: Setting bit for diphthong 0xa6d from record 1 at slab offset 0x06d00, slab 4
DEBUG: Setting bit for diphthong 0x100b from record 1 at slab offset 0x10b00, slab 6
DEBUG: Setting bit for diphthong 0x126d from record 1 at slab offset 0x0ed00, slab 7
DEBUG: Setting bit for diphthong 0x1402 from record 1 at slab offset 0x00200, slab 8
DEBUG: Setting bit for diphthong 0x1409 from record 1 at slab offset 0x00900, slab 8

Something funny is going on with the diphthong numbers it thinks are relevant for each index page. Found that bug. Now it's less bad looking, but it's still messed up somewhere:

DEBUG: Diphthong 0x00b is in the indexed text.
DEBUG: Diphthong 0x26d is in the indexed text.
DEBUG: Diphthong 0x12a is in the indexed text.
DEBUG: Diphthong 0x402 is in the indexed text.
DEBUG: Diphthong 0x409 is in the indexed text.
DEBUG: Setting bit for diphthong 0x000b from record 1 at slab offset 0x00b00, slab 0
DEBUG: Setting bit for diphthong 0x012a from record 1 at slab offset 0x12a00, slab 0
DEBUG: Setting bit for diphthong 0x026d from record 1 at slab offset 0x12d00, slab 1
DEBUG: Setting bit for diphthong 0x0402 from record 1 at slab offset 0x04200, slab 3
DEBUG: Setting bit for diphthong 0x0409 from record 1 at slab offset 0x04900, slab 3
DEBUG: Setting bit for diphthong 0x080b from record 1 at slab offset 0x08b00, slab 6
DEBUG: Setting bit for diphthong 0x092a from record 1 at slab offset 0x06a00, slab 7
DEBUG: Setting bit for diphthong 0x0a6d from record 1 at slab offset 0x06d00, slab 8
DEBUG: Setting bit for diphthong 0x0c02 from record 1 at slab offset 0x0c200, slab 9
DEBUG: Setting bit for diphthong 0x0c09 from record 1 at slab offset 0x0c900, slab 9

We're now writing all 5 of the dipthongs, but we're duplicating them, with 2048 added to them, by the look of things. Okay, that problem was using a char instead of a 16-bit type for the index into the diphthong bitmap.

The indexes written to disk now look plausible! But I won't believe it until I've verified it. I'll make a little search utility that will use the index to find matching records, and calculate their similarity scores, and then rank them.

Searching using the index is pretty simple, we just have to count up the number of hits against each diphthong against each record. Then we'll have a nice score fore each record. Sort those by descending score, and we'll have a ranked set of results. Alternatively, leave them unsorted for it to act as a filter. We'll probably have an option to toggle that sort order. But first, let's calculate those scores.

We also want search queries to be able to be quickly edited and have the results update in more or less real-time. This includes both inserting and deleting characters. We might also want to add or delete multiple at a time, e.g., for UTF-8 multi-byte sequences, so we should separate updating the query and query state from providing the optionally sorted search results.

The API so far is looking like this:

char search_query_init(void);
char search_query_release(void);
char search_query_append(unsigned char c);

I'll add functions for deleting characters at end or ranges of characters at specific positions within the query -- or inserting them -- as well as the function to ask it to update the query results, and select whether to sort the results, or leave them in their natural order.

Okay, I've added these now, too:

char search_query_delete_char(void);
char search_query_delete_range(unsigned int first, unsigned int last);

I still don't have the code to filter and sort, but let's start by testing what we have first, by making a linux command line tool for testing search. First step done: I can run a query from the command line utility and get the scores of the contacts (I'm searching by first name here):

$ make src/telephony/linux/search \
&& src/telephony/linux/search PHONE/CONTACT0.D81 PHONE/IDX02-0.D81 "Nicole"
make: 'src/telephony/linux/search' is up to date.
INFO: Disk image 'PHONE/CONTACT0.D81' mounted as drive 0
INFO: Disk image 'PHONE/IDX02-0.D81' mounted as drive 1
1:1:
2:2:
3:1:
4:1:
5:6:
INFO: Completed.
$ cat export2.txt
CONTACT:+99920915060:Jerry:Williams:0:
CONTACT:+99985423098:Yolanda:Anderson:0:
CONTACT:+99986302945:Carmen:Smith:0:
CONTACT:+99973014512:Emma:Johnson:0:
CONTACT:+99966049372:Nicole:Holmes:0:

And it works! Contact 5 gets a score of 6, because it is a strong match to my query. Record 2 also gets a hit, because "Yolanda" contains the diphthong "ol" from our query.

However, it is still being case-sensitive for some reason. I can only assume I messed up my lookup table. Yup -- out by one error in position of the lower-case alphabet. Now it works just fine with any case.

So now we can work on getting the filtered list of results, and then having them optionally sorted by score. I've implemented these functions now:

char search_query_rerun(void);
char search_collate(void);
char search_sort_results_by_score(void);

So let's update our search program to show the matching records, and to ignore those with score == 1, since those don't actually match the query -- they're just valid. Okay, all done, and it works :)

$ make src/telephony/linux/search \
&& src/telephony/linux/search PHONE/CONTACT0.D81 PHONE/IDX02-0.D81 "Jerry"
make: 'src/telephony/linux/search' is up to date.
INFO: Disk image 'PHONE/CONTACT0.D81' mounted as drive 0
INFO: Disk image 'PHONE/IDX02-0.D81' mounted as drive 1
1:5:Jerry:Williams:+99920915060:0
INFO: Completed.
$ make src/telephony/linux/search \
&& src/telephony/linux/search PHONE/CONTACT0.D81 PHONE/IDX02-0.D81 "Nicole"
make: 'src/telephony/linux/search' is up to date.
INFO: Disk image 'PHONE/CONTACT0.D81' mounted as drive 0
INFO: Disk image 'PHONE/IDX02-0.D81' mounted as drive 1
5:6:Nicole:Holmes:+99966049372:0
2:2:Yolanda:Anderson:+99985423098:0
INFO: Completed.

Okay, so exact and fuzzy matching are both working. So what's next? Probably importing the SMS messages, now that we have a way to find the corresponding contact by looking for the contact that corresponds to the originating phone number, and then updating the unread message count and the message thread index to support searching within (but not between) threads.

This really shouldn't be hard now that we've got all this foundation work done.

Messages from unknown numbers we should just map to a hard-wired "UNKNOWN Numbers" contact, that possibly we declare to always occupy contact record 1 for efficiency and simplicity. In that case, we'll record the orginating number with the messages, so that if the user creates a contact for them later, we can find all the relevant messages and pre-populate the message thread for that contact. Actually, we can just always store the originating number.

A couple of hours later I have message importing working fairly well -- except that the search for the contact by telephone number is completely failing. This is weird, because I can search for the phone number using my command line tool, and it works just fine. If I disable that, and just let it do an O(n) trawl through the contacts D81 to find the contact, then they do get stored (yet to confirm the thread MESSAGES.D81 really has the messages stored, but the unread message count goes up for SMS messages received :).

That problem turned out to be that the import program wasn't generating the contact index by phone number until the end --- so the index was legitimately empty when the search happened.

With that fixed, it now looks good.

Next step is to make a message thread search utility, that with no query string will show all messages in a thread with a contact (also selected by a search string), optionally sorted by relevance.

Okay, I have that working without the message thread search query, like this:

$ make src/telephony/linux/thread && src/telephony/linux/thread "Nicole"
make: 'src/telephony/linux/thread' is up to date.
CONTACT:Nicole:Holmes:+99966049372:2
MESSAGETX:+99966049372:398185456:🍳🍿 Quasi nisi quidem, quis veniam sed numquam ipsam quo hic amet molestiae? Veritatis cupiditate ullam nihil et tenetur doloribus, accusantium:
MESSAGERX:+99966049372:398196160:Dolores excepturi facilis atque quas labore rerum, libero voluptates modi, dolorum delectus dolores praesentium rerum dignissimos 😭🥺:
MESSAGERX:+99966049372:398235394:Beatae consequuntur similique rerum et, est nostrum MEGAphone quo quidem facilis in corporis 🦄:
MESSAGETX:+99966049372:398298378:Impedit dolorem ut neque quo placeat eaque necessitatibus culpa blanditiis, cupiditate velit minima ducimus deleniti esse.:
INFO: Returning conversation only for best matching contact.

The problem is that the message thread index seems to be bonkers. It has bits set in it, but there are bits for messages that don't exist. So I need to figure out what's going wrong there. The first part of the problem was the physical sector markers that we have in the CONTACTS disk images. But they were being set in all D81s that were being provisioned. With that fixed, we no longer have any spurious bits set that don't correspond to real messages.

So why is the query resulting in no matches? The search utility that is really made for contacts, rather than message threads does find hits. So I guess this means the problem is more likely in threads.c. Modifying search so that it can show the body text of messages that match, it looks like the index contents are all fine.

Found and fixed -- another simple PEBCAC: For some reason I had lost the line that actually added the query text into the search. With that fixed, it looks like it's working now:

$ src/telephony/linux/thread "Nicole" "MEGA"
CONTACT:Nicole:Holmes:+99966049372:2
MESSAGERX:+99966049372:398236805:Beatae consequuntur similique rerum et, est nostrum MEGAphone quo quidem facilis in corporis 🦄:
INFO: Returning conversation only for best matching contact.

And look at that: We've found a matching contact and message with the string in it :)

So, I think that's actually the last of the ingredients I need to actually start getting this all ready to run natively. I'll have to adapt the code, and start adding the UI stuff, like actually rendering messages and contacts. But with the stimulator, I can populate a fairly extensive set of contacts and messages for each, so that there's something to see in the various views.

Anyway, this post has dragged on more than long enough, so I'll stop it right now, and start work on getting this transitioned to native MEGA65 running to display contacts etc.

Friday, 18 July 2025

MEGAphone Call, SMS and Contacts Management

Managing contacts on a system that has 384KB of RAM is not as simple as it first sounds. This is compounded by the limited support for direct FAT file system access in the Hypervisor.

My current plan is to use a D81 disk image to store each contact in a single 512 byte sector. As the disk images are 8,192,000 bytes = 800KiB, this means we can have 1,600 contacts -- less a bunch for track 40 that we will keep reserved for a valid 1581 directory listing, that will basically identify it as a MEGAphone contacts disk image. That will leave 790KiB = room for 1,580 contacts.

To keep the contacts file a valid 1581 file (as well as supporting direct sector access), the first two bytes of each contact sector will be a standard link to the next sector. Actually, each 512 byte sector has to have two of these sector pointers, one at offset 0 and the other at offset 256, because the 1581 uses 256 byte logical sectors for commonality with the 1541/1571. That leaves us with 508 bytes per contact.

Now, what exactly we do with all 508 bytes is up for grabs. But what we can do, is decide on how to store the name and primary telephone number of the contact, since those are the two most important fields. Things like an avatar we'll leave for now as out of scope, as while desirable, they are not essential for basic operation.

For flexibility, we'll use a single byte to indicate the field type, followed by a length field. We'll use the LSB of the field type as MSB of the length field, to allow for fields to consume all 508 bytes.

0x00 - End of contact record

0x02 - Contact Name

0x04 - Contact Second Name (surname)

0x06 - Primary number

0x08 - Disk image name for SMS conversation file

0x0A - Number of unread SMS messages (length = 2, to allow for upto 65,535) unread messages.

The contact disk image should be sorted by telephone number for quickly finding a contact based on incoming telephone number (for handling SMS arrivals).

A secondary contact disk image should be maintained as a subordinate copy, re-sorted by name. And a 3rd copy sorted by second name. Whenever a contact is modified in the primary contact file, these subordinate contact indices should be updated. Similarly whenever a contact is added or deleted from the primary contact list.

This implies that we need a sort function for a contacts disk image, that sorts based on the supplied field type. As the MEGAphone won't have the 8MB HyperRAM in the initial build, we have to be able to do this sort in chip RAM. In practice, this means that we'll probably read 64KB of sectors in at a time, sort those, and then write them back, advance 32KB, and repeat. This will implement a kind of batched of bubble-sort. Multiple such passes will be required, until no further changes occur during a complete pass. We could do better in terms of big-O time cost, but this algorithm will be small and simple, and we can easily improve it down the track.

Now, we indicated having a disk image for each SMS conversation, We'll do a similar approach to the contact disk images, by storing the most recent 1,580 messages, each in their own 512 byte sector. The message format should indicate if it's an in-bound or out-bound message, the time and date of the message, followed by the message body. We'll use a similar type + length + data format:

0x00 - End of message record

0x02 - Message is in-bound (from contact) or outbound (1 byte length)

0x04 - Time and date of message (ASCII representation)

0x06 - Message body, UTF-8.

Now, for searching for contacts and messages within a conversation, we'll implement a horribly crude index, where we have a bitmap of 1,600 bits = 200 bytes for each paired combination of 56 characters, i.e., 56x56 = 3,136 256 byte sectors = 1,568 512 byte physical sectors, just fitting in the 1,580 sectors available in the D81. For the 56 characters, we will allow A-Z (case insensitive), 0-9, some punctuation, and German extra characters for now (ö, ä, ü and ß), given the initial market focuses will be English and German speaking. We'll also reserve one of those characters to mean "any other character," so that at least partial filtering will be possible, regardless of the language, provided it uses a reasonable fraction of latin letters. It's a far from perfect solution, but it will allow for fast searching, without using a lot of RAM or complex algorithms.

Basically when searching for a string like "potato", all we have to do is generate each consecutive character pairing, i.e., "po", "ot", "ta", "at", and "to", retrieve the bitmap for each and logically AND them together. Any 1 bits at the end mean that each of those two-character pairings are present in that message. This will likely be good enough, and result in a fairly low false-positive rate. It will also be tolerant to some spelling errors. We can slightly improve it by instead of ANDing, count the number of 1s in each message, and then return a search result list ordered by how good the match is. A bir crude, but I suspect it will be surprisingly effective.

We can use this kind of index on both the contact list, and one for each of the SMS message threads with each contact. That keeps code compact and simple.

When we receive or send an SMS, we need to look at the next available sector in the contact conversation (and possibly create a new contact if the number hasn't been seen before, I need to think about that), and put the message there, with the in-bound or out-bound marker set. Then if it's an inbound message, we need to update the unseen message count in the contact list, and then update that in the contact index disk images as well (ideally with a targeted update, rather than requiring a full re-sort that would take several seconds, probably.)

We should probably also have somewhere in the contacts disk image where we keep track of the whole unseen message count status, for quickly determining what we should show on the display to report unseen messages. This could be kept in some fake directory entries, which would also make it easy to query from 3rd-party programs.

If the thread for a contact fills, then we should either automatically rotate the conversation through, aging out old messages, create a 2nd disk image for the contact, prompt the user or otherwise. We should also keep track of what the most recent message position in the disk image is. This could be done in a fake directory entry.

Okay, that all sounds a bit complex, but the reality is that it's about as simple as we can make a functional SMS and contact handling system.

It will let us make code to ask for the most recent n messages, which will make it easier for us to update a scrolling display of messages, without having to keep them all in RAM at the same time. That will need to work backwards, since by default we want to show the most recent message at the bottom, and be able to scroll backwards in time, without having to get all crazy-pants with algorithms. It might be that we allow scrolling only message-by-message to keep this simple. Probably okay to begin with.

This means our SMS message chain renderer also needs to start from the bottom of the screen and work its way backwards up the display. Again, not too hard to do. It's just a matter of implementing it.

Now, for managing piles of contacts with out disk searching going too slow, I'll probably use a directory structure, where we have PHONE/THREADS/00-FF/00-FF/ allowing for 64K contacts, each in their own directory, with one or more D81s holding conversations, and whatever else we decide to put there (like avatar images). Then we'd also have PHONE/CONTACT0.D81, CONTACTS/IDX02-0.D81 (for sorted by first name index) and PHONE/IDX04-0.D81 (for sorted by second name index).

Now, one unsolved issue is messages received from numbers that are not yet contacts. We could setup a new contact for each automatically, but as we have index issues beyond 1,580 contacts (although we could have CONTACT0, CONTACT1 etc D81s, to allow for growing beyond 1,580 contacts), it probably makes sense to just put all messages/calls from unknown senders into a common thread for now, with the sending number indicated in a specific data field.

That also reminds me, we will be treating missed calls as a special kind of SMS message, likewise whenever you call someone, it will be logged as a virtual outbound SMS.

Corrupt indices we will deal with by just rebuilding them.

Corrupt contacts we will just live with. On SD card they shouldn't happen, but if they do, it should only mess up fields in the record, and we can just let the user edit it, and make it correct when written back.

Deleting contacts or messages just leaves an empty record in the disk image, which the sort function will clean up.

Now, we also need to store the current state of the telephony interface for when we get interrupted by an incoming call, incoming SMS or other event that requires us to switch which part of the telephony software is running (we are really breaking the software down into a bunch of smaller programs, so that each can fit in 64KB using the CC65 compiler (which really means about 40KB of code, which with CC65 is not necessarily that much). On returning from a helper program, we should be able to restore the state of what we were doing, e.g., showing any active call status, ringer status etc, restoring a message from draft status (we should store those in a reserved sector of the contact thread if it's a message to a contact, but that won't work if the recipient isn't a contact). There are probably other things, too.

To manage this, except for draft messages to contacts which we can put in a reserved sector in the thread D81 for each contact, we should probably just have a STATE.D81 where we progressively allocate sectors for each bit of state. Those sectors should be chained into the directory listing with appropriately named files, so that they can be easily modified by 3rd party programs -- so long as their track and sectors don't change on the disk, as we want to bypass having to use the CBDOS file system code (partly so that we can reclaim the 128KB of chip RAM where the "ROM" usually sits). We'll mark the files as locked, so that they are nominally read-only, and include a warning in the directory entry that files must retain their track and sector numbers, and should only be read by 3rd party software if using the file system, instead of direct sector access.

When messages are received from the cellular modem (e.g., incoming call, incoming SMS, call terminated etc), then these should be written verbatim into a message buffer in STATE.D81 (to keep common cellular modem monitoring code simple), state should be stashed and then processed by the appropriate helper program. Now, there is an issue here where a denial of service attack could be mounted against a user, by having people continually ring and/or text them, so that the thing is endlessly occupied saving and restoring state and switching programs. To mitigate this, we should probably process RING messages from within each program, and, ideally, messages from the same contact that is currently being displayed, too. But everything else can be batched up and processed periodically.

Or, we could have the program that does all this processing persistently held in RAM (or stored in the flash filesystem, which is also super-fast!), and then basically bank switched (or DMA swapped, or equivalent) into place, so that the latency and duration of activity is greatly reduced. There are limits to this, though, because whenever we have to switch mounted disk images, there will be a considerable delay, as the hypervisor has to do FAT file system searches. So we really want to avoid doing that too often. It might be that we limit ourselves to doing a process of SMS messages addressed to other contacts/numbers to once every 10 seconds or so, or whenever the unprocessed message queue reaches 50% full, to give us time to drain it, without losing messages.

We'll reserve a healthy slab of STATE.D81 for unprocessed messages, perhaps 128KB, which would be enough for 256 unprocessed messages.

And I think that covers the whole back-of-house stuff that we need for tracking SMS and phone calls.

So let's try to turn this into a set of requirements (which expand on the requirements we already have documented in the repository, and are subordinate to them):

Requirement: Linux program that can create the various core D81 files (e.g., STATE.D81).

Requirement: Linux program that sets up the directory structure on the SD card for the MEGAphone (PHONE/*).

Requirement: Code in dialer to save/restore state in STATE.D81

Requirement: Code in each UI program to save/restore state in STATE.D81 and any relevant contact thread.

Requirement: Code in each UI program to trigger a switch to the message queue processor program after a given timeout and/or the unprocessed queue has reached the 50% full level.

Requirement: Function to create a contact (including creating the D81 file(s) required).

Requirement: Function to modify a contact.

Requirement: Copy contact disk to index disk.

Requirement: Sort index disk by specified key.

Requirement: Create or open thread disk image.

Requirement: Insert message into thread disk image.

Requirement: Retrieve message from thread.

Requirement: Update unread message count for contact.

Requirement: Update unread message status for phone.

Requirement: Log non-contact calls/SMS as pseudo-contact "unknown"

Requirement: Display an SMS / call log message (either in-bound or outbound, using appropriate formatting to differentiate)

Requirement: Display a screen full of SMS / call log messages.

Requirement: Text entry for composing an SMS.

Requirement: Dialer entry for telephone number.

Requirement: Contact list. Select to open message list.

Requirement: Exit message list back to contact list.

Desirable: Allow inserting unicode/emoji in contact fields.

Desirable: Delete old messages from thread.

Desirable: Unicode/Emoji entry for SMS composition.

Desirable: Contact list search function by typing a fragment of a name or number.

Desirable: Message thread search function by typing a fragment of a message.

Desirable: Highlight sections of messages that match search query.

Desirable: Update search index based on a single contact or message.

Desirable: Rebuild search index for all contacts/messages.

Desirable: Associate avatar/emoji with contacts.

Desirable: Display contact avatar/emoji on incoming call.

Desirable: Dispaly contact avatar/emoji during call.

Desirable: Allow SMS composition during calls.

Desirable: Allow pasting location into SMS message (including during call).

Desirable: Allow showing last known direction and distance to contact based on last shared location (including during a call).

That's all probably enough now, that I can start digging in to implement it all.

The above requirements specification also addresses the following milestones in the NLnet funded project:

Milestone 1.4 Telephony software User-Interface: Requirements Specification

Milestone 1.6 Text Messaging software: User-Interface: Requirements Specification

MEGA65 Links