Saturday, 8 October 2022

Adding proportional text support to the MEGA65's "web" browser

In the previous post I sorted out the Ethernet and TCP stack bugs that were preventing the browser from working, as well as getting some of the general infrastructure in place, like having a start screen for the browser, that lets you choose the initial URL to open.

What I want to work on now, is improving the appearance of the pages.  The H65 format allows for using almost all of the MEGA65's graphic features, but my current page generator doesn't really support any of them, except for underlined text and in-lined images. In particular, text is restricted to using the normal C64-style 8x8 fixed width font.  That's fine for starting, and its also good to have that font available for where it makes sense, e.g., for showing BASIC code snippets.  However, it would be nice to be able to use much nicer looking fonts in pages as well, if people want to.  

Fortunately, I have done quite a bit of work on displaying proportional fonts on the MEGA65. In fact, we have MegaWAT!?, a power-point like presentation program for the MEGA65 that renders proportional text in real-time on the MEGA65. In fact, I've even given presentations at conferences using it and a real MEGA65 instead of a boring normal computer. So I already have C code for reading TTF or Type 1 fonts using libtruetype on Linux to create rasterised fonts that the MEGA65 can display.  In fact, for the browser its even easier, as we are generating the H65 files on the server-side, so we can do all of the rendering there.

What will remain the same, though, is that we don't just render the fonts onto a big fat canvas, as that would waste lots of precious chip RAM, and limit the size of pages that can be displayed.  Instead, we convert each glyph from a font into a set of 8x8 FCM characters that the VIC-IV can display. We can then just assemble the glyph each time we want to display it using the same FCM character definitions, thus saving lots of RAM. 

This would mean that all the glyphs would have to be a multiple of 8 pixels wide, which kind of defeats the purpose. But here the VIC-IV in the MEGA65 has a secret weapon: In 16-bit text mode, you can tell the VIC-IV to only draw a variable number of the pixel columns of a character. This makes it nice and easy to display glyphs from fonts that aren't exact multiples of 8 pixels wide.  Thus we can still get the nice appearance, without wasting the memory. 

The second secret weapon of the VIC-IV is the ability to use FCM characters in "alpha blending mode", where instead of allowing each pixel of a character to be chosen from the 256 colour palette, we instead use that value to indicate the fade value between the foreground and background colour of the character.  For example, if the background colour were black and the foreground colour were white, then a pixel value of $FF would display white, and a pixel value of $00 would display black, and a pixel value of $80 would display a colour half-way between them, i.e., some shade of grey. This allows text to be anti-aliased and look much better.  Best of all, it doesn't use up any palette slots on the VIC-IV to do this. 

The third secret weapon is that instead of using FCM + alpha blending mode, we can use NCM + alpha blending mode. This means each 64 byte FCM char block encodes a 16x8 pixel area instead of an 8x8 pixel area, thus reducing the number of char blocks required to display larger glyphs.

By combining all three of these, I intend to support really nice looking text in the MEGA65 browser, like this:



The first step is to modify the md2h65 converter to allow the use of these fonts. This involved pulling a pile of the code from the tool used to prepare fonts for MegaWAT!? into md2h65.  I then had to re-factor the md parser I had made, as it was pretty rubbish, and couldn't support UTF-8, or even properly support most of the mark-down formatting strings. 

Markdown doesn't include a native way to specify typefaces, so I have hacked in a C-inspired syntax that allows declaration of different typefaces for the different text types. These lines look like this:

#define FONT(p) /usr/share/fonts/type1/urw-base35/NimbusRoman-Italic.t1,16

The p in brackets means paragraph, and in addition we also support h1, h2, h3, bold, italic and bolditalic to override all of the fonts.  The ,16 at the end indicates the size of the typeface.  At the moment, the font name has to be the absolute path to the font file, but I hope to improve that along the way.

Next, the renderer in md2h65 has to assemble the lines of text that might be a mix of C64 font and these proportional fonts, all of which might have different heights.  I already had code for mostly handling that in MegaWAT!?, which I also hacked into md2h65.

Initially I haven't implemented the trimming of the character widths, as I just want to make sure I have the font rendering working.  One challenge with debugging this, is that the m65 utility that I use to render screen-shots of the MEGA65 for these posts doesn't yet properly handle the alpha blending mode, so I have to bug fix that.  I then also hit another random bug in the network code, causing some corruption of the received TCP frames, which is confusing the browser code, so as usual, I have to go down a few rabbit holes, before I can progress.

Specifically, a byte of the H65 file is being mutated from $00 to $06 at some point during either network handling, or in my parsing the H65 in the browser code. I can see it is modified in the RX buffer of the TCP socket when it is being parsed.  So now to find out when it is being corrupted...

And it looks like the contents of the Ethernet frames are being corrupted, as I am seeing things like this:

00000000: 53 65 72 76 65 76 3a 22 53 6b 6d 74 6c 65 48 54    Servev:"SkmtleHT
00000010: 54 54 2f 36 2e 36 20 50 79 74 68 6f 6e 2f 32 2e    TT/6.6 Python/2.
00000020: 37 2e 31 38 0d 0e 44 65 74 65 3a 22 53 61 74 2c    7.18MNDete:"Sat,
00000030: 20 30 38 20 4f 67 74 24 32 30 32 32 20 30 33 3a     08 Ogt$2022 03:
00000040: 33 32 3a 37 35 24 47 4d 54 0d 0a 43 6f 6e 74 65    32:75$GMTMJConte
00000050: 6e 74 2d 74 79 70 65 3e 20 61 70 70 6c 6d 63 61    nt-type> applmca
00000060: 74 6d 6f 6e 2f 6f 63 74 65 74 2d 77 74 76 65 65    tmon/octet-wtvee
00000070: 6d 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74    mMJContent-Lengt
00000080: 68 3a 20 31 34 37 38 33 32 0d 0a 4e 61 73 74 2d    h: 147832MJNast-
00000090: 4d 6f 64 6d 66 6d 65 64 3a 22 53 61 74 2c 20 30    Modmfmed:"Sat, 0
000000a0: 38 20 4f 67 74 24 32 30 32 32 20 30 33 3a 33 32    8 Ogt$2022 03:32
000000b0: 3a 33 38 20 47 4d 54 0d 0a 0d 0a 48 36 35 ff 27    :38 GMTMJMJH65~'
000000c0: 00 50 85 54 e8 37 00 06 06 07 f0 c9 00 00 00 00    @PETh7@FFGpI@@@@

For example.. "Servev" should most likely be "Server". 

So why is this suddenly happening now?  Ah, I am still running the old broken bitstream for some reason... Ah, that would be because I updated the bitstream in slot 0, but didn't erase the old bitstream in slot 1. Well, at least that's a simple problem, and means that I haven't got any new network bugs to deal with.  

Okay, so that's all fixed, and I can load pages again.  So now to test the NCM rendered proportional text. Initially I have just a single word at the end of the test page that is in a proportional font, the word "Some".  Without aliasing, it looks a bit messed up, but you can still see what it is:

On real hardware, the background of those gylphs is blue, not black. This is part of that bug in the m65 screenshot renderer I was saying out. So let's fix that first. Righty-oh, now can see it properly:

Again, note that I haven't yet implemented the character trimming. There is also some artefacting around the alpha blended characters in the screenshot that is not visible on the real hardware, which will be easy enough for me to fix.

Done:

Much nicer. Now to implement the width trimming.  I also noticed that my renderer incorrectly things that C64 font text is two lines high, as well, which I'll have to deal with. But first the width trimming:

That's looking nicer.  But we have some "poop" on the right-most edge of the glyphs. I'm guessing I have an out-by-one error there somewhere, which should also be fairly easy to fix.

It turned out I was just not trimming by enough. With that corrected, that single pixel column of poop goes away. Quite why it is even getting into the char blocks though is still a bit of a mystery that I would like to solve. In theory, I'm only copying those columns indicated as part of the glyph. Ah! It's because in NCM mode, I am packing two columns of pixels into a byte, and I don't check if the char is an odd number of pixels wide. So now it looks like this:

Good. Now that single word looks right.  Let's get multiple words working. At the moment we have a problem where each word appears on a separate line of its own, like this:

"Some" and "more" should be next to each other, and "bold text" should also be next to those.  That "bold text" which is just C64 font ends up on a new line tells me that the problem is that a new line is being triggered when outputting proportional text.  This problem was that I wasn't first rendering the word to measure its length, but rather just feeding it out as a line. Probably just from when I started implementing it.  But now with that fixed, we get this:

That's looking more right, except that we aren't emitting spaces following the proportional text words. With a bit of re-factoring, we now have spaces of the correct size proscribed by the fonts:

Now its time to tackle that extra vertical space. That problem was that I was incrementing the Y position during rendering the lines, and then again after the line of text had been rendered: Only one of these is required. With that fixed, its looking right again:

We are now quite close, because we can now assemble whole lines of proportional text. So I'll modify my example MD file here to use the proportional font for all of the paragraph text, and maybe also set some fonts for the headings, bold and italic, and see how it looks. 

And all the text has disappeared, which is very odd!

So why is this so, when we had some proportional text working before?  Also, the spacing makes it look like it wants to display the text, but is messing up some how.

Looking at the binary of the H65 file, it looks like it is outputting the right number of glyphs, just that they are all blank.  The problem is stemming from loading more than one font. If I have only one font, then life is good.  Any more than one font loaded via #define FONT() directives, then they all come out blank.  So I must have some problem with my handling of font loading. With that fixed, we have some progress:

We can now see text in the different typefaces, but with some obvious problems:

1. The rows get out of sync when characters are of different heights.

2. The underline attribute is being applied to all rows, not just the base-line row.

3. There is still some problems with some text being invisible or generally messed up.

4. The reverse attribute shouldn't be applied to bold text that uses a proportional font.

There are probably some more in there, too, but that's a start for me to work on.

For the out-of-sync problem, I just need to apply the trim to all rows of chars in a glyph, even if they are out of the range used by that glyph. This is a little fiddly, because we can mix 8x8 C64 chars with 16x8 glyphs. So we need to explicitly code these gaps as empty glyph blocks with trim.

Some progress:

Problems 1 and 2 are now solved. I can also now see another problem:

5. The underhang of characters is not drawn, e.g., the tails of g and y characters.

But what I most care about right now, is that large slabs of the text is invisible.

This is a bit weird, as there doesn't seem to be any real pattern to what gets displayed, and what does not. It looks like the problem might be in md2h65, as the blank sections do seem to be really blank in the H65 file. Ah, the problem was I had setup some buffers with incorrect dimensions. Now it is better:

So that solves 3, leaving 4 and 5.  Hopefully getting the descenders of glyphs to display won't be too hard, as they are being rendered.  Again the problem here was just some mishandling of the buffers and addressing of the rows in them.  So now we can see the descenders:

That's 5 done, just leaving the disabling of the reverse video for bold face when using a proportional font.  That will be easy to fix. I should also allow selecting different colours for the various type faces, as well. But first, removing the reverse video:

Well, that's really starting to look quite nice and elegant. I'll get to the colours soon, but there is a more pressing issue here: This particular size of font, 10pt, is a really poor choice, because of two reasons:

1. The glyphs all tend to be less than 8 pixels wide, which means that we run out of our 80 columns of characters before the right edge of the screen. As a result, the lines of text don't go all the way to the right.

2. Because of how we draw the proportional text using VIC-IV characters, the text has to be a multiple of 8 pixels tall. 10pt tends to have ascenders that are 9px tall, and descenders that are 1 or 2px below the baseline. As the characters are aligned on the baseline, the result is each line of text occupies 3 character rows, which leaves a lot of space.

So the simple solution is that we can use a larger font size, so that the white-space is reduced.  Later we could look at having the page layout configurable, so that fewer lines with more characters are possible. Another approach is to render pairs of glyphs into single character boxes.  Given how common certain pairs of letters occur in most languages, that's probably a fairly workable solution.  This would only be needed for typefaces that are being rendered at less than about 16px.

With that done, we can now fill lines up fairly comfortably, as we can see here:

Now we can see a new problem, that we are breaking lines in the middle of words.  This is a bit odd, as the word rendering logic is still supposed to be fixed to whole words. That is, it should try to place words on one line or the other, and not even be able to split them, even if it wanted to.  So how is this happening? Ah, when I refactored, I had the word render output on the wrong side of the moved loop boundary for the di-glyph grouping.  

With that fixed, now its better:

The grouping of multiple glyphs into single 16x8 character cards is actually pretty nifty. It's even combining some of the characters in the larger typeface in the heading.  It will be interesting to see how it fares with larger slabs of text, where the proliferation of di-glyphs might mean that it uses more of the limited number of character cards available.  We'll have to see.

A more pressing issue right now is that it is only by sheer luck that this change has made complete lines fit almost exactly.  If we used a slightly larger typeface, the lines will start to run off the right-hand edge of the screen, because they will be averaging more than 8 pixels per character card. That isn't happening right now, because the spaces between the words are not merged with other glyphs, and help to bring the average width of the character cards back down to almost exactly 8.

If I increase the stuffing of glyphs into the character cards to require at least 11 pixels instead of 8, then the average pixels per card will increase to be greater than 8, and we'll see it start to run off the edge, like this:

The solution is to track the number of display pixels on an accumulated line, rather than the number of characters, and just assuming that they are all 8 px wide, like on the C64.

With that fixed, we now have nicely filled lines, with no overrun:

So now it is super easy to modify the sizes of the fonts:

Really all that's left for this part, I think, is to move the text one pixel above the baseline, so that the underline appears in a more sensible place, and think about how to reduce the number of dead char rows.  Moving the baseline would be enough to help for fonts that have only 1px descenders.  For larger sizes, maybe we make a trade-off and have the under lining below the descenders, in return for being able to reduce the maximum dead-space below the descenders.  But that will have to wait for another day.  For today, I'm just pleased to have advanced from previous state of being able to use only 8x8 fixed-width bitmap fonts to having really quite pleasant and readable looking text -- especially for an 8-bit machine.

No comments:

Post a Comment