Saturday, 5 September 2020

Fixing the remaining ethernet RX bugs

For a while now there has been an intermittent problem with receiving ethernet frames: The occasional byte of a received packet is corrupted.  The CRC of the received packets is checked fine, so I am assuming it is read/write contention on the receive buffer dual-port memories: Basically if you try to read from a dual-port memory when it is being written to, you can end up reading rubbish.

A secondary problem is that we have only two RX buffers, which means that the MEGA65 has a bit of a risk of missing the odd packet, if they come in quick succession. Also, the mechanism for indicating when an RX buffer is free is a bit of a pain and prone to occasionally confusing the ethernet RX engine.

My plan is to attack both problems at the same time, by switching from using a single 4KB RX buffer memory, divided into two logical halves, into four separate 2KB RX buffers.  That should help avoid packet loss, and also the annoying RX errors.

Then to improve the RX buffer handling, I'll make a register that asks for the next received packet to be made visible, and to simultaneously to free the previously visible RX buffer.  In this way it will be super simple to use, and will also make it easy to later add more RX buffers, without having to change the ethernet driver code.

First step is to replace the single 4KB buffer with 4x2KB buffers. Done.

Then we need to keep track of which buffer the CPU can see, and which the ethernet controller can see. Also easy and done.

Then we need to have a way to have the CPU be able to tell the ethernet controller when an RX buffer is free.  I'll do this at the point when the CPU asks for the next received packet to be made visible.  I'll keep a counter of the number of free RX buffer slots, which can only ever be between 0 and 3, because one buffer is always held by the CPU (so that the RX buffer can't change while the CPU is reading the packet out).

To implement this, the register $D6E1 will be changed: Bits 1 and 2 previously were used to pick which of the two RX buffers were visible to the CPU, and which was able to be written to by the ethernet controller.  Now, reading those two bits will return the number of free RX buffers, and writing a 1 to bit 1 will cause the current RX buffer that was available to the CPU to be released, and the next received packet, if any, to be made available to the processor.

I was pleasantly surprised how few synthesis runs I needed to get this to generally work: Writing with bit 1 set to $D6E1 flips RX buffer, and all the rest. However, there is a subtle problem where flipping to the next RX buffer sometimes does something weird, and the RX buffer doesn't get cleared, but the count of free RX buffers does get updates.  The result is that the logic to pick the next received packet to present to the CPU can keep presenting the same packet to the CPU over and over.  

The logic I have there looks right, but clearly isn't.  It is possible I am trying to be a bit too clever, by trying to find the first available buffer / filled buffer, rather than just rotating around them in a ring.  That would also make sure we process packets in the correct order.  I'm not really sure why I didn't do this to begin with, as it is clearly the most sensible algorithm for handling the packet buffers.  So time to synthesise again...

Ok, so the ring buffer works much better, but we are still ending up with the number of free RX buffers being incorrectly calculated.  I am suspecting that part of the problem might be that writing to $D6E1 to accept a packet from the RX buffer queue may be being written for two successive cycles on the MEGA65, which causes the logic to be executed more than once, generally resulting in confusion.  So I am synthesising a fix for that, as well.

Now, the whole reason for doing this was to fix the corruption of received bytes. But I am still seeing corrupt ping reply packets.  But at the same time, I am not seeing signs of the packets being received incorrectly by the MEGA65, nor of receive errors on the other side, which is what would happen if the transmit side were having signal integrity problems.  

So if the problem isn't on the receive side, or the actual sending of packets, it makes me wonder if the problem isn't in the stuffing of the TX buffer. There is only one TX buffer, and if we don't try to send a packet while another is already being sent, then it shouldn't ever have read-write conflicts.

But I can at least test this theory by not merely sending multiple responses to a single ping request, but actually building the ping reply packets from scratch for each: If some of these get corrupted, but others not, then we know it is not on the receive side -- else we would have copied the same error, and the clones would all have the same fault.

First up, I am indeed seeing that some packets are being corrupted, but not others, which means that the RX buffers are not the problem.  BUT I am also seeing something weird where the entire RX buffer is changing.  What I'm not sure of, is if that is because the RX buffer is some how being advanced to the next, or if something else weird is going on.  Similarly, I am also sometimes seeing the same packet over and over.  

A bit of debugging shows that the ethernet RX buffer management is still being a bit dodgy:  Packet after packet, it can get stuck with the CPU and ethernet controller accessing the same RX buffers, even when we advance the buffer.  Looking closer, this is when it correctly thinks that the there have been no more packets received. But it still shouldn't stay the same all the time, but rather should rotate through the set of RX buffers. 

I *think* the problem there was that if the ethernet register was written to for more than one consecutive cycle it could rotate through all the buffers, and generally confuse things.  In any case, I haven't seen any more packets with corrupt bytes in them.  So that's a plus.

However, there is still a few percent of packet loss.  In at least one case, there were three consecutive packets that were truncated -- but still with valid CRCs. Also, if I try to construct and send multiple replies to ping packets, often most of the duplicates end up containing an incorrect completely different packet.  This causes packet traces like this:

08:36:53.065726 IP > ICMP echo request, id 32126, seq 146, length 64
    0x0000:  4040 4040 4040 1005 019f fcfd 0800 4500  @@@@@@........E.
    0x0010:  0054 ffb9 4000 4001 ce85 cb13 6b01 cb13  .T..@.@.....k...
    0x0020:  6b41 0800 6d0c 7d7e 0092 8db0 3d5f 0000  kA..m.}~....=_..
    0x0030:  0000 8200 0100 0000 0000 1011 1213 1415  ................
    0x0040:  1617 1819 1a1b 1c1d 1e1f 2021 2223 2425  ...........!"#$%
    0x0050:  2627 2829 2a2b 2c2d 2e2f 3031 3233 3435  &'()*+,-./012345
    0x0060:  3637                                     67
08:36:53.066540 IP > ICMP echo reply, id 32126, seq 146, length 64
    0x0000:  1005 019f fcfd 4040 4040 4040 0800 4500  ......@@@@@@..E.
    0x0010:  0054 ffb9 4000 4001 ce85 cb13 6b41 cb13  .T..@.@.....kA..
    0x0020:  6b01 0000 6d0c 7d7e 0092 8db0 3d5f 0000  k...m.}~....=_..
    0x0030:  0000 8a00 0100 0000 0000 1011 1213 1415  ................
    0x0040:  1617 1819 1a1b 1c1d 1e1f 2021 2223 2425  ...........!"#$%
    0x0050:  2627 2829 2a2b 2c2d 2e2f 3031 3233 3435  &'()*+,-./012345
    0x0060:  3637 bc55 ce51                           67.U.Q
08:36:53.067141 IP6 fe80:be:1205:141::9f:fcfd.5353 > ff02::800:fb.5353: 0 PTR (QM)? _googlecast._tcp.local. (40)
    0x0000:  1005 019f fcfd 4040 4040 4040 86dd 600f  ......@@@@@@..`.
    0x0010:  6e5a 0030 1101 fe80 00be 1205 0141 0000  nZ.0.........A..
    0x0020:  0000 009f fcfd ff02 0000 0000 0000 0000  ................
    0x0030:  0000 0800 00fb 14e9 14e9 0030 7d83 0000  ...........0}...
    0x0040:  0000 0001 0000 0000 0000 0b5f 676f 6f67  ..........._goog
    0x0050:  6c65 6361 7374 045f 7463 7005 6c6f 6361  lecast._tcp.loca
    0x0060:  6c00 000c 0001 4df4 072b                 l.....M..+
08:36:53.067452 IP6 fe80:be:1205:141::9f:fcfd.5353 > ff02::800:fb.5353: 0 PTR (QM)? _googlecast._tcp.local. (40)
    0x0000:  1005 019f fcfd 4040 4040 4040 86dd 600f  ......@@@@@@..`.
    0x0010:  6e5a 0030 1101 fe80 00be 1205 0141 0000  nZ.0.........A..
    0x0020:  0000 009f fcfd ff02 0000 0000 0000 0000  ................
    0x0030:  0000 0800 00fb 14e9 14e9 0030 7d83 0000  ...........0}...
    0x0040:  0000 0001 0000 0000 0000 0b5f 676f 6f67  ..........._goog
    0x0050:  6c65 6361 7374 045f 7463 7005 6c6f 6361  lecast._tcp.loca
    0x0060:  6c00 000c 0001 4df4 072b                 l.....M..+
08:36:53.067744 IP6 fe80:be:1205:141::9f:fcfd.5353 > ff02::800:fb.5353: 0 PTR (QM)? _googlecast._tcp.local. (40)
    0x0000:  1005 019f fcfd 4040 4040 4040 86dd 600f  ......@@@@@@..`.
    0x0010:  6e5a 0030 1101 fe80 00be 1205 0141 0000  nZ.0.........A..
    0x0020:  0000 009f fcfd ff02 0000 0000 0000 0000  ................
    0x0030:  0000 0800 00fb 14e9 14e9 0030 7d83 0000  ...........0}...
    0x0040:  0000 0001 0000 0000 0000 0b5f 676f 6f67  ..........._goog
    0x0050:  6c65 6361 7374 045f 7463 7005 6c6f 6361  lecast._tcp.loca
    0x0060:  6c00 000c 0001 4df4 072b                 l.....M..+
08:36:53.068047 IP6 fe80:be:1205:141::9f:fcfd.5353 > ff02::800:fb.5353: 0 PTR (QM)? _googlecast._tcp.local. (40)
    0x0000:  1005 019f fcfd 4040 4040 4040 86dd 600f  ......@@@@@@..`.
    0x0010:  6e5a 0030 1101 fe80 00be 1205 0141 0000  nZ.0.........A..
    0x0020:  0000 009f fcfd ff02 0000 0000 0000 0000  ................
    0x0030:  0000 0800 00fb 14e9 14e9 0030 7d83 0000  ...........0}...
    0x0040:  0000 0001 0000 0000 0000 0b5f 676f 6f67  ..........._goog
    0x0050:  6c65 6361 7374 045f 7463 7005 6c6f 6361  lecast._tcp.loca
    0x0060:  6c00 000c 0001 4df4 072b                 l.....M..+
08:36:53.068351 IP6 fe80:be:1205:141::9f:fcfd.5353 > ff02::800:fb.5353: 0 PTR (QM)? _googlecast._tcp.local. (40)
    0x0000:  1005 019f fcfd 4040 4040 4040 86dd 600f  ......@@@@@@..`.
    0x0010:  6e5a 0030 1101 fe80 00be 1205 0141 0000  nZ.0.........A..
    0x0020:  0000 009f fcfd ff02 0000 0000 0000 0000  ................
    0x0030:  0000 0800 00fb 14e9 14e9 0030 7d83 0000  ...........0}...
    0x0040:  0000 0001 0000 0000 0000 0b5f 676f 6f67  ..........._goog
    0x0050:  6c65 6361 7374 045f 7463 7005 6c6f 6361  lecast._tcp.loca
    0x0060:  6c00 000c 0001 4df4 072b                 l.....M..+
08:36:53.068658 IP6 fe80:be:1205:141::9f:fcfd.5353 > ff02::800:fb.5353: 0 PTR (QM)? _googlecast._tcp.local. (40)
    0x0000:  1005 019f fcfd 4040 4040 4040 86dd 600f  ......@@@@@@..`.
    0x0010:  6e5a 0030 1101 fe80 00be 1205 0141 0000  nZ.0.........A..
    0x0020:  0000 009f fcfd ff02 0000 0000 0000 0000  ................
    0x0030:  0000 0800 00fb 14e9 14e9 0030 7d83 0000  ...........0}...
    0x0040:  0000 0001 0000 0000 0000 0b5f 676f 6f67  ..........._goog
    0x0050:  6c65 6361 7374 045f 7463 7005 6c6f 6361  lecast._tcp.loca
    0x0060:  6c00 000c 0001 4df4 072b                 l.....M..+
08:36:53.068962 IP6 fe80:be:1205:141::9f:fcfd.5353 > ff02::800:fb.5353: 0 PTR (QM)? _googlecast._tcp.local. (40)
    0x0000:  1005 019f fcfd 4040 4040 4040 86dd 600f  ......@@@@@@..`.
    0x0010:  6e5a 0030 1101 fe80 00be 1205 0141 0000  nZ.0.........A..
    0x0020:  0000 009f fcfd ff02 0000 0000 0000 0000  ................
    0x0030:  0000 0800 00fb 14e9 14e9 0030 7d83 0000  ...........0}...
    0x0040:  0000 0001 0000 0000 0000 0b5f 676f 6f67  ..........._goog
    0x0050:  6c65 6361 7374 045f 7463 7005 6c6f 6361  lecast._tcp.loca
    0x0060:  6c00 000c 0001 4df4 072b                 l.....M..+

That is, we see the ICMP PING request, followed by one valid reply, and then 9 of these strange other packets, each of which are identical.  They have the source MAC address of the MEGA65 (40:40:40:40:40:40), and they are being sent a fraction of a milli-second apart, which matches the code that I am using to try to send multiple replies:

    ; It's a ping packet.

    jsr send_icmp_reply
    jsr send_icmp_reply
    jsr send_icmp_reply
    jsr send_icmp_reply
    jsr send_icmp_reply
    jsr send_icmp_reply
    jsr send_icmp_reply
    jsr send_icmp_reply
    jsr send_icmp_reply

Nowhere in this routine does the RX buffer get rotated. And more the point, I have no idea where this other packet data is coming from.  Certainly the MEGA65 doesn't support GoogleCast, so I can only assume that it is some other packet that has ended up in a buffer somewhere. 

Ok, that's interesting: When I instrument the RX buffer selection between sending these packets, the RX buffers are in fact being rotated after sending the first packet back, but not after that.  Before sending the first reply, I am seeing, for example, CPU side RX buffer is #1, and ethernet side is #2. Then after sending the first reply, it is CPU side RX buffer #0 and ethernet side is #1.  Ah, that I recognise: The combination is the default buffer allocation when the ethernet state machine has been reset. The cause was that I had code to unjam the TX state machine before sending a packet to deal with previous bugs.  So I'll remove that, and see how we go.

Ok, so that fixes that problem: I am now getting an ample quantity of identical ping packets -- and still without any corrupt packets, which is also nice. So we are making headway.  Now the remaining problem is that we still are getting some packet loss.  What's not yet clear is whether that is the MEGA65 is missing packets, or whether the replies are being ignored for some reason.

Hmm.... I just removed the duplicate reply sending code, and saw the first corrupted packet in a while. More weird, each PING is getting a reply immediately, and then after 1 second, i.e., when the next PING packet comes in.   Thus the packet log looks like this:

09:00:33.205017 IP > ICMP echo request, id 5052, seq 176, length 64

09:00:33.205830 IP > ICMP echo reply, id 5052, seq 175, length 64

09:00:33.206449 IP > ICMP echo reply, id 5052, seq 176, length 64

And I just saw another of those corrupted packets.  In duplicate because of this bug.  But both duplicates were identical, indicating that they were due to the MEGA65 receiving the request packet originally.

Time to add some diagnostics to see if the MEGA65 thinks it is receiving each packet twice... Which it does.  Now to find out why...  I can see that the ethernet side RX buffer advances as the new packet comes in, but then for some reason we are processing the old packet again first.  Found it: I was being oversensitive to packet reception, while dealing with bugs in the ethernet controller that I have since fixed. Basically I was noticing when a packet was started to be received, as well as when it was actually received.

I've also looked more into the few remaining packet errors that I am seeing:  These are now exclusively the last byte of the packet being changed to $BD.  This is a strong hint as to what is going on: Almost certainly this is the end-of-packet marker that is supposed to be written to the receive buffer immediately following the packet.  Something is causing it to sometimes be written over the last byte of the packet.  

It might be that we detect the RX Valid signal from the ethernet controller very slightly early, e.g., after we have received 6 of the 8 bits of the last byte.  It must be marginal timing if this is the cause, as the ethernet CRC is still being correctly verified.

I'll try fixing this by requiring the RX Valid signal to be disasserted for 3 cycles instead of 2.  I'll also change the byte that gets written at the end to another value, so that I can confirm conclusively that this is the cause of the incorrect values.  The choice of $BD was quite arbitrary. I'll make the equally arbitrary decision that the new value will be $EA.

Now testing this fix.  Watching a ping run for a long time, I am seeing that some of the remaining packet loss is being caused by packets truncated by some non-trivial amount, often 12 bytes or 34 bytes.  I have instrumented the receive programme and found that these truncated packets are received truncated by the MEGA65, so the problem is on the receive side.  The question is what the source of that might be?

On the receive side, these short packets are not triggering a CRC error, which makes me rather suspcious.  Also, sometimes these packets end up with the "is multicast" flag in our receiver being asserted, rather than the "is broadcast" flag.  Even so, some of those packets result in valid replies to the PING.

I wondered if I had a dodgy ethernet cable, but switching to another made no difference to the kinds of errors I am seeing.

So to summarise, in terms of missed ping replies there seem to be several causes:

1. MEGA65 receives some packets truncated.

2. MEGA65 receives a valid packet, but the connected computer receives no reply packet, dispite all evidence that the MEGA65 has sent a reply.

and also:

3. MEGA65 sometimes thinks broadcast ICMP PING packets are multicast packets instead, but then replies to them without problem.

4. The connected computer does not indicate any RX frame errors, suggesting that the MEGA65 is not sending any invalid frames at any point.

Now what I haven't tested yet, is whether it is my poor laptop that has the problem with its ethernet, since that is the other common piece here. So time to pull the Mac out and try with that.

First thing, the Mac is complaining that the first data byte of the ping replies is incorrect.  Apparently I had the location of the ICMP CRC field in the packet wrong, and so was updating the first data byte, not the ICMP CRC.

The second thing that the Mac revealed, is that at least some of the packets are being rejected because the ICMP CRC is incorrect, being out by 1.  I'm not sure if the problem is that the packets are being generated with the wrong CRC.  But it probably suggests that its worth calculating the ICMP CRC ourselves, rather than trying to just patch the CRC of the PING.

I've added the code to calculate the ICMP checksum directly, which has reduced, but not eliminated, the number of bad packets.  More the point, the types of errors seem to change as I make these fixes, and eliminate potential sources of the problem -- but somewhere between 3% and 5% of packets continue to be lost. AND the problem continues to be worse with larger packets. With 900 byte ICMP PING packets only a very small fraction actually succeed.

This brings me back to the ethernet RX and TX lines -- in particular the TX lines, as we seem to reliably receive even very large packets.  However adjusting the TX phase seems to make no difference at all.  Modifying the TX routine to send the same TX buffer twice reveals why: Exactly the same checksum error occurs on both copies -- so the problem is actually in the TX buffer.  So hopefully if I can fix that, I can finally fix the packet errors.

Ok, found one quite likely source of the problem: I was clocking the TX buffer's write side on the 50MHz clock, instead of on the CPU clock.  This would cause exactly the kind of random corruption of bytes during writing to the buffer that we have been seeing. This also explains why the ethernet frames were not being sent with corrupt CRCs, because the ethernet controller computes the ethernet CRC in real-time as it sends the frame, and then appends it at the end.

Fixed that, and that's got rid of the byte corruption in transmitted packets.  I can now reliably respond to ping packets both big and small.  The packet error rate is now down to about 2%. The remaining packet errors are short packets, i.e., with valid ethernet checksum, but truncated for some reason.  The question is whether this is due to some problem with transmission, or the transmit buffer length setup.  This can be answered by going back to sending two copies of each packet, and seeing whether both or only one gets truncated when this problem occurs.

Ok, that didn't take long: And as I hoped, the problem occurs for both packets:

14:58:55.768602 IP > ICMP echo request, id 1676, seq 10, length 958
14:58:55.770340 IP > ICMP echo reply, id 1676, seq 10, length 958
14:58:55.770404 IP > ICMP echo reply, id 1676, seq 10, length 958
14:58:56.769145 IP > ICMP echo request, id 1676, seq 11, length 958
14:58:56.770697 IP truncated-ip - 886 bytes missing! > ICMP echo reply, id 1676, seq 11, length 958
14:58:56.770718 IP truncated-ip - 886 bytes missing! > ICMP echo reply, id 1676, seq 11, length 958

What is interesting is that the ethernet frame is the correct length: It is only the contents of the packet that are wrong.  I need to look at the contents of one of these truncated packets to see exactly where the problem is... Except I just also saw one where only one of the packets was messed up... And I just saw a packet with incorrect data in it.

All this points to some subtle problem with the packet reception side of things again.

And from there it has all gone down hill a bit again. The intermittent problem I have previously seen where the TX side of the finite state machine locks up is now happening pretty much all of the time.  This means that no packets get sent at all, while the program waits in vain for the TX side to indicate readiness. However, it IS ready, in that if I go ahead and send a packet anyway, it will get sent.

The symptoms are that the eth_tx_complete line gets stuck high.  The only place that asserts that is the SentFrame state.  As soon as eth_tx_trigger goes low, it will leave that state, and go to state IdleWait. The instrumentation I have on $D6E4 lets me see both eth_tx_complete and eth_tx_trigger lines in real-time.  And eth_tx_complete is high, and eth_tx_trigger is low. This means that it can't be stuck in the SentFRame state.  The IdleWait state just counts 50 cycles, and then proceeds to InterPacketGap. As soon as the inter-packet gap timeout has been served, eth_tx_complete gets cleared, and the state moves to Idle. It is not possible for eth_tx_complete to get asserted again in that cycle, but only after sending a complete frame.

Watching the values in the debug flags of $D6E4, I see the value sits on $34, unless I send a packet.  The $3x part means that the eth_txd bits are set to "11". The $x4 part indicates that eth_tx_complete is high, while eth_tx_trigger, eth_tx_commenced and eth_txen_int are all low: That is, we are not sending a packet, nor have we been triggered to do so.

 So lets look back at the Idle state. There are three conditions where it can leave Idle:

1. eth_tx_trigger = '1' , i.e., the user has asked for a packet to be sent. This is the highest priority option.

2. activity_dump='1' and activity_dump_ready_toggle has toggled. 

3. eth_videostream = '1' , activity_dump='0', and buffer_moby_toggle has toggled.

eth_videostream is accessible via bit 3 of $D6E1, and activity_dump can be set by writing to bit 2 of $D6E1 (but can't be read). So in theory we can clear those by writing $03 into $D6E1.   

Clearing them made no difference, but setting bit 3 (eth_videostream) DID temporarily clear the problem.  Hmm...but if I set eth_videostream, then reset the ethernet controller, then it seems to come good.

Also trying to TFTP fetch a file causes the problem to happen almost immediately, which is convenient for debugging.

Ok, so some poking around, I found I had already make a good solution for this problem, but just wasn't using it: Bit 7 of $D6E0 indicates if the TX state machine is ready to send a packet. This means a simple:

ethtxwait: BIT $D6E0 

           BPL waitsomemore

can be used to wait until the ethernet controller is ready to send the next packet.

So that fixes that, but now it still goes funny after a while, refusing to answer ARP requests.  This seems to be the return of another niggly bug, where the controller goes nuts sending some sort of frame, possibly the video or CPU activity dump frames we used to support. However disabling those functions doesn't stop it. The only cure is to reset the whole ethernet controller.  I'm currently synthesising a bitstream that actually disables the dispatch of those packet types, to see if it cures the problem.  What is frustrating here, is that I can't see for the life of me how it gets into this state, and more so, how it can stay in such a loop.  The FSM is not that complex, and once a frame starts sending, its pretty inevitable that it progresses through to completion.

Disabling those special frame modes seems to have solved that part of the problem: The TX FSM no longer locks-up, and I can even use my dodgy work-in-progress TFTP server to fetch a file fairly quickly:

$ tftp
tftp> get MEGA65.D81
Received 819200 bytes in 3.2 seconds
tftp> get MEGA65.D81
Received 819200 bytes in 2.5 seconds
tftp> get MEGA65.D81
Received 819200 bytes in 2.5 seconds
tftp> get MEGA65.D81
Error code 1: ? FILE NOT FOUND ERROR

The TFTP server even has nice 8-bit style error messages, as you can see above :) The reason it gets file not found error here, is that there is a bug in HYPPO, the MEGA65 Hypervisor's FAT32 file descriptor code, that gets a touch confused when too many files have been opened.

Now, while that works, its only behaving nicely because I am sending every TFTP packet several times.  Also, every time one of those multiple sendings triggers and ACK from the receiver, that triggers yet more retransmissions. This nicely coverse the odd corrupted packet that we are still encountering.

After doing all the TFTP fetches, the RX side seems to have gotten out of sync a bit, with one more packet received than it releases, causing all ping replies to be one second late.  Reseting the ethernet controller fixes that, but it still shouldn't happen. I'll have to make another debug bitstream that shows the RX buffer allocations again, so that I can see what on earth is going on.

The errors I am seeing now are a combination of:

1. Bad ethernet frames received by the PC, indicating that our TX side is doing something wrong.

2. Truncated ethernet frames received by the PC, but with valid checksums, indicating that probably our RX side has messed up.

For both of these, and even for some of the prior problems, I am wondering if there isn't a problem with the clocks for the ethernet. The ethernet clocks were not being driven though a BUFG global buffer, and thus might have been skewing around within the ethernet controller. I've added BUFGs to all the ethernet clocks, to see if that doesn't solve the problem.

Well, that didn't solve the problem either: We still are stuck with this rather stubborn 3% -- 5% packet loss, truncation or corruption.  That said, its now flat at that level, regardless of the size of the packets, which it wasn't previously, and the contents of the received packets seem to get corrupted much less frequently, so significant progress has still been made.

But... Looking at the timing report, it looks like adding the BUFGs to the clocks has changed the names of the clock, so timing closure was quite badly out.  So I guess I'll resynthesise that, and try it out.  Resynthesised that, timing closure is now good, but the problem remains.  

Next step is to try producing a series of identical packets from my Linux machine, and send them to the MEGA65, to see if packets are getting corrupted while being sent to the MEGA65.

I'm pulling my hair out on this, so I'll go and attack some other problems for a while, then come back to this in a week or two.

Its now a couple of weeks later, and I am revisiting this.  I found and fixed one problem with the TX state machine getting stuck, which fixed the problem where it would stop sending packets after a while. However we are still seeing wrong packet lengths on received packets, and also that some packets are not being sent.

To try to understand why some packets are not being sent, I have been testing sending series of identical packets, and have indeed observed that sometimes the packets don't get sent.  The linux box reports no RX frame errors, which means that we are either sending good frames, or not sending a frame at all.

This makes me think that the ethernet chip on the MEGA65 itself is blocking the transmission of the frames OR that my TX finite state machine is getting stuck in a way that prevents a single packet from being sent OR that the frame preamble is too short to be detected by the receiver. Lets look at each of the three, to see if we can figure out which it is, and how to fix it.

1. I've looked through the datasheet, and I can't find any mechanism for suppressing the transmission of frames (some chipsets have jabber prevention on the TX side, but this one doesn't seem to). So it probably isn't that.

2. I've added a counter that counts each time the TX frame gets triggered, which should help detect this.  It would also be great to probe the TXEN pin on the ethernet chip.  Unfortunately it is on the underside of the PCB, and I think its one of those little surface-mount jobs that have no exposed pins, which would make that difficult.  But the counter should tell us, anyway.

3. If its the preamble being a bit too short, then extending it a few symbols to allow time for TXEN to take effect should be a fairly simple intervention.  It needs to be 56 bits = 28 dibits. I was using 29, presumably to allow for the first to be lost.  I've now increased it to 32.

Those changes are now synthesising, so hopefully we will get some answers. If it suddenly is fixed, then we know that the longer preamble was the problem.  If not, we can check the TX counter to make sure that it increases after each packet is sent.

Well, that's interesting: Increasing the TX preamble has now reduced the percentage of packets that get accepted to less than 10%.  I was able to confirm that the TX state machine seems to be working fine, though, using the debug insturmentation I added for this.

Now it looks like the preamble length is the problem. So I will make this run-time adjustable, to speed up the debugging of this. That still requires another bitstream to be built, but hopefully once built, I can nail this problem.

I found it interesting that making the preamble longer than required caused problems. I would have thought that ethernet cards would accept any valid preamble sequence, and thus a long preamble would be more reliably received.  But maybe its the other way around: An excessively long preamble gets ignored as being assumed to be a jabbering network card, and perhaps shorter preambles will be accepted, with the receiver assuming they might have missed the first few bits.

One curious thing with this: I was sending 29 dibits = 58 bits = 7.25 bytes, then the %11 dibit that marks the end of the preamble. Thus I was 4 bits short of the correct length.  Anyway, once this bitstream builds, I can work out how the preamble length affects reception of packets in a more thorough way.

Bitstream built, and the results are a bit annoying:  The randomly lost packets still continues, but I have also seen the TX state machine lockup. Admittedly just the once, but that's one too many.

What I have discovered, is that the preamble MUST be an odd number of dibits long. This does suggest one possible mechanism for the problem: If there is a race-condition to set the TX dibit and assert TXEN, then it is possible that sometimes the first dibit sent is NOT correctly sent.  This is not so implausible, as I have seen that if I do set it to use an even number of dibits, that it isn't that no packets are received, but rather that a very small fraction of packets are received... a suspiciously similar ratio to the number of packets that get lost when I use an odd number of di-bits for the preamble.

The trick is to see what I can do, to avoid this problem. I already have a TX phase adjustment mechanism, where I can vary the phase of the TX signals compared with the 50MHz reference clock. However, I don't have a mechanism to vary the phase between these two signals. However, according to what I can see in the datasheet for the ethernet controller, this shouldn't matter: All that is important, is that these signals be stable on each rising edge of the 50MHz reference clock.

So it should be possible to adjust the TX phase to make sure of this -- provided that the path skew on the eth_txd and eth_txen are stable and approximately equal.  So lets check the four possible TX delay phase values, by checking the packet loss rate of sending ~100 ping reply packets, 5 times each to get an average:

%00 - 93, 87, 85, 90, 93 = average 89.6% packets delivered = 10.4% packet loss

%01 - 92, 95, 89, 92, 84 = average 90.4% packets delivered = 9.6% packet loss

%10 - 88, 91, 88, 92, 90 = average 89.8% packets delivered =10.2% packets lost

%11 - 94, 88, 85, 84, 87 = average 87.6% packets delivered =12.4% packets lost.

All are broadly similar, to within the statistical variation of the tests.  For example, running subsequent tests with %11, I see results of between 8% and 13% packet loss. So I am fairly confident in the conclusion that the TX phase is not having any effect.

This is a bit surprising, as I would expect shifting the phase should allow at least one setting where the eth reference clock and data are sufficiently out of phase as to cause problems. Indeed, I have seen it doing this previously. It is sufficiently weird as to make me suspect that the TX phase control is not working for some reason. However, every indication from the source code is that it is working.

I have also checked that I have a few output drive stages, so there shouldn't be any big problems with skew or glitching.

I'm now going through the timing report for the design, to look for any problems there.  There are no timing violations for the eth_txd signals, but there is a large latency from eth_txd_out[1] to the actual eth_txd[1] pin of the FPGA of 8.913 ns, and for eth_txd_out[0] to eth_txd[0] a delay of 8.880 ns.  The eth_txen output delay is 8.586 ns, which should be close enough, given the 50MHz reference clock has a period of 20 ns, and we are talking about a skew between the txen and txd signals of less than 0.5 ns.  Thus because the difference in delay between them is small, any problem should be solvable by tweaking the TX phase setting to get them to line up nicely with the ethernet clock... which is clearly not happening.

The slew rate on the physical pins were set to slow, which I am now trying to set to fast, to see if that doesn't help tidy things up.

But its also looking like I will need to pull the mainboard out of the MEGA65 so that I can flip it over to get to the ethernet chip, and measure things at the pins of the ethernet chip, to see if there is any weird problem with the eth_txen being differentially delayed by different pcb trace lengths or some other effect.  

I'll get to that in a moment, but in the meantime I also resynthesised with a change that sets the eth_txd signal to %01 when idly waiting before transmitting a packet.  My goal here was to make sure that it couldn't latch an incorrect dibit value, thus causing the preamble to be sent with an even instead of odd number if dibits.  The results of this for the four different TX phase values, as before are:

%00 - 964 out of 1,000 packets received = 3.6% packet loss

%01 - 962 out of 1,000 packets received = 3.8% packet loss

%10 - 974 out of 1,000 packets received = 2.6% packet loss

%11 - 971 out of 1,000 packets received = 2.9% packet loss

Again, these are all within statistical noise of one another, but are all also substantially better than without this little change.  Sending 900 byte ping packets has the same packet loss, so we can at least be confident that the bit frame is fine.

It is also still holding true, that if I change the preamble length to be even, that I end up with packet loss the opposite to if the preamble is set to and odd length, i.e., ~96% packet loss instead of ~4%.  Thus I still think that the TXEN line is asserting at not quite the right time.  So time to flip the MEGA65 main board over, and investigate...

Well, that's interesting... At first I thought the problem was slow ramping of the eth_txen line. But that's not the problem: For starters, I can rotate it around the eth reference clock using the TX phase control, and it makes no difference.  So I stuck the oscilloscope probes on eth_txen and eth_txd, so that I could see the delay from eth_txen being asserted to the start of the packet body.  When I do that, and have the oscilloscope set to trigger on eth_txen rising edge, I can see that the preamble is not always the same length. It sometimes is 20ns = one dibit longer. 

Confirming this required a bit of rather fiddly work. Fortunately I have a really nice Zeiss microscope available to me that has HDMI out.  This thing really is a dream for fiddly work:

So I set that up with OBS to capture the HDMI stream (my spare HDMI monitor is broken, so I couldn't just display it directly). Then I could easily see the tiny 5mm square chip, and find traces that led to the pins I cared about, and stick the oscilloscope probes in the vias on those traces:

To give an idea of how nice the microscope, the above was taken at x0.8 magnification, without any real fiddling.  The digital camera is quite high resolution, and I can easily zoom in to see features a fraction of a milli-metre wide, like the traces below the chip:

... and that's without switching the front optics to a higher gain.  But I digress...

Then it was a bit of gymnastics to hold both probes in the tiny vias which were also so close to each other I had to hold them at the right angles, and then hold my phone in the other hand to take repeated photos of the oscilloscope display until I got one that showed both good and bad traces. But I managed it in the end:

Well, this is all a welcome discovery, as it explains all of what I have been seeing lately, including that switching to an even preamble length roughly inverts the packet loss rate.

The only problem is, I can't for the life of me see how this can be happening.  The part of the state machine responsible for sending the preamble is rather simple:

    if rising_edge(clock50mhz) then
        case eth_tx_state is


        when Idle =>
          if eth_tx_trigger = '0' then
            eth_tx_complete <= '0';
          end if;

          -- XXX Try having TXD be ready for preamble, to see if that
          -- fixes the weird problem with packet loss due to wrong preamble length.
          eth_txd_int <= "01";
          if eth_tx_trigger = '1' then

            -- reset frame padding state
            eth_tx_padding <= '0';
            if to_integer(eth_tx_size)<60 then
              eth_tx_size_padded <= to_unsigned(60,12);
              eth_tx_size_padded <= eth_tx_size;
            end if;
            -- begin transmission
            tx_frame_count <= tx_frame_count + 1;
            eth_tx_commenced <= '1';
            eth_tx_complete <= '0';
            -- 56 bits of preamble = 28 dibits.
            -- We add a few extra just to make sure.
            tx_preamble_count <= tx_preamble_length;
            eth_txen_int <= '1';
            eth_txd_int <= "01";
            eth_tx_state <= WaitBeforeTX;
            eth_tx_viciv <= '0';
            eth_tx_dump <= '0';

          end if;

        when WaitBeforeTX =>
          if eth_tx_packet_count /= "111111" then
            eth_tx_packet_count <= eth_tx_packet_count + 1;
            eth_tx_packet_count <= "000000";
          end if;          

          txbuffer_readaddress <= 0;
          eth_tx_state <= SendingPreamble;
          report "Reseting TX CRC";
          tx_fcs_crc_load_init <= '1';
          report "TX CRC init not announcing data";
          tx_fcs_crc_d_valid <= '0';
          tx_fcs_crc_calc_en <= '0';
        when SendingPreamble =>
          if tx_preamble_count = 0 then
            eth_txd_int <= "11";
            eth_tx_state <= SendingFrame;
            eth_tx_bit_count <= 0;
            txbuffer_readaddress <= txbuffer_readaddress + 1;
            tx_fcs_crc_load_init <= '0';
            report "Releasing TX CRC init";
            tx_fcs_crc_d_valid <= '1';
            tx_fcs_crc_calc_en <= '1';
            report "TX CRC announcing input";
            if eth_tx_viciv='0' and eth_tx_dump='0' then
              eth_tx_bits <= txbuffer_rdata;
              tx_fcs_crc_data_in <= std_logic_vector(txbuffer_rdata);
              report "Feeding TX CRC $" & to_hstring(txbuffer_rdata);
              eth_tx_bits <= x"ff";
              tx_fcs_crc_data_in <= x"ff";
            end if;
            eth_txd_int <= "01";
            tx_preamble_count <= tx_preamble_count - 1;
          end if;


The bold lines show that we work on the 50MHz clock, and send a preamble of tx_preamble_length di-bits (which is a constant register), before moving to sending the frame.  

There is simply no place for a single extra di-bit to be added to the preamble.  The ethernet clock is fed through a BUFG globl buffer, so shouldn't suffer from excessive jitter or skew.

It is nigglingly me that it feels like the state machine can get confused in some way that I am not considering, given that the transmit state machine generally seems able to get in a mess.  There are only two signals that can influence the TX state machine outside of the state machine itself: eth_tx_trigger and reset lines.  The reset lines are stable, so should not pose a problem. However, eth_tx_trigger does get directly set by the 40.5MHz CPU side, and might be able to glitch, rather than just asserting stably.

This is easy enough to test: Add a drive stage on the 40.5MHz side of the signal, and then add an extra flip-flop to latch it across to the 50MHz ethernet side.

I won't be surprised if this works, as when the ethernet stuff used to work perfectly, it was probably when the CPU was at 50MHz, and thus there was no need to do cross-domain clocking stuff in the ethernet controller.  The only question will be if there are other signals crossing between the two sides that I need to handle.

However, as seems to perpetually be the case with sorting this ethernet adapter out, it still doesn't work.  BUT it has fixed the random insertion of an extra dibit into the preamble. It hasn't fixed the TX state machine locking up, however. So there are two pieces of evidence that something fishy is still going on.

BUT we have progress: The dropped packet rate HAS dropped, and all of the dropped packets are actually due to truncated packets, i.e., the problem we discovered earlier where the length of received packets is incorrectly recorded. So maybe we ARE on the home run after all...

To find out more about the truncated packet problem, I tried a little experiment. Basically I suspect that the length field of the wrong receive buffer is updated either all of time or at least some of the time.  So I ran two separate ping sessions with the same packet size as each other, and then again with different sized packets.  If the problem is recording packet lengths in the wrong receive buffers, then we should see a difference in the number of truncated packets.

So, let's try that, first with 900 byte ping packets on both sessions: 2% and 3% packet loss in the two sessions, i.e., about what we see doing a single session.

Now lets do it again with 900 byte ping packets and 100 byte ping packets: 30% and 12% packet loss, and bizarrely, one duplicate.

The packet loss is clearly higher with different size packets, and is higher for the longer of the two lengths.  Both point to the length of the wrong packets being used, as for the shorter packets, the PING reply packet can be properly constructed with too much data; its only with truncated data that it causes problems.  I can see further confirmation of this when I instrument the test programme on the MEGA65 to display the reported frame length when doing the double-ping with different packet sizes, and I can see that the wrong length is reported for some frames.

What is mysterious is how the wrong length can get written.  eth_frame_len is used to write the bytes of the ethernet frame as it is received.  This is then the value that gets written into the first 2 bytes of the RX buffer, so that the receiver knows the length of the received frame.  The end of frame is not indicated to the RX buffer rotator until 15 cycles after the last byte gets written, which should be stacks of time (especially since during ethernet RX one byte gets written every 4 cycles, and that seems to work just fine).

Starting to dig around in this: One part of the problem is resetting the TX finite state machine actually also resets the RX buffer allocations.  This causes some problems.  So we need to really fix the TX state machine lockup stuff properly.

Then what I am seeing with two pings running, is that it tends to go fine until some other packet is received, which presumably causes a pattern of RX buffer usage that tickles the problem as the following output from my test programme shows, where presumably two packets of 64 bytes each arrive, causing messed up packet lengths to be reported:

 346 34 ping!                  ca @a  
 146 43 ping!                  @b ab  
 346 35 ping!                  ac bc  
 146 44 ping!                  b@ c@  
 346 36 ping!                  ca @a  
 64 45 ping!                   @b ab  
 346 37 ping!                  ac bc  
 146 46 ping!                  b@ c@  
 346 38 ping!                  ca @a  
 64 47 ping!                   @b ab  
 146 39 ping!                  ac bc  

Each line begins with the size of the packet, the sequence ID for the ping, and then the buffer allocation information, both before and after processing the packet as two letters.  @ means RX buffer 0, a = RX buffer 1, .. c = RX buffer 3.  

Thus the first means that a 346 byte ping packet with sequence number 34 was received. Before the packet begins to be processed, the CPU can see RX buffer 3 and the ethernet controller will write into buffer 1 next, as indicated by the ca string. After the packet has been processed, the CPU is now looking at RX buffer 0, and the ethernet controller still at buffer 1, as indicated by the @a string, since no new packets have been received.

Next a 146 byte ping packet with sequence number 43 (from the second ping session) is received, and we see the RX buffers advance one position.

This repeats until we see that apparently we have received a 64 byte ping packet with sequence 45, when it should be 146 bytes long.

So we can see that the buffer lengths really are being mixed up. 

The more I look at it, the more it looks like the length of the received packet simply doesn't get written some of the time, as compared to being written into the wrong buffer.  This is rather odd, since the contents of packets get written without error, so it seems rather unlikely that this logic is faulty.  Even when we reject a frame due to CRC error or it not being addressed to our MAC address, the logic writes these length bytes.

It looks like we only successfully write the length of a frame around half to three-quarters of the time.  So this is quite a frequently occurring problem.  It seems like it should therefore have a pretty obvious cause in the VHDL...

The only thing that looks like it could have any impact is the logic for doing the cross-domain clock transfer of the write requests.  However, the fact that it works reliably for the bytes in each frame makes it seem rather unlikely that this is to blame. 

Nonetheless I have tweaked the logic for detecting the toggle for triggering a write to the RX buffers to see if that solves the problem.  The problem still persists, however, with packet lengths not being reliably written.  But it does seem to happen less often.  I'm not sure if I am happy about that or not.  On the one hand, its nice that it is getting closer, and has helped. But on the other hand, it makes no sense that it shouldn't work for writing the frame length bytes when it works very reliably to write the bodies of packets, without a single data error.

Okay: I think I might have found it after all that: There was no delay being added between writing the 1st and 2nd bytes of the frame length, when there should have been. That would cause writing the low-byte of the frame length to fail sometimes. So this sounds quite plausible as a mechanism.  The tidying up of the latching would also have helped reduce, but not eliminate the problem, so it all is consistent.  Let's see if it really fixes it...

... and it does! I can now run two ping sessions with different sized packets, and have zero packet loss.

Now the only problem is that the RX buffer management is still being a bit odd, with the last packet received not being available until the following packet arrives.  This only happens after some condition has been satisfied, however, as resetting the ethernet controller clears this behaviour.  Of course it only happened once, and I can't reproduce it, to see what is going on...

Ah, that's got it! It seems to be happening after the RX buffers have all been filled. I've fixed that, but now it looks like the ethernet RX IRQ is a bit sticky, which means that we sometimes see the same packet 2 or 3 times.  But that, I can live with for now, since the actual function of the ethernet controller is now basically as it should be, with zero packet loss, even if I run a ping flood from linux at it with 10,000 packets.  So I think I'll leave it there for today.

Friday, 21 August 2020

OpenROMs Update

Howdy folks,

Feral Child has been working through the European summer heat on the OpenROMs (our free and open-source C64/C65 ROM project), and has the following update for our reading and viewing pleasure:


Hi there! It's been a while since the previous Open ROMs update, time for a new one!

I'll start from not-so-good news - I'm running out of ROM space. On the stock C64 the Kernal + BASIC has to fit within 16 KB total; with the original C64 ROM set there are very few bytes left, and the code is already space-optimized most of the time. That means, if you reimplement the firmware (or mod the original one), it is very hard (or even impossible) to add any significant feature without removing something else. This is already clearly visible - for unexpanded ROM it is no longer possible to enable all the features the Kernal replacement offers, also there is not that much BASIC ROM area left. For example, I had to disable the built-in tape head alignment tool for the default build:

Fortunately, lack of space is not a problem on the MEGA65 - it has an abundance of extra ROM, which can be mapped anywhere in the C64 address space with 8KB granularity. Also, the Ultimate64 community started an interesting effort - additional Kernal ROM banks. Although the concept is quite different from what MEGA65 does, it should still be possible to take advantage of this extra space in the future.

So, what takes so much space? The official Kernal API is nearly complete at this moment, but feature-wise... it is now able to utilize two most popular drive mods: DolphinDOS and JiffyDOS (but the latter protocol is not fully implemented yet, LOADing is not as fast as it could be - but still much faster than with standard IEC transfer). Additionally, there is tape support - limited to loading programs, but can read both the standard format and TurboTape64. The turbo-saved files can be up to 250 blocks long (not all the implementations can read such a large chunks of data), and there are raster and sound effects (with normal format only raster effects are available - it uses such a long pulses, that for now every attempt to add SFX resulted in a very low buzzing sound, probably dangerous to the speakers in the long run). And that's not all - normal/turbo formats can be autodetected, you don't have to worry about special syntax to select the proper loading routine. No turbo implementation I know is able to do this!

BASIC also saw some work, although it still needs more effort before it is really usable. Besides various reworks and speed improvements, at last we have some variable support. For now only strings (including TI$, string arrays are mostly functional too!) - although some floating point routines are already written, they still need more testing, and are not exposed as variables yet. The expression parser (code that interprets mathematical expressions, including brackets and operator precedence) is already written too - for now it can be used to concatenate strings.

Our string handling is a little bit different than C64 one. Microsoft BASIC interpreter has a mechanism called garbage collector, which is triggered once there is so much garbage (obsolete strings - replaced with different ones, but still kept in memory), that it isn't possible to allocate any more memory - the original mechanism (used in PETs, VIC-20, and C64) is very ineffective (square computational complexity), the C64 can freeze for half a minute before the process is complete. Starting from CBM-II and C16/C116/C+4, an improved algorithm was implemented - much faster, but requires additional 2 bytes per each string, a so-called back-pointer. Open ROMs contains a similar mechanism, although it differs in details (there is no magic value to denote a temporary string). You can read more about these algorithms here:

Open ROMs has one more improvement over the original memory handling - the Commodore C64 ROM displays '38911 BASIC BYTES FREE' message on the startup banner, despite the machine having 64KB of RAM (not including the colour memory). 38911 is much less than 64KB - but this was done for a reason. First, the ROM needs some memory for itself, to store screen content and some other stuff - this takes away 2KB. Second, some of the RAM cannot be easily accessed by C64 BASIC, as it is hidden 'below' the ROM or I/O area. Old builds of Open ROMs displayed '61438 BASIC BYTES FREE', as they contained a specialized code (helper routines, which has to be copied to RAM) to access nearly all the C64 memory. But this code has two drawbacks: it is much slower, and utilizes some RAM which normally can be used by applications or games - which might easily lead to compatibility problems. Therefore, at some point I have disabled this mechanism (but it is still there, and can be enabled on custom builds) and implemented the C64-like behaviour. But now the default Open ROMs build displays '51199 BYTES FREE':

because we have another 'memory model', which is able to use RAM hidden below the $A000-$BFFF BASIC ROM - which can be accessed relatively cheaply if the memory access routines are placed in a high ROM area, $E000 or above. It does not use RAM below the I/O or the $E000-$FFFF ROM - it is harder to access safely, as one has to redirect the interrupt handlers back to the ROM. I hope this will remain as a default memory model for MEGA65 build - I'm not sure about the generic C64 one, as the required memory access routines need some extra ROM space.

That's all for now - let's hope not much will be found broken once I start testing the Open ROMs with the upcoming DevKit!

Tuesday, 18 August 2020

Debugging IEC port on MEGA65 R2/R3 boards

So we are almost finished testing the MEGA65 R3 board, but have hit a funny problem, that is now also showing up on my R2 board:  The IEC port is not working reliably.

The symptom is that the CLK and DATA  lines don't go to 0V when they are pulled low by the MEGA65, but stay at around 2V -- but only when a drive is connected.  

The only "drive" I have here in the desert with me is a Pi1541, so that will have to do.  Fortunately it has nice test points:


It uses a 7406 inverting hex buffer, to give maximum compatibility.

The Pi1541 uses 1K pull-ups, resulting in 5mA at 5V, while the MEGA65 uses NC7SZ126 drivers, which can sink 25mA. So the MEGA65 really should win, hands down.  But this is not what we see.

What is interesting, is that the problem occurs, even if the Pi1541 is not powered.  After being reminded that 7406's can be fried by hot-plugging drives and computers while powered on, I am now wondering if it isn't the 7406 that is fried.  Because the MEGA65's drivers are quite strong, I wonder if that hasn't increased the likelihood.

Basically I can't think of a better theory, since the problem happens only when the Pi1541 hat is connected to the MEGA65, even if I disconnect the hat from the Pi itself.  There just isn't really much more on the board apart from that and the pull-ups.  I did test that the pull-ups weren't too heavy by measuring the resistance between the lines, and also simulating 1K pull-up with a resistor directly in the MEGA65's IEC port.

So time to pull and replace the 7406. There are a few problems, though. First, I'm in the middle of the Outback with only limited facilities. I *thought* I had packed a set of hot soldering tweesers as part my soldering station, but that was, sadly, only my imagination speaking.  So that means its time for some quality time with my solder sucker. Except that seems to make as good a seal as a fly-screen door on a submarine.  So time to chop the legs of the 7406, and poke them out with the soldering iron one by one and replace the chip.  

Now about replacing that chip... I still live in the middle of the Outback.  Fortunately I do have my box of crazy electronics parts, which includes a good selection of early 1980s 74 glue chips. Unfortunately, none are 7406's.  BUT I have several 7405s, which are quite similar, but have lower maximum current sink capacity.  Depending on the manufacture it can be a limit of around 8mA instead of 30 or 40mA for the 7606.  8mA is probably sailing a bit close to the wind, but as the nearest spare 7406 is several hundred kilometres away, we'll give it a try. The worst that can happen is that we let the magic smoke out.

But while I am working through all of that, it occurred to me that the original problem of the IEC port not working properly might have been that I was controlling the IEC drivers on the MEGA65 main board incorrectly, pushing high, instead of tri-stating the output buffer, and relying on the pull-up resistors to float the lines high.  

Of course I realised this just after the German part of the team had gone to sleep for the night, and as I don't have any other drives here, I couldn't test it.  Anton has now tested it on his R2 and R3 PCBs, and his SD2IEC and 1581 each work on both boards.  However, his 1541 didn't work on the R3. Testing it on the R2 board found it also not working there. So now we are trying to work out if his 1541 is faulty, or if the MEGA65s are having problems communicating with real 1541s.

We know the problem is unlikely to be timing, because the other drives work.  

So we are now remotely going through the process of probing the IEC lines with his 1581 plugged in, which we know works.  We will then repeat the process with the 1541 plugged in, and see if we can spot any differences. As this can be sensitive to the level of the IEC signals, we will do it 8 times on each, with the different IEC line level combinations.  In IEC test the lines are cleared with 1,3 and 5, and set with 2,4 and 6 respectively.  So we will use that notation for the tests, and in all cases, probe pin 5, the data line.

The easy way

1581: pin 5 (data)

135 - HIGH 4.85 V

136 - LOW ~0V

145 - HIGH 4,85 V

146 - LOW ~0V

235 - HIGH 4.85 V

236 -LOW ~0V

245 - HIGH 4,85 V

246 - LOW ~0V


1581: pin 4 (clock)

135 - HIGH 4.85 V

136 - HIGH 4.85 V

145 - LOW

146 - LOW

235 - HIGH 4.85 V

236 - HIGH 4.85 V

245 - LOW

246 - LOW

1541: pin 5 (data)

135 - HIGH 4.53 V

136 - LOW

145 - HIGH 4.52 V

146 - LOW

235 - HIGH 4.52 V

236 - LOW

245 - HIGH 4.52 V

246 - LOW


1581: pin 4

135 - HIGH 4.51 V

136 - HIGH 4.51 V

145 - LOW (always at around 0.08V)

146 - LOW

235 - HIGH 4.51 V

236 - HIGH 4.51 V

245 - LOW

246 - LOW


So, those all look pretty much the same.  This makes me think that maybe there is still a CPU speed problem somewhere. If so, its rather odd, because I can run the freeze-combined.prg test programme, and while it isn't perfectly meta-stable, its accurate to 99.9% or so of C64 CPU speed (the only difference is we are not currently doing the write(s) of an instruction during the 3-cycle BA setup time).

Investigating further, we find that the first time trying to access the 1581 from C64 mode, we get a device not present error -- but 2nd time it works.  So we tried again from C65 mode, and now the 1541 is responding.  Nothing else has changed, except maybe that the 1541 has warmed up a bit.

Yes, and now its also responding from C64 mode.

Rechecking voltages on the 1541, its now upto about 4.86V, so we can't (yet) rule temperature out. But even if it is temperature, we need to know what is going wrong, so that we can actually make it work with all drives all of the time.

Anyway, this is all feeling weirder, although my gut still tells me that some timing issue is still not quite right, and is the cause.  The trick is how we can capture good and bad traces, so that we can play "spot the differences" and work out the problem.  One of those cheap USB data loggers and an IEC extension cable that taps them off might be a good path here. Again, not something I can easily conjure up in the desert, and if I don't get the Pi1541 working, I don't have a way to test anyway, so it will be depending on the rest of the team back in Germany to help me further on this one.

Another option is to use two MEGA65's connected via their IEC busses, with one sniffing and logging the bus traffic, while the other talks to the drive.  Alternatively, we could use something like Vivado Integrated Logic Analyser, that can log various signals to BRAM in the FPGA, and then later visualise them.  As the R3 board has more BRAM than the R2, there is actually some free BRAM, enough to be able to try this. But first, some sleep, then to think about the best path of attack.

But before I do hit the sack, we tested using a C64 ROM on the MEGA65, and also using drives set to device 11, and leaving the MEGA65's internal drive sitting on 8 and 9. And that DOES work will all drives. So it looks like we are tickling bugs in the C65's not completely finished DOS -- or the ROM patch to change the device number needs a bit of further tweaking.  But that can all wait for another blog post.

Sunday, 16 August 2020

MEGA65 R3 PCB Testing

This week we have been testing all the changes made on the R3 PCB. The pain has been that we have a very tight time-frame for this, and I don't have one of the new R3 PCBs here with me.  

In short, we have to tell Trenz in the next day or so, whether there are any known problems with the board, so that they can order the bare PCBs in time for the DevKits to go out later in the year.  But COVID19 and logistics have thrown a spanner in the works, because I am here in the Australian Outback -- some 17,000km from the test boards in Germany -- and international freight is, shall we say, a little problematic at the moment.

So I am having to do the whole PCB bring-up stuff remotely.  Fortunately the R3 board is very similar to the R2 board, with just a few fix-ups in various places. The list of things that have changed in some way, includes:

1. Pull-ups included on all ports that were missing them on the R2.

2. POT line support on the joysticks, in a way that is compatible with the 1351 mouse and paddles.

3. SRQ line on the IEC port is now fully present.

4. Support for 2nd internal floppy drive on the 34-pin connector.

5. Internal speaker is now stereo, and using a different amplifier.

6. digital video / audio output now uses a direct FPGA connection instead of ADV7511 buffer chip.

(1) was easy to test: If the machine boots and works with joysticks and cartridges etc, without soldering anything on the board, then it is fine, which it is.

For (2), this was also fairly straight-forward, in that hooking up a mouse and testing it with a little BASIC programme is pretty simple.

(3) and (4) were also fairly straight-forward. The real problems with remotely testing are with (5) and (6), as I can't just poke and prod with my oscilloscope.  Even just checking if new registers are showing up becomes much harder.  We have had to coordinate across the oceans and time-zones, and do a lot of remote-desktop control, so that I can interact with the R3 PCB from afar:

To add to the fun, it looks like one of the three sample boards has a fault with the IEC port circuitry.

We have of course organised to send one of the boards here, but with COVID19, its really hard to predict when the board will make it here to Australia. Then when it does arrive in Australia, it has to get to where I live at the moment.  Normally I live in a large city, but not this year. This year for various reasons I am living in the middle of the Australian Outback. While this has a number of nice benefits, such as seeing pretty wild flowers, our fun lizards and other bush critters, and even a bit of opal hunting, as the following pictures show, it does also create some logistical problems...

Sturt's Desert Pea, the state's national flower, which is almost impossible to grow purposely, but sprouts up in the Outback, wherever it feels like.

Bearded Dragon warming itself on the road.

Edge-on view of opalised fossil cockle shell.

But back to those logistical problems...

I won't give my actual address here, but let's suffice by saying that it can be approximated as something like:

"The house the other side of the caravan park in the tiny village in the middle of the Outback, about 135 km from the nearest sealed road."

For the postal service, life is relatively simple: There is a private bag for the village (kind of a postal version of the old party-line phone lines), which gets picked up from the nearest regional centre (about a 6 hour drive away!), and then delivered to us on an Outback Mail Run twice a week.  While things might not be quite as tenacious as they were in the 1950s when Tom Kruse was the famous Outback Mailman, the posty still has to drive that 135km of rough, stony and corroguated dirt road.  

Where the post can go wrong, is if the post office in the city puts the post into the wrong private bag.  Then it goes to the wrong Outback Station, who hopefully realise and put it back into the next bag for collection by the posty.  That adds a week, then add another week for it to get back to us here. And that's assuming it doesn't sit on someone's mail shelf for a week or two, before they realise it isn't for one of their workers, and put it back in the bag, and then convince the post office to not put it straight back into their bag again the next week.  In short, the post can take upto 6 weeks to get here, if things go a bit pear shaped.  Even Tom Kruse was faster than that in his old Lleyland!

But for international couriers, its even more annoying: NO couriers deliver to Arkaroola, where we live. The tourist enterprise here has to ship their own supplies in, including all food, because they can't get any delivery service apart from the post. The nearest point that they will deliver is the end of the sealed road 135km away.

Fortunately we know folks who run the caravan park there, and they are kind enough to hold couriered parcels for us, and bring them over when they visit. So in theory we can get a parcel with the R3 PCB sent to them, and then get it over here quickly after that at the standard Outback surcharge of a cold beer. But first we have to get it to them...  And that's where COVID19 is playing jip with everything: International freight is quite impacted, so it might take a few weeks to get here. Or a few days. Noone can tell us, and DHLs tracking page is also not really telling us anything useful, only that the parcel has been cleared for departure from Frankfurt. 

And in the meantime, we continue to debug remotely...

Friday, 7 August 2020

MEGA65 Revision 3 PCB Samples For Testing

 Another exciting moment this week: We have the samples of the Revision 3 PCBs for the MEGA!  Pretty pictures follow, along with some discussion along the way.

First up, we can see that the PCB is quite similar to the Revision 2.  In fact, it has as few changes as we could get away with, while fixing various niggly little problems:

This view more from above shows that we have much nicer silk-screen artwork, clearly labeling it as a MEGA65, recognising Trenz, MEGA, and the MOULDS TEAM, the tiny entity we had to setup to handle the moulds funding stuff:
We'll cover some of these things a bit more in some of the photos coming up, but we can see here that there is an extra 2-pin connector on the right hand-side, so we can now support stereo internal speakers, complete with 2W amplifier, so you don't need to worry about having a monitor with speakers if you want to kit your MEGA65 out for easy party use:

A bit closer view of some of the connectors.  The keen-eyed will notice some changes to the digital video output area: We have changed from the ADV7511 display driver to directly driving the digital video from the FPGA, to solve problems we had with sound on the ADV7511.  The 3.5mm audio jack also has the pinout corrected, and works for stereo output straight up now:

The power switch and cartridge port area has been reworked to allow more room for super-wide cartridges, as well as adding missing pull-ups to the cartridge port and joystick ports:

And perhaps the biggest change, in terms of long-term effect, is that we now have an A200T FPGA instead of the A100T.  This means about 2x the logic capacity and almost 3x the internal RAM (BRAM) in the FPGA. This means we can support other more complex machines more effectively.  AGA Amigas should be absolutely no problem for example.
From the back, the main difference to see is all the nice pull-up resistors for the expansion port:
Back to the front, from top left going anti-clockwise, we have the internal SD card, PMOD expansion connectors, dual floppy power connectors, and the 34-pin floppy connector.  The floppy connector now has all the signals to support two drives connected at the same time:
Here we have the nice MEGA logo, and the keyboard connector above it.  The red connector and normal header are both connected together. This just gives us some freedom as to which style cable we use.  The keyboard header can also in principle host some simple expansions, as it has lines direct to the Xilinx FPGA.  Speaking of FPGAs, we also see the MAX10 service FPGA.  This FPGA does some power house-keeping, and also gives us the eventual posibility of making the main FPGA boot from a bitstream stored on the SD card, so that you aren't limited to what can fit in the 32MB flash that the Xilinx normally boots from. We also see the unpopulated JP21, which has 12 lines directly connected to the MAX10 FPGA, and can be used for internal expansions as well.  We're in the process of making a mechanism to remotely access that header from the Xilinx FPGA:
Here we have the monster A200T FPGA again:
As previously mentioned, we have the extra speaker connector to support stereo internal speakers. There are screw bosses in the injection moulded cases to accept some really nice 2W 31mm diameter speakers we found for using the in MEGAphone. We won't be supplying these in the MEGA65, but you have the connectors there to add some nice loud internal speakers:

Finally, we have the overall top view, before I sign out and start working in earnest getting the board fully tested, so that we can commit to manufacture for the DevKits!

Tuesday, 28 July 2020

MEGA65 Emulator and Tools Live ISO

Snoopy, one of the MEGA65 enthusiasts who frequents forum64 has done some amazing workk, making a live ISO image containing LGB's MEGA65 emulator, the MEGA65 documentation and various other goodies.  You can follow the thread directly here, but for your viewing convenience, and with Snoopy's permission, here's the low-down:

As mentioned and discussed in this thread, I have written some toolscripts, configured them for suitable use and build now a new iso image for USB stick or DVD which let you boot a live system based on Xubuntu for an easy use of xmega65 and xc65 of the Xemu emulator.

Because of potential copyright issues the original C65 ROM is not included in the live system.

The live system offers you after booting the options to load the system ROM file (has to be named MEGA65.ROM) from a connected storage medium (USB stick, SD card, ...) or to download the ROM file from the website '' and patch it automatically to the latest C65 ROM patch "AH". The patch "AH" fixes some issues of the original ROM (e.g. more free BASIC10 memory and a work around for the "disc change bug"). Because the live system is read only, you have to repeat this procedure after every new start of the live system.

Please note, if you do not load or download the C65 ROM you can use xmega65 only with the included OpenROM from the Mega team.

The live system also includes the "Mega65 book" manual (the version from 18th of July 2020 with 734 pages) which can be updated to the latest available version with the "Update MEGA65 book" icon on the desktop. Because the live system is read only, you have to update the MEGA65 book after every new start if necessary.

Also included are a few disc images (read only, because it's a live system!), which you should copy first to an external USB stick or SD card. So you could use them also for saving data onto it with the emulator. In Xemu you can attach a disc image with a right mouse click.

The Xemu live system offers you a Xubuntu (64bit) system with English system language and four selectable keyboard layouts:

- English (US) (as default)

- German

- Hungarian (greetings to LGB-Z for his great work on the Xemu emulator!)

- Italian (there are some great C65 fans in Italy)

Please note that the Xemu emulator only uses English keyboard layout, regardless which system layout you have chosen.

You start the Xemu emulator with a click on the appropriate icon on the desktop:

'xmega65' for MEGA65 emulation:

'xc65' for C65 emulation:

Also included in the live system are some "everyday" programs like LibreWriter, LibreCalc, Leafpad, Calculator, GHex, Geany, Midnight Commander (MC) and the latest version of Firefox (Firefox 78.0.2 - 64-bit) with the most important bookmarks. :D

Handling of the iso file:

The iso image has to be copied to an USB stick (at least 2 GB big) or burn onto a DVD. That should be no great problem in most cases. To burn a DVD you can use nearly every CD burning program. To copy the iso bootable to a USB stick you can use an image writer tool (e.g. "ISO to USB" for Windows or using "Startup Disk Creator" with Linux/Ubuntu).

After done this, plug the stick into the PC or the laptop and turn it on ... and wait until it's ready. Nearly every computer which can handle 64bit systems and can boot from external stick should be good enough for the live system. I have tested it positive with four different computers (even an old "Samsung NC10 plus" netbook works).

You can also use the iso file within a virtual environment as e.g. Virtualbox. But it will be more or less slower as using a real computer.

Please note that every change you made to the system or any file in the live system will be lost after turning the computer off. You have to copy files and disc images, you wanted permanently save and change, to an external medium (usb stick, SD card, external harddrive, ...) and save the data on them, because the files on the live system are read only.

If you need the system passwords, there are:

user: mega65

password: mega65

root password: mega65

You can read the password infos also by clicking the desktop icon "Read me".

Installation of the live system onto your computer:

If you select "Boot system installer" at the very first boot screen of the iso file, you can install the live system onto your computer. Please follow the instructions shown on the screen.

Please note that my main topic was the live system, so I don't used much time for the installer and possible issues with it. Maybe it works on your computer, maybe not. ;)

Other boot options are:

- "safe graphics mode" (try this if your computer has some graphics trouble with the live system)

- "debug mode" (shows much more details during booting for debugging what's going wrong if something's going wrong)

You can download the iso file "xemu_live_system_20200724.iso" (1,72 GB) via https from this filehoster ( link:


Don't blame me for possible slow download speed or other traffic limitations, it's up to the filehoster. :zeig:

If you want to be (nearly) sure, that you have downloaded the correct and original iso file, you can use the md5sum for control. With Windows you can use e.g. the freeware "MD5 Checksum Tool" and in Linux e.g. the shell command md5sum -b xemu_live_system_20200724.iso for it.

The md5sum of the iso file 'xemu_live_system_20200724.iso' is:

4e4c845f21ed9f1ea0e48bab3dcb04c7 *xemu_live_system_20200724.iso

If your downloaded iso file has a different md5checksum, something has gone wrong. :achtung

I have checked the content of the ISO for possible copyright issues to the best of my knowledge. If anyone have legal issues with that live system, please contact me and I will delete the iso file from the filehoster as soon as possible!

To all other guys: Have fun with the Xemu live system!