Wednesday 14 September 2022

MEGAphone R4 PCB Bring-up -- Part 2

Now that I can control the I2C IO expander that drives the power rails, its time to make sure that each power system fires up.

To do this, I am trying each power rail in turn, checking which indicator LED lights for it, and which isolation switch cuts it off: 

LED_D7 (VCC_FPGA?) stays on the whole time.
LED_D7_PS1 stays on when indicate button is pressed, except as noted below.

VCC_MODEM1 - Port 0, Bit 0 - Switch S5 (4th from top) - LED D7_PS2
VCC_MODEM2 - Port 0, Bit 1 - Switch S7 (5th from top) - LED D7_PS3
VCC_RFD900 - Port 0, Bit 2 - Switch S9 (6th from top) - LED D7_PS4 **
VCC_WIFI - Port 0, Bit 3 - Switch S8 (1st from top) - LED D7_PS6
VCC_SCREEN - Port 0, Bit 4 - no disable switch - LED D7_PS5
VCC_AMPLIFIER - Port 0, Bit 5 - no disable switch - no LED change visible
VCC_5V  - Port 1, Bit 5 - no disable switch - LED D6_PS7 **

** D7_PS1 dims or turns off completely when this

So while some are working fine, some are being a bit funny. First, the modem controls, wifi and screen power controls are all fine.  But there is some funny interaction for the RFD900 and 5V (for microphones and some other bits) with the VCC_FPGA control. Specifically, the VCC_FPGA LED dimming or turning off when VCC_RFD900 or VCC_5V are enabled is quite odd.  

The VCC_FPGA is controlled by an SN74HC74 S/R latch, so that the Xilinx FPGA can turn itself off, and the Crab or another interrupt source can turn it on. Checking with the oscilloscope, turning on VCC_5V doesn't cause the 3.3V rail to the Xilinx FPGA to be disabled -- only the indicator LED for it.

Probing the VCC_FPGA LED with the oscilloscope, I can see that the pin that goes to GND to cause the LED to light is dropping a bit when VCC_5V is enabled. Basically it is sitting at 1.8V instead of floating to 3.3V when the indicator button is not pressed.

The problem seems to be that the INDICATORS line changes between 1.8V and 3.3V, which is then breaking the assumptions about the indicator LED circuit.  It might just be that the INDICATORS line needs a pull-up or pull-down resistor to dissipate some small current leakage from somewhere.  

The INDICATORS line is floating when inactive, and tied to GND when active. But it is of course also connected to the low side of each indicator LED through a 220 Ohm resistor.  So if one or more of the power supplies are on, then there will be some leakage. Also, the leakage will be proportional to the output voltage of the rail. This means that VCC_5V will affect things differently to the rest. And the RFD900 power rail is also 5V, and that's the other one that causes the odd behaviour.

So I think we have a root-cause analysis of the problem here. The probable best solution is to put diodes on the low-side of the indicator LEDs to the INDICATOR line, so that there is no back-flow between the power-supplies, thus keeping them isolated from one another.

Next step is to check that the power rails do actually switch, by checking power pins on the various sub-system's connectors.  After that we will check the Xilinx FPGA power control system, to make sure that we can turn the FPGA on and off as required.

It's a bit fiddly to have to flip the board over and load a different bitstream all the time to change the settings, so I will do this by beginning to implement the serial control protocol between the Xilinx and Crab FPGAs, but for now routing it to the ESP32/sound module header, that it is easy for me to get to a couple of pins from.  Then I can use an FTDI USB serial cable to communicate with the board and select the power rails to turn on and off, and have it report which power rails are on and off, so that I can debug this comfortably. It will also mean that I get a jump start on this communications protocol for the complete system.

When we do setup the communications protocol with the Xilinx FPGA, we have 4 IO lines between it and the Crab.  As both have stable clocks, we can just implement two UART links.  As the Crab will be providing the interface to higher-speed services, such as WiFi and cellular data, it's probably a good idea to setup two UART interfaces, one of which is dedicated to the current high-speed serial link, and the other can carry low-speed communications and the control messages to select the high-speed link, and to control the power rails and other signals.

So let's start by getting a loop-back on the serial lines, and the FTDI cable setup, so that we have the communications channel available to us. On the Crab header pin P2_8 is the TX line to the ESP32 header, and P2_9 is the RX line.  These should correspond to GPIO_9 for TX, and GPIO_6 for RX, according to the Crab.  

A UART loopback is super easy to implement, as we just feed the input on the RX pin to be output on the TX pin. With a bit of cable spaghetti, I was able to quickly confirm I had all the pins correctly selected, by checking that the loop-back works by typing into a terminal program, and confirming that the characters are echoed back out to me:

Next stop is to implement the UART control protocol, to allow at least setting and reading the power rails.  I'll need a UART TX and RX module, and a bit of logic to implement the state-machine for processing the serial data.

First step on that path will be to implement a UART TX module, and confirm that I can receive output from it.  We want the link to be as fast as possible, so I will aim for 4Mbps while using the FTDI USB serial adapter. Once integrated into the MEGA65 core, we may be able to use an even higher data rate.  The Orange Crab is fed by a 48MHz clock, so 4Mbps is a simple divide by 12.  The MEGA65 core itself uses 40.5MHz, which doesn't divide so neatly, requiring 10.125 clocks per tick. I'll deal with that when I get to it, as there are ways to get around that.

After some fiddling, I have the UART TX working, but I did have to drop it to 2Mbps, as 4Mbps wasn't working with the FTDI cable I have here. Next step is to implement UART RX.  I'll start by implementing a simple listener for specific characters to perform specific function.

I must say, working with the open-source yosys FPGA toolchain is really nice: Synthesis is still taking less than 10 seconds for this admittedly simple design.  But given the Xilinx Vivado tools take over a minute just to write out a bitstream for a design after synthesis, the reduced time wasting is profoundly beneficial: I don't have to keep myself on task for half an hour between synthesis runs, but can rather debug things with fast iteration, just like we are all used to with normal software.

And as a result, I have the UART RX confirmed working in just a few minutes. For now, I am just having it display an identification banner, rather than doing anything specific with it.  But I can now start to implement control sequences.  

I am thinking that I will eventually shift from normal 8-bit serial characters to 9-bit ones, so that I don't have to muck about with escape characters. The result will be 11 bits per byte sent, instead of 10, so 10/11 = ~91% efficiency. This is a bit lower than the average performance if using an escape character. But as escape characters can result in worst-case performance of 50%, and are just fiddlier to manage, it seems a reasonable approach.  In particular, not having to have a state machine will be super nice. It's only a pain to work with 9-bit bytes with normal terminal software, that doesn't support them -- neither minicom nor cu on Linux support them, for example. But once I implement the other end of the protocol in the Xilinx FPGA, it won't matter.

In the meantime, I'll stick to 8-bit bytes, and just implement the control sequences using the 256 character codes that allows me. The other 256 character codes that the 9th bit will grant I will use for the serial data itself.  Also in the meantime for debugging, I can use the 2nd UART between the two FPGAs to carry the data.

So in terms of control codes, we need ones to turn on and off each of the power rails, as well as one to query the state of the various inputs. I'll start with being able to set and query the power rails.  

The first step towards this is enabling us to set and scan the values for all three IO expanders that are on the I2C bus, and verify that this is working.  Then I can start implementing the serial commands to set and clear each line, as well as providing regular updates on the state of the input pins, e.g., from the D-PAD, buttons and joystick port.

I have the initial code in place to do this, but it is currently reading $FF on all the bytes, which makes me suspect that it isn't working correctly.  There is also a funny problem where receiving characters by serial stopped working.  I might fix that one first, and then start moving forward again on debugging the I2C communications.

...Except of course when I go to test it, it seems to be suddenly working again. So maybe it was just an issue with a loose connection, or the FTDI USB serial adapter driver got confused.  Then it started happening again, and using the oscilloscope I was able to confirm the FTDI cable is not outputting any serial data, and even looping the TX and RX lines together to make a loop-back, doesn't result in any serial activity: So there is some problem with the FTDI adapter or its driver.  Disconnecting and reconnecting the USB port seems to fix it most of the time.

Right-o. So now back to the reading of the I2C busses... They are now all reading as $00 rather than $FF, but that could be because I had disconnected the battery overnight: Yes, connecting the battery makes it go to $FF again.  

So now to investigate the I2C connections.  One quick and simple test is to see if the buttons cause any of the bits to go to zero: If so, then we are probably reading the ports correctly... and there is no sign of response. So time to do some I2C captures, and see if I can see what is going wrong. I may also be able to run some simulations, although I am still hampered on that front by not being able to get the simulated I2C traffic to be able to be automatically decoded and analysed.  

In theory, sigrok should be able to decode the I2C traffic.  The PulseView GUI can open a VCD file produced by the simulation, but seems to treat it as though it is only a single sample, thus preventing analysis.  This is due to this bug, which is in turn caused by this other bug dating back to 2017. It was apparently incorporated into mainline of sigrok in 2020, so why isn't it available to me on Ubuntu 20.04? Maybe it was too early.

Time to try building libsigrok from source, and see if I can't get the support in that way.  Well, that was a great way to waste several hours: Trying to get the latest version of pulseview to run on Ubuntu 22.04 using any method at all is quite problematic. It won't build from source.  There is an AppImage image of it, which has some well documented problems with Ubuntu 22.04, which can be worked around, but then still refuses to read the VCD files. And I can't find any decent tool that will convert from VCD to PulseView's native .SR file format. 

I think it will be faster and more effective for me to make a simple I2C decoder that takes in raw SCL/SDA data, or reads them from a VCD file, and just decodes the protocol that way. I shouldn't need to resort to this, but such is life.  I can't be bothered wasting any more time doing it the "normal" way.

There are lots of good examples like this, that describe how I2C works.  The protocol itself is fairly simple.  It hasn't taken me long to get my i2cdecoder to work to the point where I can see something like this:

$ make i2cdecode  && ./i2cdecode test.vcd
make: 'i2cdecode' is up to date.
INFO: Found module scope 'testbed'
INFO: SCL signal is indicated by !
INFO: SDA signal is indicated by "
INFO: Found module scope 'blink'
INFO: SCL signal is indicated by !
INFO: SDA signal is indicated by "
INFO: Found module scope 'i2c_master'
INFO: SCL signal is indicated by !
INFO: SDA signal is indicated by "
INFO: Found module scope 'xilinx_uart0_rx'
INFO: Found module scope 'xilinx_uart0_tx'
DEBUG:            14440 I2C START
DEBUG:            19200 I2C START CLOCK LOW
DEBUG:           182720 I2C byte = $20 (nack=1)
DEBUG:           355880 I2C byte = $00 (nack=1)
DEBUG:           529040 I2C byte = $00 (nack=1)
DEBUG:           553160 I2C RE-START
DEBUG:           557920 I2C START CLOCK LOW
DEBUG:           721440 I2C byte = $21 (nack=1)
DEBUG:           894600 I2C byte = $ff (nack=0)
DEBUG:          1067760 I2C byte = $ff (nack=1)
DEBUG:          1091880 I2C RE-START
DEBUG:          1096640 I2C START CLOCK LOW
DEBUG:          1260160 I2C byte = $20 (nack=1)
DEBUG:          1433320 I2C byte = $04 (nack=1)
DEBUG:          1606480 I2C byte = $00 (nack=1)
DEBUG:          1779640 I2C byte = $00 (nack=1)
DEBUG:          1803760 I2C STOP


This has already revealed the first bug: The I2C address, i.e., the first byte after a START condition, should be $40 -- $45, depending on the I2C IO expander, and if we are reading or writing (read addresses are the write address + 1).  When I added support for the multiple IO expanders, I messed up the selection and shifting of the upper part of the I2C address.  With that fixed, now I see something like this:

DEBUG:            14440 I2C START
DEBUG:            19200 I2C START CLOCK LOW
DEBUG:           182720 I2C byte = $48 (nack=1)
DEBUG:           355880 I2C byte = $00 (nack=1)
DEBUG:           529040 I2C byte = $00 (nack=1)
DEBUG:           553160 I2C RE-START
DEBUG:           557920 I2C START CLOCK LOW
DEBUG:           721440 I2C byte = $49 (nack=1)
DEBUG:           894600 I2C byte = $ff (nack=0)
DEBUG:          1067760 I2C byte = $ff (nack=1)
DEBUG:          1091880 I2C RE-START
DEBUG:          1096640 I2C START CLOCK LOW
DEBUG:          1260160 I2C byte = $48 (nack=1)
DEBUG:          1433320 I2C byte = $04 (nack=1)
DEBUG:          1606480 I2C byte = $00 (nack=1)
DEBUG:          1779640 I2C byte = $00 (nack=1)
DEBUG:          1803760 I2C STOP


That looks better. Now to give it a quick test, since that only takes a minute or so with the open-source yosys FPGA synthesis tools...

That's a nice step forward: I now see something like this from the UART info:

MEGAphone CTL0 07000000003f5

What's even better, is that if I press either of the buttons on the S3 switch, I can see the change in the 2nd hex digit of that string to something like:

MEGAphone CTL0 06000000003f5

This means that we are now reading the ports correctly, and presumably that I am setting the ports up at least half-way correctly.  The other buttons and D-PAD don't however, cause any change. So I'll have to check that I have those correctly setup in terms of data direction.

To help, I'll improve my I2C decoder to show the actual I2C register reads and writes, so that I can have increased confidence that it is doing things properly, or indeed, to show up where it is doing things incorrectly.  I also had to extend the run-time of the simulation, to capture the complete rotation of I2C accesses that we perform: This is because it rotates through all three I2C IO expanders, and for each of those, it rotates through which registers it writes during the setup phase.  After that, it settles into a constant loop of just writing the port outputs to them. 

The start of the sequence looks like this in the capture:

// On the first IO expander, write to register 0, read registers 1 and 2, and set registers 4 and 5 to $00 and $00. 

INFO: Device $24, WRITE reg $00 <= $00
INFO: Device $24,  read reg $01 == $ff
INFO: Device $24,  read reg $02 == $ff
INFO: Device $24, WRITE reg $04 <= $00
INFO: Device $24, WRITE reg $05 <= $00 

Okay, so at this point, we already have a problem: That write to register 0 should not happen: It should select register zero, but not write to it, so that when the reads occur, they are of registers 0 and 1. I believe the problem is that there is some funny behaviour when our I2C master switches from write to read, that it can sometimes perform an additional action. I'll have to have a think about how to fix that.  But, again, having a nice simple decode of the I2C traffic from simulation is making this all much easier to get to the bottom of.

So with simulation, I have confirmed that the register number is written twice during the write-to-read sequence, thus causing the problem.  The question now is how to fix it.

Oddly, this problem happens only on the very first I2C transaction we do.  After that, it correctly writes the value only once.  Given the register that gets written to is register $00, which is a read-only port, we can safely ignore the problem.  So let's look at the rest of the stream:

First, we have our erronous write to a read-only register:

INFO: Device $24, WRITE reg $00 <= $00

This then causes the following reads to read from incorrect registers, which will get sorted out next time we read from them, so we can also ignore this:
INFO: Device $24,  read reg $01 == $ff
INFO: Device $24,  read reg $02 == $ff

Then we write $00 into registers 4 and 5 (polarity inversion bits for the IO pins), which happens correctly:

INFO: Device $24, WRITE reg $04 <= $00
INFO: Device $24, WRITE reg $05 <= $00

Now as this isn't the very first I2C transaction, we don't see the erronous write to register 0, and instead we correctly read registers 0 and 1 from the second IO expander (not the device ID has been incremented by one):

INFO: Device $25,  read reg $00 == $ff
INFO: Device $25,  read reg $01 == $ff

We then clear the polarity inversion for this 2nd IO expander:

INFO: Device $25, WRITE reg $04 <= $00
INFO: Device $25, WRITE reg $05 <= $00

Now we are up to the 3rd IO expander, and again, we read those first to registers correctly:

INFO: Device $26,  read reg $00 == $ff
INFO: Device $26,  read reg $01 == $ff

And also clear its polarity inversion bits correctly:

INFO: Device $26, WRITE reg $04 <= $00
INFO: Device $26, WRITE reg $05 <= $00

So now we have the 2nd round where we write the direction bits for each of the IO ports. A 1 means input, and a 0 means output.  Again, the first action for each IO expander is to read the input ports:

INFO: Device $24,  read reg $00 == $ff
INFO: Device $24,  read reg $01 == $ff

Then we set those direction bits, making most of the pins on the first IO expander be inputs:

INFO: Device $24, WRITE reg $06 <= $c0
INFO: Device $24, WRITE reg $07 <= $df

Then we have the 2nd IO expander, again, reading the input register and then writing to the data direction registers. This time, most of the pins are outputs:

INFO: Device $25,  read reg $00 == $ff
INFO: Device $25,  read reg $01 == $ff
INFO: Device $25, WRITE reg $06 <= $c0
INFO: Device $25, WRITE reg $07 <= $40

And then finally the 3rd one, which we set to be mostly inputs:

INFO: Device $26,  read reg $00 == $ff
INFO: Device $26,  read reg $01 == $ff
INFO: Device $26, WRITE reg $06 <= $bf
INFO: Device $26, WRITE reg $07 <= $ff

Once this has been done, we now enter a loop, where for each IO expander, we read the pins, and update any write values that should be written to them:

INFO: Device $24,  read reg $00 == $ff
INFO: Device $24,  read reg $01 == $ff
INFO: Device $24, WRITE reg $02 <= $00
INFO: Device $24, WRITE reg $03 <= $00

Which it then does in endless loop for the 3 IO expanders:

INFO: Device $25,  read reg $00 == $ff
INFO: Device $25,  read reg $01 == $ff
INFO: Device $25, WRITE reg $02 <= $00
INFO: Device $25, WRITE reg $03 <= $40
INFO: Device $26,  read reg $00 == $ff
INFO: Device $26,  read reg $01 == $ff
INFO: Device $26, WRITE reg $02 <= $bf
INFO: Device $26, WRITE reg $03 <= $ff
INFO: Device $24,  read reg $00 == $ff
INFO: Device $24,  read reg $01 == $ff
INFO: Device $24, WRITE reg $02 <= $00
INFO: Device $24, WRITE reg $03 <= $00

So, this all looks fine, provided I have the correct direction flags on all of the IO expanders.  I'm guessing I don't, which is why only the S3 buttons can be read.

This just leaves the mystery of why its behaving oddly.

Ok, a bit of further prodding, reveals that both the S2 and S3 buttons can be read, oddly:

S3 buttons are read in "IO expander 0, port 0" bits 0 and 1,
S2 buttons are read in "IO expander 2, port 1" bits 4 and5

When they should be "IO expander 2, port 1" and "IO expander 2, port 0", respectively. So they are getting rotated by one position.  That's easy enough to rotate back around. The real question is why the D-PAD lines aren't floating high. These should be in the lower 4 bits of the same port as contains the S2 buttons, and correctly have pull-up resistors:


The most logical explanation for this, is that I have somehow set the port to output on those pins, instead of to input. So a quick and dirty test will be for me to change the DDR on all the ports of all the IO expanders to input, and see what happens. I can also probe the pins of the IO expanders to see if they really are low or not.

Setting them all high hasn't solved the problem. So I must have some other issue with the I2C setup, or with the wiring.  I need to sleep now, but I think the next check will be confirming the voltage from the buttons is on the pins of the IO expanders where it should be, so that I know the problem is then just limited to the I2C communications.

So I have confirmed that the voltages are fine (with all the lines set to input), and swing as required. This means it is something with the I2C communications. I am suspecting the issue with the bytes being rotated around from where they should be, as possibly being related to the real issue. And now that I am looking at the correct outputs, I can see that the D-PAD is also working fine. The joystick port inputs aren't working properly, though. But that's most likely because I have the 5V rail turned off, so the voltage level convertor isn't running. I'll check that, as I go through the process of getting all the port direction initialisations correct, since we know that at least some of that is still wrong, since the D-PAD etc were not working when I had them set as I expected they should need to be.

Right, so I have fixed the time at which I sample the read values, and now I have both bytes for each IO expander being sampled at the correct time. The only issue is that IO expander 0's bytes are showing up in IO expander 2. I finally realised the reason for this: The IO expanders are numbered U12 -- U14, but their I2C addresses are in descending rather than ascending order. This means I was setting up the wrong DDR values etc for them all. Well, actually, just the 1st and last were swapped.  So now to fix that...

So we are now finally at the point where the IO expanders seem to be under sane control, and we can read all the various inputs.  The next step then is to add support for setting the various outputs.

We have quite a few of these that we need to care for:

      // Default output settings
      power_rail_modem1 <= 1'b0;
      power_rail_modem2 <= 1'b0;
      power_rail_rfd900 <= 1'b0;
      power_rail_esp32 <= 1'b0;
      power_rail_screen <= 1'b0;
      power_rail_speaker_amplifier <= 1'b0;
      lcd_standby <= 1'b0;
      modem1_wake_n <= 1'b0;
      power_rail_mic <= 1'b0;
      cm4_en <= 1'b0;
      cm4_wifi_en <= 1'b0;
      cm4_bt_en <= 1'b0;
      lcd_display_en <= 1'b0;
      lcd_backlight_en <= 1'b0;
      esp32_reset_n <= 1'b0;
      hdmi_hotplug_detect_enable <= 1'b0;
      hdmi_en <= 1'b0;
      hdmi_rx_enable <= 1'b0;
      modem1_reset_n <= 1'b0;
      modem2_wake_n <= 1'b0;
      modem2_reset_n <= 1'b0;
      modem2_wireless_disable <= 1'b0;
      modem1_wireless_disable <= 1'b0;
      power_rail_headphone_amplifier <= 1'b0;
      hdmi_hotplug_detect <= 1'b0;
      hdmi_cec_a <= 1'b0;
      otp_hold_n <= 1'b0;
      otp_reset_n <= 1'b0;
      otp_cs2 <= 1'b0;
      otp_cs1 <= 1'b0;
      otp_wp_n <= 1'b0;
      otp_si <= 1'b0;

There are in fact 32 of them.  The easiest approach is to just have 32 chars to enable them, and another 32 chars to clear them.  Perhaps the easiest here is to use $40 -- $5F and $60 -- $7F as the two ranges, this will mean to turn on we can use the characters @A..Z[\]^_ and to turn off, the characters `a..z{|}~ and DEL.  Only DEL will be a bit of a pain to type while testing, which is fine, as I'll assign that to otp_si, which is only needed when we get to testing the One Time Pad SPI flash.

Well, that was fairly easy in the end: I have also set it up so that it notices when ever a signal changes, and reports that by repeating the state message, so now if I press H to turn on the 5V supply to the joystick port, and then h to turn it off again, I see something like this:

MEGAphone CTL0 000000003fff0
MEGAphone CTL0 000000003f071

We can see those extra 5 bits at the end light up when the 5V is on.  This makes me want to test a joystick with it now, as it will also update automatically as I use the joystick:

Yes, I can now mash the joystick, and it shows up with very low latency:

MEGAphone CTL0 000000003fff2
MEGAphone CTL0 000000003fdf3
MEGAphone CTL0 000000003fff4
MEGAphone CTL0 000000003fef5
MEGAphone CTL0 000000003fff6
MEGAphone CTL0 000000003f7f7
MEGAphone CTL0 000000003fff8
MEGAphone CTL0 000000003fbf9
MEGAphone CTL0 000000003fffa
MEGAphone CTL0 000000003ff7b
MEGAphone CTL0 000000003fffc
MEGAphone CTL0 000000003ff7d
MEGAphone CTL0 000000003fb7e
MEGAphone CTL0 000000003f97f

And of course a picture of a joystick connected to the board, just so that you can see what a joyful horror this all is:

Now I just need to verify that each and every power rail can be controlled, which should now not take very long to do, but that can wait for the next blog post.



No comments:

Post a Comment