Sunday, December 10, 2017

Automatic 4502 / 6502 Instruction Set Switching

We have known for a long time, that we need to support 6502 illegal opcodes on the MEGA65.  Initially, we thought that this would affect only a very small percentage of C64 software, however, it seems that a reasonable fraction of software has trouble with illegal opcodes. Perhaps it is one of more of the common decrunch routines.  Or it could just be that we have some subtle bug in our 4502 implementation that means some 6502 instructions sometimes go astray -- even though we pass the runnable 6502 instruction test suite for all official op-codes.

In any case, we know we need to add "6502 mode" to our CPU.  In fact, the bulk of the work was done quite some time ago, but I am only just now getting around to testing it.

Give or take getting the precise behaviour of the illegal op-codes correct, it wasn't too hard to add a second personality to the CPU: It is just some of the instruction fetch and decode logic that needed to be duplicated. Then the CPU needed a flag to indicate which mode to be in.

Then it should just be a case of working out when we are in C64 mode versus C65 mode, and setting the CPU mode accordingly, right? Unfortunately not.

This is because the C65's C64-mode KERNAL ROM uses 4502 opcodes to work out whether it should talk to the internal 1581/1565 drive, or to a drive on the IEC bus.

There are a few key parts of the routine we need to worry about (this is from t he 910111 version of the C65 ROM):

$F72C - Context switch to C65 DOS
$F83E - Context switch back from C65 DOS on return from DOS call

The context switch to C65 DOS and back are fairly similar, and worth a quick look. First, context switching to the C65 DOS:

F72C   78         SEI
F72D   48         PHA
; C65 IO / VIC-III mode enable sequence
F72E   A9 A5      LDA #$A5      
F730   8D 2F D0   STA $D02F
F733   A9 96      LDA #$96
F735   8D 2F D0   STA $D02F     
; set bit 6 in $D031 to put CPU at 3.5MHz
F739   A9 40      LDA #$40
F73A   0C 31 D0   TSB $D031     
; bank in $C000 interface ROM and remove CIAs from IO map
; so that 2KB of colour RAM is visible $D800-$DFFF
F73D   A9 21      LDA #$21
F73F   0C 30 D0   TSB $D030
; Save registers from C64 mode, so that they can be restored
; on return
F742   68         PLA
F743   8D F6 DF   STA $DFF6
F746   8E F7 DF   STX $DFF7
F749   8C F8 DF   STY $DFF8
F74C   9C F9 DF   STZ $DFF9
; Now pull the return address from the stack, and save that, too.
F74F   68         PLA
F750   8D FB DF   STA $DFFB
F753   68         PLA
F754   8D FC DF   STA $DFFC
; Remember what the stack pointer was
F757   BA         TSX
F758   8E FF DF   STX $DFFF
; Rearrange memory map:
; Map $0000-$1FFF to $10000-$11FFF 
; Map $8000-$BFFF to $20000-$23FFF 
; (C64 KERNAL stays visible at $E000-$FFFF)
F75B   A9 00      LDA #$00
F75D   A2 11      LDX #$11   ($0000+$10000)
F75F   A0 80      LDY #$80
F761   A3 31      LDZ #$31   ($8000+$18000)
F763   5C         MAP        ; activate new map
; We are now in C65 DOS memory map
; Set stack pointer to $1FF
F764   A2 FF      LDX #$FF
F766   9A         TXS
; Load the saved return address, and put it into the C65 DOS
; stack
F76A   48         PHA
F76E   48         PHA
; Restore all the saved registers
F76F   AD F6 DF   LDA $DFF6
F772   AE F7 DF   LDX $DFF7
F775   AC F8 DF   LDY $DFF8
; Return to the return address that had just been copied to this stack
F77B   60         RTS

The general procedure of this routine is quite interesting. The last few bytes of colour RAM (it is 2KB long on the C65, and is visible $D800-$DFFF in when the CIAs are banked out of the way) are used as a scratch transfer area.  The contents of the registers are saved, so that they are available to the DOS function that has been called. The only piece of mild gymnastics going on is the way that the return address of the caller is copied from the C64 stack to the C65 DOS stack.  The C64 KERNAL is kept visible, so that the RTS will continue in the C64 KERNAL routine at that point, which then makes the indirect jump into the C65 DOS to do the necessary work.  There is no risk of interrupts happening while in this mode, as IRQs and NMIs are both disabled by the MAP instruction, until a NOP instruction is executed.

This routine takes 21 ~1MHz clock cycles (26 including the JSR) before the CPU is switched to 3.5MHz.  Then 85 more clock cycles at 3.5MHz. The total cost of switching to the C65 DOS context is thus 50 micro seconds.

The return routine is similar, essentially reversing the process, although the handling of the two different return addresses at $DFFB/C versus $DFFD/E is still not entirely clear to me.

; Pop return address of caller and save, ready for copying to C64
; stack.
F83E  68          PLA
F842  68          PLA
F843  8D FE DF    STA $DFFE
; restore C64 memory map and stack pointer for the original
; C64-context caller (61 cycles at 3.5MHz)
F846  20 7C F7    JSR $F77C
; clear bit 7 in $C0, indicating not a C65 internal drive
F849  77 C0       RMB7 $C0  
F84B  6B          TZA
F84C  10 0C       BPL $F85A
F84E  A9 00       LDA #$00
; set bit 7 in $C0, indicating a C65 internal drive
F850  F7 C0       SMB7 $C0 
; Push return address back onto C64 stack 
F855  DA          PHX
F859  DA          PHX
; Set bits in $90 (status) if required from A
F85A  04 90       TSB $90   
; Restore CPU registers
F85C  AE F7 DF    LDX $DFF7
F85F  AC F8 DF    LDY $DFF8
F862  AB F9 DF    LDZ $DFF9
F865  AD F6 DF    LDA $DFF6
; bank out $C000 ROM and bank CIAs back in.
F868  48          PHA
F869  A9 21       LDA #$21
F86B  1C 30 D0    TRB $D030
; return CPU to 1MHz. 
F86E  A9 40       LDA #$40
F870  1C 31 D0    TRB $D031
; return to VIC-II mode
F873  8D 2F D0    STA $D02F
; Re-restore Accumulator
F876  68          PLA
; Re-enable IRQ & NMI after MAP change made in call to $F846
F877  EA          EOM       
F878  58          CLI
F879  18          CLC
F87A  60          RTS

Because the C65 IO mode and 3.5MHz CPU mode is already active, the cost is 56 cycles at ~3.5MHz = ~16 micro seconds for the call to $F77C, plus 70 cycles at 3.5MHz (= ~20 micro seconds) and 15 cycles at ~1MHz for the routine itself.  The total cost for switching from C64 mode to C65 DOS and back again -- without actually doing any work is thus 50 microseconds for the switch to the C65 DOS context, plus 16 + 20 + 15 = 51 microseconds to switch back, i.e., ~0.1 millisecond.  (I have counted both code paths for the stack restoration, as I am not entirely sure what is going on there.  However, that only makes about 5.5 microseconds difference.)

This convoluted process explains why the C65 DOS is so incredibly slow, achieving less than 2KB/second, even though the internal floppy drive can theoretically read at 30KB/sec, as there is a complete and time-consuming context switch whenever a byte is read from the internal floppy drive. I've been tempted to patch the C65 DOS to either support an efficient LOAD system call, or to implement some sort of buffering for sequential reads.  I've even thought about teaching the CPU about these context switch routines, and basically having a special case for them, that does the context switch in just one or a few cycles. However, that is a job for another day. In the meantime, on an M65, running the CPU at 50MHz is an easy interim solution, as loading can happen at some tens of KB/second, even with this ineffeciency.

Meanwhile, back on the topic of CPU personality selection, those two routines are not particularly troublesome, as they only use 4502 opcodes after using $D02F to enable C65 / VIC-III IO mode, which we can use as a reliable clue to switch the CPU to 4502 mode.  This gives us our first rule:

(1) If the C65 / VIC-III (or M65 / VIC-IV) IO mode is enabled, the CPU should always be in 4502 mode.

The routines to enter the various DOS calls are, however, a little troublesome.   They are all very similar, so I present just one of them here as an example:

F7E4 FF C0 09 BBS7 $C0,$F7F0 ; Is current device C65 DOS?
F7E7 20 C7 ED JSR $F72C ; Context switch to C65 DOS memory map
F7EA 22 0A 80 JSR ($800A) ; Call C65 DOS TALK routine
F7ED 20 3E F8 ; Context switch back to C64 memory map
F7F0 4c C7 ED JMP $EDC7 ; Send TALK on IEC bus in all cases

The instructions in bold are 4502 instructions.

Opcode $22 is normally a KIL instruction on the 6502, so we could safely make that do the new indirect JSR at all times, without great risk. $FF is normally ISC $nnnn,X, where ISC is the combination of INC and SBC. I have no idea if people find uses for that opcode.  However, we don't need to worry about that, because these opcodes only exist in the C64-mode KERNAL on the C65.  Thus, we can simply switch the CPU to 4502 mode whenever executing code in the KERNAL in C64 mode.  This gives us our second rule:

(2) When executing code in the KERNAL (ROM at $E000-$FFFF), the CPU should always be in 4502 mode.

So, if the CPU were in 6502 mode in C64 mode, then these two rules would ensure that the C65 internal DOS would work.  So that's good.

Now the trick is we just need to work out when we are in C64 mode and when we are in C65 mode.

First, a C65 starts up in C64 mode, and then escalates to C65 mode if it doesn't see why it should stay in C64 mode. That is, the machine always starts in C64 mode.  Fortunately, the switch to C65 mode does enable C65 / VIC-III IO mode, so it is possible that we need no further rules.

Finally, when the Hypervisor is running, the CPU should always be in 4502 mode.

(3) When executing code in Hypervisor mode, the CPU should always be in 4502 mode.

So, in theory at least, that should be all we need to automatically set the CPU personality, in a way that maximises compatibility, i.e., where C64 programs get a 6502-compatible CPU, unless they ask for something different, but the C65's C64-mode KERNAL runs in 4502 mode, so that the rather ugly DOS inter-process communications can still happen.

The present state of play, is that this is all implemented, but not yet enabled by default or tested on the MEGA65.  This is one of the jobs on my list for the coming week.

Thursday, November 23, 2017

1351 Mouse Progress: Accuracy better than the original

Here is the latest update from Daniël's work on making a 1351-compatible mouse using an Arduino, which was achieved a few weeks ago, but I have been tied up (not literally) in remote tropical jungle (literally) for my work.

Wiring the Arduino to the C64

Time to start connecting the Arduino PWM outputs to the C64. The PWM outputs of the Arduino cannot be directly connected to the C64, some components in between are necessary. At this point, plugging stuff directly into the Arduino became impractical, so I spent quite a few hourrs searching where I had left my breadboard. Luckily I found it, and work could continue.
First, we need some resistance to control the speed that the capacitors in the C64 charge, and to limit the current output the output pins. In order to determine the right restistance experimentally I used two potmeters of 0-1000 Ω. This is much lower than the 1351 uses, but my experimentation did show that in order to be able to transmit low values, it is necessary to be able to charge the SID's capacitors fast, and I don't see any objection against fast charging, as long as the currents remain reasonable.
While experimenting I was also remebered that the PWM outputs have the properly they pull down the line when they are low: In other words, they pull the charge out of the C64's capacitors and they pull strongly. This is undesired, as this prevents detection of the SID sync pulses, and therfore we need a diode to prevent this.
In my box of electronic components I have two types of diodes: 1N4007 and BAT43 (a Schottky diode). If we need to able to charge fast, a low voltage drop might come handy, so I went for the BAT43. If you look at the voltage drop, the BAT43 is one of the best diodes that you can buy at good prices as a hobbyist.
After some experimentation I had my design ready, and it looks as follows:

I've recreated it in Fritzing, so you can look at it in more detail. Breadboard view:

Schematic view:

With the circuitry in place, it should be possible to send values from the Arduino to the C64. I typed the following program on the Commodore 64:

10 POKE $DC0C,1
20 PRINT PEEK($D419)
30 GOTO 20

The first line disables the CIA timer interrupt on the C64. This is necessary because the CIA line that controls the analog multiplexer that connects either control port 1 or control port 2 to the SID, shared its control lines with the keyboard matrix select lines. The C64 timer interrupt modifies this line in order to scan the keyboard, but as a result also switches between control port 1 or control port 2 POTX lines. Because this happens in the middle of SID read cycles, the C64 interrupt causes noise to read on the C64.

Because disabling the timer interrupt makes the C64 keyboard inoperable, you cannot interrupt above program once started other than with a reset button. Because of this it is highly recommended to use a cartridge that can recover a BASIC program after reset. I am using Final Cartridge III. A KCS Power Cartridge will do the trick as well. To interrupt, just reset and type “OLD” for the FC3, or “UNNEW”for the KCS. Both cartridges extend BASIC with $ for hexadecimal, which is why I am conveniently using hexadecimal numbers in my program.

Yes, it did work! However, the results did need calibration and they were still noisy. I still had to do some programming to get it correct.

Reducing noise and calibrating the thing

The initial results were way too noisy in order to be usefull. It did take some analysis to see where the nosie came from. I found 3 sources. First, there was noise on the 5V line when the Arduino was powered from USB. Powering it from the C64 did show much more stable results. Some people might be tempted to think that this is because of the linear power supply of the C64, but no, those original power supplies should not be used anymore. For my experiments here I use a modified power supply with switching regulators. My laptop could be powered by battery and still the USB 5V power supply was much more noisy than the C64 5V power. While not a problem for the final product, I did need to connect the USB for uploading sketches. It turns out that the 5V noise causes some inaccuracy on when the comparator did exactly trigger. Through experimentation I found out that modifying my voltage divider from 100k/15k to 100k/33k resistors did reduce this jitter a lot, evading the problem.
Another noise source was found in the Arduino's timer interrupt. By default the Arduino's timer0 generates interrupts in order to support library functions such as delay(). However, if a comparator interrupt is triggered while the timer interrupt is being handled, there before the comparator interrupt handler gets executed. To eleminate this noise I did disable timer0. This renders all timing functions inside the Arduino runtime library unusable. All timing needs to be done with our own timer: The SID synchronisation pulse.
A third source of noise was found in timer1 itself: Because the Arduino 16MHz clock is divided by 8 in order to get to 2 MHz, after setting the timer1 counter to 0, there can be 1-8 Arduino cycles before it is increase to 1. Luckily the timer prescaler can be reset, so besides setting the timer counter to 0, we also need to reset the prescaler. i.e. our IRQ handler should look like this:

TCNT1 = 0;

After these adjustments, the values read on the C64 were quite stable, only they werent correct, we need to calbrate stuff. Now because the PAL C64 runs at 985 KHz and the NTSC C64 runs at 1023 KHz, this calibration is PAL/NTSC dependent. Therfore I did add a small check to the IRQ handler:

commodore_is_pal = (TCNT1 < 512);

It's quite simple: The PAL C64 runs slower than 1MHz, so the timer will have overflown when the comparator interrupt occurs, the value in the timer counter should be low when the interrupt occurs. The NTSC C64 runs faster than 1 MHz, so the timer counter will not yet have overflown when the interrupt occurs, the counter should contain a high value. So a simple check low or high can distinguish between PAL and NTSC.

Through experimentation I found that if I set my potmeters to 340 Ω and do the following adjustments in software:

void set_potx(u8 potx) {
/* Our timer runs at 2000KHz while a PAL C64 runs at 985 KHz.
This means that we need approximately 2 timer cycles for 1 C64 cycle, so multiply
by 2 */
u16 d;
d = 0x1f2 + 2 * potx;
/* However, because the difference is not exactly 2, this means our pulses would be slightly too short.
Compensate. */
if (commodore_is_pal) {
if (potx>=24) d++;
if (potx>=48) d++;
if (potx>=87) d++;
if (potx>=121) d++;
if (potx>=156) d++;
if (potx>=187) d++;
if (potx>=223) d++;
if (potx>=251) d++;
} /* TODO NTSC */


... I was able to get perfect results! This means that I am able to transmit values in the full range of 0 to 255 to the C64, with no noise at all. The real 1351 can only transmit values in the range of 64-191 and the lowest bit is always noise, so this is a notable achievement. It is also not possible to get the full range of values with analog paddles, Commodore's paddles get you results from about 20 to 235.

Here are some screenshots with a value of 2 and a value of 64:

...with a real 1351 you will see the noise in the least significant bit on the screen.

The oscilloscope view for both situations (green measured on POT line, yellow on pulse line before any components):

Full Arduino program

The following program counts POTX and POTY from 0 to 255 in a loop. Arduino pin2 can be used to pause the process: Put a wire between pin 2 and ground and the counting stops. You can watch the program do its work by reading registers $D419 and $D41A on the Commodore 64.


Commodore 1351 mouse simulation code for Arduino

Written by Daniël Mantione

volatile unsigned long sid_measurement_cycles=0;

u8 posx=0;
u8 posy=0;

bool commodore_is_pal = true;

/* The SID has started its discharge cycle. We now need to synchronize the PWM, but first we do a
PAL/NTSC check.
This is highly time critical code: Any modifications here before the timer is set to 0 will
require adjustment of the PWM offset in set_potx/set_poty. */
/* This is untested on NTSC! A PAL system runs at less than 1MHz and therefore 512 SID cycles
will take longer than our 1024 timer cycles. Therfore we expect a low timer counter value when
the interrupt occurs. An NTSC system runs at higher than 1MHz and therefore 512 SID cycles
will take shorter than our 1024 timer cycles. Therefore we expect a high timer value when the
interrupt occurs. */
commodore_is_pal = (TCNT1 < 512);
/* In order to synchronize the PWM with the SID we will reset the clock prescaler of the
microcontroller and reset timer 1 by writing 0 to its counter: */
TCNT1 = 0;
/* Timer has been reset, so end of time critical code. */

void setup_comparator() {
(0<<ACD) | // Analog Comparator: Enabled
(0<<ACBG) | // Analog Comparator Bandgap Select: AIN0 is applied to the positive input
(0<<ACO) | // Analog Comparator Output: Off
(1<<ACI) | // Analog Comparator Interrupt Flag: Clear Pending Interrupt
(1<<ACIE) | // Analog Comparator Interrupt: Enabled
(0<<ACIC) | // Analog Comparator Input Capture: Disabled
(1<<ACIS1) | (0<ACIS0); // Analog Comparator Interrupt Mode: Comparator Interrupt on Falling
// Output Edge
pinMode(6, INPUT); //Avoid interfering with comparator
pinMode(7, INPUT); //Avoid interfering with comparator

void setup_timer1() {
/* For generating the pulses for POTX/POTY we will use timer 1 of the Atmega328p. This is a 16-
bit timer, which allows for high precision.*/
TIMSK1 = 0;
/* We set the clock source to none, so the timer does not run while we adjust it. The WGM12 bit
is to already select the right timer mode, otherwise OCR1A/ORC1B cannot be set correctly (read
on). */
TCCR1B = _BV(WGM12);

/* Activate PWM on Arduino pin 9 and 10. The PWM pin is high when the counter is higher than
MATCH. Select Fast PWM mode 7. */
TCCR1A = _BV(COM1A1) | _BV(COM1A0) | _BV(COM1B0) | _BV(COM1B1) | _BV(WGM10) | _BV(WGM11);
TCNT1 = 0x000;

/* WGM12=1 to select Fast PWM mode 7. CS11 selects a clock source of clkio / 8, which results in
2MHz. A timer clock of 2MHz combined with a range of 0-1023 is acceptable for our purposes. By
selecting a clock the timer starts counting */
TCCR1B = _BV(WGM12) | _BV(CS11);

void setup() {
/* Arduino timer 0 is used by default to support timing functions like delay(). Its interrupt
handler may delay the analog comparator interrupt and thus cause noise. Therefore switch it
off. */
digitalWrite(LED_BUILTIN, LOW);

void set_potx(u8 potx) {
/* Our timer runs at 2000KHz while a PAL C64 runs at 985 KHz.
This means that we need approximately 2 timer cycles for 1 C64 cycle, so multiply
by 2 */
u16 d;
d = 0x1f2 + 2 * potx;
/* However, because the difference is not exactly 2, this means our pulses would be slightly too short.
Compensate. */
if (commodore_is_pal) {
if (potx>=24) d++;
if (potx>=48) d++;
if (potx>=87) d++;
if (potx>=121) d++;
if (potx>=156) d++;
if (potx>=187) d++;
if (potx>=223) d++;
if (potx>=251) d++;
} /* TODO NTSC */

void set_poty(u8 poty) {
/* Our timer runs at 2000KHz while a PAL C64 runs at 985 KHz.
This means that we need approximately 2 timer cycles for 1 C64 cycle, so multiply
by 2 */
u16 d;
d = 0x1f2 + 2 * poty;
/* However, because the difference is not exactly 2, this means our pulses would be slightly too short.
Compensate. */
if (commodore_is_pal) {
if (poty>=24) d++;
if (poty>=48) d++;
if (poty>=87) d++;
if (poty>=121) d++;
if (poty>=156) d++;
if (poty>=187) d++;
if (poty>=223) d++;
if (poty>=251) d++;
} /* TODO NTSC */

void loop() {
int s;
if (digitalRead(2)) {
posx ++;
posy ++;
/* Because timer 0 is stopped, we cannot use the normal delay() routines, so we have to
delay a different way. */
while (s==sid_measurement_cycles>>8)

The input voltage

There is one final thing to worry about. The program above gives the exact right result when the Arduino is powered from USB. When powered from the C64, the values are off by 1. It turns out that the cause is the voltage: My voltage meter measures 5.11V on the Arduino when powered from USB, just 4.91V when powered from C64.

Therefore, in order to generate really 100% perfect results, it is necessary to adjust for this. I think this can be done by burning a little bit of the voltage with a zener diode like this:

The Arduino outputs voltages of approximately 5V. The zener diode burns any voltage higher than 4.7V away into heat. The BAT43 has an official voltage drop of max. 0.33V, but my measurements show about 0.2V drop. This means that any supply voltage higher than about 4.9V is burned away, probably good enough for our purposes. We should be able to charge the SID capacitors fast enough with 4.7V and knowing it is exactly 4.7 should eliminate the uncertainty about the exact input voltage.

The next standard zener value below 4.7V is 4.3V. Here I have more doubts that we are able to charge fast and high enough, but if so, this allows even more tolerance in the input voltage and more choice in diodes.

I think will be a good idea to design the Megastick PCB with room for both a zener diode and potmeter: It is always possible to remove the zener later and replace the potmeter with a fixed resistor, but at least initially it will be very useful to do some manual adjustments in order to get exactly the right value. 0-500 Ω is probably better for accuracy purposes than the 0-1000 Ω that I used.

I was thinking about using a MOSFET as an alternative, so you can avoid the diode and its voltage drop:

... but this probably doesn't work: While the POT line is charged, the voltage between gate and drain drops, causing the the MOSFET to turn itself off, likely too early.

Friday, September 8, 2017

That recent C65 sale on ebay

Just a quick post while I am in Vanuatu for work.

Many of you have probably already seen the recent ebay sale of a Commodore 65.

What is interesting here is that this unit is incomplete, lacking the video chip, without which it doesn't work, and yet it still sold for US$18K !

Meanwhile, we continue to chug away in the background on the MEGA65, which we can assure you will be cheaper to buy than that :)

Once this current spate of trips to Vanuatu and beyond for work settle down, I hope to make some more progress on keyboard and case moulding solutions.

Thursday, August 24, 2017

1351 Mouse Progress: Detecting when the SID chip drains the POT lines

The following is all from Daniël Mantione, who is doing great work on working out how to emulate a 1351 mouse using an Arduino, which we intend to use to make a solid-state joystick which will also work as a proportional 1351-compatible mouse:


The comparator

It turns out that the analog comparator on the Arduino is located on digital pins 6 and 7. It does not share any functionality with the analog ADC pins.
Two pins are necessary: The comparator compares the voltage on both pins, one cannot compare with a voltage specified in software. The internal resistance of the Arduino is incredible, pins configured as inputs have a resistance of many megaohm, so a resistor from 5V to the pin doesn't really work, you need two resistors to create a voltage divider.
I took my box of electronic components and after a bit of searching and experimenting I came up with a 100 kOhm and a 15 kOhm resistor. 100 kOhm from 5V to the pin and 15 kOhm from the pin to ground:

This did result in a voltage of 0.65V on the pin, which is good for our purposes. Now, once the voltage drops below this, we want an interrupt.
This turned out to be relatively easy to code. After a bit of experimentation, I came up with the following Arduino program:
unsigned long sid_measurement_cycles=0;

digitalWrite(LED_BUILTIN, HIGH);
digitalWrite(LED_BUILTIN, LOW);

void setup() {
(0<<ACD) | // Analog Comparator: Enabled
(0<<ACBG) | // Analog Comparator Bandgap Select: AIN0 is applied to the positive input
(0<<ACO) | // Analog Comparator Output: Off
(1<<ACI) | // Analog Comparator Interrupt Flag: Clear Pending Interrupt
(1<<ACIE) | // Analog Comparator Interrupt: Enabled
(0<<ACIC) | // Analog Comparator Input Capture: Disabled
(1<<ACIS1) | (0<ACIS0); // Analog Comparator Interrupt Mode: Comparator Interrupt on Falling Output Edge

digitalWrite(LED_BUILTIN, LOW);
pinMode(6, INPUT); //Avoid interfering with comparator
pinMode(7, INPUT); //Avoid interfering with comparator

void loop() {
if (sid_measurement_cycles % 1000 == 1) {
Serial.print("SID measurement cycles: ");

Basically, the analog comparator needs to be activated, told to trigger when the voltage drops below (rather than above) the reference voltage and told to generate interrupts.
The interrupt service routine is very simple: It increases a counter and then briefly flashes the internal led of the Arduino. The internal led is pin 13, and can be connected to an oscilloscope. Connecting the second input of the oscilloscope to pin 13:

...the result was as follows:

Exactly the intended result. The Arduino serial monitor shows:

SID measurement cycles: 1
SID measurement cycles: 2
SID measurement cycles: 1001
SID measurement cycles: 2001
SID measurement cycles: 2002
SID measurement cycles: 3001
SID measurement cycles: 3002
SID measurement cycles: 4001

This went much smoother than I expected.

Now, the next steps are going to be quite a bit more difficult. First, we need to wait 256 microseconds. This is easy to achieve using busy waiting, but doing so, would mean that the Arduino would spend the majority of its CPU time inside the ISR. This already sounds like bad taste.

Then we need to generate pulses on POTX and POTY. Doing this with busy waiting as well, is possible would make the Arduino spend the vast majority of its time inside the ISR. Might make it unworkable to do anything usefull with the Arduno besides transmitting data.

In other words, we need to do the transmission without involving the CPU. Using the PWM generators of the Arduino would sound a good approach. This involves advanced timer programming: We first need to program a timer to generate an interrupt at exactly the right time, then program two timers to generate PWM pulses.

The Arduino has 1 16-bit timer and 2 8-bit timers. The 16-bit timer allow the timers to count at high frequency (16 MHz) and is therefore most suited for high precision. But there exists only 1 16-bit timer, while we need to generate two pulses.

In other words, I am not completely certain yet what the best approach is. Perhaps I will try first to generate a timer interrupt at the right time and then generate both pulses with busy waiting. That should already result in less than half of the CPU time spent in ISRs. As both pulses need to be generated inside a single loop, the loop body will contain branches and this might have an effect on accuracy as well. But we'll see.