Saturday 26 July 2014

Debugging the new CPU

Debugging of the re-implemented CPU continues, and is hopefully close to complete.  I thought I would describe some of the process I have followed.

The real secret to debugging anything is discoverability, that is having the means to work out what is happening so that you can tell not only whether the result is correct, but how the result is being calculated.

With a hardware design this can be rather annoying, because the time it takes to make a trivial change, resnythesise and test the design can be of the order of an hour.  Assuming that you correctly expose the thing you are trying to debug, that means you can examine (and hopefully fix) at most a dozen or so defects per full day of effort. Not Good.

Fortunately there are simulation tools for VHDL that let you debug without having to go through the whole synthesis process, and thus reduce the time to examine a defect from hours to minutes.  While this has limitations, for example, to debug the SD card interface in simulation I would need to write an SD card simulator, it is extremely useful, and I have made extensive use of the free and open-source simulation tool, ghdl.

The processor redesign basically consisted of gutting out the first implementation of the CPU and leaving just the shell that accesses the memory and interfaces with the serial monitor, which I have described in previous posts.  The serial monitor is extremely useful, because it allows reading and writing of all memory, as well as examining the processor state, and single-stepping the processor.

The first part was to re-do the serial monitor interface, because this needed an overhaul for the new processor architecture.  This was rather tricky, because simulating a serial connection feeding various commands in would take a fair bit of work, and the time scales of serial input means that simulation would be rather slow anyway.  So as a result I used some of the LEDs on FPGA board to provide some useful debugging output, and worked as carefully as I could to make sure that the code was likely to work.

The second and related step was getting the memory access stuff working again, and accessible via the reworked serial monitor interface.

These two steps took much longer than I had hoped, and were really frustrating.  In retrospect, it might well have been easier to make a simulator for serial input and used ghdl simulation to shorten the process a bit.

After this, I set about implementing a few simple instructions so that I could get single-stepping of the CPU through the serial monitor working.  This also turned out to take way longer than I would have liked, partly because the new CPU architecture uses 6502-style end-of-instruction pipelining which really complicated single-stepping.  I did get it working in the end.

Then it was on to implementing LDA, STA, JMP and a few other instructions to allow the writing of simple little test programs to confirm that the CPU was generally working.  At this point ghdl was useful to allow quick testing of the instructions and their interactions.

In the process of doing this, I realised that the debug output I was producing in ghdl was not as good as it could be.  Basically I was looking at hexadecimal instruction bytes and trying to decide if it was right or not.

It would be much easier to debug if I could get ghdl to show full instruction disassemblies as well, so in stead of just seeing 8D 0D DC, it would also show STA $DC0D.  Also, it would help enormously to know what memory access was happening each cycle, so that I could get an idea of exactly where an instruction was going astray.

I finally had time to implement this during the week, and now I can easily get output like:

MEMORY reading $FFFF654 = $A9
MEMORY reading $FFFF655 = $00
MEMORY reading $FFFF656 = $85
$F654 A9 00     lda  #$00          A:00 X:22 Y:33 Z:00 SP:01FF P:26 $01=3F  ..E-.IZ.  
MEMORY reading $FFFF657 = $20
MEMORY reading $FFFF658 = $A9
MEMORY reading $FFFF658 = $A9
MEMORY writing $0000020 <= $00
$F656 85 20     sta  $20           A:00 X:22 Y:33 Z:00 SP:01FF P:26 $01=3F  ..E-.IZ.  
MEMORY reading $FFFF659 = $91
MEMORY reading $FFFF65A = $91
MEMORY reading $FFFF65A = $85
$F658 A9 91     lda  #$91          A:91 X:22 Y:33 Z:00 SP:01FF P:A4 $01=3F  N.E-.I..

Actually the output has a little more information in it, but the above gives you an idea.

We can see a few things from this output.

First, the instructions seem to work, as we see the right values end up in the accumulator, and the correct value being written to the write address.

Second, we can see that there is a dummy read in STA, which is part of the design that allows 48MHz operation.  So for some instructions at least, we don't expect 48x performance.  Some of these might get improved down the track, but some penalty cycles will have to remain.

Thirds, we can see the 6502-style pre-fetching of the next instruction while the previous instruction is finishing off.

Armed with the ability to produce this kind of trace, I used the TTL6502 test program for 6502 processors, and by examining the simulation output was able to quickly find and fix quite a number of bugs.

The TTL6502 program only tests the original 6502 instructions, not any of the 4502 extensions.  So I have followed a bit of an ad-hoc process of writing little programs that use each of the new instructions, and verifying from the memory trace, register and flag values that all is well.  This has also turned up a great many bugs.

This is more or less where I am at now, fixing bugs with PHW (push word, immediate or absolute) and a few other remaining instructions.  Once that is done, we should hopefully be back to being able to boot the C65 ROM into C64 mode, and then soon after running SynthMark64 to get an idea of the speed of the new CPU.

No comments:

Post a Comment