Reverse engineering SWD JTAG debugging/flashing protocol for Freescale chips
One of the great breakthroughs of the Arduino has been that to get started, you only need the device itself and the software. The try/fail/try again cycle of development got reduced to altering your code and pressing run (again).
Most other platforms completely fail in this respect. Many chip-manufacturers have excellent and cheap try-out boards that are even pin compatible with the Arduino. However, while Arduino gets many things right – hardware choices and software choices are not two of them. The Atmel AVR series is easy but outdated. The Arduino “IDE” barely beats notepad.exe in functionality and project management.
Chip manufacturers almost invariably fail to recognize that having to struggle through ANYTHING between the compile button and the thing actually running your code is losing them enthusiasm, mindshare and ultimately customers. Because the manufacturers have no idea how to approach this part of the developer experience themselves, and apparently no inclination to get involved, they attempt to extend the Eclipse IDE (with endless pages of settings tabs) or try to become hardware compatible with Arduino.
So when I was recently asked if I would like to host a workshop programming games for the latest incarnation of the microgameboy, I had a problem! There is no real foolproof way to even talk to these chips! The official toolchain only supports a handful of programmer devices you have to attach separately using a tiny halfpitch connector that is not easy to find. This will not do for novices.
So part of my “this really should NOT be Rocket Science” quest has become this:
Make developing for a platform like the microgameboy EASY.
Stage one: creating a cheap tool to connect the microgameboy to your pc to update the program memory.
The chip I am using in the microgameboy is made by Freescale (the MKL02Z32 to be precise). This chip belongs to the ARM Cortex M0 family of devices. ARM is a big company that creates standard designs for chips. If you keep to the standard, software written for that standard will run on your device. Luckily, the Cortex standard includes a chapter on the debugging interface. The debugging interface allows anyone to inspect the inner state of the chip and poke around in its system memory. With some clever mangling (and the manual from Freescale – since this bit is outside the Cortex standard) you can trick the chip into updating/rewriting its own flash memory. This is what I’m going to do.
The first step of such an undertaking is (as usual): Homework.
Serial Wire Debug manual by ARM
KL26 manual by Freescale
There are endless stacks of documents to be found that all refer to the debug interface in some way or other.. which always leads me to:
The second step of such an undertaking is (as usual): screw this shit..
I am not going to sit here reading 5000 pages of dense text. I’ll have a look at the actual data instead.
Last year I ordered a whole bunch of tools to deal with inspection of electrical signals. Amongst this set is an Open Workbench Logic Analyzer by GadgetFactory. It allows you to record a whole bunch of signals at the same time, at very high speeds. Using this device, I got this:
Saving this to a file and parsing the file for clock/data state transitions got me this huge array of bits:
data dump from jlink capture:
As you can see, the file contains some obvious patterns, creating diagonal lines in the data dump like 60’s wallpaper. By pressing enter at the diagonals, I separated the data dump into a more logical grouping of bits.
Spacing it better
111111111111111111111111111111111 11111111111111111111111011110011110011111111111 11111111111111111111111111111111111111111111111 10110110110110111111111111111111111111111111111 11111111111111111111111111000000000000000010100 10110011101110001010001000001111010000000001010 01011001110111000101000100000111101000001010000 00110011011110000000000000000000000000000000000 01001010110011000000000000000000000000000010100 00000001011000110000000010000000000000000000001 11111010001101100110000111100000000000000000000 00000000000011111001100000000000000000000000000 00000000010111011011001000110000000000111011100 01000000101011110110011000000000001000000000000 00111111010001101100110000000000000000000000000 00000000000000011000101100110100100000000000000 00000110001001000000011010001100110000000000000 10000000000000011111000000011111001100000000000 00000000000000000000000010111101011001100000000 00011111111111111111111101101000110011000011111 11100000000000000001111000000001111100110000000 00000000000000000000000000001011111001100101100 00000000000000000000000000110111110011000000100 10000000000000000000000000101111100110010100000 00000000000000000000000001011110101100100011010 00000000000000000000000010110100011001100100000 00000100000000000000111100000000111110011000000 00000000000000000000000000000101111010110011000 00000001111111111111111111101011010001100110000 11111111100000000000000011111000000011111001100 00000000000000000000000000000000010111110011001 01100000000000000000000000000001101111100110000 00100100000000000000000000000001011111001100101 00000000000000000000000000000010111101011001000 11010000000000000000000000000101101000110011000 10000000001000000000000001111000000001111100110 00000000000000000000000000000000001011110101100 11000000000010111111000000001111110101001011001 11011100010100010000011110100000101000000110011 01111000000000000000000000000000000000001001010 11001100000000000000000000000000001010000000001 01100011000000001000000000000000000000111111010 00110110011000000000000000000000000000000000000 00001100010110011010000000000000000000000110001 00000000001101000110011000000001011011100000000 00000111100000001111100110000000000000000000000 00000000000001010111101100000000000110001100110 00010000010010110100011001100110000101101110000 00000000011110000000111110011000000000000000000 00000000000000000101011110110000000000000000001 01000000101111101010000001100110111100000000000 00000000000000000001000110110011000000000000000 00000000000000000011010001100110000000000000100 00000000000001110110001011001101001000000000000 00000001100010011101110110011010000000000000000 00000000000000100000000101111011000000000000000 00000000000000000000101000000110011011110000000 00000000000000000000000100011011001100000000000 00000000000000000000001101000110011000011111011 01110000000000000111111000101100110100100000000 00000000000110001001111110011000000000000000000 00000000000000000101011110110010000000000000000 00000001000000001010000001100110111100000000000 00000000000000000001000110110011000000000000000 00000000000000000011010001100110000000000000100 00000000000001110110001011001101001000000000000 00000001100010011111100110000000000000000000000 00000000000001010111101100000001000000000000000 00000000000110100000011001101111000000000000000 00000000000000010001101100110000000000000000000 00000000000000110100011001100111111101101110000 00000000011111100010110011010010000000000000000 00011000100111111001100000000000000000000000000 00000000010101111011000000000000000000000000001 00000001101000000110011011110000000000000000000 00000000000100011011001100000000000000000000000 00000000001101000110011000000000000100000000000 00000111011000101100110100100000000000000000001 10001001111110011000000000000000000000000000000 00000101011110110000000000000000000000000000000 10011010000001100110111100000000000000000000000 00000001000110110011000000000000000000000000000 00000011010001100110000000000001000000000000000 01110110001011001101001000000000000000000011000 10011101110110011100000000000000000000000000001 00000000000101111011000000000000000000000000000 00000000101000000110011011110000000000000000000 00000000000100011011001100000000000000000000000 00000000001101000110011000110000000111111110000 00000111111000101100110100100000000000000000001 10001001111110011000000000000000000000000000000 00000101011110110000000000000000000000000000000 00001010100101100111011100010100010000011110100 00010100000011001101111000000000000000000000000 0
Almost repetitive, I did not manage to create a straight line but again revealed a pattern in the data. I remembered reading about the packet layout somewhere on the first pages of the manuals. The packets should be 8bit – pause – 3bit – optional pause – 33 bit – optional pause. Let’s see how that fits the data:
Syncing it up
10100101 100 11101110001010001000001111010000 0 10100101 100 11101110001010001000001111010000 0 10000001 100 11011110000000000000000000000000 0 10010101 100 11000000000000000000000000000010 1 10110001 100 00000010000000000000000000001111 1 10001101 100 11000011110000000000000000000000 0 11111001 100 00000000000000000000000000000000 0 11101101 100 10001100000000001110111000100000 0 10111101 100 11000000000001000000000000001111 1 10001101 100 11000000000000000000000000000000 0 11000101 100 11010010000000000000000000110001 0 11010001 100 11000000000000010000000000000011 1 11111001 100 00000000000000000000000000000000 0 11110101 100 11000000000001111111111111111111 1 11010001 100 11000011111111000000000000000011 1 11111001 100 00000000000000000000000000000000 0 11111001 100 10110000000000000000000000000000 1 11111001 100 00001001000000000000000000000000 0 11111001 100 10100000000000000000000000000000 0 11110101 100 10001101000000000000000000000000 0 11010001 100 11001000000000010000000000000011 1 11111001 100 00000000000000000000000000000000 0 11110101 100 11000000000011111111111111111111 0 11010001 100 11000011111111100000000000000011 1 11111001 100 00000000000000000000000000000000 0 11111001 100 10110000000000000000000000000000 1 11111001 100 00001001000000000000000000000000 0 11111001 100 10100000000000000000000000000000 0 11110101 100 10001101000000000000000000000000 0 11010001 100 11000100000000010000000000000011 1 11111001 100 00000000000000000000000000000000 0 11110101 100 11000000000010111111000000001111 1
Way better again. Especially the centre column of 100s is looking good! This is the ACK-message of the SWD-standard.
Parsing it
To parse it I had to look at the manual again *sigh*. There are some bits in the protocol that allow the chip to do sanity-checking on the signal. Start bit, stop bit, park bit, parity bit – checking for these bits allowed me to sync the signal even better. More manual readings gave me the names of the various registers and bits. Read bit, write bit, debug bit, ACK, error and all that jazz.
Behold the interpreted bit stream of a commercial programming box talking to a Freescale-chip:
needle (0x0BC11477) in LSB first format: 11101110001010001000001111010000 found needle! skipped: 11111111111111111111111111111111111111111111111111111111011110011110011111111111111111111111111111111111111111111111111111111 AP R ?????? skipped: 11111111111111111111111111111 DP R ID DP R ID 0x0000001E 10000001 100 01111000000000000000000000000000 0 DP W CTRLSTAT -> 0x50000000 10010101 100 00000000000000000000000000001010 1 DP R CTRLSTAT 0x000000F0 10001101 100 00001111000000000000000000000000 0 AP R DATAREAD AP R ?????? DP R RDBUFF 0x00000000 10001101 100 00000000000000000000000000000000 0 AP W CSW -> 0x23000012 11000101 100 01001000000000000000000011000100 0 AP W TARGET -> 0xF0002000 11010001 100 00000000000001000000000000001111 1 AP R DATAREAD AP R TARGET 0xF0000FF0 11010001 100 00001111111100000000000000001111 1 AP R DATAREAD AP R DATAREAD AP R DATAREAD AP R DATAREAD AP R TARGET
Download the code/program/dumps HERE.
Coding it
Now that I had captured a real bit of setup-code that actually performed the thing I needed to do on the chip, I could rewrite it! With some tweaking, head-banging and by using the logic analyser again to check if my own output matched the official output – I managed to get something working. Now I have a way to dump the contents of the memory of any Cortex M0 device with an enabled SWD subsystem:
uint32_t Peek2(uint32_t address) { Write(false, DP_W_ABORT,0x1e); Write(false, DP_W_SELECT,0); Write(true, AP_TAR, address); Write(true, AP_CSW ,0x23000012 ) ; // , SIZE_32 | AUTOINC_SINGLE |( (uint32_t)1<< (uint32_t)24) | ( (uint32_t)1<< (uint32_t)25) | ( (uint32_t)1<< (uint32_t)29)); Read(true, AP_DRW); return Read(false, DP_R_RDBUFF ); }
Now gives the following output on my MKL02Z32 test device:
SWD-DP id: 0x0BC11477 Cortex M0 identified! FFFFFFFF FFFFFFFF FFFFFFFF FFFFFF7E D0342900 22002301 4288B410 2401D32C 42A10724 4281D204 0109D202 E7F8011B 42A100E4 4281D204 0049D202 E7F8005B
Up next: writing hex-files to the built-in flash memory. To be continued..