The two main limitations of the Microchip compiler are that it doesn't allow GCC's -O3 level optimizations or compile to the MIPS16e instruction set unless you pay for the full version. The loss of MIPS16e is a real bummer since it saves a lot of program memory by using 16-bit instead of 32-bit instructions like ARM's Thumb instruction set. Microchip claims that programs in this format are 40% smaller without losing much performance which would be great for one of my projects. The standard MIPS GCC has none of these limitations, but it also doesn't come with any headers or other files specific to the PIC32. This got me thinking about how I could combine the two tools and use MIPS GCC on Linux to generate part of the program. Running a few tests shows that Microchip's compiler will accept assembly files written for either the 16-bit or 32-bit instruction set, and the linker will accept object files for either instruction set too. This way, all the chip specific parts of the program like initial setup can be compiled with Microchip's compiler while the bulk of the program can be compiled externally with optimizations and linked in later.
One neat thing about Microchip's IDE and compiler is that it includes a simulator that lets you single-step through the program and see registers and memory. Unfortunately, the source code view in the simulator does not work very well. It duplicates source lines and often highlights the wrong line of assembly whether the source is assembly, optimized C, or unoptimized C. As an alternative, I found instructions for setting up qemu-mipsel-static for simulating and gdb-multiarch for debugging on Linux which work really well! (MIPS also provides similar instructions.) Since the programs are compiled with mipsel-linux-gnu-gcc, it's possible to use Linux syscalls to output to the screen and read and write files. Even though the final PIC32 program won't have access to syscalls, it came in useful several times to have them available for debugging. The compiled MIPS programs run right from the command line as if they were native since the shell invokes QEMU automatically. On Ubuntu 22, it wasn't even necessary to install binfmt, as mentioned in the instructions above, to get this functionality. (Edit: binfmt requires systemd to initialize, so Windows Subsystem for Linux 2 requires sudo update-binfmts --enable after restart.) Configuring gdb-multiarch correctly was a bit of a challenge because it needs to know which architecture it's debugging, but none of the obvious arguments to "set arch" put it in the right mode. It turns out "set arch" with no argument shows all supported architectures including mips:isa32r2 which is what runs in the PIC32. Another useful command is "layout regs" which puts gdb in a TUI mode that works extremely well. The one drawback so far is that gdb doesn't recognize when the program emulated in QEMU switches into MIPS16e mode, so it doesn't allow single-stepping in that mode. Once all of this was set up, I was able to make a lot of progress on PIC32 programming.
After working with other embedded assembly languages like 6502, 8051, and MSP430, it was really fun to try MIPS which is so much more convenient to program in. Lots of 32-bit wide registers and the three-register instruction format are a huge improvement over 8-bit architectures. On the other hand, there are some weird things about MIPS like the delay slot and lack of a flags register . The weirdest parts, though, are how the assembly works with the GNU assembler as. Various macros synthesize operations, such as loading an immediate into a register, that can't always be done with one instruction. Some instructions like div also function like macros outputting several instructions, which only became apparent after seeing the assembly listing. After looking up the bit patterns for the bytes generated and decoding the instructions, it was confusing that those instructions seemed incomplete for the division instruction. As it turns out, the assembly listing only shows the first few instructions laid down by a macro, so the command line argument --listing-cont-lines is needed to increase the lines shown and see the bytes for all the instructions generated. Translating those bytes into instructions to see what was going on was tedious, so I fed them through the rasm2 tool of radare2 to speed the process up. This worked alright on Ubuntu 21 where radare2 installed fine, but it failed to install on a fresh copy of Ubuntu 22, so I made a Python script to add instruction decoding to assembly listings:
Bytes Source Decoded instruction
===== ====== ===================
00000000 nop | sll $zero, $zero, 0
42000824 li $t0, 0x42 | addiu $t0, $zero, 0x0042
20400901 add $t0, $t1 | add $t0, $t0, $t1
05000821 add $t0, 5 | addi $t0, $t0, 0x0005
3412083C lw $t0, 0x12345678 | lui $t0, 0x1234
7856088D | lw $t0, 0x5678($t0)
02002015 div $t0, $t1 | bne $t1, $zero, 0x00008
1A000901 | div $t0, $t1
0D000700 | break
FFFF0124 | addiu $at, $zero, -0x0001
04002115 | bne $t1, $at, 0x00010
0080013C | lui $at, 0x8000
02000115 | bne $t0, $at, 0x00008
00000000 | sll $zero, $zero, 0
0D000600 | break
12400000 | mflo $t0
- The li macro uses addiu with the zero register to load constants that fit into 16 bits rather than an ori instruction like some examples show.
- Instructions like add, and, xor, etc take registers as arguments rather than immediates but will quietly generate the addi, andi, xori, etc variants if an immediate argument is supplied.
- Instructions like lb, lh, and lw generally take an offset and register holding an address as its second argument but will allow an address as second argument instead which loads the address for you before fetching the data the address points to.
- The div instruction adds extra instructions to check for division by zero and other things.
- Three-register instructions can generally be shortened to two registers if the destination and first source are the same.
The Python script to do the decoding was really easy to set up. Like with other projects, I relied on a spreadsheet to quickly organize the data then used spreadsheet formulas to generate a Python dictionary to paste directly into the script. After a while, I also added an option to decode a single instruction from the command line in big- or little-endian mode instead of decoding a whole listing file:
$ mips-decode -el 42000824
addiu t, s, i 001001ssssstttttiiiiiiiiiiiiiiii
addiu $t0, $zero, 0x0042 00100100000010000000000001000010
The last thing to figure out for this was the strange memory access instructions that became apparent after finishing the decoding script. The la instruction for loading an address into a register would load a value relative to $gp, which is the global pointer, instead of loading a fixed address as an immediate. All the jump instructions were replaced by relative branches too. These weird replacements were due to GCC compiling to position-independent code by default. Passing -mno-abicalls and -mno-shared as arguments to GCC produces non-position-independent code that works on the PIC32. A few short tests with Microchip's compiler shows that it produces the same type of code.
Another strategy I investigated was running Linux for MIPS in a virtual machine to see if using native tools like gcc and gdb would be more convenient in any way than cross compiling. Debian Linux seemed like the best candidate since it's related to Ubuntu and is one of the few distributions that still supports MIPS. Setting this up turned out to be much more difficult than expected. My experience of supplying a virtual machine with minimal command line arguments for a virtual hard drive, CPU type, and path to an installation iso did not work at all with QEMU. Lots of head scratching and googling led me to links here and here describing the process, but both links are outdated and don't work with the latest version of Debian. A more recent link here has broken links for Debian 9.1 and incomplete instructions, although a comment further down the page explains the missing step of pulling initrd.gz off the virtual hard drive after installation. After a lot more effort than I intended to spend on this, Debian 11.5 for MIPS finally works in QEMU. The results, however, were totally underwhelming as the overhead of emulation is really apparent even for simple operations. None of the acceleration options like kvm or xen are available since there is no hardware acceleration for MIPS on my x86 Linux server. Summing all the integers from 0 to 999,999 in Python takes about 2.4 seconds compared to only 0.09 seconds when run natively. This type of oversimplified benchmark is next to meaningless but enough to show that the emulated MIPS system is just way too slow to be useful.
There are a few other options I haven't tried. chipKIT has a compiler package for programming PIC32s which is based on GCC, as I understand it, and not locked down like Microchip's compiler. Another thing to try is passing all of the individual options enabled by -O3 to Microchip's compiler which is supposed to have the same effect as -O3. Finally, while I've downloaded the Microchip compiler for Linux, I haven't tried compiling with it since I would still need to switch over to Windows to write the compiled firmware to the chip. Hopefully, a project generated by the IDE on Windows will compile with the Linux command line version if that's ever necessary.
No comments:
Post a Comment