Sunday, January 2, 2022

Linux for Embedded Development

A little over a year ago, I bought a new Casio FX-9750GIII graphing calculator. It really appealed to me for several reasons so I decided to buy one even though my days of collecting calculators are mostly over. One of the main things I like is that it comes with Python which has become one of my favorite programming languages. Another interesting thing is that the processor speed was doubled to 59MHz over the preceding FX-9750GII making it as fast as the FX-9860GII and the FX-CG50. Amazingly, there is a port of Xcas for it, so you can run the same Computer Algebra System (CAS) used in the HP PrimeTI-89, and TI-Nspire. The styling of this one has changed back to the standard rectangular design which is much better than the awkward and ugly rounded design Casio used for some of the preceding models. All of this is very impressive and it only costs $39! Best of all, it can run programs written in C and assembly like previous Casio models. Other modern calculators like the HP PrimeTI-Nspire, and TI-84+ CE don't allow native code execution which is why I've stayed away from them. The catch with the FX-9750GIII is that the GCC toolchain put together by the community only runs on Linux. It took quite a bit of work getting the toolchain installed on Ubuntu in a virtual machine since I didn't have much experience with Linux. A few months ago, I moved everything over to an old server that someone donated. Below are some things I've learned while doing more and more of my embedded projects on Ubuntu.

Virtual Machine
At first, I installed Ubuntu Server on VirtualBox so I could use the GCC toolchain for the FX-9750GIII. It took quite a while to install and then some more work to get things like networking and a shared folder with the host to work. The GCC toolchain turned out to be very problematic to install which was not entirely due to my lack of experience with Linux. Thankfully, the maintainer of the toolchain put in a lot of work helping me get it working on the Planet Casio forum. One of the first problems was trying to get the scripts working without understanding how the various types of shells work in Linux which is very different from the Windows command prompt I'm used to using. There is a package called GiteaPC that the maintainers put together that is supposed to install everything, but this didn't work for me at first either. Other problems were beyond my control such as not being able to update CMake to the version the toolchain requires. For this, I had to update Ubtuntu from 20.04 LTS to 21.04 which does not have long term support. CMake also had problems finding the PATH variables. Eventually, I did get the toolchain running, though it failed when trying to install it again sometime later since the maintainers had changed the toolchain's structure so it no longer built correctly. In the end, I did get GiteaPC working, again with the help of the maintainers, though I have to say the weeks-long process of just getting to Hello, World was pretty awful. All of this led me to reinstalling Ubuntu several times on VirtualBox.

Thursday, December 30, 2021

Kitty Salon

For Christmas this year, my girlfriend and I made a talking toy cat for her nephew. It was a fun opportunity to build something together and practice some fabrication skills that are useful for calculator projects.

The idea for this project came from her nephew pretending to be a cat and calling himself "Kitty Salon." The toy version of Kitty Salon has five recorded messages saying things like "Meow! Meow!" and "Rise and shine!" A button on the bottom plays one of the messages every time it's pressed, cycling through all five. Two LEDs for eyes glow to show that the cat is on and fade on and off while the cat is talking.

Body
The first step to building the cat was sculpting a model out of clay. The innermost part of the model is a small box made out of foamcore which is large enough to hold all the electronics. The first try at making the cat out of clay from the craft store failed. The box the clay came in said that it dries when exposed to water. In our case, the model cracked in several places after drying and the feet fell off. The second try used Sculpey which doesn't require water to set and has the added benefit of working well with silicone rubber molds. You can see in the picture above that the back end of the cat is a little square and fat since the Sculpey is molded around the box for the electronics.

Wednesday, July 21, 2021

6507 Graphing Calculator: Update

After a few months of progress, a demo version of this project is on my website. You can try out all the functionality of the calculator in your browser since it runs on my 6502 emulator. Below are some updates since the last post.

Software The functionality of the calculator is close to complete. The Forth system is basically done and supports these new structures: IF/ELSE/THEN, DO/LOOP, BEGIN/AGAIN, and BEGIN/UNTIL. The only part missing from the Forth system is a way to look at the words and variables. Most Forths have a word called WORDS for this. My plan is to have a very simple list-based browser that will let the user delete words and will handle garbage collection. Logical operations like AND and OR have been implemented as well as CORDIC-based functions like SIN, COS, and TAN.
The firmware running in simulation is now a little more than 9KB, which is too big to fit on the calculator's 8KB EEPROM. The CORDIC routines especially take up a lot of room. I'll need to shave at least 2KB off of the firmware to get everything to fit and have room to implement the last few remaining features. My previous plan was to rewrite parts of the firmware in Forth. Now, I plan to also implement a byte-based virtual machine for floating point registers which should make the firmware much smaller and hopefully not sacrifice much speed. 
Features I plan to add to the firmware:
  • Word browser with garbage collection
  • Simple graphing
  • Improved numerical output on the stack display
  • New words: ATAN, ACOS, ASIN, MOD, ^, E^, LN
Hardware
There haven't been many updates to the hardware part of this project. The voltage regulator is a switching boost converter that generates 5v from three lithium AA batteries. These have good battery life for high current drain applications like this and also don't leak and damage the device the way alkaline batteries can. The boost converter will let me use the batteries down to a fairly low voltage, whereas a buck converter with four AAs would stop working when the batteries hit 1.3v and still have a lot of charge left. Picking the right capacitors and inductors for the regulator was a little tough since the exact ones listed in the datasheet have been discontinued. The few I picked to try seem to work but the voltage is too high when I run the regulator on a breadboard, so I'll have to wait until I solder it down to the circuit board to be sure.

Website The new page on my website for the project has the firmware running in my 6502 emulator with documentation for the project below it. Some of the documentation sections, like the word list, took quite a bit of HTML and CSS to arrange, so I made a simple markup system to generate the pages. The information for the pages is stored in a markup text file that is easy to read and edit. A Python script reads the file and generates all the HTML files which would have taken a very long time to write by hand. I'm not sure what tools people generally use for this type of job. The system I made was a fun sub-project and really saved a considerable amount of time.

There were also a few small updates to the 6502 emulator that the calculator firmware is running on. One is a check for indirect jumps where the low byte of the address is 0xFF. There is a well known bug in the original NMOS 6502 where those addresses are calculated incorrectly. It turns out that the new check I added to the emulator is not really necessary since assemblers do their own check, but I'll leave it in for now. The HTML layout of the emulator pages is also more modular and cleaned up so it's easier to add new projects that use the emulator.

Conclusion
Now that this project is at a stopping point, I'm going to shift my attention to a new project that I'm working on for my resume. This won't take as long to build as a calculator, so hopefully I can shift back to this project when that was is finished.

Monday, January 25, 2021

New Project: 6507 Graphing Calculator

Despite the decision in 2019 to finish up some lingering projects before beginning anything new, I did start working on a new calculator project. One of the members on the 6502 forum organized a contest about a year ago for projects using the 6507 processor, so I decided to make an exception and build this calculator as an entry. Most of my free time in 2020 was spent working on my Robot Game project before making a lot of progress on this project. Rather than a series of progress update posts like my other calculator projects have, this is one long post about the project so far. Some of this information is already available in the 6502 forum post on the project. 

Contest
The contest started in January 2020. The goal is to build something using a 6507 processor, 6532 RIOT, which stands for RAM, IO, and Timer, and a 2KB 2716 EPROM. After some discussion, a larger ROM of up to 8KB was allowed since it would be difficult for most people to program a 2716. The limit seems to apply to the total memory, so I may have accidentally disqualified myself by adding 2KB SRAM to the 8KB ROM. The 6507 and 6532 run at 1MHz and are both NMOS parts, so they consume a lot of current compared to the newer CMOS 65C02. The pair was used together in products like the Atari 2600. The 6507 is a reduced version of the 6502 and has 28 pins instead of 40, so it can only address 8KB of memory rather than the usual 64KB. It's also missing the IRQ and NMI interrupt pins. These limitations make it interesting to work with. The 6532 has 128 bytes of RAM, two IO ports, and a programmable timer, which is a lot of useful stuff to have in one package.

Hardware
The first 2KB of address space is occupied by RAM. Because the 6507 can only address 8KB, the last 4KB of address space is banked for the ROM. The 6532 takes 256 bytes of space leaving 1.75KB in the middle of the memory map that is assigned to ROM. Having a fixed amount of ROM there that isn't banked simplifies handling the banks and initializing the system. Here is the memory map:
  • 0x0000 - 0x07FF: 2KB RAM
  • 0x0800 - 0x08FF: RIOT
  • 0x0900 - 0x0FFF: 1.75KB fixed ROM
  • 0x1000 - 0x1FFF: 4KB banked ROM
The address decoding is done with a an ATF16V8C GAL, which is similar to a CPLD, though less capable. The WinCUPL software for programming these is atrocious and crashes frequently, but setting up the design for the GAL itself was pretty simple. The chips can be programmed just fine with the TL866II+ programmer that I use for EEPROMs. According to the datasheet, the ATF16V8C can draw over 100mA, so I also got some lower power variants to try to see if they save power.

The display is a 128x64 monochrome LCD compatible with the KS0108. This is the first time I've used a display like this and it works pretty well so far. The header pins are on the bottom edge of the screen, so it has to be turned upside down to fit the system I'm building. This means the font data has to be written upside down and backwards, which complicates things a little. It wasn't easy to find a completely free 5x8 font, so I made one myself. Later, I experimented with different styles to see what looks best:


The system would not start reliably on a breadboard, and I eventually tracked the problem down to sagging voltage on the breadboard. The problem was actually the jumper wires used to carry the supply current not the breadboard itself. Soldering everything onto protoboard fixed the startup problems and everything worked reliably after that:


There is still a good amount of work to do even though the hardware is pretty far along. The keyboard needs to be soldered and connected to one of the latches. There are a few possibilities for making the labels for the buttons that still need to be worked out. Another big part still missing is the power supply. The first plan was to run the system on four AA batteries, though this would waste a lot of power using a linear regulator or even a buck regulator since a lot of power would be left in the batteries by the time the voltage drops too low to use with a regulator. Instead, three AAs with a boost regulator will allow me to fully use more of the batteries' charge and also use rechargeables, which wouldn't be possible with the buck regulator. My plan is to use a MAX756 boost regulator, though I still need to set it up and test it. One cool thing about the chip is that it has a low battery indicator, which is something I was planning on trying to implement myself. The power supply also needs a way to be switched on and off by software control, so the keypad can have an on/off key rather than a separate power switch. This is really important so the system has a chance to save it's state before powering itself down. The last piece of the power supply system is a way to put the processor to sleep when it isn't needed. This can be done with the RDY pin. Hopefully, the timer and interrupt system of the RIOT can be combined with the GAL and software control to save energy while the calculator is between key presses.

Software
The interface is built around a Forth-style system with an 8 level stack of 8 byte objects. Unlike most Forths, there are three different types: floats, strings, and hex. The floats use 6 bytes for a 12 digit BCD significand and 2 bytes for the exponent and sign. Strings can take up all 8 bytes. Hex objects hold a 16-bit unsigned integer that can be used for memory addressing and come in two types: smart and raw. The raw hex type just stores a single 16-bit number. The smart hex type holds a base address, offset, and calculated sum of the two. The offset comes from any value added or subtracted to the object. This lets pointers be garbage collected since only the base needs to be updated then the existing offset can be applied to generate the sum. Garbage collection is also not a common feature in most Forths, but is definitely necessary for a Forth-based calculator with only 2KB of RAM to be viable. 

The Forth system is token based to conserve memory. The header of each word describes how many objects of which type the word expects on the stack. The dispatcher can check those objects and prevent stack underflow, which saves a lot of space compared to doing the checks in the word itself. Marking each word and piece of data with a token allows for garbage collection, since every piece of data that is a pointer which may change can be identified and updated when needed. 

So far, the basic Forth system including defining words, tick, and EXEC, and the four basic arithmetic functions for floats and hex objects have been defined. The main functions remaining to be implemented are loops, IF statements, and transcendental functions like sine and logarithms. The firmware is already 6.5KB, so there is very little room to squeeze in a lot of capability. My plan now is to continue writing everything in assembly, even if the firmware overflows the 8KB limit, then rewrite key parts of it in Forth to squeeze it into the available memory.

Floating point
The floating point numbers have a 12 decimal digit significand with guard, round, and sticky digits and rounding to even, which is the same format used by the HP-48 series of calculators. Implementing a floating point package is somewhat complicated and easy to make a mistake on, so a lot of testing is needed to verify the calculations. At first, I included tests in the firmware running in my JavaScript emulator, but this doesn't work well at all for a large file of test data. Next, I ported the emulator to node.js, so I can run the tests in a separate console window. This works by first generating a few million test calculations in Python using the Decimal package configured for the same number of decimal places and rounding style. The calculations are written to a file that comes to about 200MB then directed to the node.js emulator. The firmware running in the emulator reads the file generated by Python and performs calculations on the data there then compares the results to the results in the file and flags any that don't match. This system let me catch several errors in my code that would have been difficult to find otherwise.

Conclusion
Despite the months of work on this project, there is still a lot to do! In the coming months, I'll work on the keypad, power supply, and finishing the firmware, so I can finish the project and submit it to the contest.

Thursday, July 9, 2020

Robot Game: 65C02 C, assembly, and Forth Comparison

The last six months or so, I have been working on a project to compare the performance of C, assembly, and Forth on the 65C02. This week I finally finished everything and published the results on my website. A few days later, the project was featured on Hackaday!

To compare the performance between the languages, I made a simple game called Robot Game in Python in order to have something to port. The C version is compiled with CC65 and the Forth version with Tali Forth 2. There are two assembly versions. One is plain 65C02 assembly using the X register as a data stack and the other uses the Assembly Optimizer I have been working on to get slightly better performance by putting locals in zero page at fixed addresses. All four version of the game work in my JavaScript-based emulator and can be played on my website. The source code for all the versions of the game, the emulator, and optimizer are now on my GitHub page.

Porting the four games was a challenge and also a lot of fun! I learned a lot about how C and especially Forth work internally. After I had the games ported, I looked at 28 different tasks the game does internally and compared performance across languages. Here are the results where traditional assembly is normalized at 100%:


The optimizer adds up to 10% performance and 25% in one very short routine. C was about 2-3 times slower than regular assembly in the tests and Forth was usually around 10-20 times slower. One of the really interesting things was figuring out why Forth is so much slower despite Forth fans claiming it's much better than C. 

Sunday, December 8, 2019

Optimized MSP430 Assembly in Mecrisp Forth


A few years ago, I got interested in Forth due to its connection to RPN calculators but lost interest because of the huge hit to performance a stack-based system takes on an architecture like 6502. Recently, my interest was piqued again during a conversation in the #Forth IRC channel on FreeNode about Mecrisp-Across, which uses an ARM-based microcontroller board to produce optimized assembly for the MSP430. Because the compiler tries to keep values in registers instead of the stack, it has the potential to reclaim some of the performance lost in most Forths. Some of these optimizations might be useful for a 6502-based Forth, so I tried to understand how optimizations in Mecrisp work.

The first step was getting Mecrisp running on a PC. The author provides versions that are compiled for ARM with Linux system calls, so you can use QEMU for emulation under Linux. Getting this working on Windows was a real challenge. One Debian image I tried emulating with QEMU crashed the emulator outright. Another image booted into Linux but refused to install QEMU complaining about needing the Linux CD in the drive before it would install any packages. Getting Ubuntu working on VirtualBox under Windows 10 and installing QEMU on it wasn't too tough, but the Mecrisp image wouldn't run until I got the right type of QEMU (there are several). The right combination was VirtualBox, an Ubuntu image from OSboxes, and QEMU-user-static.

Monday, August 19, 2019

New Project: 6502 Assembly Optimizer


This is another project on the list of open projects from a few months ago. The goal is to improve 6502 assembly code used in the firmware of a graphing calculator project. The first step is to manage how local memory is used, since the current options are not that great. One option is stack-based addressing in zero page, which monopolizes the X register. This is one cycle slower than hard coding a static address in zero page and mostly prevents the use of the X register for anything else useful, leading to less efficient code. The other option is assigning each local variable to its own address in zero page, which is very wasteful. With only 256 bytes, you run out of room pretty fast. This project assigns zero page automatically by determining which functions call each other (call graph pictured above) and assigning memory so that functions which never overlap share the same memory. In this test code, 38 bytes of local variables fit into only 19 bytes of zero page. Here is a graph where each row represents one byte of zero page and each block one byte of local memory in a function:


Depending on the shape of the call graph, it might be possible to fit a large number of variables in a small amount of zero page.