Since my last post I have made a lot of changes to the CORDIC routines I was working on. As you can see in the chart above, the new version fits in less than the 4.1k of the last version even after adding in logarithms. The change came after I found an article about how the HP-35 did CORDIC calculations. Unlike the other descriptions of the method I have used, this version uses an atan table of powers of 10 instead of powers of 2. This lets you shift the arguments by whole decimal places, which is much faster with BCD numbers than powers of 2 . Before, I had wondered if this would be more efficient, but I don't understand the math behind it well enough to modify the routine myself. My implementation of the method the HP-35 uses only takes 90k cycles, compared to the 1.18 million of the last version. The X and Y that the calculation produces, however, cannot be scaled to produce sine and cosine directly. The scale factor depends on the number of rotations, which is easy to precalculate when the angle is halved every iteration, but not possible to know in advance when you use powers of 10 since each iteration requires up to 9 calculations with each affecting the scale factor. As a result, the ratio of X and Y provides the tangent and the HP-35 used identities involving multiplication, division, and square root to generate sine and cosine from tangent. In addition to the 90k cycles for the CORDIC calculation, I would need at least another 140k cycles for these identities. This is 5 times faster than how I was doing it before, although I'm afraid the accuracy would suffer from the added calculations.
The next thing I tried was posting on the HP museum forum, where Thomas Klemm showed me how to do CORDIC calculations in base 10 without resorting to identities. This was a huge help and a much better way to do things than anything I have tried before. The newest routine I have calculates tangent in only 186k cycles and provides sine and cosine without additional calculations! Like in Thomas' version, I tried to make this CORDIC routine more generic so that I can reuse it for inverse trig functions too. Logarithms and powers of e turned out to be much simpler than trig functions using the method described on the same page about the HP-35.
There are still a lot of things to learn about MSP430 assembly. One thing that tripped me up was thinking that RRA automatically shifts a 0 in from the left when rotating when it actually reproduces the leftmost bit instead. Another surprise was that shifting packed BCD by two decimal places is just as fast loading a word from memory, swapping and combining bytes, then writing the result back to memory as it is to just copy bytes directly. I thought the byte copies would be faster, but I was actually able to shave one cycle off by loading and writing words. I rewrote this short section of code five different times trying to save a cycle or two and eventually realized that this is not a useful way to spend my time. This improvement is too tiny to ever be noticeable, and I want to finish this project soon. There are a few other small improvements I have not spent time on since they would provide only a negligible speed up.
The next step is to start working on hardware and the interface code.