Sunday, December 30, 2018

Tiny Calculator: Sine, Cosine, and Tangent

For the trig functions, I started building the same angle table that I used for the CORDIC routines in my RPN Scientific Calculator. Comparing my table to an example calculation I found of how CORDIC works, I noticed that the values are all pi times bigger than the ones in the example. Could I have made the same mistake in my RPN Scientific Calculator? When I get back home from the holidays, I should check to make sure that this does not throw off some of the calculations. Another thing I found out from the same page was that the K value to multiply by after the CORDIC calculation is calculated for a particular number of iterations. In my RPN Scientific Calculator, I included a K value for the maximum number of iterations and truncated it when less precision was used. On second thought, I should have done some testing to make sure truncating the K value to the precision of the argument always yields the best results for a particular level of precision. At the moment I don't plan to redo anything on my older calculators unless something is majorly wrong. It is good, though, to recognize this type of mistake, so I can look out for it in future projects that use CORDIC.

To generate the angle table I used the bc command line calculator. It has a very simple programming language that can run from a script. Interestingly, bc does not seem to do any type of rounding, so the tables I have are unrounded. Maybe I will experiment with a rounded table to see how much of an effect it has. For some of the other tables (including a couple I ended up not needing) I used Python, which has been really helpful. The table itself contains angles descending in size, so I only sto

The first step to implementing the trig functions was replicating the CORDIC calculations in a spreadsheet. This works for about 30 iterations until the numbers become too small for the spreadsheet to handle. Rather than doing bit shift operations like I would on the microcontroller, I multiplied by inverse powers of two, which accomplishes the same goal. When I had that working, I tried generating those multipliers on the MSP430 and multiplying, rather than bit shifts. I thought this might be fairly fast since the multiplier could be left in binary, which makes it quicker to use with the Russian Peasant algorithm, and the amount of zeroes in the multiplier would mean relatively few multiplies altogether. Consider this shortened table of successive inverse powers of two:

Friday, December 14, 2018

Tiny Calculator: Four Functions

The last few weeks I have gone down a little bit of a rabbit hole writing BCD math routines for my Tiny Calculator project. The goal has been to compare different calculation strategies to find which ones are fast but still small enough to fit in the 16k of flash the MSP430 I am using has. Like the addition routines I wrote about in my last post, all of these routines were a lot smaller than I expected. There were also some other surprises about what worked best.

Subtraction
To start with, I based this routine on the setup code for the addition routine, which saves some space. Originally my plan was to make everything modular so that I could substitute pieces of code in depending on what size and speed trade off I wanted, which is why I have separate body and ending options in the table for my addition routine. However, to save space and build one routine off the parts of another, it makes more sense to decide which version to use at the beginning. The only big difference between the two subtraction routines I tried was rolling and unrolling the main loop, with the unrolled version being about 75% faster. It's also about 25% larger but that is only around 50 bytes.

Multiplication
In addition to the straightforward method of repeated additions, I tried the quarter squares and Russian peasant algorithms I learned about doing 6502 BCD multiplication. It was a surprise that using quarter squares was over 10% slower on the MSP430 than the Russian peasant algorithm, whereas it was over 4 times faster on the 6502! Part of the difference may be the speed up I get on the MSP430 by converting each digit of one argument to binary (so it can be halved) only once and saving it, rather than converting it every time like on the 6502. Another major difference is how much slower accessing tables on the MSP430 is compared to register operations. It's interesting how the timing of the architecture can decide which algorithm works the fastest. It's also great that the Russian peasant algorithm doesn't need any tables and is therefore much smaller.