After rewriting the multiplication function I started on the division routine. There are several ways to do this. I used the subtract and shift method. It takes up a little more than 150 lines of code which makes it about twice as big as the multiplication routine. Unlike with the other operations, with division a maximum number of decimal places has to be specified. It seems unlikely that I would ever need more than 16 places, which makes me think a fixed place format might have worked too. This is still possible if I need to shrink the program size.
The next step is to implement logarithms and powers of e. This will be useful on its own but will also let me calculate powers and roots. One way I found to calculate logarithms is with a Taylor series like this:
This seems ideal since it only requires one loop and several multiply and divide operations which shouldn't take up much room in Flash. Even with the the MSP430 running at 16MHz, it rook over 3 seconds to calculate ln(7). In an effort to speed things up I examined the code for multiplying and dividing but didn't find much to improve. One thing I did change was the addition routine. I was adding a 0 to the beginning of every result so that there would be room for a 1 if I needed to carry. However, in most cases the 0 wasn't needed and when doing repeated additions during multiplication, the 0s accumulated at the beginning of the result wasting memory. Removing these 0s wasted time so I changed the addition code to only add the 0 when necessary. This still didn't make it much faster, though.
Next, I ran the code on my PC and noticed that it needed to access the external memory over 2 million times to calculate the logarithm to 16 decimal places. Clearly this was the bottleneck. The next idea I tried was the CORDIC method which is ideal for small microcontrollers like the MSP430. This uses a precomputed table and only needs adds and bit shifts, two things that microcontrollers can do much more efficiently than multiplying and dividing. Using this method I can calculate e^x and ln(x) for a wide range of values in less than one second because it only needs about 50,000 accesses to the external memory.
After I finished the ln function but before I started on the e^x function, I noticed that the program was already 6.5k. This is a little concerning since 16k is the max size for the program and there are still many more features to add. One thing that can reduce size is the static keyword. Making all my functions static shrunk the program size