In preparation for the 6502-based calculator I am planning on building, I started thinking about the best way to do BCD multiplication. One of the reasons I was interested in working with the 6502 is its built in BCD mode. This works great for add and subtract but the chip does not have any sort of multiply or divide, BCD or otherwise. (Some of the 6800 family I was researching before do have hardware multiply, as does the 8051.) There are several routines that can make BCD multiplication faster, so I tried some of them on the 6502 to see which one works best.

To test these I used the Kowalski 6502 Simulator, which is neat because you can assemble in the simulator and then single step or run the program and see which line is being executed. Another convenient thing is that an input/output window is mapped into memory starting at at $E000. I also like the help window which automatically gives you information on op codes when you type them. On the other hand, the included macro system is extremely primitive and you can't do much with it, especially compared to the excellent macro system in CA65. It has a lot of other shortcomings and is fairly buggy, but I don't want to complain since it is free software. After I got frustrated at the lack of basic macro functionality, I tried assembling with CA65 and loading the binary into the simulator. The diassembler seems to work well but of course there is no way for the simulator to know label or function names. For these multiplication tests I decided to just make do with the simulator since I don't need much macro functionality. In the future, though, I will go back to using the 6502 trainer I made before.

To run the tests I made a loop that multiplies all combinations of BCD numbers $00 x $00 to $99 x $99. The Kowalski simulator has a function to count cycles, so I used that to find the time of an empty loop. For each test I recorded the cycles it took, subtracted the loop overhead time, and divided by 10,000 to get average cycles per calculation. After I was done, I went back and used look up tables for operations like shifting by 4 and halving numbers to speed things up a little. I also tried checking for 0 and 1 as arguments to skip the rest of the calculation, but this was not faster. In the table below, the table size column takes into account all the space the tables take up, including unused padding bytes inside the table. Here are the results in the order I did the tests:

Descriptions of methods below:

To test these I used the Kowalski 6502 Simulator, which is neat because you can assemble in the simulator and then single step or run the program and see which line is being executed. Another convenient thing is that an input/output window is mapped into memory starting at at $E000. I also like the help window which automatically gives you information on op codes when you type them. On the other hand, the included macro system is extremely primitive and you can't do much with it, especially compared to the excellent macro system in CA65. It has a lot of other shortcomings and is fairly buggy, but I don't want to complain since it is free software. After I got frustrated at the lack of basic macro functionality, I tried assembling with CA65 and loading the binary into the simulator. The diassembler seems to work well but of course there is no way for the simulator to know label or function names. For these multiplication tests I decided to just make do with the simulator since I don't need much macro functionality. In the future, though, I will go back to using the 6502 trainer I made before.

To run the tests I made a loop that multiplies all combinations of BCD numbers $00 x $00 to $99 x $99. The Kowalski simulator has a function to count cycles, so I used that to find the time of an empty loop. For each test I recorded the cycles it took, subtracted the loop overhead time, and divided by 10,000 to get average cycles per calculation. After I was done, I went back and used look up tables for operations like shifting by 4 and halving numbers to speed things up a little. I also tried checking for 0 and 1 as arguments to skip the rest of the calculation, but this was not faster. In the table below, the table size column takes into account all the space the tables take up, including unused padding bytes inside the table. Here are the results in the order I did the tests:

Descriptions of methods below: