Wednesday, October 30, 2013

Microcontroller Showdown, Part 2

Here are the results for the tests I did with the MSP430 and the LPC1114. Lately I have been busy with work so I haven't had a chance to do anything with the other three chips. I do have tool chains set up for them and I plan to run the same tests on them soon too.

To time the tests I used an MSP430 LaunchPad which waited for a pulse from the tested chips to begin and end timing. The timer was run from the VLO which is accurate to ±3% so the results are not exact but they give a rough idea of the difference in speed. The tests were done inside two nested loops which iterate altogether 2,400,000 times. Each operation is done once per iteration except adding and GPIO operations which are done 10 times since they should be much faster. At first I did each test 5 times and took the average but then I did them only once since the results were usually very close and some of the later tests took quite a while. To begin with I timed how long the empty loop took then I subtracted that time from the results to get the actual time the operations took. For both chips I used gcc to compile with both -Os and -O3, although there wasn't really a difference except for the BCD routine tests.

As you can see from the table, the LPC1114 was much faster than the MSP430, even running from the bottlenecked flash. The results would be even faster if the tests had been copied to RAM and run from there. I was also surprised to realize that the LPC1114 has a barrel shifter which is very handy. Another neat feature is GPIO masking. Each mask is memory mapped so individuals pins can be toggled with a single write to memory. This allows GPIO operations to be much faster than the traditional load, modify, and store routine.

These tests by themselves are a bit arbitrary and the results are not especially useful since they only measure calculation speed. Other operations such as branching are just as important and would affect the results in a real situation. Memory access is also not tested here since all of the data for tests probably fits into the registers. For this reason I also tested some of the BCD routines I am using for my calculator project since they are the kind of functions I would like to use these chips for. The LPC1114 turned out to be much faster for this too. For the BCD multiplication test I tried a version using the * operator and a version that repeatedly does adds instead. On the MSP430 the version not using * turned out to be a little faster. One strange thing about the BCD adding routines compiled for the MSP430 is that with -Os they are not only about half as big as when they are compiled with -O3 but they are also about 5% faster.

These tests are not a comprehensive benchmark. Many factors have to be taken into account to choose the right chip for a job. These tests only focus on a very narrow range of those factors.

Loop overhead time -O3 1.720.84
-Os 1.720.91
Add unsigned 8 bit -O3 9.8268.1064.643.8
-Os 9.8898.1694.843.93
Add unsigned 16 bit -O3 10.0328.3124.964.12
-Os 10.0718.3515.154.24
Add unsigned 32 bit -O3 29.59227.8724.233.39
-Os 29.81628.0964.383.47
Add signed 8 bit -O3 9.848.124.633.79
-Os 9.9098.1894.823.91
Add signed 16 bit -O3
-Os 10.0548.3345.154.24
Add signed 32 bit -O3 29.53627.8164.23.36
-Os 29.63727.9174.383.47
Multiply unsigned 8 bit -O3 19.6517.931.280.44
-Os 19.5817.861.410.5
Multiply unsigned 16 bit -O3 21.1419.421.280.44
-Os 21.0719.351.410.5
Multiply unsigned 32 bit -O3 40.3638.641.330.49
-Os 40.2338.511.420.51
Multiply signed 8 bit -O3 19.9818.261.390.55
-Os 19.9118.191.530.62
Multiply signed 16 bit -O3 20.6518.931.390.55
-Os 20.5818.861.530.62
Multiply signed 32 bit -O3 70.2468.521.230.39
-Os 69.9768.251.410.5
Divide unsigned 8 bit -O3 17.3415.628.097.25
-Os 17.2715.558.337.42
Divide unsigned 16 bit -O3 29.4427.721110.16
-Os 29.3227.611.1410.23
Divide unsigned 32 bit -O3 90.1288.414.5613.72
-Os 89.7588.0314.6513.74
Divide signed 8 bit -O3 33.2731.559.358.51
-Os 33.1331.419.388.47
Divide signed 16 bit -O3 33.723212.0511.21
-Os 33.4731.7512.2911.38
Divide signed 32 bit -O3 96.9495.2215.514.66
-Os 96.5494.8215.5814.67
Shift right unsigned 8 bit -O3 14.8613.141.830.99
-Os 14.7913.071.961.05
Shift right unsigned 16 bit -O3 29.7928.071.830.99
-Os 29.6627.941.971.06
Shift right unsigned 32 bit -O3 69.6267.91.720.88
-Os 69.3167.591.80.89
Shift right signed 8 bit -O3 14.8513.132.161.32
-Os 14.813.082.291.38
Shift right signed 16 bit -O3 25.1523.432.161.32
-Os 25.0523.332.291.38
Shift right signed 32 bit -O3 59.6957.971.720.88
-Os 59.4257.71.80.89
Shift left unsigned 8 bit -O3 14.8613.141.830.99
-Os 14.813.081.961.05
Shift left unsigned 16 bit -O3 25.1623.441.830.99
-Os 25.0423.321.971.06
Shift left unsigned 32 bit -O3 59.6757.981.720.88
-Os 59.4157.961.80.89
Shift left signed 8 bit -O3 14.8613.142.161.32
-Os 14.813.082.291.38
Shift left signed 16 bit -O3 25.1723.452.161.32
-Os 25.0423.322.291.38
Shift left signed 32 bit -O3 59.6857.961.720.88
-Os 59.4157.691.80.89
GPIO -O3 18.08816.3687.036.19
-Os 18.05616.3367.266.35
GPIO Constant Generator -O3 14.75213.032
-Os 14.75813.038
GPIO Masked -O3 3.222.38
-Os 3.522.61
BCD Add 120+325 -O3 47.9346.2117.9316.21
-Os 50.0548.3320.8319.11
BCD Multiply 24 * 13
(Without multiplies)
-O3 393.4391.68131.49129.77
-Os 375.21373.49149.84148.12
BCD Multiply 24 * 13
(With multiplies)
-O3 405.64403.92131.88130.16
-Os 508.52506.8140.53138.81

No comments:

Post a Comment