Today I was reading more about 8051 microcontrollers, and I realized I could make an improvement to some of the code I posted in 2017 for my BCD Addition Speed Test comparing the 6502 and 8051. As I wrote there, the 8051 has at least one 16 bit register called DPTR, which is the only way to get data to and from external memory. Unlike the 6502, there is no indexing or indirection when accessing external memory, so all addressing has to be done manually. Here is the macro I had in my other post to compare the two types of addressing, with an extra column showing cycle count:
The first improvement is to get rid of the clr C on line 002. I'm not sure how I missed something that simple! When I wrote the post, I had mostly worked with the 6502, which needs C to be explicitly cleared. Maybe writing MSP430 assembly, which also has an add without carry instruction, helped me realize that. The second improvement is eliminating the push on line 11 that saves the accumulator. One shortcut is to save that value to DP0H then restore it when the new DP0H is written using the xch instruction. This eliminates the pop on line 13 also. The new version looks like this:
The macro is still 7 cycles but it saves the 4 cycles for push/pop or the 2 cycles to save A in a register instead. It also looks much cleaner and will hopefully lead to less headaches.
In addition to movx @DPTR, A and movx A, @DPTR there are also movx @Rn, A and movx A, @Rn instructions that I had not paid much attention to, since using them constrains where objects like BCD numbers can be stored. At the very least, those objects would have to fit into 256 byte pages to use those instructions, and realistically the objects might have to start at the beginning of the page and possibly waste some memory. On the other hand, I have been doing a lot more thinking about how to store those objects to build a graphing calculator. Splitting memory into regular chunks (regardless of the size the object it occupies actually requires) is one strategy to prevent memory fragmentation, so using the faster movx @Rn, A and movx A, @Rn instructions might be a good fit. The R registers are also much more flexible for indexing since it is faster to adjust the address when you don't have to account for crossing page boundaries like the macro does.
Line | 6502 | Cycles | 8051 | Cycles |
---|---|---|---|---|
001 002 003 004 005 006 007 008 009 010 011 012 013 014 |
... LDY #Offset STA (Address), Y |
2 6 |
IndexDPTR0 MACRO DPTR_copy, Index clr C mov A, DPTR_copy add A, Index mov DP0L, A mov A, LOW(DPTR_copy)+1 addc A, #0 mov DP0H, A ENDM ... push A IndexDPTR0 Address, #Offset pop A movx @DPTR, A |
1
1
1
1
1
1
1
2
2
2
|
The first improvement is to get rid of the clr C on line 002. I'm not sure how I missed something that simple! When I wrote the post, I had mostly worked with the 6502, which needs C to be explicitly cleared. Maybe writing MSP430 assembly, which also has an add without carry instruction, helped me realize that. The second improvement is eliminating the push on line 11 that saves the accumulator. One shortcut is to save that value to DP0H then restore it when the new DP0H is written using the xch instruction. This eliminates the pop on line 13 also. The new version looks like this:
Line | 6502 | Cycles | 8051 | Cycles |
---|---|---|---|---|
001 002 003 004 005 006 007 008 009 010 011 012 |
... LDY #Offset STA (Address), Y |
2
6
| IndexDPTR0 MACRO DPTR_copy, Index mov DP0H, A mov A, DPTR_copy add A, Index mov DP0L, A mov A, LOW(DPTR_copy)+1 addc A, #0 xch A, DP0H ENDM ... IndexDPTR0 Address, #Offset movx @DPTR, A |
1
1
1
1
1
1
1
2
|
The macro is still 7 cycles but it saves the 4 cycles for push/pop or the 2 cycles to save A in a register instead. It also looks much cleaner and will hopefully lead to less headaches.
In addition to movx @DPTR, A and movx A, @DPTR there are also movx @Rn, A and movx A, @Rn instructions that I had not paid much attention to, since using them constrains where objects like BCD numbers can be stored. At the very least, those objects would have to fit into 256 byte pages to use those instructions, and realistically the objects might have to start at the beginning of the page and possibly waste some memory. On the other hand, I have been doing a lot more thinking about how to store those objects to build a graphing calculator. Splitting memory into regular chunks (regardless of the size the object it occupies actually requires) is one strategy to prevent memory fragmentation, so using the faster movx @Rn, A and movx A, @Rn instructions might be a good fit. The R registers are also much more flexible for indexing since it is faster to adjust the address when you don't have to account for crossing page boundaries like the macro does.