Friday, December 22, 2017

Pocket Calculator: Stack

For this calculator, I am working on a more sophisticated stack system than I had for my RPN Scientific Calculator. Instead of stack items being fixed at the maximum length, which wastes a lot of memory, item length varies. With only 4k to work with on the LPC1114, it will be very important to conserve memory wherever possible. As I have it now, each item has a header that indicates its position on the RPN stack, its name, and an ID number for variables in functions. RPN stack positions start at one so that variables marked zero will not appear on the RPN stack. This is useful for named variables, so that they can use the same stack system internally. The ID numbers will allow me to keep track of variables local to functions, so I can easily clear them when the function exits.

The system I have now writes the result of an operation directly to the top of the stack then marks the two consumed items as free memory. This has the advantage of allowing the result to be larger than the operands, which can be useful for things like string concatenation. The screenshot above shows how I was testing the system in C# with a simple calculator that can only add. Freeing items like this will make the stack grow fairly quickly and eventually require garbage collection. The advantage, however, is that calculations should be very fast until garbage collection is required. A slower method would be to scan the stack for a free section that is large enough for the result before adding to the stack, but this will eventually require garbage collection as well. My suspicion is that just adding to the top of the stack will be faster overall, even if garbage collection happens more often.

To keep track of the memory, I decided to implement my own garbage collection system, since malloc in C would lead to unrecoverable fragmentation. On IRC one day I made the mistake of asking about how to implement garbage collected stacks like these internally and ended up wasting over an hour arguing with someone who was convinced I was going about everything totally wrong. In the end, one of the suggestions was to have the calculator user push a 0 onto the stack manually before entering values to be operated on so that the system can overwrite the 0 with the result. This recommendation (and a few of the others given) were just laughably horrible and would seriously cripple the functionality of the calculator. Sometimes people on IRC can give good suggestions, but I also have to remind myself not to argue with people who don't really know what they are talking about but won't admit it.

One of the useful suggestions given by a different person was to use two stacks for calculations. The result goes on the second stack and is then copied back to the main stack overwriting the consumed operands. In theory this system will never need garbage collection, although it seems to me that it will take the same amount of time on average. It may be simpler since calculating free memory, for example, is comparing just one pointer, rather than crawling the stack adding up free chunks. This system assumes, however, that operations always work on the top of the stack, which is not the case when named variables share the same internal stack as the RPN stack. It might make more sense in that case to use the second stack only for named variables and do garbage collection on that stack, while keeping the first stack RPN only to prevent garbage collection.

Sunday, December 3, 2017

New project: Pocket Calculator

Recently I got interested in building a very small calculator I can carry around with me. The other calculator projects I worked on are neat, but I don't carry them around because they are too big and fragile. This time I want to make something really practical but also small enough to fit in my pocket. The smallest way I know how to do this with the chips I'm familiar with is using the same LPC1114 in DIP28 package I used for my Programmable RPN Calculator. There are a few changes I would like to make for this calculator, though. First, I want to be able to manipulate all kinds of data on the calculator's stack, such as strings, instead of just floating point BCD numbers. Second, I plan to redo the BCD format I used in the other calculators to store the least significant digits first, which should make them a little easier to deal with. Third, I plan to make all items on the stack variable length, rather than fixed length like on my other calculators. This will be necessary for this calculator since I plan to store the whole stack in the 4k RAM the microcontroller has, instead of on an external SPI RAM chip.

The main new feature I want this calculator to have is a BASIC interpreter. My experience with the first RPN calculator I worked on showed how much of the microcontroller's flash can be eaten up by interface code, so I really don't think I could fit all the calculator functions and interpreter in the LPC1114's 32k of flash. Instead, I plan to store a virtual machine on the chip that can interpret code stored on a large capacity flash or FRAM chip. Retrieving the code over SPI and interpreting it will be much slower than executing code natively on the chip, but I think it will be fast enough for the UI, BASIC editor, and maybe even the tokenizing part of the BASIC editor. On one hand I could switch to a chip with bigger flash, which I am not anxious to do, but I will need some kind of virtual machine anyway to interpret the code generated by the BASIC editor, so it is no loss to use the same virtual machine for the UI. The BASIC programs and editor data will be stored on an SPI SRAM chip, which should work fast enough to edit and tokenize programs, while all the stack and program variables will be stored on the microcontroller.

Tuesday, November 21, 2017

BCD Addition Speed Test: 6502 vs 8051

12345.6789 + 9999.99999 packed BCD on 8051
A few years ago I decided to revisit the microcontroller comparison I had done to see how much of a speed up I could get recoding my BCD addition routine in 8051 assembly. At the time I was surprised that it was 3-4 times faster than the C version. That original BCD routine, which I used for my RPN Scientific Calculator, uses unpacked BCD characters arranged with the most significant bytes first. For the calculator I want to build with a 6502, I wrote a new routine that stores the least significant bytes first in packed BCD. This takes advantage of the 6502's decimal mode to quickly add packed BCD bytes and avoids having to shift all the bytes when an extra byte has to be added for the final carry. It turns out that the 8051 has an instruction, DA, to correct the result of BCD additions from their binary result back into BCD, so it is easy to do packed BCD for it as well. To compare the speed of the two chips for use in a calculator, I translated the 6502 packed BCD routine to 8051 assembly and ran both versions in their respective simulators.

Wednesday, November 15, 2017

6502 BCD Multiplication

In preparation for the 6502-based calculator I am planning on building, I started thinking about the best way to do BCD multiplication. One of the reasons I was interested in working with the 6502 is its built in BCD mode. This works great for add and subtract but the chip does not have any sort of multiply or divide, BCD or otherwise. (Some of the 6800 family I was researching before do have hardware multiply, as does the 8051.) There are several routines that can make BCD multiplication faster, so I tried some of them on the 6502 to see which one works best.

To test these I used the Kowalski 6502 Simulator, which is neat because you can assemble in the simulator and then single step or run the program and see which line is being executed. Another convenient thing is that an input/output window is mapped into memory starting at at $E000. I also like the help window which automatically gives you information on op codes when you type them. On the other hand, the included macro system is extremely primitive and you can't do much with it, especially compared to the excellent macro system in CA65. It has a lot of other shortcomings and is fairly buggy, but I don't want to complain since it is free software. After I got frustrated at the lack of basic macro functionality, I tried assembling with CA65 and loading the binary into the simulator. The diassembler seems to work well but of course there is no way for the simulator to know label or function names. For these multiplication tests I decided to just make do with the simulator since I don't need much macro functionality. In the future, though, I will go back to using the 6502 trainer I made before.

To run the tests I made a loop that multiplies all combinations of BCD numbers $00 x $00 to $99 x $99. The Kowalski simulator has a function to count cycles, so I used that to find the time of an empty loop. For each test I recorded the cycles it took, subtracted the loop overhead time, and divided by 10,000 to get average cycles per calculation. After I was done, I went back and used look up tables for operations like shifting by 4 and halving numbers to speed things up a little. I also tried checking for 0 and 1 as arguments to skip the rest of the calculation, but this was not faster. In the table below, the table size column takes into account all the space the tables take up, including unused padding bytes inside the table. Here are the results in the order I did the tests:

MethodCycle averageCode sizeTable size
1Repeated additions1,596.23340
2Repeated additions, minimize loops1,236.66480
3Shift/add284.80670
44x8 lookup table101.006510k
54x8 lookup table, byte aligned97.006114.8k
6Russian peasant algorithm267.79660
7Quarter square table (half table)115.3385816
8Quarter square table (full table)118.00781,632
9Quarter square table (BCD to bin to BCD)86.00511,816
10Quarter square table (BCD to bin only)60.0034792
11Half squares113.60126200
12Simulated ROM lookup31.002046k

Descriptions of methods below:

Friday, November 3, 2017

6502 Address Decoding

For the 6502 graphing calculator I started working on two years ago I wanted to use an ATF1508 CPLD to do the address decoding. In another post, I wrote about how difficult it was to get this chip going. For one, the program I found to generate SVF files for it is 16-bit only. At the time I was running it in a virtual machine, but I don't have my Windows XP disk any more. If I keep using the homemade setup for it I had before, instead of buying a programmer (which is expensive), I think I might be able to run the program in a DOS emulator without Windows.

6502 address decoder with SRAM
Another way I was thinking about doing the decoding is with a 32k 12ns SRAM. For each address, the SRAM would act as a lookup table for the chip selects and other signals that the CPLD normaly does. This would not be as fast as the 7.5ns of the ATF1508, but it would still be very fast. One way would be to hold the 6502 in reset and cycle a counter through all 16-bits of the address space copying information from an EEPROM to the SRAM. After a certain amount of time, a timer would fire to disable that EEPROM and the counter and enable the SRAM, EEPROM, and 6502 of the main system. I'm not sure I got all of the signals exactly right but I made this schematic to illustrate approximately what I was thinking. The disadvantages of this setup is that the extra SRAM consumes 160ma, which would be a lot for the battery-based calculator I want to build, and the extra space the counters and second EEPROM would take.



Another way I thought about was disabling the main SRAM on startup but directing all reads to the EEPROM and all writes to the address decoding SRAM. This would let me write the whole lookup table into that SRAM with the 6502 . A transparent buffer would separate that SRAM from the data bus and prevent it from being written to further after a timer fires. If I got everything right, this would require more glue logic, but it would get rid of the counters and extra EEPROM of the last design.

A third way I thought of is to skip writing the memory decoder SRAM and use an EEPROM. Unfortunately, that would need 45ns or so to do the decoding, which is a lot slower than a CPLD. One alternative I found is 8k 25ns NVRAM, which stores its value permanently in an EEPROM but loads its content into SRAM when it turns on. This would give me 8 byte resolution, so I would probably lose less than 200 bytes of address space for peripherals. Also, if I drive the SPI clock with the NVRAM then I could set it low for one 8 byte stretch and fill that stretch with NOP instructions. Then to pull the clock low for a given number of cycles, I would just have to jump to the corresponding NOP in that stretch of memory.

On the 6502 forum I found a really great page on 6502 bus timing. If I understand the timing correctly, the 6502 needs up to 30ns to output its address and then the address decoding logic can go to work (either 12 or 25ns in this case). After that, the 74HC670 I have in the diagram will need 16ns up to nearly 40ns in the worst case to drive the high bits of the 512k SRAM. If I can get away with only two windows into the SRAM then it would only be about 6ns max if I replace the 74HC670 with a buffer and 74AS244. In order to avoid accidentally writing to wrong addresses, those delays will have to propagate before the clock goes high and the glue logic brings the WE\ of the SRAM low. This would allow me to run between 5 and 10MHz, depending on the combination. This is a lot less than the 14 MHz maximum the W65C02 I am using is rated for but it might work until I get the kinks out of working with the ATF1508.

Wednesday, November 1, 2017

6800 Family Tree

After about two years away from electronics, I have some time for my projects now that I am almost finished with my degree. Lately, I've been thinking about a lot of things and I want to start some new calculator projects and pick up where I left off on some other projects. One thing I have been thinking about is working on a simpler 6502 based calculator than I was working on before to get more experience working with the chip. I think the 6502 graphing calculator I was working on did not turn out very well because I rushed through everything too fast trying to get ready for Makevention in 2015.

Along those lines I recently got curious about other chips I could use to make a calculator. The last time I was curious about that I compared some of the microcontrollers I was interested in to see how fast they compute. One of my plans now is to do some other comparisons, which I think will be more useful. This time I want to compare processors like the 6502 as well, so I looked at other chips that would be fun to use. One group that I have been interested in for a while is the 6800 family like the 6809 and 6309. I could not find a family tree on the internet, so I started making one myself. When I was almost done, I found this tree in a document about the 6804: