Monday, August 19, 2019

New Project: 6502 Assembly Optimizer


This is another project on the list of open projects from a few months ago. The goal is to improve 6502 assembly code used in the firmware of a graphing calculator project. The first step is to manage how local memory is used, since the current options are not that great. One option is stack-based addressing in zero page, which monopolizes the X register. This is one cycle slower than hard coding a static address in zero page and mostly prevents the use of the X register for anything else useful, leading to less efficient code. The other option is assigning each local variable to its own address in zero page, which is very wasteful. With only 256 bytes, you run out of room pretty fast. This project assigns zero page automatically by determining which functions call each other (call graph pictured above) and assigning memory so that functions which never overlap share the same memory. In this test code, 38 bytes of local variables fit into only 19 bytes of zero page. Here is a graph where each row represents one byte of zero page and each block one byte of local memory in a function:


Depending on the shape of the call graph, it might be possible to fit a large number of variables in a small amount of zero page.

This system started with macros called FUNC and LOCAL that assign zero page usage without the help of the optimizer. FUNC starts a new function, laying down a label and defining some symbols but not laying down any instructions. LOCAL defines a local variable that stays in scope until the next non-local label. With these macros, each local variable can be assigned a unique address in memory. If a flag in the source to enable the optimizer is present, however, LOCAL instead outputs information to allow the optimizer so assign the variable address. The source is assembled with Macroassembler AS since it's able to output partially evaluated assembly after macro expansion. This partially evaluated file is fed to the assembly optimizer script, which uses the information from the macros to generate the call graph and assign addresses to variables.

The FUNC macro accepts optional keywords after the function name. For example, "FUNC main, begin" tells the optimizer to begin tracing the program flow from function main. Other words can be added in the future like "recursive" and "interrupt" to let the optimizer handle memory for those functions differently. Functionality like this should allow the use of indirect jumps and re-entrant code. My plan is to introduce a very small set of features like these to keep the project more like a macro system than a new programming language. Use of FUNC and LOCAL is optional with the optimizer, and they can be made to use only part of zero page, so I can use the rest of zero page how I'm used to.

The call graph and variable allocation output in the picture above are created in HTML. It allows for much better looking debug information than outputting to the console or a text file. The first version used Python turtles to draw the call graph, which was unbelievably slow taking several seconds to draw everything. The current version reads in a template HTML file and inserts an array of values to draw in the JavaScript portion of the HTML file. It seems there is no way for a local JavaScript file to refresh itself when the file on disk is rewritten, so the window refreshes every time it regains focus, which is a good compromise. One interesting thing is that the variable assignment chart changes each time the script is run. This must be due to some of the data being stored in a dictionary, which does not have to return values in the same order every time.

This project has been really helpful in practicing Python and learning new features. Using dictionaries, lists, and sets is so much easier than how I would have done it in C or another language. It's really clear to me now what an advantage it is to use Python for stuff like this. Parsing assembly, though, is a little more difficult than I realized when I started. It turns out that running the source through the assembler to have it expand macros is not much of a head start, since there is a still a lot of work left to do after that. Evaluating expressions which can contain symbols or any type is not that easy. One problem arises when an instruction contains a function with a label as an argument. (Function here refers to functions built into the assembler like hi and lo that can be evaluated at compile time.) For example, LDA f(LABEL) would perform function f on label LABEL and use that as the address for the LDA instruction. The return value for f can be 8 or 16 bits wide, which determines whether the zero page or absolute version of LDA is used. If LABEL comes after the LDA instruction, its address depends on the return of the function f which depends on the address of LABEL which depends on the return of the function f. Since there is no way to tell the size of what f will return, the optimizer has to guess and supply a test value to LABEL. If the optimizer guesses that f will return an 8 bit wide value and adjusts LABEL accordingly then feeds that value to f, it is possible that f will return a 16 bit wide result. The optimizer then has to adjust the value of LABEL since its initial guess was wrong. In theory, adjusting LABEL could then change the return value of f to an 8 bit result, which means LABEL has to be readjusted again, entering an endless loop of resizing and recalculating. In practice, this type of problem might not arise very often, but taking it into account complicates the evaluation of expressions.

After the issue with shifting symbol sizes is fixed, the next step is the really fun part of peephole optimizing.

No comments:

Post a Comment