Sunday, July 10, 2022

6502 Interactive Assembler

One of my 2022 Project Goals is a PIC32-based robot which emulates a 6502. While working on the emulation, I kept having to look up details about address modes and flags, so I made a Python program for Linux called i65 to look up information like the addressing mode, size, cycle count, and flags for 6502 instructions (GitHub link). It can take an instruction name or addressing mode as an argument and displays information for any instructions that match.

If the program is started without any arguments, it waits for input and shows all instructions that match what you type until it's narrowed down far enough to show the full information. The Kowalski 6502 simulator has this, and I wanted something similar for Linux.

After that program was done, I decided to use it as a base for a combined assembler and simulator that also works in a Linux terminal. The Kowalski simulator is usually open in my taskbar on Windows for when I need to try out little snippets, though I have a few gripes about using it. I always forget to put a .ORG directive and have to reassemble. The register window only shows info for the last executed instruction, although it would be useful to see it for all of them if the program is short. It would also be nice to be able to put instructions in the first column and have indented labels not in the first column which the simulator doesn't allow. Another thing is that the parser chokes on a few simple things like LDA (5)+2. It seems to require any argument that begins with a parenthesis to be an indirect addressing mode. On the other hand, the guys maintaining Kowalski have added some improvements and missing instructions which I appreciate.

The assembler I came up with (GitHub link) reassembles and simulates as much of the source as it can while you type, so you don't have to go through the whole cycle of assembling, fixing errors, re-assembling, and simulating. It somewhat resembles Compiler Explorer in that regard. There is no user input or output, and it doesn't produce binaries, so it won't replace an assembler and simulator like Kowalski, but for testing snippets for my emulation projects like the PIC32 robot, it works a lot better. The assembler has syntax highlighting including marking unknown symbols in yellow and syntax errors in red. The simulator keeps track of uninitialized memory and marks anything loaded or calculated from it with "?".

The address mode recognition and syntax checker for the assembler were especially interesting to work on. Rather than try to write out a state machine with a huge number of if statements, I mapped everything out in a spreadsheet. Once that was done, it was simple to have the spreadsheet format the data into a Python dictionary that can be directly pasted into the Python script. The Python code to advance the state machine is very compact and basically just does a lookup into the data structure. For example, starting with an instruction like LDA ((3)+2,X), the state machine would take it's current state ("none" since nothing has been evaluated yet) and look up the next state based on the first input symbol which is "(". In the table below, you can see that column A holds the current state (A2 for "none") and row 1 holds the input state (D1 for "("). The lookup specifies the next state as D2 which is "(".

A lookup yielding "E" means the combination is invalid, as in E2 where starting an argument with "," is always invalid. On the next iteration, "(" is the input state (A5). One special rule is that any parentheses after the top level are treated as part of an expression since they don't specify addressing mode. Any part of an expression including numbers and operators is coded as "*" in the column, so the next state is given by looking up A5 and B1 which yields "(*" from B5. Proceeding like this, the states are:

LDA ((3)+2,X)
1.  (          A2,D1  = D2:  "none" + "(" = "("
2.   (         A5,B1  = B5:  "("    + "*" = "(*"
3.    3        A8,B1  = B8:  "(*"   + "*" = "(*"
4.     )       A8,B1  = B8:  "(*"   + "*" = "(*"
5.      +      A8,B1  = B8:  "(*"   + "*" = "(*"
6.       2     A8,B1  = B8:  "(*"   + "*" = "(*"
7.        ,    A8,E1  = A12: "(*"   + "," = "(*,"
8.         X   A12,G1 = G12: "(*,"  + "X" = "(*,X"
9.          )  A14,F1 = F14: "(*,X" + ")" = "(*,X)"

The result after all symbols are parsed is "(*,X)" which can be matched to the addressing mode for generating the machine code. This method lets me handle cases like LDA (3)+2 or LDA (3)+(2) and determine that they are absolute addressing mode like LDA 5 rather than indirect addressing like LDA (5). A similar state machine handles syntax checking so I know anything passed to the addressing mode state machine is correct. This method also lets me tell the difference between * as multiplication and * as current address (ie JMP *), as well as - as subtraction and - as a minus sign.

After the program was finished, I worked on using Transcrypt to convert the Python source into JavaScript so it can run in a browser. The first step was replacing the calls to the curses library on Linux that handle drawing text with output to a browser window. It only took about 100 lines of JavaScript to make a little module that replicates a terminal window with colored text. Next, I split the program up into modules so the input and output routines specific to the Linux version and JavaScript version could be abstracted away while the rest of the program would be the same between the versions. This led to a lot of annoying bugs due to the way Python handles modules. Clearing a list imported as a global variable from another module by setting it to [ ] doesn't actually clear the list but instead makes a new list in the importing module that the original module can't see. This is really annoying and almost certainly not what the programmer intends. One solution is to refer to the global variable by prefixing it with its module name, but that adds a lot of unnecessary verbosity in a simple program with a tiny namespace where the code is only modularized for the sake of organization. The solution I went with was to call the clear method of the list instead of assigning a new list to the variable with [ ].

After splitting the program up into modules, I tried converting it to JavaScript with Transcrypt, but the result was a nearly empty JavaScript file. Transcrypt doesn't seem to be able to handle a project with multiple files, so I made a script to paste all of the modules back into a single file. The result of this was broken in a bunch of different ways. It would load the assembler but the display was corrupted. The addresses listed for the assembled bytes also didn't look right since it couldn't handle the simple expression ("0000"+hex(num)[2:])[-4:].upper() correctly until I removed the [-4:] and split the expression up into two lines. Before that, I had to reimplement the built-in hex function which was missing. The next problem was the browser's debug console complaining that Transcrypt couldn't find the method to clear a Python list. Replacing the references to list.clear() with list[:]=[] did not fix the problem, so I gave up on using Transcrypt altogether.

Next, I tried using Brython which converts the Python to JavaScript every time the website loads rather than doing the conversion once and outputting a JavaScript file to use on the website. One of the Brython files is about 4MB, so between loading that and doing the conversion, the page can take 10 seconds or so the first time it loads. On the other hand, interfacing the Python code with the webpage was extremely easy and everything works reliably. I think it's good enough for showing the project to the few people who are interested in 6502 assembly but can't run the original Python version. The JavaScript version is on my website here.

2 comments:

  1. Hi Joey,
    Interesting concept, I hope to peek in 6502 assembly with this tool finally.
    Thank you for sharing all other projects as well as principals and experiments with detail description of your's results (quite surprised about FORTH low performance for example, calculations methods, RPN calculators) and your incredibly fast learning progress).
    Good luck with 2022 goals!
    KR
    Pavel

    ReplyDelete
  2. Hi Pavel,
    Thank you very much for your kind words and thank you for reading my posts. I hope this project will be useful to you and you make good progress with 6502 assembly.

    Take care,
    Joey

    ReplyDelete