The last six months or so, I have been working on a project to compare the performance of C, assembly, and Forth on the 65C02. This week I finally finished everything and published the results on my website. A few days later, the project was featured on Hackaday!
To compare the performance between the languages, I made a simple game called Robot Game in Python in order to have something to port. The C version is compiled with CC65 and the Forth version with Tali Forth 2. There are two assembly versions. One is plain 65C02 assembly using the X register as a data stack and the other uses the Assembly Optimizer I have been working on to get slightly better performance by putting locals in zero page at fixed addresses. All four version of the game work in my JavaScript-based emulator and can be played on my website. The source code for all the versions of the game, the emulator, and optimizer are now on my GitHub page.
Porting the four games was a challenge and also a lot of fun! I learned a lot about how C and especially Forth work internally. After I had the games ported, I looked at 28 different tasks the game does internally and compared performance across languages. Here are the results where traditional assembly is normalized at 100%:
The optimizer adds up to 10% performance and 25% in one very short routine. C was about 2-3 times slower than regular assembly in the tests and Forth was usually around 10-20 times slower. One of the really interesting things was figuring out why Forth is so much slower despite Forth fans claiming it's much better than C.