Doing some tinkering with my retro BCPL system recently and while I’ve always though it was faster than BASIC on the same hardware, I never really worked out just how much faster it really was…
So as it was a retro system, I ran up some properly retro benchmarks (and a few new ones) just to test.
The TL;DR
The good news? Yes! It’s faster! The less good news – it’s not as fast as I feel it could be. It may also seem that some of the benchmarks “cheat”. See the bit below about floating point.
Benchmarking
So lets look at the benchmarks and talk about benchmarking in general.
Benchmarking is not something I’m unfamiliar with – I’ve been involved in the field of supercomputing in the past and there we had a whole suite of tests to do – as well as some people employed just to run (and tweak) those benchmarks, so having seen that tweaking in the past I’ve tried hard to keep the benchmarks fair over the different versions of BASIC I have used, as well as the compiled BCPL.
Remember too that these benchmarks are synthetic benchmarks – meaning that they may not represent the real-life programs we want to run. Would blisteringly fast floating point performance affect an editor or compiler for example?
The Benchmarks
The benchmarks I’ve used are not the ones aimed at “power” users, but a mix of the old ones that appeared in various magazines of the time which are suitable for those computers – that’s the late 1970s to the mid 1980s and a few newer ones which are still suitable for those old systems.
- The Rugg/Feldman benchmarks -1977: Wikipedia
- The Byte Sieve benchmark – 1981: Wikipedia
- Interface Ages Prime Cruncher benchmark – 1980 : Archive.Org
- Noels Retro Lab BASIC benchmark – 2020 (ish)
- My own Text/ASCII Mandelbrot. (2019-2023)
Hardware
The hardware I’m using is my own designed Ruby 816 system – it has a WDC 65C816 CPU, 512KB of RAM and runs at 16Mhz. The ‘816 is the so-called 16-bit 6502. For the BASIC benchmark runs the CPU is running in 65C02 emulation mode so it’s essentially the same old 6502 that was in the Commodore PET, Apple II, BBC Micro and so on. The 65C02 was used in the BBC Master and BBC Basic version 4 takes advantage of the extra instructions this CPU offers.
The BCPL tests run the CPU in native 16-bit mode. The BCPL environment is actually a 32-bit one, so to make that work there is a bytecode interpreter at its heart. This is written in hand-coded 816 assembler.
BASICs and BCPL
The BASICs I’ve tested are:
- BBC Basic – Versions 1,2,3 and 4. Note that version 4 requires the 65C02 which is what the 65816 actually emulates at power-on time.
- EhBASIC – This is a BASIC written by the late Lee Davidson which is based on a disassembly of a Microsoft 6502 BASIC. It’s a version that has been tweaked to use some 65C02 features.
- CMB2 BASIC – The same BASIC that runs on the Commodore PET, VIC-20 and C64 computers. It was assembled from source and tweaked to run on the Ruby hardware system.
The BCPL is Martin Richards 32-bit 2014 compiler running on the same hardware with the CPU running in native 16-bit mode. The compiler outputs a bytecode (called CINTCODE) for a 32-bit virtual machine which is subsequently interpreted by hand written 65816 assembly code.
The tests were edited and run directly on the Ruby816 system.
And then there’s floating point
After a few runs and tests it became apparent that BCPL as going to win but not for the reason that might initially be obvious… BASIC uses floating point numbers more or less by default. BCPL is more or less integer by default. Some BASICs can use integers but the older, Microsoft variants do this very inefficiently. BBC Basic is the exception here but even then, trying to make sure everything is worked out as an integer can be tricky.
So… What I’ve done is tried to do 2 versions of the BCPL benchmarks. It’s not perfect as things like FOR loops in BCPL are implicitly integers but most of the Rugg/Fielding benchmarks are essentially WHILE loops, so that works out OK.
One just didn’t work well though – the Byte Sieve simply wasn’t appropriate for conversion to floating point, so it’s not included (An integer version is). Noels Retro Lab benchmark was converted using a WHILE loop in the same way the Rugg/Fielding ones were. The floating point Mandelbrot is a line for line translation of the BASIC source (with GOTOs and all) but I added in a scaled integer version – it’s not quite identical, but close enough to be representative.
And in a somewhat surprising twist of fate, there are some occasions where BBC Basic 4 was FASTER than BCPL at floating point. Why? Well… Rather than hand-code IEEE754 floating point in 65C816 assembler, I arranged for BCPL to use the on-board ATmega as a floating point co-processor, so it should be fast? well it is, but due to the way the 2 processors talk to each other the latency is somewhat high. The net result is that doing a lot of trivial calculations (like add/subtract) takes longer in BCPL than in BBC Basic4. Is it worth doing anything about it? Maybe, but not right now…
Timings
Rather than try to use a stopwatch, I have used the timing facilities provided by the RubyOS.
When running in 6502 emulation mode, RubyOS provides a 100Hz ticker to make the BBC Basics the built-in TIME function works as it ought. The other BASICs use their PEEK and POKE commands to access the memory locations that the ticker uses. There is the potential for a mis-read, but I feel this would not happen often enough to be an issue and if it did then the results would be obvious, so a simple re-run would give the true result.
The BCPL system running in native CPU mode has a 1000Hz ticker and this is used when running the BCPL benchmarks.
I ran each test a minimum of three times and picked the time that was the same over 2 runs. So e.g. if I saw 1.24, 1.24 and 1.23 then I picked 1.24. When I got 3 results that were very close but different, I ran it a few times more until I got 2 the same. Might be marginally scientific, but it’s good enough.
The results:
BCPL is generally faster, but it’s easier to gain speed by use of integer loop counters in FOR loops and so on, however even when using floating point there are gains to be had as the compiler makes a good job of turning the text program into something much more compact and easier to execute – even if it’s is still being run inside an interpreted virtual machine.
BBC Basic improved from version 1 to version 4 – in particular the floating point code was extensively optimised to use the 65C02 instruction set in Basic4.
EhBASIC is a fraction slower than CBM Basic 2 but both come from the same Microsoft background. BBC Basic is much faster in all cases, but it is a larger BASIC and was written some years after Microsoft BASIC.
Source Code?
If you want it, it’s here
Conclusion
- The world doesn’t need another retro computer benchmark – mostly because we’re not running into shops in the early 80’s and seeing who has the fastest BASIC. The BBC Micro won that war for something vaguely affordable (back then, anyway).
- Compiled languages are faster.
- BBC Basic4 is almost twice as fast as Microsoft Basics.
- BCPL can be up to 8 times faster when you cheat and re-write the benchmark to use integers, but even with floating point it’s still mostly 2 to 3 times faster.
- My Ruby816 board has performed remarkably well. allowing me to edit and test all the code in these benchmarks. The BCPL compile runs directly on it and while not the fastest thing, wasn’t that slow to make it unusable. Compiling the integer point Mandelbrot takes about 4.5 seconds.