codersnotes

The Death Of The Von Neumann Architecture June 10th, 2015

Lots of exciting new things were announced at the latest Apple WWDC this week, but hidden amongst the big announcements was an extension of something that's been bothering me for a while.

All programs developer for the Apple Watch are required to exist only in bytecode form.

I can tell this won't bother a lot of people. They may even like it. Well that's their business, but let me try and explain why I think this is heading down a bad path.

So there are two approaches to storing computer programs. (Well actually there's a few more, but go do your own damn research). The first is called the Harvard architecture. In this there are two areas, one for storing the actual program code, and one for storing the data the program operates on.

The other is the Von Neumann architecture. In this there there is only one area, which is just treated as a big table of bytes. You can either interpret those bytes as CPU instructions, or use them as data. Or you can mix and match as you see fit. The computer doesn't care, it's all just numbers.

Almost every single computer you've ever touched, seen, or heard about, uses the Von Neumann architecture. From the C64 through the SNES to the PC you may even be reading this on. The most common use for the Harvard architecture nowadays is in small microcontrollers - the thing that controls your microwave oven perhaps.

One of the key principles of the Von Neumann architecture is that because a program is just made of bytes, a program can change itself (known as self-modifying code). That property by itself it turns out, is not especially useful. Self-modifying code is generally regarded as confusing to work with, and can trip the CPU up occasionally. No-one really uses it any more, so we'll move on past it here.

But what's more useful, and arguably essential, is that a program can build new programs just by placing the right bytes in the right place. So a program like gcc can spit out a series of bytes which then in turn become a new program.

In 2005 when I first worked on games for the Xbox 360, I first noticed something that started to worry me. Now video game consoles care a lot about who gets to run programs on their system, and so to prevent unauthorized software, Microsoft made a decision to make the program code read-only. Once the program code leaves the PC it was originally compiled on, the operating system on the Xbox does not permit the program to be changed. Nor is there any way to ask the operating system nicely to convert some data you have into new code.

The reason for this of course, is to prevent you from making new games (or pirating existing games) on the 360 without Microsoft's permission. Now I don't mind that part so much. Well actually no, I do, but we can argue about DRM another time.

What really struck me was that there was no flexibility here; no way to for a program to make changes to itself or build new subprograms, even if Microsoft approved it. I let it go at the time, because consoles tend to have specialized requirements compared to desktop computers.

Then in 2007 Apple released the iPhone, and to my surprise it did the same thing - the code segment is read only, and there's no way to circumvent this, even just for short bursts.

What this means is that the iPhone and Xbox 360 are what's known as a "Modified Harvard Architecture", where the code and data are treated separately. A program on those systems cannot convert its data into code. You can maybe read your code as data, but not the other way.

This is possibly the biggest setback in computer science innovation to ever happen.

Why is this important?

The key to innovation is to be able to do things that weren't expected of you. If you can only accomplish things that the previous guy provided for you, you can't advance.

Let's say you're writing a web browser, and you have this cool scripting language you want to implement. But it runs kinda slowly. But one day you have an idea - we could dynamically compile the script into native assembly, which could perhaps increase the performance by 30X!

Except the thing is, you can't do this on an iPhone, because the operating system doesn't permit you to make new code. In case you think this an artificial example I've made up, this is the exact reason there's no Firefox on iOS. Of course, Apple's own Safari compile scripts to assembly because they've helpfully added a secret backdoor just for Safari, but not for any other programs.

Or perhaps you're running on an OS that doesn't have any support for hot-patching a running executable with new code. And you think "I know how to write a programming language runtime that can do that." Perhaps you'd seen how Erlang can do it, and wanted to try it yourself.

You can't do that on iOS.

Or perhaps you think it would be really cool to invent a new programming system that blurs the line between statically typed code and dynamically typed code.

You can't do that on iOS.

What if you were on a system that didn't support DLLs or shared libraries, and you thought it'd be kinda useful to invent something like that?

What if your operating system didn't come with an especially great CPU profiler, so you decided to write one and perhaps sell it commercially - maybe by inserting code hooks at the start and end of each function as it runs?

Yep you guessed it - not on iOS, unless you're Apple.

LuaJIT's JIT-vs-interpreter comparison

In 2009, Mike Pall wrote a new VM for the Lua scripting language, called LuaJIT. LuaJIT shocked quite a lot of people in the compsci community by showing that a dynamic language can not only be fast, but really fast - sometimes even rivaling the output from statically-typed compilers. On one benchmark, LuaJIT's ARM JIT is 48X faster than its own interpreter. Android can run the JIT version, but iOS can only run the interpreter version.

What if the platform holder didn't think to provide you with something you wanted to use? As long as you can write bytes into memory, you have the ability to do it yourself and invent whatever technique you need.

The walled garden of modern systems worries a lot of people because it prevents them from shipping certain applications without the platform holders permission - see the recent Pebble Time fiasco for an example. While I worry about that too, what scares me more is the simple technical restriction that prevents you from expanding past the boundaries set for you.

You're reliant on what Microsoft or Apple provides you. You cannot innovate in computing on these platforms, you have to hope the platform owner provides you with a pre-canned solution and that it doesn't suck.

Languages like Java, JavaScript, or the .NET framework could never have prospered on platforms that didn't allow them to create new native code as they ran. These products could only exist because the platforms they ran on allowed them to make their own native code, and run it right there and then.

On a Von Neumann architecture, if you don't like the language you're provided, it's always possible to escape it and move upwards into one you created yourself. It's all just bytes in memory, so you just put the right bytes where you need them.

Now if it were just the occasional consoles or cellphone that behaved like this, I'd be annoyed but I'd just have to suck it up and get on with things.

What really worries me about Apple's latest decision is that apps wishing to ship on the new Apple Watch cannot even ship native assembly language - you HAVE to ship byte code. It's not just a question of permission any more; an Apple Watch program simply has no capability to even think about creating new sub-programs. It has no way to reason about program code at runtime, no way to work inside its own medium.

Apple and Microsoft are showing every indication that within the near future this direction may well apply to desktops too. The restrictive 'store' APIs pushed by these platform holders seem to relish in preventing the execution of unapproved code.

A step backwards to the Harvard architecture benefits no-one. Yet that is where I believe we will end up if these events continue to their natural conclusion.

The idea that bytes are just bytes, with no semantics attached, is one of the most powerful ones we have in computing. Please let's not throw it away.

Written by Richard Mitton,

software engineer and travelling wizard.

Follow me on twitter: http://twitter.com/grumpygiant