Announcing the Saber Virtual Machine

January 18, 2024

In this post I'm excited to announce the Saber Virtual Machine, or SaberVM. It's a project I've been working hard on for the past month, with fascinating properties and an in-progress MVP implementation that's coming along well. If you like writing functional languages and want a portable backend to target that's fast and/or safe, you may be very interested in SaberVM, especially in its upcoming AOT-compile-to-optimized-native form.

What is it?

SaberVM is a compiler backend for functional languages. Specifically, it is an abstract stack machine, with many possible implementations. It takes in CPS code that's been closure-converted and hoisted, and executes it or AOT compiles it to a native binary. My current implementation is a naive bytecode VM in Rust, without JITing, mostly for rapid prototyping as the design settles.

The goals of the project are safety, expressivity, portability, and reliability. To give a very quick overview, safety is achieved through a carefully designed type system, and primarily includes memory safety. Expressivity is preserved by using runtime checks and tagging for some of the safety guarantees, so you can use any convoluted memory scheme you want so long as you don't dereference a dangling pointer. SaberVM offers portability by being designed to be simple to implement on a variety of platforms. Lastly, SaberVM is reliable because of a built-in crash recovery system you can use to microreboot without the whole program terminating, like Erlang's BEAM.

The two main systems of SaberVM are regions and exceptions.

Regions

A region can be thought of as an arena with a malloc/free-style internal memory management system. To read or write from the heap, you need a region. It can be the heap itself. Regions have statically-checked lifetimes, and a capability-based system for checking that values within that region are only read or written to during the lifetime of the region. When a region's lifetime ends, it is freed like an arena.

This structure of memory is important because in a safe compilation the values in the heap are tagged with information about which inhabitant is at that place in memory, so pointers can check that they're pointing at the thing they think they are when they're dereferenced. This introduces a memory fragmentation issue: "slots" in memory can then later be used only by values that are the same size or smaller; they have an unchangeable "max size." To prove this, consider two values adjacent in memory, A and B, and their pointers, &A and &B. Now say we free both, and allocate C at the same address where A was (that is, &C == &A). If C is bigger than A was, then it has arbitrary control of the bytes used to tag B! That means a nefarious program could potentially cause &B to think that B is still there, leading dereferences to not crash but instead to read memory controlled by C but think that it's B.

If allocating something in memory fixes a certain max-size for that chunk of memory for the rest of the program's lifetime, that can cause issues from a poor use of memory. Therefore, SaberVM puts its values in regions so there are certain points where it's statically known that nothing will dereference some set of pointers ever again, so their referent memory can be really freed, with no restriction on its future use. As a language writer, if you find your output programs have significant fragmentation issues, you can do some light region inference to fix it. In addition, since regions are freed like arenas, regions offer a way to deallocate a bunch of memory instantly, and improve cache locality.

Exceptions

SaberVM's other interesting system is exceptions. Exceptions in SaberVM are not like normal exceptions, though there's nothing stopping a compiler writer from building a normal exception system on top of SaberVM. Instead, SaberVM exceptions don't take arguments. Every function must have a catch-all exception case, and only that. Why? Having this built-in to SaberVM means that instructions that fail at runtime don't crash your program, they just jump to the exception handler. The language targeting SaberVM is then expected to produce exception handlers that do at least one of four things: crash the program (with an explicit halt instruction), rethrow the exception (that is, jump to the caller's catch-all exception handler), restart the crashed function (in a microreboot or Erlang style, without information about what caused the exception), or release held resources (currently SaberVM doesn't have locks, only CAS, but this is likely to change).

Note that SaberVM exceptions are not expected to be how your own language handles its exceptions! For example, if you prefer a Result-style exception handling, you can write functions that attempt single fallible instructions with an exception handler that produces the corresponding Err value.

Conclusion

SaberVM is carefully designed but still a very young project. I have an in-progress implementation in Rust that I call the MVP, a subset of the final SaberVM that isn't even necessarily forwards-compatible, just to play with the ideas and their interactions and see how they really work in practice. I'm sure as I learn more the design will change somewhat but the core ideas and goals are very strong.

I've gotten most of the typechecker done in just a few days, which is really promising, and the algorithms are very simple. (I'll post more about them soon!) I think the VM runtime will be easier to write than the typechecker, since it'll be a naive prototype (without, say, JIT compiling), so I'd say the implementation is about halfway done, which is very exciting.

If you've read this far and think that the project sounds interesting, consider starring it on github, or even sponsoring my work on github or ko-fi!