maximecb / zetavm
- понедельник, 1 мая 2017 г. в 03:12:15
C++
Multi-Language Platform for Dynamic Programming Languages
Please note that ZetaVM is currently at the early prototype stage. As such, it is incomplete and breaking changes may happen often.
# Clone this repository
git clone git@github.com:maximecb/zetavm.git
# Run the configure script and compile zetavm
# Note: run configure with `--with-sdl2` to build graphics support
cd zetavm
./configure
make
# Optionally run tests to check that everything works properly
make test
# To run programs, pass the path to a source file to zeta, for example:
./zeta benchmarks/fib29.pls
ZetaVM is a Virtual machine and JIT compiler for dynamic programming languages. It implements a basic core runtime environment on top of which programming dynamic languages can be implemented with relatively little effort.
Features of the VM will include:
Built-in support for dynamic typing
Garbage collection
JIT compilation
Dynamically growable objects (JS-like)
Dynamically-typed arrays (JS/Python-like)
64-bit integer and floating-point arithmetic
Immutable UTF-8 strings
Text-based image files (JSON-like)
Ability to suspend and resume programs
Graphical and audio libraries
Zeta image files (.zim) are JSON-like, human-readable text files containing objects, data and bytecodes to be executed by ZetaVM. They are intended to serve as a compilation target, and may contain executable programs, or libraries/packages.
This section aims to explicitly state the design goals and principles underlying ZetaVM. These should be used to guide the design of the VM.
ZetaVM should make it relatively easy to implement dynamic languages with common features such as dynamic typing, eval, dynamic arrays and dynamically-extensible objects.
ZetaVM aims to reach, after an initial prototyping phase, a stable set of core features which will eventually become completely frozen/unchanging, so that software compiled for ZetaVM can keep running, even decades after it has been written.
The core features provided by ZetaVM should be minimalistic. It is not possible, nor desirable to try to accomodate every possible use case. A large set of features is more likely to lead to the introduction of corner cases and unpredictable behaviors.
The semantics of the features provided by /vm should be simple and straightforward. These should be as few corner cases as possible.
The semantics of the ZetaVM core should be strict and precise, leaving as few undefined behaviors as possible (ideally none). Predictable semantics should be favored over small potential optimization opportunities. This will increase the likelihood that ZetaVM programs behave the same on every platform.
Often, it is preferable to be too strict in defining VM behavior and capabilities rather than too lax. Limitations on inputs the VM can take, for instance, can always be removed later, but are difficult to add later without breaking programs. In the same vein, ZetaVM should be strict in rejecting non-conforming inputs and program behaviors, so that programs do not begin to rely on unanticipated corner cases of the implementation.
Silent failures and nondeterministic behaviors should be avoided. Type errors and invalid pointer dereferences, for instance, should result in immediate program termination.
ZetaVM should be designed with some regard for performance. That is, its core semantics should be chosen so that known optimizations may be applies. However, performance is not the ultimate goal, and robustness should be favored over performance.
Core APIs and libraries provided by ZetaVM should intentionally be kept simple, low-level and minimalistic, again so that there are as few corner cases as possible. However, in order to enable forward-compatibility, core APIs should also be built with as few arbitrary limits as possible.
Some consideration needs to be given to extensibility and future-proofing. ZetaVM aims to provide a simple and stable environment, but if it is to last, it is inevitable that additions and extensions will be made.
This section roughly outlines medium-term plans for the ZetaVM project.
A first prototype of ZetaVM will be implemented. For this first prototype, there will only be an interpeter, and no garbage collector. The goal here is to get the system running quickly with a core set of features, so that we can begin prototyping and altering the design as necessary.
The first language implemented on top of ZetaVM will be a simple LISP subset. The main motivation is that a LISP-like syntax can be extremely simple to parse, but also highly extensible and expressive. This language will likely be implemented in Python, and will serve both to bootstrap the system, and to demonstrate to beginners how they can build their own language targetting ZetaVM.
A set of core libraries/packages will be provided as part of ZetaVM. These will implement a set of simple intput/output APIs to interface with the outside world. The APIs provided will intentionally be kept low-level, simple and minimalistic to minimize the risk of introducing corner cases and undefined behaviors. It is expected that higher-level, more user friendly libraries will be implemented on top of these.
Libraries should cover services such as file I/O, console I/O, basic 2D graphics (pixel plotting and blitting), mouse and keyboard input as well as raw PCM audio output. The early prototype version of the VM will implement only the most essential libraries.
Once ZetaVM is past the initial prototyping stage, a JIT compiler will be implemented to improve performance. This JIT will likely be based on basic block versioning.
The VM will eventually ship with a package manager. This package manager will make it trivial to immediately upload code you have written from the command-line and make it available to anyone.
Packages will be versioned and immutable. That is, once a package is uploaded, it will be assigned a version number, e.g. "john.imagelib.56"
. This package version will then be frozen and unchangeable. What this means is that once code relies on a specific package version, the dependencies can never be changed and broken. Hence, by freezing the core VM semantics, and freezing submitted packages, we make it possible to write software that never breaks.
Stack vs register-based VM:
Plain function calls vs call/cc
:
Memory accesses, loads and stores:
get_field
and set_field
for objects, get_elem
and
set_elem
for arrays,
get_char
for stringsStatic vs dynamic typing:
Immutable objects: Do we want the ability to tag object fields as constant/immutable? This may be good for optimization and safety. However, this is somewhat bad for monkey-patching and forward-compatibility. May want to just let people shoot themselves in the foot if they want to. Also, keeping things simple, particularly for the prototype, is important.
In ZetaVM, every value implicitly has an associated type tag which tells us which kind of value it is.
The type tags are:
$true
and $false
)More complex datatypes, such as functions, variable-length lists and objects with prototypal inheritance are to be implemented by composing objects, arrays and other simpler types.
Type tags will be internally represented by the VM as 4-bit integer values. Note that in some cases, a JIT compiler may be able to avoid storing those values in memory, that is, the type tags can be implicitly known by the compiler at code generation time.
Type tags will be accessible to bytecode as strings, through the
has_tag <val>
, <tag_str>
and get_tag <val>
instructions. The has_tag
instruction is a dynamic type test which makes it possible to answer questions
such as "is this value of type int64
?" at run time.
What should happen when an operation expecting values of a certain type
(e.g. add_i64
expects two int64
values) receives operands of the wrong type?
In order to avoid corner cases and undefined behaviors, we will guarantee that
the VM will halt program execution should a type error of this kind occur.
By halt, we mean that the VM will abort execution and report an error to the process that instantiated it. Type errors will not produce an exception that can be caught by running programs because we do not want code to rely on type errors to infer types. Correct programs should insert dynamic type checks where appropriate.
My work on Higgs and basic block versioning leads me to believe
that it should be possible for the VM to determine the types of operands in almost
every case. This is because typed operations such as add_i64
should be guarded by
dynamic type tests. Hence, I do not believe that guaranteeing program
termination on type errors will cause performance issues. If there is a
performance cost, it will be small.
If we do not implement closures natively in ZetaVM, language implementers have to do it themselves. That means they must do closure conversion and allocate mutable cells or store values on functions. This is actually not that hard given that people will need to do some kind of scope analysis. Probably better to keep the VM implementation as simple as possible and not implement closures at the VM level, only plain function callls. MiniLISP will exemplify a simple implementation of closures.
Objects will follow a model that is similar to JS, where new properties can be added dynamically, with some simplifications:
Objects in ZetaVM will not support prototypal inheritance natively. Supporting this will be left to language implementers. This only requires implementing a recursive property lookup function.
ZetaVM will not support hidden properties in objects. If you want to hide properties from language users, you can always prefix user properties by some special character. Remapping names is not difficult.
JS supports arbitrary values as property names (keys). ZetaVM will limit property names to valid ZetaVM identifiers. Supporting other property names can be done through remapping. This is to facilitate the serialization of ZetaVM objects.
Unless a very convincing use case is found, property deletion will not be supported. Property deletion seems to be rarely needed in practice and is complex to implement efficiently.
I initially thought that ZetaVM should implement fixed-length arrays only, since these are simpler to implement than dynamically growable arrays. However, since Python, JS and Lua all have dynamically growable arrays, I think the VM should support growable arrays natively. If the VM does not provide growable arrays, then many language implementations running on top of ZetaVM will end up implementing their own incompatible array/list types, which will make language interoperability difficult. Hence, the arrays that ZetaVM implements should be growable, so that they are "good enough" for most language implementations.
The ZetaVM arrays will follow a model similar to JS, with some simplifications and corner cases removed. In JS, it's possible to create an array with "holes" (non-existent elements) that have undefined
values. ZetaVM won't permit this. It also won't be possible to write out of the bounds of an arrays. This is to avoid the case where uninitialized array elements have undefined
values. We will limit array instructions in ways that force people to initialize all values contained in arrays. The main advantage of this is that ZetaVM should be able to fairly easily infer array types as arrays are grown. This will be good for performance.
Because arrays are extensible, they will have both an associated length (number of elements currently contained) and a capacity (the number of elements the array is cabaple of containing). It will be possible to provide a minimum capacity hint when allocating an array, but it will not be possible to query the VM to know the current capacity of an array. The reason for this is that the capacity is essentially a "hidden state", an implementation detail which we do not want to expose. When serializing arrays to text, the capacity will not be present in the textual representation.
Below is a tentative list of bytecodes to be provided by ZetaVM:
push_i64
, push_str
, pop
, dup
add_i64
, sub_i64
, mul_i64
, div_i64
, mod_i64
add_f64
, sub_f64
, mul_f64
, div_f64
, sqrt_f64
lsft_i64
, ulsft_i64
, rsft_i64
, and_i64
, or_i64
, xor_i64
, not_i64
lt_i64
, gt_i64
, ...get_tag <val>
, has_tag <val> <tag>
if_true <bool_val>
jump
call
, return
new_obj
, new_array
get_field
, set_field
, has_field
get_elem
, set_elem
, arr_len
get_char
, str_len
Integer arithmetic operations that produce results that are out of bounds will result in overflows. There will be no undefined behaviors in this regard.
The ZetaVM prototype will not offer any special mechanisms for overflow detection, but the final version will support integer arithmetic operations both with and without overflow checking. This is because efficient overflow checks are useful to implement bignums, saturation and other such language features.
There are example image files in the /tests/vm directory of this repository. Image files have a ".zim" file name extension.
Why not use pure JSON:
Why not use s-expressions, or something LISP-like?
Special definitions:
$foobar
$true
and $false
are special definitions$undef
is a special definitionTop-level definitions:
Comments:
Library/package dependencies:
import
bytecode instruction