Main Page

From Rio
Jump to: navigation, search

Rio is a statically-typed, stack-based programming language, with a heavy focus on metaprogramming and dependency management. The language is designed to promote point-free (concatenative) style of both positional and named arguments. Names are bound and invalidated in a way that strikes a balance between safety (referential transparency/dependency elimination) and speed (minimizing copying), based on alias-tracking.

Code snippets on this page will use the following format:

name: type of positional argument, named argument: type of named argument ⇒ type of positional return, named return: type of named return

For example, the integer addition symbol is:

int-+: int, int ⇒ int

Though it will be discussed later, int-+ is the "mangled" form of integer addition. Rio has parametric polymorphism of a sort, so that in practice one will simply use + to add any two addable things, but for now only monomorphic procedures will be described.

Finally, this notation is what the stack would look like as a list: the top of the stack string, int has type int. If a procedure takes this stack and one is presenting arguments for it, they are presented in that same order (first a string, then an int), but when consuming them the order is reversed. For example, consider two procedures:

int-string: int ⇒ string

string-++: string, string ⇒ string

We can compose them simply by putting one after the other, so int-string string-++ will have the form string, int ⇒ string, and we might write "Lex stole " 40i int-string string-++ " cakes" string-++.

LambdaConf 2016

Two talks were delivered about Rio at LambdaConf 2016: Discrete Time and Race Conditions and Named and Typed Homoiconicity

Three grammar elements

Rio only has three elements in its grammar: literals, symbols, and blocks. Literals can be numbers, characters, and strings. Numeric literals must be annotated with their type; for now, the only annotation is i which is a 32-bit integer. In the long run, all standard numeric types will be supported. Characters are surrounded by single quotes, and strings are surrounded by double quotes, with C-style escape characters.

There's another type of literal called the quasiquote, or quote for short. A quote is a string that is only defined at compile-time, and will often contain the name of a symbol. A quote begins with a single quote, and may not contain any other single quote characters. There are various core symbols for working with quotes, in particular converting them to symbols for evaluation.

A symbol is some sort of language instruction, which can take on a few forms. This page describes the core symbols, which are recognized by the compiler directly. The other symbol types are procedures, which are bindings from symbols to blocks, structs, which bind symbols to scopes, and bindings, which bind symbols to values.

A block is a list of literals, symbols, and blocks that are not evaluated when first encountered. Blocks are surrounded in braces, { }. Blocks do nothing on their own, but are values that must be consumed by symbols. For example, the if symbol consumes two or more blocks to generate a conditional branch.

Stack and dictionary behavior

Whenever a value is evaluated, it is pushed on a stack. For example, 1i pushes a literal integer onto the stack, and "hello" pushes a literal string onto the stack. The bind symbol takes a name and a value, and binds that value to that name.

bind: T, quote ⇒ value of quote: T

When a binding is created from a literal, the compiler records the block level of the binding, which is simply the number of unmatched opening braces { preceding the bind symbol. Whenever a block is closed, i.e., whenever a } is encountered, all bindings created at that block level are invalidated. Additionally, whenever a binding is created from a literal, it is assigned a global binding number, which is a unique identifier for that binding.

On the other hand, whenever a binding is created from another binding, the new binding receives the same block level and global binding number of the source binding. The global binding number serves to track binding aliases.

Whenever a binding is invalidated, all bindings that share the same global binding number are also invalidated. Put more simply, whenever a binding is invalidated, it also invalidates all its aliases.

There are a few other symbols to help with stack and dictionary management:

drop: any ⇒ empty

del: quote, value of quote: T ⇒ empty

dup: T ⇒ T, T

Note that the dup command will perform a deep copy of the value on top of the stack. For example, if the top of the stack is a list of strings, dup will construct a new list containing new strings, so this can be a potentially expensive operation.

Finally, bindings may be preceded by a star *. Starred names have two properties: First, they are single-use, so evaluating a star name will implicitly invalidate it. Second, starred names are never mangled, which, as will be discussed later, allows them to be used for named arguments and returns to procedures.

Symbol rules

Rio supports mutating symbols after tokenizing but before evaluation. This allows symbols to be transformed as strings, which is mainly useful for adding prefixes to symbols to give them special meaning. There are two very basic ones, with more to be added as needed:

!sym transforms to 'sym bind, so 3i !x binds the integer 3 to x

@sym transforms to '*sym bind, so 3i @x makes a starred binding of the integer 3 to x


Actual implementation of statically-typed polymorphism is typically based on name mangling. In this system, a polymorphic function has multiple variations, each of which is monomorphic. The various monomorphic functions have mangled names (i.e., names prefixed or suffixed to make them unique), which in some way indicate the types that version of the function accepts.

Standard mangled names in Rio are prefixed with the name of the type of the argument at the top of the stack; for example, int-+ is the version of + that expects an int at the top of the stack. The actual + procedure then looks at the type at the top of the stack, call it T, and emits the symbol T-+. To simply this, we provide the following symbol:

poly: quote ⇒ value of quote: procedure that mangles names

For example, writing '+ poly will create a + procedure that simply mangles names according to the above rules.

There are then two primary ways of "instancing" a polymorphic procedures. The mangling rules are simple, so creating a procedure with the correct name will just work. Second, macros may be used to create mangled procedures en masse; in particular, macros that create types should also provide some mangled procedures for using them.

Primitive operations

The primitive operations follow the polymorphism rules established above, and are straightforward. Numeric operations are +, -, *, / (floating-point division), div, rem, and div-rem. Integer division is defined such that if either the dividend or divisor (but not both) are negative, then the quotient is negative and the remainder is positive. More standard operations include =, /= (not equals), and, or, not, and the usual four comparison operators.


Rio lacks generics/type constructors. Instead, macros can be used to create new types, and the most fundamental such type is lists.

list: container name, element name ⇒ container name: element-list constructor

Example: 'int-list 'int list creates a type of lists of ints

The list macro also introduces the following mangled procedures:

T-length: T-list ⇒ int

T-capacity: T-list ⇒ int

T-at: T-list, *i: int ⇒ T

T-resize: T-list, *i: int ⇒ T-list

T-set: T-list, *i: int, *e: T ⇒ T-list

Example: int-list 1i @i int-list-resize 0i @i 5i @e int-list-set 0i @i int-list-at ends with the int 5 on the stack

The resize and set procedures invalidate their list-typed argument.

Upcasting and downcasting

sigma: any ⇒ sigma

Example: "hello" sigma

A value of any type can be passed to sigma, which upcasts it to a generic form and wraps it along with enough type information to safely downcast it later. The language itself only provides:

__unsafe-downcast: sigma, *e: quote ⇒ value of e

sigma-inner-type: sigma ⇒ int

__type-to-id: quote ⇒ int

Using these three things, safe downcasting can be implemented via metaprogramming. The standard library provides:

match: sigma, (quote, {value of preceding type ⇒ S})+, {sigma ⇒ S} ⇒ S

Example: "hello" sigma 'string { } 'int { int-string } { drop "unknown type" } match, ends with "hello" on the stack

Example: 5i sigma 'string { } 'int { int-string } { drop "unknown type" } match, ends with "5" on the stack

The second argument is one or more pairs of quotations and bodies. Each quotation is the name of a candidate type, and if the type contained in the sigma is listed, then the sigma is downcast to that type and passed to the associated body. The last block is given the un-casted sigma, and represents an else/default clause. This is a macro around if, so look at the next section for more information.


In general, structuring data is not necessary in Rio. The main exception is when one wishes to upcast structured data, for which the following mechanism is provided:

struct: quote, {empty ⇒ any combination of quotes} ⇒ value of quote: block

Example: 'ivec2 { 'int !a 'int !b } struct

The name given to struct becomes a constructor, which requires the scope to be a superset of the scope the struct is bound to. Invoking the constructor takes the indicated values out of scope, and returns a sigma wrapping the struct type. Whenever a sigma containing a struct is matched (as in the match macro), the matched body is given the scope implied by the struct.

Example: 1i !a 2i !b ivec2 'ivec2 { a b + } { -1i } match

Control flow

There are two control flow symbols.

if: ({empty ⇒ bool}, {empty ⇒ S})+, {empty ⇒ S}? ⇒ S

Example: "hello " { false } { "world" } { true } { "place" } { "thing" } if ++, results in "hello place" on the stack

So if takes any number of blocks. If the number of blocks is odd, then the last block is an else-style construct. Pairs of blocks are treated, from left to right, as successive if- or else-if-style constructs. The first block in each pair is a condition, and if the condition is true, the code in the second block is evaluated and no other blocks are checked or evaluated. Otherwise, the next pair of blocks are checked, and if none are left, and an else block was provided, the else block is evaluated.

All code blocks must result in identical stacks and symbol tables; if an else-style block is not provided, then all code blocks must result in stacks and symbol tables identical to the stack and symbol table prior to evaluating the if.

while: {S ⇒ bool} {S ⇒ S} ⇒ S

As long as the first block evaluates to true, the second block is evaluated. A standard simple while will pass information along the stack; for example, if looping over a list, then the index under consideration can be passed around on the stack. If more complicated behavior is required, starred names can be used to propagate state.


procedure: quote, {any ⇒ any} ⇒ value of quote: block

Example: 'inc { 1i + } procedure

The procedure procedure binds a block to a name. Then, whenever the name is evaluated, the associated block will be flattened and evaluated. Put another way, the elements of the procedure's block will be inserted inline at the call point. procedure must be evaluated at top level.

Stateful read

read: quote ⇒ depends

Example: "hello.txt" @filename 'file read, ends with the (string) contents of hello.txt on the stack

Stateful reading is always allowed; the read procedure takes a quote indicating a read target. Right now the only supported value is file, and there must be a string named filename that indicates the file to be read. The contents of the file are then pushed on the stack.

Program entrypoint

This section is preliminary; in the long run, this stuff will be a major focus. But for now, this is meant to be powerful enough to build a self-hosting compiler, and little more.

finalize: {empty ⇒ S}, *write: quote ⇒ empty

Example: 'console @write { "hello world!" } finalize is the classic program

The important thing here is the write argument, which indicates the write target of the program. For now, this can take on the values 'console and 'file. In the latter case, the entrypoint body must bind a string to the name filename. In both cases, the body must end with either an int or string on top of the stack, which is printed to the console or written to the file. finalize must be evaluated at top level.


There are four fundamental operations for metaprogramming Rio, but each of them are fairly complicated.

splice: {empty ⇒ S} ⇒ S

Rio internally uses a symbol stack to store symbols awaiting evaluation. When splice is called, the symbol stack is pushed onto a stack of inactive symbol stacks. The argument to splice is then set as the symbol stack. As long as there is at least one inactive symbol stack, the compiler is in meta-mode. If the symbol stack is empty, a symbol stack is popped off the inactive symbol stack. If both stacks are empty, the compiler has finished.

While in meta-mode: literals are evaluated to compile-time values, as in, their values are tracked by the compiler but are not used to generate code; and if and while are evaluated immediately by the compiler. This means that condition blocks consume a compile-time bool, and branching and looping is done by the compiler.

fuse: compile-time value, compile-time value ⇒ compile-time value

One of the arguments to fuse must be a block. If both arguments are blocks, they are concatenated. If the right argument is a block, then the left argument is prepended to it; likewise, if the left argument is a block, the right argument is appended to it.

push-symbol: quote ⇒ empty

The symbol at the top of the stack is pushed onto the top inactive symbol stack. So this symbol will be evaluated once the current splice finishes, and will possibly not be evaluated in meta-mode.

push-quote: quote ⇒ empty

The quote at the top of the stack is pushed onto the top of the inactive symbol stack, but as a quotation.