Keti Programming Language by Ori Roth

Background

This is a minimalistic virtualization of a classic Lisp Machine, written in C. This implementation is very dynamic, easy to learn and fun! There is no use of pointers/allocations in the code (C accessories), only logic - the system is independent, and deals its memory by itself (including memory sweep). I am a Software Engineering student at the Technion, Israel, and this project was done at the summer of 2015.

Micro Lisp Style

Language values are:

~ Signed Integer: such as 1764. (Atomic value)

~ Strings: such as "Monty" (Atomic value).

~ True: t is true. (Atomic value)

~ Nil: () is nil (a list of 0 values). (Atomic value)

~ S-Expression: definde recursively- atomic values are S-Expressions. if x1 ... xn are S-Expression, so does (x1 ... xn).

Note:

- In this version you cannot type negative numbers, so -123 would be regarded as a symbol.

- "" would be regarded as nil.

- As you will see later on, we can create S-Expression (also called a list), using the "cons" native function (see also "quote").

Supported native functions are:

native function	number of arguments	what it does
quote	1	Return the argument as it is, without interpretation. Could be used in many ways, such as making lists [(quote (1 2 3)) => (1 (2 (3 nil)))] or making symbols [(quote x) => ( ("x" nil))].
define	2	Parameters are a symbol and a value. Makes the symbol a new variable, with the received value (see below).
lambda	2	Parameters are a list of parameters and a value. Return a new function, with the received parameters list and value (see below).
if	3	Resolve the first argument. If it is true, return the second argument, else the third (if-else structure).
atom	1	Returns true if the argument is not cons, else false.
cons	2	Creates a new cons, of the two received values.
car , cdr	1	Receives a cons, and return its car/cdr.
+ , - , * , / , = , > , <	2	Exactly what you think it does.
nil	1	Receives a value. Return true if it is nil, else false.

Most expressions are in the form of prefix notation (also called Polish notation) generally (function_name { argument}) - in parentheses, and the function comes before the arguments (if any, See more). Moreover, there are special structures which require specific identification: lambda functions (functions created by you) and symbols (system structure, used to connect a name to a value).

Some simple examples:

input	result
(+ 1 1)	2
(> 5 2)	t
(car (cons "car" "cdr"))	"car"
(lambda (x) (* 2 x))	{function that receives a number, return the number multiplied by two}
(define x 3)	3 {and now x is 3}

What is in the files

* mainloop: contains the main function, which has the main game loop that waits for input, translate and process it and show the result (or error) on screen.

* system: contains the main translation functions: resolve_expr (which receive a value and resolve it recursively), and lookup (which receive a symbol and return its value).

* memory: deals with memory management. Supports allocation of cons, and the mark-sweep algorithm.

* bitarray: supplies us with bit array, makros supporters.

* native: contains the implementation of the native functions.

* error: contains the errors logic.

Structure

The language implementation is built from three main parts: memory management (memory.c, memory.h), expressions execution (system.c, system.h) and the interpreter (mainloop.c). We will now cover the techniques used in each level of the language.

Memory Management

The memory system contains two pools (large arrays of available space): pool of cons (32 bits) - cons_pool, and pool of characters - strings_pool. Alongside the pools we hold a free addresses stack for the cons pool (every time we allocate a cons, we pop an address from the stack and returns it), and a filler for the characters pool (integer pointing the first free cell- during allocation we forward the filler by number of cells equals the length of the allocated string). Shallow copy of strings is supported- instead of pointing the string in the pool, string values (in cons pool) point string handlers (special structure on the cons pool, takes whole cons 32 bits), which are special cons that contains the actual address of the string. Instead of allocating a new string when copying one, we just redirect it to the same handler. Placing the handlers on the cons pool is not mandatory, but it saves a lot of space saved when cancelling the handlers pool.

When one of the pools is being filled, it is very possible that there are stale allocations we have done sometime in the past, which are no longer required. So instead of crashing the system, we preform Garbage Collection. Now we will describe the specific algorithm used in our memory system: First we define a root set- group of cons addresses that are valid and required (such as the current environment, see below). We mark every valid address- notice that the marking process is recursive, marking addresses from the root set and addresses pointed by them. Then we scan the cons pool, pushing free (or used but invalid) addresses to the free addresses stack. Additionally, if an address is mark and its car is handler identifier (special value used only for this purpose), then we push the address to another stack- handlers_sort_buffer. This buffer allows us to clean the strings pool- we sort the handler addresses by string address (at the strings pool), and then moving the strings one by one to the start of the pool (so we can move the strings filler to the left, means more free characters cells to its right). In conclusion, the first part of the algorithm is classic mark and sweep, but dealing with the strings required us using something called defragmentation.

System Management and Translation

The system uses a special cons structure called A-List. This is a simple cons list, represent the current environment nesting (the head of the list is the current environment). Each environment contains translation list- list of symbols (names and their values). The initial symbols stands for the native function. When we use define, we connect a name with a value- there are two options:

a) The symbol is not defined: the symbol is inserted to the current environment.

b) The symbol is already defined, in some environment: the symbol's value is replaced (it does not change place).

When we use a user function, a new environment is created (its next environment is the previous one), while the arguments sent to the function are being computed and inserted to it (as symbols- they are being linked to the function's parameters in order to create the symbols). Next the function value is computed in the new environment, and finally we destroy the environment (by redirecting the current environment index to the next environment).

The system structure- a list (environment nodes) of lists (symbols nodes). This specific state of the system can occur after calling a user function, which uses x as a parameter, with 7 as the argument [for instance, (factorial 7) when factorial is defined by (define factorial (lambda (x) (...)))].

You can look at this structure as a stack of environments, while every environment is a list of symbols (same as above!).

More about programming languages

As you may have seen, many aspects of programming languages can be observed in this Micro Lisp implementation:

Interpreter Implementation - a program is computed one row at a time. In this implementation, the interpreter turns a code line to a translatable object (sitting in the cons pool), ready to be evaluated.

Dynamic Typing - type mismatch means run time failure.

Strong Typing - no implicit cast (actually no casts at all...).

Disjoint Union - a cons is compound of car and cdr: each of them can be a pointer to another cons, or an atomic value (See more).

Garbage Collection - explained quite well before (under Memory Management).

Unit Type - (), nil, is a list of zero values.

Dynamic Scoping - an identifier refers to the identifier associated with the most recent environment. For example:

C style

int x = 42;

int f() {return x;}

void main() {int x = 1; printf("%d\n", f());} // => 42

Lisp style

(define second (lambda (x y) y)) ; helps us do two operations

(define x 42)

(define f (lambda () x))

(define main (lambda () (second (define x 1) (f))))

(main) ; => 1

Examples

First Steps in Lisp - you can experience the various operations, available in this implementation. Pay attention to the prefix notation, and to the way we define variables (remember functions are also values! "functions" as you know them are simply variables containing function values).

Functions - pay attention to the way we use conditioning (if native). We can make lists using "quote": (1 2 3) this will not run properly, as the system will try to use "1" as a function and fail. (quote (1 2 3)) will work, and will generate the list (1 (2 (3 ()))) ("()" is nil value, also """"). The interpreter translate the phrase "(1 2 3)" to this list, and quote returns it without evaluation.

Advanced Language Capabilities - remember the functions and symbols structure (see above). We can extract the lambda and symbol identifiers, using car operation on function and symbol values ("(quote x)" returns symbol, because quote does not evaluate its parameter). "eval" function (from the program) receives a symbol and return its evaluation: we use manipulations to activate the lookup procedure. "design" gets a list of symbols and a value, and translate recursively each symbol that appears in the parameter, unless its on the list. Pay attention how we use design in order to build "constant" - a function that gets a function of two parameters and a value, and returns a function of one parameter that compute the received function with the received value and its parameter. For example, (constant + 8) returns a function that receives a parameter, and add 8 to it. This example shows the great power of this language - although it is a minimized implementation, you are not limited at all and can make very sophisticated procedures (I have made a functions that creates polynomials. You can try by yourselves!).

Exercise

Self Practice:

a) What are the similarities and differences between these two problems?

- Write a function that receives a list of numbers, and multiply each number by 2 (called double).

- Write a function that receives a list of numbers, and returns its members multiplication (called multilist).

Many languages (such as Python) supports "map" and "reduce" functions. Map receives a function and a list, and applies the function to every item of the list (and returns a list of the results). Reduce also receives a function (of two parameters) and a list, and reduce the list into one value (you may search for examples on the net).

b) Implement map and reduce functions in this file.

c) In the same file, implement double and multilist functions (described above), using map and reduce you have wrote. You can add other functions to the program, if you want.

Challenges

Equals - the "=" native compares cons by address. Override it with an implementation that compares cons recursively (dont forget to save the native "=" in some variable! we still want to use it).

Most - define min and max functions using a generic function, most.

Want more? Contact me via email.

Object Oriented?

The idea behind Keti language was to create an object oriented extension of Micro Lisp, and objects were actualy implemented in the previous versions (not in the current, I'm afraid). It could be done by adding two natives:

- assign: receives a symbol, a value and some item. If the item is not an object, it turns it into one while adding the value as a field with the received symbol. Otherwise adds the field to the object.

- supps: receive a symbol and an item. Returns true if the item (which can be cons/atom/object) supports a function with that symbol. For objects, it may happen when the symbol one of it fields, or otherwise when the symbol indicates a global function or a native.

When calling a function, we should check if the first argument is an object. If so, we check if the function's symbol appears in its fields list- if it does, we call it instead.

How we build objects? Objects should be identified using an identifier similar to those of lambda expressions and symbols. It should contain a list of its fields, while the original value (we turn into object) rests under "core" field (and it should be accessed when using some natives).

Use example:

(define x (assign r 5 6)) ; Now x is an object, that contains 6 under "core" field, and 5 under "r" field.

(r x) ; => 5.

(assign + (lambda (x y) (+ (+ (r x) (core x)) y)) x) ; we overrided + operation for x.

(+ x 2) ; => 13.

(+ 2 x) ; => 8.

For Google