This post delves into building an interpreter for a Lisp-like language using Rust. No knowledge beyond Rust basics is required to follow this post.
Inspiration and Project Overview
Inspired by Stepan Parunashvili’s article Risp (in (Rust) (Lisp)), I created lip, an interpreted language designed for logical operations with a Lisp-like syntax. This supports logical operations (not, and, or), branching (if expression), lambda functions, and variable definition.
This post guides you through the process of building an interpreter, focusing on the core functionalities of tokenizing, parsing, and evaluating expressions. While it may sound complex, the process only requires basic Rust knowledge. The language we’ll build is lp, a distilled version of lip designed for illustration. The live demo of an lp interpreter is accessible via a browser, and the complete code is available on GitHub.
Here are some examples of lp code:
Literal (T = true, F = false)
|
|
Logical operations (not, and, or)
|
|
If expression
|
|
The logical operations may look weird especially if not familiar with Lisp. Lisp uses prefix notation, where the operator comes first, followed by as many operands as you want.
The original Lisp has a number type. (+ 1 2 3 4)
means 1 + 2 + 3 + 4
, and interestingly, (+)
is 0. The same thing applies to lp: (& T T F F)
means (true & true & false & false)
, and (&)
is true.
Implementation Steps
To understand how lp code is evaluated, let’s break down the process into three steps: tokenize, parse, and evaluate. We’ll use the expression (& T (| F T))
as an example.
Imagine the lp code as a sentence. Tokenization is like splitting the sentence into individual words and punctuation marks. In our example, (& T (| F T))
is tokenized into an array like [left paren, and, true, left paren, or, ...]
.
Once we have the tokens, we need to understand their structure and meaning. Parsing involves arranging the tokens in a way that reflects their relationships. This is similar to how we understand the grammar of a sentence.
For our example, parsing creates an abstract syntax tree (AST), which is a tree-like representation of the expression. The AST for (& T (| F T))
looks like this:
|
|
The last task is evaluation
. Here, the AST is used to compute the final value of the expression. We traverse the AST starting from the leaves to the root:
|
|
Now that we have grasped the overview of the implementation steps, let’s get started with tokenize.
Tokenize
Let’s commence by defining valid tokens:
|
|
Write a unit test to clarify the goal:
|
|
Tokenizing lp is simple because it is almost tokenized from the start! Each token is separated by whitespace except for (
and )
. To achieve tokenization, add whitespace to the parenthesis and split the expression:
|
|
That’s it. By the way, the implementation above shows an effective use of collect()
. It transforms the iterator of Result<T, U>
into Result<Vec<T>, U>
, constructing Vec<U>
by accumulating Ok<T>
. If the iterator contains Err<U>
, the result will be Err<U>
.
Parse
Before diving into parsing expressions, let’s establish what constitutes an expression in lp. Expressions in lp can be primitives (T
and F
) or logical operations:
|
|
Again, our expectations are clarified by the following test:
|
|
The helper functions and constants (and
, or
, T
, F
) simplify the test code. For instance, and(&[T, or(&[F, T])])
looks concise compared to Expr::Call(Operator::And, vec![T, Expr::Call(Operator::Or, vec![F, T])])
.
In implementing parse
, the crucial insight is that lp expression has a recursive structure, where Expr::Call
contains Expr
. This naturally leads us to consider a recursive function for parsing:
|
|
The initial parts of the function handle non-special cases. If the expression doesn’t start with (
, it must be T
or F
. If the first token is (
, the subsequent token is expected to be an operator.
The most intriguing aspect lies in parsing operands. It parses expressions while maintaining the current position.
The first call to parse
starts from the head, encountering (
and &
tokens:
|
|
The second parse
is called recursively and starts from tokens[2]
. processing a single token T
and returns:
|
|
The third recursive call commences from tokens[3]
. This time, it marks the start of an or
expression, initiating the operand-parsing process.
|
|
After parsing or
, the position p
points to the last token, which should be )
:
|
|
Finally, we encapsulate the function and expose only the necessary result:
|
|
Evaluate
After conquering the challenging part of parse
, the remainder is surprisingly straightforward! Before delving into the implementation, let’s get started with a test:
|
|
Given the recursive structure of AST, it’s only natural that the evaluation function eval
mirrors this recursive design:
|
|
Now, the lp code can be evaluated. To wrap things up, let’s create a REPL, an interactive environment.
REPL (Read-Evalueate-Print Loop)
Many interpreted languages boast a REPL. For instance, running python
launches its REPL, Ruby has irb
, and even Rust offers one like evcxr. Why not have one for lp?
|
|
Let’s engage with it:
|
|
Implementing std::fmt::Display
for Value
is a good way to provide a more user-friendly output. Feel free to do so.
If Expression
While lp can deal with complex expressions, it still lacks many features found in ordinary languages, such as if
, for
, or function definition.
Fortunately, incorporating some additional features is not overly complicated. Take if
as an example. The expected result is like the following:
|
|
Let’s walk through the process from tokenize
to eval
. First, a new keyword must be added to Token
:
|
|
parse
also needs modification:
|
|
The implementation of parse_if
is straightforward as it leverages existing code. It calls parse_internal
three times to parse the condition, then-clause, and else-clause.
The last part, eval
, is also uncomplicated:
|
|
It’s worth mentioning that it uses raw identifier with r#else
to use else
as a variable.
If
is implemented relatively easily. However, adding functions or variable definitions requires more work. Feel free to explore these aspects on your own, and check out and use the lip repo as a reference.
Conclusion
In this exploration of interpreter implementation, we demystified the process by leveraging Rust’s powerful features and embracing Lisp’s straightforward syntax.
The journey through building an interpreter extends beyond the typical “hello world” or to-do app projects. It serves as a valuable exercise, showcasing the elegance and flexibility that Rust offers.