Fraud: Local bindings and variables

Table of contents

Bindings and variables
Concrete Syntax
Abstract Syntax
Substitution
Interpreter via substitution
Environments
Interpreter with environments
Definitions
Testing

Bindings and variables

Now, let us add local bindings to our language. This will allow us to declare a variable, bind a value to it, and use the variable in expressions.

(let ((id0 e0)) e1)

This form is called a let binding, that binds the identifier id0 to the value of e0 within the scope of the expression e1.

This is a simplification of Racket’s let binding form, that allows binding of multiple variables:

(let ([id0 e0]
      [id1 e1]
      ...
      [idn en]) em)

that allows you to declare multiple identifiers id0 through id1, binding them to values of expressions e0 through e1 respectively in the scope of the expression em.

An example program written using let bindings will look like:

(let ((x (add1 6)))
  (let ((y (+ 6 x)))
    (+ x y)))

This program binds the value of (add1 6) to x and sum of 6 and x to y and then calculates their sum, evaluating to 20.

We will extend our previous language Defend to add let bindings, and call this new language Fraud.

Concrete Syntax

We extend the concrete syntax of Defend with a few forms to support bindings. First, we add x to the language, which denotes any identifier, i.e., any string that can be represented as a symbol in Racket (alpha-numeric, and symbols like -, +, =, <, >, and similar). These can denote the name of any variable of declarations in our language - Fraud. Second, we add a (let ((x e)) e) form to the language. This allows us to declare a new variable x and bind the first expression e to it, and this x is only bound locally in the second expression e.

Fraud Grammar

Abstract Syntax

With the surface syntax of the language defined, we can now turn to define the AST. Because we have two new kinds of expressions in the language that do not fit in the other AST nodes, we will define two new AST nodes. First, (Var x) for representing variables in the AST. Second, we have (Let x e e) for representing let.

Fraud AST

The code for the AST is straightforward:

#lang racket

(provide Val UnOp BinOp If Err Err? Let Var)

;; type Expr =
;; | other expression forms omitted for brevity
;; | (Var x)
;; | (Let x e e)

; full set of struct definitions can be found in ast.rkt
(struct Var (x) #:prefab)
(struct Let (x e1 e2) #:prefab)

The parser is slightly longer than Con, but almost similar:

#lang racket

(require "ast.rkt")

(provide parse)

;; full parser can be found in parser.rkt

;; S-Expr -> Expr
(define (parse s)
  (match s
    ; ... rest of the cases omitted for brevity
    [(? symbol?)  (Var s)]
    [`(let ((,x ,e1)) ,e2) (Let x (parse e1) (parse e2))]
    [_ (error "Parse error!")]))

In the two new cases, the first one matches against a symbol in the concrete syntax, and if so creates a new Var struct. Recall that programs are s-expressions, which are lists. So a program in concrete syntax is written as a list is '(+ x 5), which in equivalent to (list '+ 'x 5). The parser when it comes across the '+ symbol will produce the BinOp node, which will recursively call parse on the following two subexpressions. Then 'x will be parsed as a symbol and represented as a Var node with new match clause we just added. The second clause for let, matches with the form (let (( literally, and then matches with the variable as x followed by the binding expression e1 followed by e2 where the identifier is bound.

This is how the AST looks when parsed:

> (parse '(let ((x 42)) (add1 x)))
'#s(Let x #s(Val 42) #s(UnOp add1 #s(Var x)))

Substitution

We are done with the syntactic part of extending the language and now move on to defining its semantics:

the meaning of a let expression (let ((x e0)) e) is the meaning of e (the body of the let) when x means the value of e0 (the right hand side of the let),
the meaning of a variable x depends on the context in which it is bound. It means the value of the right-hand side of the nearest enclosing let expression that binds x. If there is no such enclosing let expression, the variable is meaningless.

Let us understand these rules with a few examples:

x: this expression is meaningless on its own.
(let ((x 7)) x): this means 7, since the body expression, x, means 7 because the nearest enclosing binding for x is to 7, which means 7.
(let ((x 7)) 2): this means 2 since the body expression, 2, means 2.
(let ((x 7)) (add1 x)): this means 8 since the body expression, (add1 x), means one more than x and x means 7 because the nearest enclosing binding for x is to 7.
(let ((x (add1 7))) x): this means 8 since the body expression, x, means 8 because the nearest enclosing binding for x is to (add1 7) which means 8.
(let ((x 7)) (let ((y 2)) x)): this means 7 since the body expression, (let ((y 2)) x), means 2 since the body expression, x, means 7 since the nearest enclosing binding for x is to 7.
(let ((x 7)) (let ((x 2)) x)): this means 2 since the body expression, (let ((x 2)) x), means 2 since the body expression, x, means 2 since the nearest enclosing binding for x is to 2.
(let ((x (add1 x))) x): this is meaningless, since the right-hand side expression, (add1 x) is meaningless because x has no enclosing let that binds it.
(let ((x 7)) (let ((x (add1 x))) x)): this means 8 because the body expression (let ((x (add1 x))) x) means 8 because the body expression, x, is bound to (add1 x) is in the nearest enclosing let expression that binds x and (add1 x) means 8 because it is one more than x where x is bound to 7 in the nearest enclosing let that binds it.

Make sure you have a good understanding of how binding works in these examples before moving on. Remember: you can always check your understanding by pasting expressions into Racket and seeing what it produces, or better yet, write examples in DrRacket and hover over identifiers to see arrows between variable bindings and their occurrences.

One thing that should be clear from these examples is that the meaning of a sub-expression is not determined by the form of that expression alone. For example, x could mean 7, or it could mean 8, or it could be meaningless, or it could mean 22, etc. It depends on the context in which it occurs. So in formulating the meaning of an expression, this context must be taken into account.

A good analogy where context matters for the meaning of an expression is algebra, where it is common to write expressions as given x = 7 + 3, find the value of expressions like 2x + 7. In such cases, you would first find the value of x to be 10 and substitute into the given expression to find it is 27. Turns out, we can given meaning to let expression in a similar manner as shown in the E-Let rule:

Fraud E-Let

The above rule states, to give meaning to a let expression, we first need to give meaning to the binding expressions e1 as v1, then all we need to do is substitute all occurences of the variable x in body of the let expression. We are using the [x/v1]e2 notation to denote substitute all occurences of x with v1 in e2. Then all we have to do is define how substitution behaves. It was simple for algebra, but for programming language it is more involved.

Let us define substition as a function subst that takes what variable it is substituting, with the value it is substituting, and in the expression it is substituting. It computes a new version of the in expression where all occurences of what has been substituted by with:

Fraud Substitution

The subst function keeps values as-is, as there is nothing to substitute. All the other cases, it does what we expect, recursively applies subst to all subexpressions to replace all occurences. Only two cases are interesting, the ones for Var and Let. For the Var case, we only substitute a variable if it has the same name as the one we are subtituting. For Let, if a let binding is redefining the same variable with a new expression, we only substitute it in the binding expression. We do not substitute in the body of the let, as it should be substituted with the new redefined value, when we give meaning to the redefining let. Conversely, if the let binding is for any other variable, it does not matter and we can substitute in both the binding expression and the body of the let.

Note, that we did not give any meaning to variables x yet. Look at the Var case for subst, and you will notice all occurences of a bound variables will be substituted with their values as we give meaning to an expression. That will leave us only with programs, where we have unbound variables or free variables. For example, (add1 x) or (let ((x 5)) y) where x and y are free variables respectively. Such free variables result in an error.

Fraud E-Var

Interpreter via substitution

Note: Complete implementation of the code shown in this section can be found on the fraud-subst implementation on GitHub.

With the formal semantics defined, we can translate that to our interpreter code trivially. First let us translate the subst definition to a program:

;; subst :: Symbol -> Expr -> Expr -> Expr
(define (subst what with in)
  (match in
    [(Val v) (Val v)]
    [(Var x) (if (eq? x what)
                 with
                 (Var x))]
    [(UnOp u e) (UnOp u (subst what with e))]
    [(BinOp b e1 e2) (BinOp b (subst what with e1)
                              (subst what with e2))]
    [(If e0 e1 e2) (If (subst what with e0)
                       (subst what with e1)
                       (subst what with e2))]
    [(Let x e1 e2) (if (eq? x what)
                       (Let x (subst what with e1) e2)
                       (Let x (subst what with e1) (subst what with e2)))]))

Then we have to add new cases to the interpreter to reflect the above inference rules using subst:

;; interp :: Expr -> Val
(define (interp e)
  (match e
    [(Val v) v]
    [(Var x) (raise (Err "Unbound identifier"))]
    [(UnOp u e) (interp-unop u e)]
    [(BinOp b e1 e2) (interp-binop b e1 e2)]
    [(If e1 e2 e3) (interp-if e1 e2 e3)]
    [(Let x e1 e2) (interp (subst x (Val (interp e1)) e2))]))

The above interpreter only has two new cases for Var and Let directly reflecting the rules with relation. We can try running this on a few examples:

> (interp-err (parse '(let ((x 7)) x)))
7
> (interp-err (parse '(let ((x 7)) (let ((x (add1 x))) x))))
8
> (interp-err (parse '(let ((x 7)) y)))
'#s(Err "Unbound identifier")

Thus we have a working implementation of let and variables. When the interpreter sees an identifier, you might have had a sense that it needs to “look it up”. Not only did this interpreter not look up anything, we defined its behavior on variables to be an error! While absolutely correct, this is also a little surprising. More importantly, we write interpreters to understand and explain languages, and this implementation might strike you as not doing that, because it doesn’t match our intuition.

There’s another problem with substitution, which is the number of times it has traverse the source program AST. It would be nice to have to traverse only those parts of the program that are actually evaluated and only when necessary. But substitution traverses everything, like unvisited branches of conditionals, and forces the program to be traversed once for substitution and once again for interpretation.

Environments

Let us follow our intuition about looking up variables. To look up variables for values, we would need to store them in some directory. We can create a map from variables to their values, which we call the environment. This fits our mental model: when we declare variables we store them in the environment and when we evaluate variables we look up their values the environment. This also addresses our second concern about traversing the source program too many times. Storing a variable in the environment merely denotes our intent to substitute the identifier later on. Only when we evaluate a variable do we substitute it by looking up in the environment. In practice, all commonly used languages use the environment in their implementation.

We will denote the environment as E. Storing a variable in the environment is denoted as , while lookup of a variable x is denoted as . With our environment defined, we can now redefine the meaning of let in terms of the environment:

Fraud E-Let

The E ⊢ is now a part of all rules in our language. The ⊢ symbol is called the turnstile. In this context, you can read this as “Given an environment E, (Let x e1 e2) means v if e1 means v1 in the same enviroment E and e2 means v under a new environment same as E with x mapped to v1”. Now the rule for let matches our intuition of storing a value to a variable.

Similarly, the rule for interpreting a variable would be:

Fraud E-Var

This rule is read in a similar fashion: given an environment E, the meaning of (Var x) is the value of looking up x in the environment E.

Note, how we did not define a new programming language for this change. We did not need to define a new concrete syntax, abstract syntax or the parser. All we are changing is the implementation strategy. We can now write an interpreter that uses environments.

Interpreter with environments

Note: Complete implementation of the code shown in this section can be found on the fraud implementation on GitHub.

We will use Racket’s lists to define our environment. One could use Racket’s hash tables to implement environments as well.

For our implementation, an empty enviroment is '(). Storing a variable to a value stores a list in the environment. So an environment containing x = 2 will be denoted as '((x 2)). Storing y = 5 in the same environment, will result in a new environment '((y 5) (x 2)). The same variables can be redefined as well. If we re-bind x = 42 in the environment, it will be stored as '((x 42) (y 5) (x 2)). We define a function store to define this operation:

;; store :: Env -> Symbol -> Val -> Env 
(define (store env x val)
  (cons (list x val) env))

Similarly, our look up operation on the environment will walk the list and return the first occurence of the identifier. If it is not found it will raise an error:

;; lookup :: Env -> Symbol -> Val
(define (lookup env x)
  (match env
    ['() (raise (Err "Unbound identifier"))]
    [(cons (list y val) rest) (if (eq? x y) val
                                  (lookup rest x))]))

Just like we did for substitution, we have to define two new cases in the interpreter for the Var and the Let case. Note, just like how our inference rules were updated when we defined the meaning of the language to include an environment, our interp and similar functions are also updated to include env as a parameter denoting the environment. For (Var x), all we do is call lookup. For (Let x e1 e2), we interpret the body e2, after we interpret the binding expression e1 in the provided environment and store the result in the new environment.

;; interp :: Env -> Expr -> Val
(define (interp env e)
  (match e
    [(Val v) v]
    [(Var x) (lookup env x)]
    [(UnOp u e) (interp-unop env u e)]
    [(BinOp b e1 e2) (interp-binop env b e1 e2)]
    [(If e1 e2 e3) (interp-if env e1 e2 e3)]
    [(Let x e1 e2) (interp
                    (store env x             ; env will be updated
                           (interp env e1))  ; after e1 is evaled in old env
                    e2)]))                   ; e2 evaluated in updated env

We can try running this on the same set of examples:

> (interp-err (parse '(let ((x 7)) x)))
7
> (interp-err (parse '(let ((x 7)) (let ((x (add1 x))) x))))
8
> (interp-err (parse '(let ((x 7)) y)))
'#s(Err "Unbound identifier")

If substitutions and environments result in the same thing, why did we take the long winded way to learn both? Substitutions take an expression-centric view on bindings, i.e., all operations are done at the expression level. Environment, on the other hand, take an abstract machine view of bindings: there is some environment in your machine that you can store or do lookup in while interpreting you program. Having two ways to realize the same thing is great as we can cross-check our implementation against both to see if they agree. If they do not at least one of them is wrong. We can also prove that both approaches will give the same result for all possible programs in our language, but that is out of scope for this class.

Definitions

Some definitions related to terms used in this article:

Instance: Any usage of an identifier
Binding instance: The instance of an identifier where it is defined and given a value
Scope: The code region where an identifier bound
Bound instance: An instance of an identifier occurring the scope of a binding instance for that identifier
Free: An instance of an identifier occurring outside the scope of a binding instance for that identifier
Substitution: Replacing an identifier by its value
Environment: A map of bindings
Binding: An identifier/value pair added to an environment when evaluating a binding instance

Testing

We can write a few cases for both the interpreters to see if they behave as we would expect:

(check-equal? (interp-err (parse '(let ((x 1)) (+ x 3)))) 4)
(check-equal? (interp-err (parse '(let ((x 1))
                                    (let ((y 2))
                                      (+ x y))))) 3)
(check-equal? (interp-err (parse '(let ((x (add1 6)))
                                    (let ((x (+ 6 x)))
                                      (/ x 2))))) 6)
(check-equal? (interp-err (parse '(let ((x (add1 6)))
                                    (let ((x (+ 6 x)))
                                      (/ x y)))))
              (Err "Unbound identifier")))

We can extend this to check if both our substitution and environment-based interpreter agree for a set of programs.