Dev Tips & Tricks

Building a language interpreter in JavaScript - Bonus Part

Back to posts

Welcome to the bonus part of the JavaScript interpreter. We have finished with our interpreter implementation using pure JavaScript and we got the printly running.

In this bonus part, we will show you a better way to do this - using a library!

Our current implementation will definitely work for a lot of use cases and we can easily extend it, but it would be really nice if we had an easier way of implementing this whole thing. Well, now that we have implemented it we have gained enough understanding so that we can use a library with great success.

The library we will be using is called peggyjs. This is a parser generator for JavaScript where we just define our grammar and it will generate the parser for us.

PeggyJS is using pretty much the same parsing method we used in our implementation called Parsing Expression Grammar (PEG).

To install it we run:

npm install --save-dev peggy

This will install a peggy CLI in our node_modules folder. We can also install it globally by running:

npm install -g peggy

And that is pretty much it. We install it as a dev dependency because we will only need it to generate a parser, it will generate a library JS script for us which will contain a complete implementation of our grammar.

Defining the grammar

PeggyJS uses a language similar to what we used in our grammar.

Here is an example of PeggyJS rule:

Rule = Token1 Space Token2

Token1 = "hello"
Token2 = "world"
Space = " "

This means that our rule needs to have Token1 then Space and after that Token2. Looking at what Token1, Token2, and Space resolve to, we can only match text hello world.

I've written Token1, Token2, and Space separately like this but PeggyJS also allows this to be written directly in the rule so this will work too:

Rule = "hello" " " "world"

Writing zero or more rules is easy:

Rule = Token1*
Token1 = "value"

Meaning we can match text value or valuevaluevaluevalue and so on.

We can also use regex directly as a rule:

Variable = [a-z][a-zA-Z0-9]*

And we implemented regex for finding names in our language :)

Aside from zero or more *, one or more + and optional ?, work exactly as we used them in our example.

For taking one rule or another we use a different character though / instead of |:

Rule = Name / Number

Name = [a-z][a-zA-Z0-9]*
Number = [0-9]+

And one small but important thing. We can write JavaScript directly as a part of the rule!

Rule = Name / Number

// We return complete name as a text.
Name = [a-z][a-zA-Z0-9]* { return text(); }

// We return a number directly as a JS parsed number.
Number = [0-9]+ { return parseFloat(text())}

And since we can use JS we can also mark specific parts of the rule with variable names we can use directly in JavaScript part:

Rule = "(" left:Name " " right:Number ")" { return {
  type: 'our rule',
  left, // we have access to the result of Name as a variable named left here
  right // and also we have access to the result Number rule as a variable right here
} }

// We return complete name as a text.
Name = [a-z][a-zA-Z0-9]* { return text(); }

// We return a number directly as a JS parsed number.
Number = [0-9]+ { return parseFloat(text())}

So, let's define the grammar for our printly:

{
    // PeggyJS allows us to run JS at the beginning to
    // define some functions at the top.
    // We will use it to define one function for all of our
    // binary operations since they need to resolve in the same way.
    const parseExpression = (head, tail) => {
        return tail.reduce(
            (result, element) => ({
                type: 'operation',
                operation: element[1],
                left: result, 
                right: element[3], 
            }), 
            head
        );
    }
}

// We need to have a different initial rule
// because we are parsing all statements at once
Program = _ head:LineStatement _ tail:(LineStatement _)* {
    return [
        head,
        ...tail.map(s => s[0])
    ];
}

// LineStatement -> IfExpressionStatement | AssignmentStatement | FunctionStatement
LineStatement = IfExpressionStatement / AssignmentStatement / FunctionStatement

// IfExpressionStatement -> IfKeyword PStart Expression PEnd CodeBlock
IfExpressionStatement "if statement" = IfKeyword _ PStart  _ check:Expression  _ PEnd _ statements:CodeBlock { return {
    type: 'if',
    check,
    statements
} }

// CodeBlock -> BStart LineStatement* BEnd
CodeBlock = BStart  _ statements:Program  _ BEnd { return statements; }

// FunctionStatement -> FunctionExpression Eol
FunctionStatement "function" = statement:FunctionExpression _ Eol { return statement; }

// FunctionExpression -> Name PStart FunctionParameters? PEnd
FunctionExpression = name:Name _ PStart  _ parameters:FunctionParameters?  _ PEnd { return {
    type: 'function',
    name,
    parameters
}; }

// FunctionParameters -> Expression (Comma Expression)*
// Since tail is an array of expressions each having a comma
// we need to map it to extract only the expression
FunctionParameters = head:Expression _ tail:( _ Comma  _ Expression)* { return [head, ...tail.map(p => p[3]) ]; }

// AssignmentStatement -> Name Equals Expression Eol
AssignmentStatement "assignment" = name:Name _ Equals  _ expression:Expression  _ Eol { return {
    type: 'assignment',
    name,
    expression
}; }

// Expression -> EqualityTerm ((And | Or) EqualityTerm)*
Expression = head:EqualityTerm _ tail:( _ (And / Or)  _ EqualityTerm)*  { return parseExpression(head, tail); }

// EqualityTerm -> RelationTerm ((DoubleEquals | NotEquals) RelationTerm)*
EqualityTerm = head:RelationTerm _ tail:( _ (DoubleEquals / NotEquals)  _ RelationTerm)*  { return parseExpression(head, tail); }

// RelationTerm -> AddSubTerm ((Less | Greater | LessEquals | GreaterEquals) AddSubTerm)*
RelationTerm = head:AddSubTerm _ tail:( _ (Less / Greater / LessEquals / GreaterEquals) _ AddSubTerm)*  { return parseExpression(head, tail); }

// AddSubTerm -> MulDivTerm ((Add | Subtract) MulDivTerm)*
AddSubTerm = head:MulDivTerm _ tail:( _ (Add / Subtract) _ MulDivTerm)*  { return parseExpression(head, tail); }

// MulDivTerm -> UnaryTerm ((Multiply | Divide) UnaryTerm)*
MulDivTerm = head:UnaryTerm _ tail:( _ (Multiply / Divide) _ UnaryTerm)* { return parseExpression(head, tail); }

// UnaryTerm -> Not? Factor
// optional value return null if they are not matched
UnaryTerm = withNot:Not? _ value:Factor { return {
    type: 'unary',
    withNot: withNot !== null,
    value
} }

// Factor -> GroupExpression | FunctionExpression | NumberExpression | VariableExpression | StringExpression
Factor = GroupExpression / FunctionExpression / NumberExpression / VariableExpression / StringExpression

// GroupExpression -> PStart Expression PEnd
GroupExpression = PStart _ Expression _ PEnd

// VariableExpression -> Name
VariableExpression = name:Name { return {
    type: 'variable',
    name
} }

// NumberExpression -> Number
NumberExpression = value:Number { return {
    type: 'number',
    value
} }

// StringExpression -> String
StringExpression = value:String { return {
    type: 'string',
    value
} }

// String parsing
String
  = "'" chars:Characters* "'" { return chars.join('') }

// Characters unless ' or Escaped
Characters
  = !("'" / "\\") char:. { return char }
  / "\\" "'" { return "'" }


// Direct regex check for name
Name =  [a-z][a-zA-Z0-9]* { return text() }

// Direct number check for numbers
Number = [0-9]+ { return text() }

// Other rules which resolve to tokens by exact match
IfKeyword = "if"
And = "&&"
Or = "||"
Not = "!"
DoubleEquals = "=="
NotEquals = "!="
Less = "<"
Greater = ">"
LessEquals = "<="
GreaterEquals = ">="
Multiply = "*"
Divide = "/"
Add = "+"
Subtract = "-"
Comma = ","
Equals = "="
PStart = "("
PEnd = ")"
BStart = "{"
BEnd = "}"
Eol = ";"

_ "whitespace"
  = [ \t\n\r]* { return null }

One thing we needed to add is the whitespace rule _. Since PeggyJS works strictly we need to define what we allow whitespace between certain rules.

And we are done.

Now, we save this rule in a file printly.peggy and run the following command:

npx peggy printly.peggy -o printly-parser.js

PeggyJS parser will run and generate a parser for our language in printly-parser.js and most of our work is done.

We just need to modify our index.js a bit:

// Code which we want to parse
const code = `
firstName = 'John';
lastName = 'Smith';
print('Hello, ' + firstName + ' ' + lastName);
`;


// We include our parser
const { parse } = require('./printly-parser');

// Everything gets parsed here and
// transformed into statements
const statements = parse(code);

// Import the interpreter
const interpret = require('./interpreter');

// We create a virtual state machine object
const vm = {
    variables: {},
    functions: {
        print(message) { // We add a print function so that we can call a function from our code.
            console.log('MESSAGE:', message); 
        }
    }
};
// Interpret the statements and return last result.
const result = interpret(statements, vm);

// And finally we output the result
console.log('Result:')
console.dir(result, {depth: null});

Aaaand we're done!

You can run the code in index.js and our language should work exactly the same.

If you wish to see this part in the repository, check out the branch bonus-chapter1.

As you saw in all of the previous parts (Part1, Part2, Part3), making a programming language takes a lot of work and there are not many shortcuts you can take. Thankfully PeggyJS makes a lot of that way easier to do, and now since we have a better understanding of what goes into making a parser from scratch we can appreciate the work it does for us and it will make it a lot easier to figure out any potential bugs which may arise during program development.

If you want to understand more about the building programming language you can check out https://craftinginterpreters.com/ which has a whole book worth of many details on what goes into creating one.

Thank you for reading! Follow our 2am.blog for more articles like this one.

Share with
Terms of Use | Privacy Policy