Constructing a programming language from scratch in a number of hours
12 hours in the past
The world is filled with programming languages with every kind of various use circumstances. Most of those languages, although, are constructed for very basic functions — generally, we could wish to design a language to suit a really particular use case (e.g. Fb designed React to make it simpler to develop their internet purposes, and Apple not too long ago developed Pkl, a language designed to make configurations simpler. There are various extra examples of this in numerous fields). As such, figuring out how you can construct a programming language is a helpful talent to have in your belt.
On this article, we’ll construct an interpreted programming language from scratch and be taught slightly bit about each the lambda calculus and programming languages as an entire alongside the best way. The language we construct right here can be pretty esoteric, however the course of ought to hopefully provide you with an thought of how you can design your individual use-case-specific languages (and educate you helpful details about how programming languages operate underneath the hood).
Since we’re constructing an interpreted language¹, our overarching move will look one thing like this:
Mainly, we begin with some concrete syntax (code) written in our goal language (the language that we are attempting to put in writing), cross it to some parser that converts it to an summary syntax tree (a tree illustration of the code that’s simple to work with), after which cross that to an interpreter that “runs” the summary syntax tree to offer us a closing consequence. Observe that the parser and interpreter are written in some already current host language — C’s unique parser and compiler, for example, had been written in meeting.
** Observe: My use of “parser” right here encapsulates the whole parsing course of. Normally, lexing is completed previous to “parsing”, however on this case parsing is simply the method of taking concrete syntax to summary syntax, no matter that will appear like.
For example, think about the next specification for a easy language for primary arithmetic:
EXPR = quantity| EXPR + EXPR| EXPR – EXPR| EXPR * EXPR| EXPR / EXPR| (EXPR)
The above, by the best way, is an EBNF for a context-free grammar². I gained’t delve too deeply into what this implies right here, however all programming languages written in a kind like this are parse-able in polynomial time through the CYK algorithm. For this EBNF, one thing like (4 + 4) * 3 is a sound program, however one thing like def f(x): return 5; f(5) just isn’t.
Suppose we at the moment are given the concrete syntax (4 + 4) * 3 . After parsing, we should always get an summary syntax tree (AST) like this:
Then our interpreter will begin on the root and recursively go down the tree till we get our reply, specifically 24.
Observe shortly that this grammar is ambiguous — for example, how ought to the expression 4 + 4 * 3 parse? It might both parse because the above (that’s, (4 + 4) * 3), or it might additionally parse as 4 + (4 * 3) — neither parsing is inherently extra “right” in the best way that now we have specified the language, as each are legitimate parse timber. In circumstances like this, the parser should arbitrarily decide about how you can parse our language.
By the move chart, our first step ought to logically be to design our concrete syntax. What you select to make your syntax is totally as much as you. I made a decision to create EmojiLang, a (horrible) language that ensures a particularly colourful display screen while you’re typing. The grammar is beneath:
grammar EmojiLang;
program: ‘🏃♂️🏃♂️🏃♂️’ expr ‘🛑🛑🛑’ EOF;
expr: ‘(‘ (ID| atom| ifCond| anonFunctionDefn| funApplication| varAssignment| READ_FLOAT| READ_STRING| printExpr| sequentialOp| baseComputation) ‘)’;
atom: NUMBER | BOOLEAN | STRING;ifCond: ‘🤔’ cond=expr ‘❓’ ontrue=expr ‘:’ onfalse=expr;anonFunctionDefn: ‘🧑🏭’ arg=ID ‘⚒️’ physique=expr;funApplication: ‘🏭’ enjoyable=expr arg=expr;varAssignment: ‘📦’ var=ID ‘🔜’ val=expr;printExpr: ‘🖨️’ expr;sequentialOp: ‘📋’ first=expr second=expr;baseComputation: left=expr op=(‘➕’ | ‘➖’ | ‘✖️’ | ‘➗’ | ‘🟰’ | ‘≤’) proper=expr;
ID: [a-zA-Z_][a-zA-Z0-9_]*;NUMBER: [0-9]+ (‘.’ [0-9]+)?;BOOLEAN: ‘👍’ | ‘👎’;STRING: ‘”‘ ~[rn”]* ‘”‘;READ_FLOAT: ‘🔢’;READ_STRING: ‘🔤’;
WHITESPACE: [ trn]+ -> skip;COMMENT: ‘😴’ .*? ‘⏰’ -> skip;LINE_COMMENT: ‘🥱’ ~[rn]* -> skip;
** Observe: the above specification is written for use in a device referred to as ANTLR, we’ll get again to this very quickly.
This language is, after all, ridiculous, however it’s fascinating for a few causes. Firstly, all of our expressions are required to be parenthesized. This makes it extraordinarily annoying to code, but in addition makes our grammar non-ambiguous. Secondly, discover how we are able to solely outline nameless features — there isn’t a equal syntax for one thing like def in Python. Lastly, all features on this language (aside from the bottom computations) have precisely one argument. We’ll discover the implications of the final two design choices once we mess around with our language in a bit.
We are able to, after all, write our personal parser. Fortunately although, there are instruments that may parse arbitrary context-free grammars. For this tutorial, we’ll use ANTLR (you possibly can obtain it right here). ANTLR is a really good and easy-to-use device that takes grammar specs just like the above and creates parsers in quite a lot of goal languages (on this tutorial, we can be utilizing Python).
Utilizing it’s pretty easy, simply observe the next steps:
Obtain the ANTLR Java binaries from hereCreate a .g4 file (just like the above) to your grammar. Observe that ANTLR can’t really deal with emojis very nicely, so in the event you plan to make use of emojis in your language, run the next python script to demojify your grammar:import emojiimport sys
def demojify_file(input_file, output_file):with open(input_file, “r”, encoding=”utf-8″) as f:input_text = f.learn()
input_text = emoji.demojize(input_text)
with open(output_file, “w”, encoding=”utf-8″) as f:f.write(input_text)
if __name__ == “__main__”:input_file = sys.argv[1]output_file = sys.argv[2]demojify_file(input_file, output_file)
3. Run java -Xmx500M -cp <path_to_antlr.jar> org.antlr.v4.Device -Dlanguage=Python3 <your_grammar.g4> to generate your parser
You’ll be able to then import the generated parsing information and use them as follows:
from antlr4 import *from EmojiLangLexer import EmojiLangLexerfrom EmojiLangParser import EmojiLangParserfrom EmojiLangListener import EmojiLangListenerimport emoji
if __name__ == “__main__”:input_file = sys.argv[1]with open(input_file, “r”, encoding=”utf-8″) as f:input_text = f.learn()
input_text = emoji.demojize(input_text)input_stream = InputStream(input_text)lexer = EmojiLangLexer(input_stream)token_stream = CommonTokenStream(lexer)parser = EmojiLangParser(token_stream)tree = parser.program()
if parser.getNumberOfSyntaxErrors() > 0:exit(1)
You most likely gained’t need to do the demojizing step wherein case you should utilize antlr4’s FileStream as an alternative of the InputStream , but it surely actually doesn’t matter. Now, now we have a really good summary syntax tree that’s simple to work with, and we are able to transfer on to the onerous half — interpreting³
As a result of we’re working with timber, our interpreter will naturally be a recursive entity. We do have some selections although on how precisely we implement a few of its options. For this tutorial, we’ll construct an interpreter that makes use of environments that bind variables to addresses after which a mutable retailer that maps addresses to values. This concept is pretty frequent, although not ubiquitous, and it permits us to take care of correct scope and help variable mutation. For ease of implementation, we may also make our interpreter return a standard worth construction.
Values, Shops, and the Surroundings
First, let’s outline what our interpreter can output. Now we have three apparent base circumstances in our EBNF (specifically booleans, strings, and numbers) so let’s create worth objects for these:
class Worth:def __init__(self, worth):self.worth = worth
def __str__(self) -> str:return str(self.worth)
class NumValue(Worth):def __init__(self, worth: float):tremendous().__init__(worth)
class StringValue(Worth):def __init__(self, worth: str):tremendous().__init__(worth)
class BoolValue(Worth):def __init__(self, worth: bool):tremendous().__init__(worth)
To retailer mappings of variables to values, we may also create an setting and a retailer:
class EnvLookupException(Exception):cross
class Surroundings:def __init__(self):self.vars = {}
def set_var(self, title, addr: int):self.vars[name] = addr
def get_var(self, title):if title not in self.vars:elevate EnvLookupException(f”Variable {title} not present in setting”)return self.vars[name]
def copy(self):new_env = Surroundings()new_env.vars = self.vars.copy()return new_env
class Retailer:def __init__(self):self.retailer = []
def alloc(self, worth: Worth):self.retailer.append(worth)return len(self.retailer) – 1
def get(self, addr: int):if addr >= len(self.retailer):elevate EnvLookupException(f”Deal with {addr} not present in retailer”)return self.retailer[addr]
def set(self, addr: int, worth: Worth):if addr >= len(self.retailer):elevate EnvLookupException(f”Deal with {addr} not present in retailer”)self.retailer[addr] = worth
Successfully, our surroundings will retailer variable → tackle bindings, and our retailer will maintain tackle → worth bindings. The shop is probably not mandatory with our present characteristic set, but when we allowed for mutation for pass-by-reference variables, it could be useful⁴.
Ideally, we’d additionally wish to cross features as variables, so we want yet another worth to signify them. To do that, we create a closure, which incorporates the operate’s parameter, physique, and the setting that it was created in:
class ClosureValue(Worth):# Physique ought to be an antlr4 treedef __init__(self, param: str, physique: object, env: ‘Surroundings’):tremendous().__init__(None)self.param = paramself.physique = bodyself.env = env
def __str__(self) -> str:return f”<operate>”
It’s possible you’ll moderately ask about why we want the setting saved within the operate. Suppose as an alternative that we had a operate worth like this:
class FunctionValue(Worth):# Physique ought to be an antlr4 treedef __init__(self, param: str, physique: object):tremendous().__init__(None)self.param = paramself.physique = physique
def __str__(self) -> str:return f”<operate>”
Now, suppose that we had code equal to the next in our language:
# —————-# ENV MUST PERSIST# —————-def f(x):def g(y):return x + yreturn g(x)
print((f(4))(5)) # 9
# —————-# ENV MUST CLEAR# —————-def f2(x):return x + y
def g2(y):return f(5)
print(f(4)) # Ought to crash
To make sure that y remains to be in scope for g within the prime case, we must implement dynamic scoping (scope the place variables are added to the setting as this system runs with out being cleared) with out closures, that means that the underside code would really run and print 9. For the underside code to correctly crash although, we are able to’t implement dynamic scope. Thus we wish the features to successfully bear in mind what setting they had been created in — therefore why we add environments to our closure class⁵.
The Interpreter
Now, we’re prepared to put in writing our precise interpreter. In ANTLR, our interpreter will prolong the EmojiLangListener class. We may also must create a top-level setting and provides the interpreter a retailer:
class EmojiLangException(Exception):cross
TOP_ENV = Surroundings()
class Interpreter(EmojiLangListener):def __init__(self):self.retailer = Retailer()
Now, we have to create an interp technique that handles all of our expression circumstances. It’ll find yourself trying one thing as follows:
def interp(self, prog, env: Surroundings) -> Worth:if prog.ID():return self.interp_id(prog.ID())elif prog.atom():return self.interp_atom(prog.atom())elif prog.anonFunctionDefn():return self.interp_function_defn(prog.anonFunctionDefn())elif prog.READ_FLOAT():return self.interp_read_float()elif prog.READ_STRING():return self.interp_read_string()elif prog.printExpr():return self.interp_print_expr()elif prog.ifCond():return self.interp_if(prog.ifCond(), env)elif prog.sequentialOp():return self.interp_sequential_op(prog.sequentialOp(), env)elif prog.baseComputation():return self.interp_base_computation(prog.baseComputation(), env)elif prog.varAssignment():return self.interp_var_assignment(prog.varAssignment(), env)elif prog.funApplication():return self.interp_fun_application(prog.funApplication(), env)
Our base circumstances (IDs, atoms, operate definitions, studying, and printing) are pretty easy, so we are able to simply write them in:
def interp(self, prog, env: Surroundings) -> Worth:if prog.ID():return self.retailer.get(env.get_var(prog.ID().getText()))elif prog.atom():return self.interp_atom(prog.atom())elif prog.anonFunctionDefn():return ClosureValue(prog.anonFunctionDefn().arg.textual content, prog.anonFunctionDefn().physique, env)elif prog.READ_FLOAT():attempt:return NumValue(float(enter(“> “)))besides ValueError:elevate EmojiLangException(“Anticipated float enter”)elif prog.READ_STRING():return StringValue(enter(“> “))elif prog.printExpr():worth = self.interp(prog.printExpr().expr(), env)if isinstance(worth, StringValue):# print with out quotesprint(str(worth)[1:-1])else:print(worth)return worth# …
def interp_atom(self, atm):if atm.NUMBER():return NumValue(float(atm.NUMBER().getText()))elif atm.BOOLEAN():return BoolValue(atm.BOOLEAN().getText() == “:thumbs_up:”)elif atm.STRING():return StringValue(atm.STRING().getText())# THIS SHOULD NEVER HAPPENraise EmojiLangException(f”Unknown atom {atm.getText()}”)
Our if situation can be pretty easy. We simply must interpret the situation after which return the results of decoding the true case if it is true and the false case if its false. The code is thus simply
def interp_if(self, if_cond, env: Surroundings):cond = self.interp(if_cond.cond, env)if not isinstance(cond, BoolValue):elevate EmojiLangException(f”Anticipated boolean when evaluating if situation, acquired {cond}”)return self.interp(if_cond.ontrue if cond.worth else if_cond.onfalse, env)
Sequential operations are equally easy, they simply must interpret the primary expression after which the second. We are able to thus exchange the code in that block as follows:
def interp(self, prog, env: Surroundings) -> Worth:# …elif prog.sequentialOp():self.interp(prog.sequentialOp().first, env)return self.interp(prog.sequentialOp().second, env)# …
Subsequent are the bottom computations. It is a first rate quantity of code since we have to deal with a number of operations, but it surely’s not tremendous advanced:
def interp_base_computation(self, base_computation, env: Surroundings):left, proper = self.interp(base_computation.left, env), self.interp(base_computation.proper, env)if base_computation.op.textual content == “:plus:”:if isinstance(left, NumValue) and isinstance(proper, NumValue):return NumValue(left.worth + proper.worth)elif isinstance(left, StringValue) and isinstance(proper, StringValue):return StringValue(left.worth + proper.worth)elevate EmojiLangException(f”Can not add {left} and {proper}”)if base_computation.op.textual content == “:heavy_equals_sign:”:if sort(left) != sort(proper):return BoolValue(False)if isinstance(left, ClosureValue):elevate EmojiLangException(“Can not evaluate features”)return BoolValue(left.worth == proper.worth)
# NUMBERS ONLY COMPUTATIONSif not isinstance(left, NumValue) or not isinstance(proper, NumValue):elevate EmojiLangException(f”Anticipated numbers when evaluating base computation, acquired {left} and {proper}”)if base_computation.op.textual content == “:minus:”:return NumValue(left.worth – proper.worth)elif base_computation.op.textual content == “:multiply:”:return NumValue(left.worth * proper.worth)elif base_computation.op.textual content == “:divide:”:if proper.worth == 0:elevate EmojiLangException(“Division by zero”)return NumValue(left.worth / proper.worth)elif base_computation.op.textual content == “≤”:return BoolValue(left.worth <= proper.worth)
Maybe this will also be cleaned up a bit with the usage of e.g. dictionaries, however that’s not tremendous vital. Subsequent now we have variable assignments, that are additionally not too sophisticated. Now we have a pair selections right here on what precisely we wish it to do — specifically, ought to it enable new variables to come back into existence or solely modify current ones? I’ll select the latter case, which makes our code appear like this:
def interp_var_assignment(self, var_assign, env: Surroundings):worth = self.interp(var_assign.val, env)addr = env.get_var(var_assign.var.textual content)self.retailer.retailer[addr] = valuereturn worth
Lastly, now we have operate purposes. Right here, now we have to do 4 steps. First, we interpret the operate we’re calling to get our closure. Then, we interpret our argument. Then, we bind the argument into a duplicate of the closure’s setting. Lastly, we interpret the closure’s physique within the new setting. The code finally ends up trying as follows:
def interp_fun_application(self, fun_app, env: Surroundings):closure = self.interp(fun_app.enjoyable, env)if not isinstance(closure, ClosureValue):elevate EmojiLangException(f”Anticipated operate when evaluating operate software, acquired {closure}”)arg = self.interp(fun_app.arg, env)new_env = closure.env.copy()new_env.set_var(closure.param, self.retailer.alloc(arg))return self.interp(closure.physique, new_env)
Now, our interp is totally useful! We want solely modify our foremost program to run it now on a program:
if __name__ == “__main__”:input_file = sys.argv[1]with open(input_file, “r”, encoding=”utf-8″) as f:input_text = f.learn()
# Preprocess enter to switch emojis with demojized namesinput_text = emoji.demojize(input_text)
input_stream = InputStream(input_text)lexer = EmojiLangLexer(input_stream)token_stream = CommonTokenStream(lexer)parser = EmojiLangParser(token_stream)tree = parser.program()
if parser.getNumberOfSyntaxErrors() > 0:exit(1)
interpreter = Interpreter()interpreter.interp(tree.expr(), TOP_ENV)
We at the moment are lastly prepared to start out writing applications in our language. Right here’s a easy hiya world program within the emoji language (EML):
🏃♂️🏃♂️🏃♂️(🖨️ (“Whats up World!”))🛑🛑🛑
And to run it, we simply do
> python emoji_lang.py helloworld.emlHello World!
(the above, after all, assumes that this system is current in a file referred to as helloworld.eml )
Currying
Again within the first part, I famous that our programming language is fascinating as a result of features can solely take one argument. How then can we create an impact much like multivariate features? Take into account, for example, the next python code:
def f(x, y):return x + y
print(f(3, 4))
f right here has arity 2 — that’s, it takes two arguments. We are able to, nonetheless, write equal code that solely makes use of features of arity one (besides addition) as follows:
def f(x):def g(y):return x + yreturn g
print((f(3))(4))
The above idea of turning greater arity features into unary features known as currying. It really works for features of any arity — for a operate f of arity n, we are able to merely carry out currying n-1 occasions. In emoji language, this permits us to put in writing a program like this:
🏃♂️🏃♂️🏃♂️(📋(🖨️ (“Enter two numbers to compute their sum.”))(🖨️(🏭(🏭 (🧑🏭 x ⚒️ (🧑🏭 y ⚒️((x) ➕ (y))))(🔢))(🔢))))🛑🛑🛑
the Python translation of which is
print(“Enter two numbers to compute their sum.”)print(((lambda x: (lambda y: x + y)))(float(enter()))(float(enter())))
Or, extra legibly,
print(“Enter two numbers to compute their sum.”)
def f(x):def g(y):return x + yreturn g
x = float(enter())y = float(enter())
print(f(x)(y))
Discover additionally how the primary python iteration used no named features. It seems that we don’t really want them, although after all they’re helpful for readability. Then we get
> python emoji_lang.py currying.emlEnter two numbers to compute their sum> 4> 59.0
Recursion
How can we do a loop or recursion on this language although? Now we have no syntax for for or whereas , and with out names for features, it might appear to be recursion is unimaginable.
We are able to, nonetheless, do a neat little trick. Since features are values, we are able to cross features to themselves each time we name them, thus permitting them to name themselves.
Take, for example, the next python code:
n = int(enter())whereas n > 0:print(n)n -= 1
We are able to convert this to a recursive model utilizing one thing like this⁶:
def while_loop(situation, physique):”””A recursive implementation of some time loop.
Arguments————– situation: Some operate of zero arguments that returns a boolean value- physique: Some operate of zero arguments to run whereas the situation returns true”””if situation():physique()while_loop(situation, physique)else:return False
class Field:def __init__(self, n):self.n = n
def set_n(self, n):self.n = n
def get_n(self):return self.n
n = Field(int(enter()))
def physique():print(n.get_n())n.set_n(n.get_n() – 1)
while_loop(lambda: n.get_n() > 0, physique)
However we do have an issue right here — specifically, discover how while_loop calls itself. We are able to’t do this in our language with solely nameless features, so how can we repair that? The reply is that we are able to do one thing like this:
def while_loop(self, situation, physique):if situation():physique()self(self, situation, physique)else:return False
# …# (outline n as a field)# …
def physique():print(n.get_n())n.set_n(n.get_n() – 1)
def call_while(loop_func, situation, physique):loop_func(loop_func, situation, physique)
call_while(while_loop, lambda: n.get_n() > 0, physique)
In impact, we make while_loop take itself as a parameter. Then, we are able to name self inside while_loop , permitting while_loop to name itself recursively.
We nonetheless, after all, must lambda-fy and curry this for our language, so we have to make code equal to the next:
(((lambda while_loop: lambda n: while_loop(while_loop)(lambda bogus: n.get_n() > 0)(lambda bogus: print(n.get_n()) or n.set_n(n.get_n() – 1)))(lambda self:lambda cond:lambda physique:(physique(“BOGUS”) or self(self)(cond)(physique)) if cond(“BOGUS”) else False))(Field(int(enter()))))
It is a little bit of a doozy, but it surely does work. In emoji lang, this then turns into
🏃♂️🏃♂️🏃♂️(🏭(🏭(🧑🏭 whereas ⚒️(🧑🏭 n ⚒️(🏭 (🏭 (🏭 (whereas) (whereas))(🧑🏭 bogus ⚒️ (🤔 ((n) ≤ (0)) ❓ (👎) : (👍))))(🧑🏭 bogus ⚒️ (📋(🖨️ (n))(📦 n 🔜 ((n) ➖ (1))))))))😴Beneath is our whereas operate. Observe that it takesitself as an argument – this permits for recursion(there are different methods to do that, however recursion through selfpassing is pretty easy)
ARGS: 1. self(itself)2. condition_func (operate of zero arguments that returns a boolean)3. physique (operate of zero arguments that returns nothing to run whereas true)
RETURNS:false when completed⏰(🧑🏭 self ⚒️(🧑🏭 condition_func ⚒️(🧑🏭 physique ⚒️(🤔 (🏭 (condition_func) (“BOGUS”)) ❓ (📋 (🏭 (physique) (“BOGUS”)) (🏭 (🏭 (🏭 (self) (self))(condition_func))(physique))) : (👎))))))(🔢)) 🛑🛑🛑
Then we get
> python emoji_lang.py while_loop.eml> 44.03.02.01.0
Bonus: The Y Combinator
It’s considerably annoying to cross in whereas to itself every time we wish to name it, so what if we might create a model of whereas that already had itself curried? It seems we are able to with one thing referred to as the Y Combinator. The Y combinator is a operate that appears as follows:
Y = lambda g: (lambda f: g(lambda arg: f(f)(arg))) (lambda f: g(lambda arg: f(f)(arg)))
It’s fully absurd, but it surely permits us to successfully “retailer” a operate in itself. I gained’t go into an excessive amount of element about it, however in the event you’d wish to be taught extra I counsel taking a look at this glorious article
With the combinator although, we are able to change our code to the next:
🏃♂️🏃♂️🏃♂️(🏭(🏭(🏭(🧑🏭 y_combinator ⚒️(🧑🏭 whereas ⚒️(🧑🏭 n ⚒️(📋🥱y-combinate our whereas(📦 whereas 🔜 (🏭 (y_combinator) (whereas)))(🏭 (🏭 (whereas)(🧑🏭 bogus ⚒️ (🤔 ((n) ≤ (0)) ❓ (👎) : (👍))))(🧑🏭 bogus ⚒️ (📋(🖨️ (n))(📦 n 🔜 ((n) ➖ (1))))))))))😴Our y combinator operate – this permits for recursion with out e.g. self passingby successfully currying the operate and passing it to itself.⏰(🧑🏭 fn_nr ⚒️(🏭(🧑🏭 cc ⚒️(🏭 (fn_nr) (🧑🏭 x ⚒️ (🏭 (🏭 (cc) (cc)) (x)))))(🧑🏭 cc ⚒️(🏭 (fn_nr) (🧑🏭 x ⚒️ (🏭 (🏭 (cc) (cc)) (x))))))))(🧑🏭 whereas ⚒️(🧑🏭 condition_func ⚒️(🧑🏭 physique ⚒️(🤔 (🏭 (condition_func) (“BOGUS”)) ❓ (📋 (🏭 (physique) (“BOGUS”)) (🏭 (🏭 (whereas)(condition_func))(physique))) : (👎)))))) (🔢)) 🛑🛑🛑
Now, discover how our name to whereas after it has been y-combinated⁷ solely entails passing the situation and the physique — we don’t must cross itself. and we nonetheless get
> python emoji_lang.py y_comb_while.eml> 44.03.02.01.0
Congratulations! You’ve gotten now hopefully constructed your individual programming language and coded some enjoyable issues in it. Although one thing like EmojiLang clearly doesn’t have a lot actual use, you possibly can think about some cool use circumstances for constructing your individual languages, e.g. by creating an excellent case-specific scripting language to hurry up frequent duties.
Should you’d like some challenges, attempt the next workout routines:
Train 1: Construct a easy parser and interpreter for the next language with out utilizing ANTLR. Make sure that parenthesis all the time take priority, and that operations in any other case have equal priority (e.g. 4 + 4 * 3 ought to consider to 24 , not 16 )
EXPR = quantity| EXPR + EXPR| EXPR – EXPR| EXPR * EXPR| EXPR / EXPR| (EXPR)
Train 2: Modify your above code so as to add operator priority
Train 3 (Tough): We don’t need to make all of our features nameless. Attempt implementing an interpreter for the next language (you should utilize ANTLR, however you’ll have to put in writing your individual .g4 file):
program = (FUNDEF | EXPR)* // a number of operate definitions or expressions
// NOTE: <one thing> implies that one thing is a string// additionally, be at liberty to disregard whitespace or add semicolons or parenthesize// expressions/fundefs in the event you please
EXPR = quantity| functionApplication| computation
FUNDEF = ‘def’ <title> ‘(‘ <args>* ‘):’ EXPR
functionApplication = <title> ‘(‘ EXPR* ‘)’ // e.g. f(1, 2+2, g(3))computation = EXPR + EXPR| EXPR – EXPR| EXPR * EXPR| EXPR / EXPR| (EXPR)
Train 4 (Simple →Very Very Laborious): .g4 information for a ton of actual languages might be discovered right here. Implement an interpreter for any one in all them.
Please contact mchak@calpoly.edu for any inquiries
P.S. Thanks to Brian Jones, my CS 430 professor, for instructing me a number of these things.
All photographs by creator until in any other case acknowledged.