Coursework 5This coursework

2024-01-11

Coursework 5This coursework is worth 25% and is due on 12 January at 16:00. You are askedto implement a compiler targeting the LLVM-IR. Be careful that this CW needssome material about the LLVM-IR that has not been shown in the lectures andyour own experiments and research might be required. You can find information about the LLVM-IR at• https://bit.ly/3rheZYr• https://llvm.org/docs/LangRef.htmlYou can do the implementation of your compiler in any programming languageyou like, but you need to submit the source code with which you generatedthe LLVM-IR files, otherwise a mark of 0% will be awarded. You are asked tosubmit the code of your compiler, but also the generated .ll files. No PDFis needed for this coursework. You should use the lexer and parser from theprevious courseworks, but you need to make some modifications to them forthe ‘typed’ version of the Fun-language. I will award up to 5% if a lexer and aparser are correctly implemented.You will be marked according to the input files• sqr.fun• fact.fun• mand.fun• mand2.fun• hanoi.funwhich are uploaded to KEATS and Github.Exclamation-Triangle DisclaimerIt should be understood that the work you submit represents your own effort.You have not copied from anyone else. An exception is the Scala code I showedduring the lectures or uploaded to KEATS, which you can both use. You canalso use your own code from the CW 1 – CW 4. But do not be tempted to askGithub Copilot for help or do any other shenanigans like this!TaskThe goal is to lex and parse 5 Fun-programs, including the Mandelbrot programshown in Figure 1, and generate corresponding code for the LLVM-IR. Unfortunately the calculations for the Mandelbrot Set require floating point arithmeticand therefore we cannot be as simple-minded about types as we have been so1far (remember the LLVM-IR is a fully-typed language and needs to know theexact types of each expression). The idea is to deal appropriately with threetypes, namely Int, Double and Void (they are represented in the LLVM-IR asi32, double and void). You need to extend the lexer and parser accordinglyin order to deal with type annotations. The Fun-language includes global constants, such asval Ymin: Double = -1.3;val Maxiters: Int = 1000;where you can assume that they are ‘normal’ identifiers, just starting with acapital letter—all other identifiers should have lower-case letters. Function definitions can take arguments of type Int or Double, and need to specify a returntype, which can be Void, for exampledef foo(n: Int , x: Double) : Double = ...def id(n: Int) : Int = ...def bar () : Void = ...The idea is to record all typing information that is given in the Fun-program,but then delay any further typing inference to after the CPS-translation. Thatmeans the parser should generate ASTs given by the Scala dataypes:abstract class Expabstract class BExpabstract class Declcase class Def(name: String , args: List [( String , String )],ty: String , body: Exp) extends Declcase class Main(e: Exp) extends Declcase class Const(name: String , v: Int) extends Declcase class FConst(name: String , x: Double) extends Declcase class Call(name: String , args: List[Exp ]) extends Expcase class If(a: BExp , e1: Exp , e2: Exp) extends Expcase class Var(s: String) extends Expcase class Num(i: Int) extends Exp // integer numberscase class FNum(i: Double) extends Exp // floating numberscase class ChConst(c: Int) extends Exp // char constantscase class Aop(o: String , a1: Exp , a2: Exp) extends Expcase class Sequence(e1: Exp , e2: Exp) extends Expcase class Bop(o: String , a1: Exp , a2: Exp) extends BExpThis datatype distinguishes whether the global constant is an integer constantor floating constant. Also a function definition needs to record the return typeof the function, namely the argument ty in Def, and the arguments consist ofan pairs of identifier names and types (Int or Double). The hard part of the CWis to design the K-intermediate language and infer all necessary types in orderto generate LLVM-IR code. You can check your LLVM-IR code by running it2with the interpreter lli.Also note that the second version of the Mandelbrot program and also theTower of Hanoi program use character constants, like 'a', '1', '\n' and so on.When they are tokenised, such characters should be interpreted as the corresponding ASCII code (an integer), such that we can use them in calculationslike 'a' + 10 where the result should be 107. As usual, the character '\n' isthe ASCII code 10.LLVM-IRThere are some subtleties in the LLVM-IR you need to be aware of:• Global constants: While global constants such asval Max : Int = 10;can be easily defined in the LLVM-IR as follows@Max = global i32 10they cannot easily be referenced. If you want to use this constant then youneed to generate code such as%tmp_22 = load i32 , i32* @Maxfirst, which treats @Max as an Integer-pointer (type i32*) that needs to beloaded into a local variable, here %tmp_22.• Void-Functions: While integer and double functions can easily be calledand their results can be allocated to a temporary variable:%tmp_23 = call i32 @sqr (i32 %n)void-functions cannot be allocated to a variable. They need to be calledjust ascall void @print_int (i32 %tmp_23)• Floating-Point Operations: While integer operations are specified in theLLVM-IR as3def compile_op (op: String) = op match {case "+" => "add i32 "case "*" => "mul i32 "case "-" => "sub i32 "case "==" => "icmp eq i32 "case "!=" => "icmp ne i32 "case "<=" => "icmp sle i32 " // signed less or equalcase "<" => "icmp slt i32 " // signed less than}the corresponding operations on doubles aredef compile_dop (op: String) = op match {case "+" => "fadd double "case "*" => "fmul double "case "-" => "fsub double "case "==" => "fcmp oeq double "case "!=" => "fcmp one double "case "<=" => "fcmp ole double "case "<" => "fcmp olt double "}• Typing: In order to leave the CPS-translations as is, it makes sense todefer the full type-inference to the K-intermediate-language. For this it isgood to define the KVar constructor ascase class KVar(s: String , ty: Ty = "UNDEF") extends KValwhere first a default type, for example UNDEF, is given. Then you need todefine two typing functionsdef typ_val(v: KVal , ts: TyEnv) = ???def typ_exp(a: KExp , ts: TyEnv) = ???Both functions require a typing-environment that updates the information about what type each variable, operation and so on receives. Oncethe types are inferred, the LLVM-IR code can be generated. Since we aredealing only with simple first-order functions, nothing on the scale asthe ‘Hindley-Milner’ typing-algorithm is needed. I suggest to just lookat what data is avaliable and generate all missing information by “simplemeans”…rather than looking at the literature which solves the problemwith much heavier machinery.• Build-In Functions: The ‘prelude’ comes with several build-in functions:new_line(), skip, print_int(n), print_space(), print_star() and print_char(n).You can find the ‘prelude’ for example in the file sqr.ll.4// Mandelbrot program (without character constants)val Ymin: Double = -1.3;val Ymax: Double = 1.3;val Ystep: Double = 0.05; //0.025;val Xmin: Double = -2.1;val Xmax: Double = 1.1;val Xstep: Double = 0.02; //0.01;val Maxiters: Int = 1000;def m_iter(m: Int , x: Double , y: Double ,zr: Double , zi: Double) : Void = {if Maxiters <= mthen print_star ()else {if 4.0 <= zi*zi+zr*zr then print_space ()else m_iter(m + 1, x, y, x+zr*zr -zi*zi , 2.0* zr*zi+y)}};def x_iter(x: Double , y: Double) : Void = {if x <= Xmaxthen { m_iter (0, x, y, 0.0, 0.0) ; x_iter(x + Xstep , y) }else skip ()};def y_iter(y: Double) : Void = {if y <= Ymaxthen { x_iter(Xmin , y) ; new_line () ; y_iter(y + Ystep) }else skip ()};y_iter(Ymin)Figure 1: The Mandelbrot program in the ‘typed’ Fun-language.5Figure 2: Ascii output of the Mandelbrot program.6