Compilers 101: 2007

Wednesday, April 11, 2007

Exam topics

Here is a preliminary list of exam topics

Analysis
Dynamic - what is?
Static
Abstract interpretation

Garbage Collection
Copying
In-place
Conservative

Overload Resolution
read more

LR(1) grammars
create
prove
describe tables used in parsing

Ambiguous grammars
convert

Continuations

Attribute grammars in linear time??

Implementation of arrays

Implementation of higher order functions

Thursday, April 05, 2007

It's Over!

Ugh, what a term this has been. This compiler project has not been the most joyful experience for sure, but its had its moments. Particularly great was the end where things started working and coming together, feature after feature being added. Its only made me realize that you can just keep going on a project like this. It really is a full time job to work on a compiler.

I'm glad its over, but I still thirst to write a better compiler.

Array Tracing

Grammatically in ADA, there is no way to determine the difference between an array and a function call. So, in our abstract syntax tree, they both share the same type of node. However, as we traverse the abstract syntax tree to build our intermediate representation we run into array and function declarations and allocate space for them on the stack. So there, that's how I determine the difference. We keep a separate list for variables/arrays and functions.

Arrays were simple but converting between an AST, IRT, to Assembly was mind-boggling since implementing arrays was the first assembly chore I had to do.

Anyways, I just wrote checks for our array access it's about 14 lines of assembly code. This includes run-time bounds checking and a function call. I can think of a few small things to bring it down. I've reused registers and all that but I don't know.

So after all this, arrays are done, that's 2 assembly instructions for declarations, 9 for assignment and 14 for access.

By the way these are one dimensional static arrays. Multi-dimensional arrays are rather simple. Dynamic arrays, I still think I can do.

Anywho, I'm going to switch the back-end of this compiler and convert it to x86. I'm going to keep it as ADA and then try to implement other compiler type things. But other than that, that ends compilers for me. It's been interesting.

Sunday, April 01, 2007

You Don't Have To Go Home But You Can't Stay Here

Am I supposed to capitalize all those words? I don't know.

So we're at the end of our compiler. What has happened?

Abstract Syntax Tree

Our compiler has changed a fair amount. Our parser now generates code.

We moved from SLR to LALR because LALR was slightly more robust in allowing us to express one section for a grammar. Ian then modified the provided tool named ILALR to write some lisp code. The tool reads the grammar which has a function attached to each rule. It then processes the grammar into lisp code that create an array where each index of the array corresponds to a rule and the value of the array corresponds to the associated function for that rule.

During the parsing phase, every time we perform a shift on that rule, we execute the function. Each function then performs an actions that would create an abstract syntax tree.

This is a huge step up from dynamically reading in the grammar during run-time and this has greatly increased performance.

Semantic Analysis

This is fun. Since our previous assignment was based on a parse tree and not an abstract syntax tree, our non-existent scoping and typing were crushed. Now we traverse a tree in lisp (which was odd in its own right). Classes in lisp are really classes. They're more sexed up macros. And classes don't have functions. Rather, you create a generic function FOR a base class and overload the method on that class.

btw, there is no way to output the elements of a class without explicitly performing it.

Ian has been on typing for, I think, the best part of 6 days now. Unification has been more challenging that it seems.

Intermediate Representation

There are two purposes for intermediate representation that I can think of. The first is to create a generic structure that abstracts the back-end so that you can create different interpreters for the intermediate representation tree to assembly. The second is for optimization purposes. How? I haven't figured that out yet.

I think we made a reasonbly good decision in going for intermediate representation. If only, to find out that it wasn't a wise choice. I believe this allowed us to modularize the tasks and not have two people stranded in one area.

I currently have the job converting the abstract syntax tree to an intermediate representation tree. (I know Ian will hate that term =P)

Assembly

Dimitri's job is to take care of converting our intermediate representation to assembly. Our target architecture is SPARC V8.

Opinion

So far, I've been a grunt coder. I haven't made that many important design decisions. The abstract syntax tree was architected (wrd?) by Ian. Dimitri had come up with the intermediate representation. My tasks were initially supposed to be switched between Ian and I. However, while Ian was setting up scoping, he ran the nightmare of typing.

The situation may have turned out more ideal than I first thought because since I worked on the abstract syntax tree, I could transition to Dimitri's side. So far, there was a lot of coding with no results on my end. So I was blue-balling coding for a long while.

Anyways, in case you were wondering, Dimitri and I haven't put THAT much consideration in designing our intermediate representation. We realized with a targeted architecture, we didn't need something as abstract as what were specified in our textbooks. We chose instructions that could get us a small set of results that we planned: expressions, ifs, conditionals, procedures, functions, outputting, ...

By the by, we use gcc as our linker and assembler to finish off our compiler. We don't do back-end in this course. Anyways, if you're interested, here is an example of what we produce.

test-basic.adb

-- CS444_04: dgnidash, iwjhalli, rftluong
-- basic test program

package body Main is
b,a : integer;
begin
a := 2;
b := 3;
if (a < b) then
output("%d",b+a);
end if;
--output("%d \n", b);
--some comments
end Main;

Intermediate Representation

0: S1: seq
1: storemem
2: tmpname : (a)
2: storemem
3: tmpname : (__uniq8)
3: intcnst : val (2)
0: S2: seq
1: S1: seq
2: storemem
3: tmpname : (b)
3: storemem
4: tmpname : (__uniq7)
4: intcnst : val (3)
1: S2: seq
2: S1: seq
3: S1: seq
4: storemem
5: tmpname : (__uniq1)
5: condstmt
6: tmpname : (a)
6: LT
6: tmpname : (b)
3: S2: seq
4: ifstmt
5: boolcond : __uniq1
5: if-true :
5: S1: seq
6: storemem
7: tmpname : (__uniq2)
7: S1: seq
8: storemem
9: tmpname : (__uniq3)
9: storemem
10: tmpname : (__uniq4)
10: strcnst : val (%d)
7: S2: seq
8: S1: seq
9: storemem
10: tmpname : (__uniq5)
10: storemem
11: tmpname : (__uniq6)
11: addop : L(b), R(a)
8: S2: seq
9: func : printf ((__uniq3 __uniq5))
5: S2: seq
6: nop
5: if-false :
5: nop
2: S2: seq
3: nop

test-basic.s

.global .umul
.align 4
.global main
main:
save %sp, -232, %sp
mov 2, %l3
st %l3, [%fp-20]
st %l3, [%fp-24]
mov 3, %l3
st %l3, [%fp-28]
st %l3, [%fp-32]
ld [%fp-24], %l1
ld [%fp-32], %l2
cmp %l1, %l2
bge CMP_F_0
set 1, %l3
b CMP_AFT_0
nop
CMP_F_0:
set 0, %l3
CMP_AFT_0:
st %l3, [%fp-36]
ld [%fp-36], %l1
set 1, %l2
cmp %l1, %l2
bne IF_FALSE_1
sethi %hi(string_0), %l3
or %l3, %lo(string_0), %l3
st %l3, [%fp-40]
st %l3, [%fp-44]
ld [%fp-32], %l1
ld [%fp-24], %l2
add %l2, %l1, %l3
st %l3, [%fp-48]
st %l3, [%fp-52]
ld [%fp-44], %o0
ld [%fp-52], %o1
call printf, 0
nop
mov %o0, %l3
st %l3, [%fp-56]
b IF_END_1
nop
IF_FALSE_1:
IF_END_1:

nop
mov 0, %g1
mov %g1, %i0
ret
restore
.align 8
string_0:
.asciz "%d"

Output

> 5

So far we have the following things working:

assignments, expressions, conditionals, and output.

Tuesday, March 06, 2007

Ensign's Log

Here's a rundown of the languages select by the compiler groups of Winter 2007: Lisp, Scheme, Haskell, Python, Java, C++.

I think it's kinda cool.

Tonight's topic are logs and here's a breaking down of one of our teammates thanks to our all-nighter:

Changeset [164] by ianwjhalliday@gmail.com

I'm gonna pull a Dimitri/Rob and not actually comment about my check in... well, ok, symbol_table was rewritten a fair bit, and we now throw package names into the symbol table and can detect the simple error in test-declare-3.adb.

Changeset [173] by ianwjhalliday@gmail.com

Poopy pants

Changeset [174] by ianwjhalliday@gmail.com

Buncha changes. I don't know whats in here, but I know it works on my system.

Despite his breakdown I think this opens an interesting avenue of discussion:

What do you think makes a reasonable log?

I think what makes a good log is to first start by thinking who and how someone would use your log. This leads to me believe a log has two major purposes: 1) to mark progress 2) to note changes that will be useful for bug fixing later on. I'm under the opinion that one should keep an svn log message brief but concise. That means that one would want to note the changes that they've made at a high-level. And if the verbosity of a log is in question then I'd prefer to err on the side of simplicity. Reparation details about how you fixed a bug and why should be left for bug reports. The case number can be referenced from the log itself if needed. And that sums up my thoughts of logs but it reeks of generalization so ...

Here are some of my logs:

04:40 Changeset [172] by robert.luong@gmail.com
Added the final touches to the testsuite.
04:39 Changeset [171] by robert.luong@gmail.com
Added in last few test cases for the night.
04:17 Changeset [170] by robert.luong@gmail.com
Updated scope arrays and scope overloading test cases.
04:11 Changeset [169] by robert.luong@gmail.com
Add duplicate procedure and package test cases.
04:06 Changeset [168] by robert.luong@gmail.com
Added 'panic mode error recovery' test case.

Those are rather poor because those are just updates to our document but here's one:

Changeset [148] robert.luong@gmail.com

Added support for rules to the state struct in parser.lisp. Now complete with debug-messages

And finally, here's one that I'll contrast with one of Ian's own logs

Changeset [134] robert.luong@gmail.com

Fixed bugs
#13 - added char literals, trimmed first and last character from characters and strings, and double "" from strings
#17 - fixed handling the case for scanning 1..9 by adding two buffers: one for handling last acceptable case and the amount that needs to be read

Here's Ian's.

Changeset [151] ianwjhalliday@gmail.com

Turns out that the compile-file function implicitly loads the file it is compiling. This makes sense. Thus, the following (load objfile) in our build-file function was unnecessary. It actually caused errors when using defconstant.

Anyways, what do you think constitutes a good repository log message?

Irreducible Complexity

Ian has spent some time on the grammar thus far. I'm not sure as to what the status of it is but its been grueling dealing with it apparently.

We have had some great issues in dealing with the symbol table, the parse tree, and panic-mode error recovery.

(Draft post, releasing two years later)

Saturday, March 03, 2007

Amateur-Ductivity

Individual Rant:

Snow Day!

I have been totally destroyed because of Snow Day. Thanks to the MS interviews during reading week I am now behind in both Programming Languages and my Compiler. Snow Day has now pushed my midterm and assignment into one fatal week.

Monday - Crypto Assignment
Tuesday - Compilers Assignment
Wednesday - Programming Languages Assignment | Crypto Midterm
Thursday - Programming Languages Midterm

BUT thanks to saving one late for programming languages, I get to do this!

Monday - Crypto Assignment
Tuesday - Compilers Assignment
Wednesday - Crypto Midterm
Thursday - Programming Languages Midterm
Friday - Programming Languages Assignment

I'm only taking three courses this term. That should be indicative of how much crap I have to do. I'm strangely calm about it though. I think after having a talk with Ian; I've tried to tame my stress a lot more.

Items Completed:
I got simple error recovery and the parse tree done. It should have only took a bit of time because I haven't worked much with the parser.

The parse tree is very simple. When we perform a shift, we create a new 'state' and push it onto the parse stack. When we perform a reduce, we pop the number of states that are required and put them into a list. We then create a new state, add those states as its children and then push it back into the parse stack. Bam, simple as that. There's one mention of it in the Compiler in C.

Simple error recovery is basically massaging the parse stack ok nce again. If we reach invalid input, we pop a state from the stack and try the input again. We do this until we run out of states. If we do, we through away that input and then reset our stack to this before hand. Very neat stuff with great results.

-- < fifteen minutes past> --

Someone broke the build.

@*#($&!!

-- < two minutes past > --

Ian ran a function I believe that had some archaic code left by Dim that referenced str. You can view this on changeset #138.

Anyways, Ian has been going through the grammar bit by bit perfecting it. We're hitting a trade-off area. Ian has been doing research for a good form of error recovery on suffix grammars. This allows us to parse in reverse order to find a proper match on the suffix. It's a lot better this way because then we can ignore an error and continue parsing on the suffix. The problem is that this is based on using the SLR tool for our grammars. Ian just ran into a portion in our grammar that requires using LALR grammars to sort it out. We could use the ILALR tool but then we can no longer find an easy way to perform our little fix.

Executive decision? We're going with LALR.

Devjavu Feature:

If you type #33 in a log for SVN. When you go to view it in Devjavu, it will directly link you to the bug list. Nifty!

Tuesday, February 20, 2007

SBCL Was a Problem, but the Run Script?

So I got an email from Tyrel Russell, our CS 444 TA. He says he can't run our program, gets some "fatal error in SBCL pid 12345" "can't find core file". I had no idea what that meant, so I said I'd come to his office to figure out what's wrong.

Well, we spent maybe 20 minutes trying to figure out what was wrong. Eventually, we tried setting the environment variables needed explicitly in his shell and then running the program without using the run script. Worked! Turns out his /usr/bin/sh shell wasn't sh, it was bash, and it required that the env vars be exported before they work.

What does this mean? Quickly hacked programs and scripts need to be tested on the environment they run in, or they need to be better written and understood (i.e. not quickly hacked). For assignment two Tyrel said we can just write in our documentation that he needs to set the environment variables before he is able to run our compiler. That's so much easier.

Sunday, February 11, 2007

Suite Dreams are Made of This ...

Test Suite

I updated the test suite. It was a natural extension of yours Dimitri. Now it loads all test cases from /incorrect/ and /correct/ that are subdirectories of /testsuite/. There are few Lisp resources on how to navigate through the filesystem. I think it's dependent perhaps on platform but nonetheless it took a while to get this working on SBCL. This line of code did not work on clisp the last time I checked:

(setf correct-directory (directory (make-pathname :name :wild :type "adb" :directory '(:relative "testsuite/correct"))))

That pulls the pathname of every file matching adb filetype inside said directory. Now our testsuite checks whether a file passes or fails and then tallies it up at the end. I've added output suppression to the branch in 0.9 so that we can simply see the results of executing our program.

Also testsuite.lisp must be executed from one directory below. The reason is because the load files are executed using a relative path to the executing one. So they're one directory off. I couldn't find a variable that I could edit that'll solve it. I may look into the source for this but as of now, we can only do this this way.

Please note that now we are editing out of the trunk. We're doing small regression tests to ensure that all the changes to the branch have been made.

Saturday, February 10, 2007

Cross-Coding

Hey people,

I'm going forward with my proposal. We already chose Lisp as our implementation language. That CANNOT be changed. So while we're here we may as well make the most of it. So I suggest that we switch tasks. We look at each other's code and critique them. The thing is that code review takes a long time and in school it's hard to justify the time spent. So after the first assignment Ian has went through certain bugs for the scanner and some other areas. What we're going to be doing is fix each other's bugs. The code reviewer is the creator and the reviewee is the person making the changes. So you must justify everything you changed to the previous person.

This makes sense to me and I hope everyone agrees. I think it's a great chance for us to get a better grasp of Lisp.

Anyways, all the best.

Ian's going to be looking at Dimitri's code.
Dimitri's going to be looking at my scanner.
I will be looking at the test suite and whatever else since Ian didn't produce any code ;)

Robert.

Thursday, February 08, 2007

Test Suite

I have modified a test suite to take in a file. This will simplify writing of test cases. Robert's dream come true:)

sbcl --load testsuite/test_parser.lisp

runs all the Ada files and reports the results. Can be modified to look for ALL the files in the directory.

I also fixed a bunch of errors with the grammar.

Will be working on a parse tree next. I really like Compilers when there are no deadlines;)

Tuesday, February 06, 2007

Job Positions Available: 1 Architectual Astronaut; 1 Python Evangelist; 1 Quality Assurer

So we're a weird team. We successfully handed in our first assignment. Here are some random stats:

114 revisions

109 build_table.lisp
34 compiler.lisp
32 error.lisp
90 parse_table.lisp
145 parser.lisp
13 run
309 scanner.lisp
21 start.lisp
101 symbol_table.lisp
117 utility.lisp
971 total lines of code
17 page documentation
29 hours awake
25 test cases

I'm a bit beat.

Anyways we got it done. We're going to have to rewrite the parser and fix a bug in the scanner. We also have to merge the changes from the branch into the trunk and then fix the trunk. The grammar is also broken.

I may suggest that we revolve our tasks. Perhaps have Dimitri look at the scanner, Ian at the parser, and I at the grammar. This is so that everyone is familiar with everyone else's code. We may also be able to code review and make relevant suggestions.

I personally want to write some good unit tests and a sexy testing suite. I know that no one else is on board with this so this will be spent during lobut-time.

So ... teamwork. A group of people working towards a common goal. It's strange. I think we all have a common goal but just different paths to getting there. Ian would be our architectual astronaut at one extreme. Dimitri would be the coder whom wants to get things done. I think I'm somewhere in the middle. So we have conflicting approaches to tasks when it comes to our compiler.

I'm not exactly sure where my astronaut and python stand but here's where I do. There should be no disagreement that you should know what you're doing before you code. The real issue to me is how much is enough thinking before we code? How much time do we take to model and think before we code? How many times should we refactor? My answer? It all depends. If you ample time and the model is unclear then I think it's wise to learn more about the facets of the language that would be helpful. Take some time to push the borders of your knowledge and try something new. If ever you feel as though you know your model then you should start coding. If it's crunch time then you get your ass over the computer and hack like a biotch. Those are the guidelines that I follow. It hardly ever always works out for me but I think it's reasonable.

It's 2AM and She Calls Me 'Cuz I'm Still Awake

Alright, the scanner went through a number of revisions. The latest one that we have right now, I am most satisfied with. It handles error-detection for a lack of a better term sex-tastically.

Unfortunately, Ian and I performed a conversion for the scanner and lisp for greater readability and maintainability. The tokens were originally in lists and we converted them into defstructs. During this conversion Ian ran into an error that bypassed Dimitri's eye in the grammar. Unfortunately, Ian was not able to resolve this error. The changes that Ian and I made were trashed. Huge amount of time was lost due to this.

So we had to branch our code and go back from revision 81 to 74. During this branch I accidentally removed my directory with my copies of an error formatter that were not checked-in and I lost a decent amount of work.

Dimitri realized that he had been running one of Cormack's tool with the command

./slr -c -l [file]

What this does is use the slr tool to generate tables for our grammar. The -l tag outputs human readable output. The problem with that? The errors are printed at the top and not the bottom. Our SLR grammar has been unreadable for who knows how long.

Ian's implementation of the umm ... error recovery has been deemed infeasible given time constraints. Ian's working on writing the document. I personally am working on the writing up the scanner and writing tests for the scanner. Even though Dimitri is against this I would prefer a robust scanner but Dimitri wants me to run tests on the parser. I am alotting myself a 30 minutes to recreate the error reporting module that was destroyed. 20 minutes to make the scanner more robust.

After that, I am going to join Dimitri on the parser until sunrise. After submission however I am not going to compilers. I'm going to be catching up on PL.

Some bad design decisions were made. Ian realized that his archaic executable ways were no longer welcome =). Dimitri is still all about getting it done. I am working along as always. I don't feel so bad though. When I look to the left I see some facebook and cyber-loafing. It's 3am on the dot. By 4am hopefully I'll be joining Dimitri.

Wish me luck.

Sunday, February 04, 2007

It Is A Good Day to Code

We're making decent progress.

I'm working on making the scanner more robust. Ian made a lotta grammars for the parser. The parser's a block box to me right now. I have two more DFAs to finish. Afterwards I'm going to ship it to Dimitri and he can include this into his project. After that I'm going to finish the last two DFAs, then do unit tests, then do error handling.

Poor modelling had cost me over a day. My abstract model for the scanner was a bit poor and Ian helped me out.

(Was a draft post but I am now releasing it ... 2 years later)

Saturday, February 03, 2007

Vent Re: Lisp Suckage; Screw packages

Common Lisp is terrible. Terrible programming language. I don't care if its functional and dynamically typed, its libraries suck and its symbol/package system is a huge pain in the ass to work with.

I understand that its typically used from a lisp top level environment, but no one in the real world uses programs like that. No real programs require you to modify your runtime's initialization configs/scripts. No real programs require you to run them from within a toplevel or require the source/object files to be loaded in the correct order by you (or some script).

Maybe I'm wrong, maybe real world programs do require all that. Maybe thats what deployment means in web applications *shudder*.

Dependencies in Lisp seem to be a nightmare. Building a self contained executable seems impossible, and building an executable that works with multiple modules and packages seems not to work at all. The best I can come up with is not using packages at all. Essentially all our sources will be concatenated together into one big source file. Awful. Just awful.

I'm going to leave it at that for now. There isn't any more time available to spend on figuring out how to do this right. Next step, command line arguments.

Friday, February 02, 2007

A Compiler by any Other Name

So ... this is what brainstorming gives us.

The language that we're compiling is ADA/CS. I've decided to call is DAD. It's ADA spelt backwards!!! (ahem)

Anyways, if you take the first letter of our names: Dimitri, Ian and Robert, you get. DIR.

So, naturally we came to Dirty as our name. You put it all together and you have the "Dirty Dad Compiler" or DDC for short.

Saturday, January 27, 2007

I'm Getting Testy

Alright, here's an update.

CS 444 has took an odd turn. The Ada/CS grammar has not yet been defined. The class decides what is in the grammar and what isn't. I'm under the impression that this is going to be clarified by the test cases that we were supposed to submit.

The original assignment 1 deadline which was schedule for Jan 30th has been changed. Jan 30th is now a deadline to submit test cases. February 6th is now the new date to submit our scanner/parser.

Our current roles are as follows: Robert | Scanner, Ian | Bootstrapper, Dimitri | Parser.

My role seems pretty clear. I'm going to write a few functions that the parser will call to get tokens. I'm going to change it such that I pass the line the token is in, and the type of token it is. Unfortunately, Ian and I have been sidetracked. We've been doing Ada tutorials. I'm on Object-Oriented Programming ... Chapter 7 out of 18.

After that, I plan to come up with about 10 test cases. Although, I'm pretty sure I can think of a hell of a lot more. But my goal is 10 test cases by Sunday noon.

Feelings:

Programming languages is becoming a bigger nuisance than I had hoped. It's not a birdie course and Lushman is pushing this course pretty hard.

Programming skill. I'm worried. I won't lie. In my mind, to program a medium-sized project adequately requires a certain grasp of a language. I don't know how to define it but I can certainly say that we're not there. Ian is the closest, me second, and then Dimitri. The problems that this causes is a waste of time on fundamental concepts. Granted, people can say I can 'look-up' what I need. But that statement is based on one big assumption, that is, I-know-what-I-don't-know. And I don't. I'm clueless. On top of learning Lisp, and Ada we have programming languages handing our asses to us.

I'm far from giving up though. I'm Chapter 8 in Lisp and Chapter 7 in Ada.

Tuesday, January 16, 2007

Do you feel me?

Here's a status update.

I've "started" (and I use that term loosely) the scanner, I've created a transducer that reads in a file and compares it to a literal DFA that I created and turned into a table. It's not fully working yet. The table can read DFAs, but I just haven't cleaned up after I decided to output all my input into another file. I'm not sure how we would want to separate keywords from literals when translating. Unless we decide to put everything in memory which is a tad silly. I don't know fellas.

We may have bitten off more than we can chew. I'm concerned that I have little debugging knowledge. I found some information about how to go about build a unit testing framework. I'll put the zip file up here.

Anyways, I'm going to try and get programming languages done as soon as humanly possible and cryptography so that I may put full effort into this.

Peace.

Friday, January 12, 2007

DevjaVu Permissions Updated

I updated permissions on DevjaVu such that only the three of us can edit the wiki, edit the trac stuff, and only the three of us can access the svn repository. When svn asks for your username and password use your DevjaVu credentials.

I'm looking into whether there are SVN environment variables that can be set so we don't have to keep putting in our login and password to do SVN commands.

Tuesday, January 02, 2007

Go Time

Well, I'm all settled in at Village 1 now. Compilers on Thursday. I did not finish the Lisp book. Bah. Lets not even talk about the compilers text.