Wednesday, April 11, 2007

Exam topics

Here is a preliminary list of exam topics

Analysis
Dynamic - what is?
Static
Abstract interpretation

Garbage Collection
Copying
In-place
Conservative

Overload Resolution
read more

LR(1) grammars
create
prove
describe tables used in parsing


Ambiguous grammars
convert

Continuations

Attribute grammars in linear time??

Implementation of arrays

Implementation of higher order functions

Thursday, April 05, 2007

It's Over!

Ugh, what a term this has been. This compiler project has not been the most joyful experience for sure, but its had its moments. Particularly great was the end where things started working and coming together, feature after feature being added. Its only made me realize that you can just keep going on a project like this. It really is a full time job to work on a compiler.

I'm glad its over, but I still thirst to write a better compiler.

Array Tracing

Grammatically in ADA, there is no way to determine the difference between an array and a function call. So, in our abstract syntax tree, they both share the same type of node. However, as we traverse the abstract syntax tree to build our intermediate representation we run into array and function declarations and allocate space for them on the stack. So there, that's how I determine the difference. We keep a separate list for variables/arrays and functions.

Arrays were simple but converting between an AST, IRT, to Assembly was mind-boggling since implementing arrays was the first assembly chore I had to do.

Anyways, I just wrote checks for our array access it's about 14 lines of assembly code. This includes run-time bounds checking and a function call. I can think of a few small things to bring it down. I've reused registers and all that but I don't know.

So after all this, arrays are done, that's 2 assembly instructions for declarations, 9 for assignment and 14 for access.


By the way these are one dimensional static arrays. Multi-dimensional arrays are rather simple. Dynamic arrays, I still think I can do.

Anywho, I'm going to switch the back-end of this compiler and convert it to x86. I'm going to keep it as ADA and then try to implement other compiler type things. But other than that, that ends compilers for me. It's been interesting.

Sunday, April 01, 2007

You Don't Have To Go Home But You Can't Stay Here

Am I supposed to capitalize all those words? I don't know.

So we're at the end of our compiler. What has happened?

Abstract Syntax Tree

Our compiler has changed a fair amount. Our parser now generates code.

We moved from SLR to LALR because LALR was slightly more robust in allowing us to express one section for a grammar. Ian then modified the provided tool named ILALR to write some lisp code. The tool reads the grammar which has a function attached to each rule. It then processes the grammar into lisp code that create an array where each index of the array corresponds to a rule and the value of the array corresponds to the associated function for that rule.

During the parsing phase, every time we perform a shift on that rule, we execute the function. Each function then performs an actions that would create an abstract syntax tree.

This is a huge step up from dynamically reading in the grammar during run-time and this has greatly increased performance.

Semantic Analysis

This is fun. Since our previous assignment was based on a parse tree and not an abstract syntax tree, our non-existent scoping and typing were crushed. Now we traverse a tree in lisp (which was odd in its own right). Classes in lisp are really classes. They're more sexed up macros. And classes don't have functions. Rather, you create a generic function FOR a base class and overload the method on that class.

btw, there is no way to output the elements of a class without explicitly performing it.

Ian has been on typing for, I think, the best part of 6 days now. Unification has been more challenging that it seems.

Intermediate Representation

There are two purposes for intermediate representation that I can think of. The first is to create a generic structure that abstracts the back-end so that you can create different interpreters for the intermediate representation tree to assembly. The second is for optimization purposes. How? I haven't figured that out yet.

I think we made a reasonbly good decision in going for intermediate representation. If only, to find out that it wasn't a wise choice. I believe this allowed us to modularize the tasks and not have two people stranded in one area.

I currently have the job converting the abstract syntax tree to an intermediate representation tree. (I know Ian will hate that term =P)

Assembly

Dimitri's job is to take care of converting our intermediate representation to assembly. Our target architecture is SPARC V8.

Opinion

So far, I've been a grunt coder. I haven't made that many important design decisions. The abstract syntax tree was architected (wrd?) by Ian. Dimitri had come up with the intermediate representation. My tasks were initially supposed to be switched between Ian and I. However, while Ian was setting up scoping, he ran the nightmare of typing.

The situation may have turned out more ideal than I first thought because since I worked on the abstract syntax tree, I could transition to Dimitri's side. So far, there was a lot of coding with no results on my end. So I was blue-balling coding for a long while.

Anyways, in case you were wondering, Dimitri and I haven't put THAT much consideration in designing our intermediate representation. We realized with a targeted architecture, we didn't need something as abstract as what were specified in our textbooks. We chose instructions that could get us a small set of results that we planned: expressions, ifs, conditionals, procedures, functions, outputting, ...

By the by, we use gcc as our linker and assembler to finish off our compiler. We don't do back-end in this course. Anyways, if you're interested, here is an example of what we produce.


test-basic.adb

-- CS444_04: dgnidash, iwjhalli, rftluong
-- basic test program

package body Main is
b,a : integer;
begin
a := 2;
b := 3;
if (a < b) then
output("%d",b+a);
end if;
--output("%d \n", b);
--some comments
end Main;


Intermediate Representation

0: S1: seq
1: storemem
2: tmpname : (a)
2: storemem
3: tmpname : (__uniq8)
3: intcnst : val (2)
0: S2: seq
1: S1: seq
2: storemem
3: tmpname : (b)
3: storemem
4: tmpname : (__uniq7)
4: intcnst : val (3)
1: S2: seq
2: S1: seq
3: S1: seq
4: storemem
5: tmpname : (__uniq1)
5: condstmt
6: tmpname : (a)
6: LT
6: tmpname : (b)
3: S2: seq
4: ifstmt
5: boolcond : __uniq1
5: if-true :
5: S1: seq
6: storemem
7: tmpname : (__uniq2)
7: S1: seq
8: storemem
9: tmpname : (__uniq3)
9: storemem
10: tmpname : (__uniq4)
10: strcnst : val (%d)
7: S2: seq
8: S1: seq
9: storemem
10: tmpname : (__uniq5)
10: storemem
11: tmpname : (__uniq6)
11: addop : L(b), R(a)
8: S2: seq
9: func : printf ((__uniq3 __uniq5))
5: S2: seq
6: nop
5: if-false :
5: nop
2: S2: seq
3: nop



test-basic.s

.global .umul
.align 4
.global main
main:
save %sp, -232, %sp
mov 2, %l3
st %l3, [%fp-20]
st %l3, [%fp-24]
mov 3, %l3
st %l3, [%fp-28]
st %l3, [%fp-32]
ld [%fp-24], %l1
ld [%fp-32], %l2
cmp %l1, %l2
bge CMP_F_0
set 1, %l3
b CMP_AFT_0
nop
CMP_F_0:
set 0, %l3
CMP_AFT_0:
st %l3, [%fp-36]
ld [%fp-36], %l1
set 1, %l2
cmp %l1, %l2
bne IF_FALSE_1
sethi %hi(string_0), %l3
or %l3, %lo(string_0), %l3
st %l3, [%fp-40]
st %l3, [%fp-44]
ld [%fp-32], %l1
ld [%fp-24], %l2
add %l2, %l1, %l3
st %l3, [%fp-48]
st %l3, [%fp-52]
ld [%fp-44], %o0
ld [%fp-52], %o1
call printf, 0
nop
mov %o0, %l3
st %l3, [%fp-56]
b IF_END_1
nop
IF_FALSE_1:
IF_END_1:

nop
mov 0, %g1
mov %g1, %i0
ret
restore
.align 8
string_0:
.asciz "%d"



Output

> 5


So far we have the following things working:

assignments, expressions, conditionals, and output.

Tuesday, March 06, 2007

Ensign's Log

Here's a rundown of the languages select by the compiler groups of Winter 2007: Lisp, Scheme, Haskell, Python, Java, C++.

I think it's kinda cool.

Tonight's topic are logs and here's a breaking down of one of our teammates thanks to our all-nighter:


Changeset [164] by ianwjhalliday@gmail.com

I'm gonna pull a Dimitri/Rob and not actually comment about my check in... well, ok, symbol_table was rewritten a fair bit, and we now throw package names into the symbol table and can detect the simple error in test-declare-3.adb.



Changeset [173] by ianwjhalliday@gmail.com

Poopy pants



Changeset [174] by ianwjhalliday@gmail.com

Buncha changes. I don't know whats in here, but I know it works on my system.


Despite his breakdown I think this opens an interesting avenue of discussion:

What do you think makes a reasonable log?

I think what makes a good log is to first start by thinking who and how someone would use your log. This leads to me believe a log has two major purposes: 1) to mark progress 2) to note changes that will be useful for bug fixing later on. I'm under the opinion that one should keep an svn log message brief but concise. That means that one would want to note the changes that they've made at a high-level. And if the verbosity of a log is in question then I'd prefer to err on the side of simplicity. Reparation details about how you fixed a bug and why should be left for bug reports. The case number can be referenced from the log itself if needed. And that sums up my thoughts of logs but it reeks of generalization so ...

Here are some of my logs:


04:40 Changeset [172] by robert.luong@gmail.com
Added the final touches to the testsuite.
04:39 Changeset [171] by robert.luong@gmail.com
Added in last few test cases for the night.
04:17 Changeset [170] by robert.luong@gmail.com
Updated scope arrays and scope overloading test cases.
04:11 Changeset [169] by robert.luong@gmail.com
Add duplicate procedure and package test cases.
04:06 Changeset [168] by robert.luong@gmail.com
Added 'panic mode error recovery' test case.


Those are rather poor because those are just updates to our document but here's one:


Changeset [148] robert.luong@gmail.com

Added support for rules to the state struct in parser.lisp. Now complete with debug-messages


And finally, here's one that I'll contrast with one of Ian's own logs


Changeset [134] robert.luong@gmail.com

Fixed bugs
#13 - added char literals, trimmed first and last character from characters and strings, and double "" from strings
#17 - fixed handling the case for scanning 1..9 by adding two buffers: one for handling last acceptable case and the amount that needs to be read


Here's Ian's.


Changeset [151] ianwjhalliday@gmail.com

Turns out that the compile-file function implicitly loads the file it is compiling. This makes sense. Thus, the following (load objfile) in our build-file function was unnecessary. It actually caused errors when using defconstant.


Anyways, what do you think constitutes a good repository log message?

Irreducible Complexity

Ian has spent some time on the grammar thus far. I'm not sure as to what the status of it is but its been grueling dealing with it apparently.

We have had some great issues in dealing with the symbol table, the parse tree, and panic-mode error recovery.

(Draft post, releasing two years later)

Saturday, March 03, 2007

Amateur-Ductivity

Individual Rant:

Snow Day!

I have been totally destroyed because of Snow Day. Thanks to the MS interviews during reading week I am now behind in both Programming Languages and my Compiler. Snow Day has now pushed my midterm and assignment into one fatal week.

Monday - Crypto Assignment
Tuesday - Compilers Assignment
Wednesday - Programming Languages Assignment | Crypto Midterm
Thursday - Programming Languages Midterm

BUT thanks to saving one late for programming languages, I get to do this!

Monday - Crypto Assignment
Tuesday - Compilers Assignment
Wednesday - Crypto Midterm
Thursday - Programming Languages Midterm
Friday - Programming Languages Assignment

I'm only taking three courses this term. That should be indicative of how much crap I have to do. I'm strangely calm about it though. I think after having a talk with Ian; I've tried to tame my stress a lot more.

Items Completed:
I got simple error recovery and the parse tree done. It should have only took a bit of time because I haven't worked much with the parser.

The parse tree is very simple. When we perform a shift, we create a new 'state' and push it onto the parse stack. When we perform a reduce, we pop the number of states that are required and put them into a list. We then create a new state, add those states as its children and then push it back into the parse stack. Bam, simple as that. There's one mention of it in the Compiler in C.

Simple error recovery is basically massaging the parse stack ok nce again. If we reach invalid input, we pop a state from the stack and try the input again. We do this until we run out of states. If we do, we through away that input and then reset our stack to this before hand. Very neat stuff with great results.

-- < fifteen minutes past> --

Someone broke the build.

@*#($&!!

-- < two minutes past > --

Ian ran a function I believe that had some archaic code left by Dim that referenced str. You can view this on changeset #138.


Anyways, Ian has been going through the grammar bit by bit perfecting it. We're hitting a trade-off area. Ian has been doing research for a good form of error recovery on suffix grammars. This allows us to parse in reverse order to find a proper match on the suffix. It's a lot better this way because then we can ignore an error and continue parsing on the suffix. The problem is that this is based on using the SLR tool for our grammars. Ian just ran into a portion in our grammar that requires using LALR grammars to sort it out. We could use the ILALR tool but then we can no longer find an easy way to perform our little fix.

Executive decision? We're going with LALR.


Devjavu Feature:

If you type #33 in a log for SVN. When you go to view it in Devjavu, it will directly link you to the bug list. Nifty!