Wednesday, April 11, 2007

Exam topics

Here is a preliminary list of exam topics

Analysis
Dynamic - what is?
Static
Abstract interpretation

Garbage Collection
Copying
In-place
Conservative

Overload Resolution
read more

LR(1) grammars
create
prove
describe tables used in parsing


Ambiguous grammars
convert

Continuations

Attribute grammars in linear time??

Implementation of arrays

Implementation of higher order functions

Thursday, April 05, 2007

It's Over!

Ugh, what a term this has been. This compiler project has not been the most joyful experience for sure, but its had its moments. Particularly great was the end where things started working and coming together, feature after feature being added. Its only made me realize that you can just keep going on a project like this. It really is a full time job to work on a compiler.

I'm glad its over, but I still thirst to write a better compiler.

Array Tracing

Grammatically in ADA, there is no way to determine the difference between an array and a function call. So, in our abstract syntax tree, they both share the same type of node. However, as we traverse the abstract syntax tree to build our intermediate representation we run into array and function declarations and allocate space for them on the stack. So there, that's how I determine the difference. We keep a separate list for variables/arrays and functions.

Arrays were simple but converting between an AST, IRT, to Assembly was mind-boggling since implementing arrays was the first assembly chore I had to do.

Anyways, I just wrote checks for our array access it's about 14 lines of assembly code. This includes run-time bounds checking and a function call. I can think of a few small things to bring it down. I've reused registers and all that but I don't know.

So after all this, arrays are done, that's 2 assembly instructions for declarations, 9 for assignment and 14 for access.


By the way these are one dimensional static arrays. Multi-dimensional arrays are rather simple. Dynamic arrays, I still think I can do.

Anywho, I'm going to switch the back-end of this compiler and convert it to x86. I'm going to keep it as ADA and then try to implement other compiler type things. But other than that, that ends compilers for me. It's been interesting.

Sunday, April 01, 2007

You Don't Have To Go Home But You Can't Stay Here

Am I supposed to capitalize all those words? I don't know.

So we're at the end of our compiler. What has happened?

Abstract Syntax Tree

Our compiler has changed a fair amount. Our parser now generates code.

We moved from SLR to LALR because LALR was slightly more robust in allowing us to express one section for a grammar. Ian then modified the provided tool named ILALR to write some lisp code. The tool reads the grammar which has a function attached to each rule. It then processes the grammar into lisp code that create an array where each index of the array corresponds to a rule and the value of the array corresponds to the associated function for that rule.

During the parsing phase, every time we perform a shift on that rule, we execute the function. Each function then performs an actions that would create an abstract syntax tree.

This is a huge step up from dynamically reading in the grammar during run-time and this has greatly increased performance.

Semantic Analysis

This is fun. Since our previous assignment was based on a parse tree and not an abstract syntax tree, our non-existent scoping and typing were crushed. Now we traverse a tree in lisp (which was odd in its own right). Classes in lisp are really classes. They're more sexed up macros. And classes don't have functions. Rather, you create a generic function FOR a base class and overload the method on that class.

btw, there is no way to output the elements of a class without explicitly performing it.

Ian has been on typing for, I think, the best part of 6 days now. Unification has been more challenging that it seems.

Intermediate Representation

There are two purposes for intermediate representation that I can think of. The first is to create a generic structure that abstracts the back-end so that you can create different interpreters for the intermediate representation tree to assembly. The second is for optimization purposes. How? I haven't figured that out yet.

I think we made a reasonbly good decision in going for intermediate representation. If only, to find out that it wasn't a wise choice. I believe this allowed us to modularize the tasks and not have two people stranded in one area.

I currently have the job converting the abstract syntax tree to an intermediate representation tree. (I know Ian will hate that term =P)

Assembly

Dimitri's job is to take care of converting our intermediate representation to assembly. Our target architecture is SPARC V8.

Opinion

So far, I've been a grunt coder. I haven't made that many important design decisions. The abstract syntax tree was architected (wrd?) by Ian. Dimitri had come up with the intermediate representation. My tasks were initially supposed to be switched between Ian and I. However, while Ian was setting up scoping, he ran the nightmare of typing.

The situation may have turned out more ideal than I first thought because since I worked on the abstract syntax tree, I could transition to Dimitri's side. So far, there was a lot of coding with no results on my end. So I was blue-balling coding for a long while.

Anyways, in case you were wondering, Dimitri and I haven't put THAT much consideration in designing our intermediate representation. We realized with a targeted architecture, we didn't need something as abstract as what were specified in our textbooks. We chose instructions that could get us a small set of results that we planned: expressions, ifs, conditionals, procedures, functions, outputting, ...

By the by, we use gcc as our linker and assembler to finish off our compiler. We don't do back-end in this course. Anyways, if you're interested, here is an example of what we produce.


test-basic.adb

-- CS444_04: dgnidash, iwjhalli, rftluong
-- basic test program

package body Main is
b,a : integer;
begin
a := 2;
b := 3;
if (a < b) then
output("%d",b+a);
end if;
--output("%d \n", b);
--some comments
end Main;


Intermediate Representation

0: S1: seq
1: storemem
2: tmpname : (a)
2: storemem
3: tmpname : (__uniq8)
3: intcnst : val (2)
0: S2: seq
1: S1: seq
2: storemem
3: tmpname : (b)
3: storemem
4: tmpname : (__uniq7)
4: intcnst : val (3)
1: S2: seq
2: S1: seq
3: S1: seq
4: storemem
5: tmpname : (__uniq1)
5: condstmt
6: tmpname : (a)
6: LT
6: tmpname : (b)
3: S2: seq
4: ifstmt
5: boolcond : __uniq1
5: if-true :
5: S1: seq
6: storemem
7: tmpname : (__uniq2)
7: S1: seq
8: storemem
9: tmpname : (__uniq3)
9: storemem
10: tmpname : (__uniq4)
10: strcnst : val (%d)
7: S2: seq
8: S1: seq
9: storemem
10: tmpname : (__uniq5)
10: storemem
11: tmpname : (__uniq6)
11: addop : L(b), R(a)
8: S2: seq
9: func : printf ((__uniq3 __uniq5))
5: S2: seq
6: nop
5: if-false :
5: nop
2: S2: seq
3: nop



test-basic.s

.global .umul
.align 4
.global main
main:
save %sp, -232, %sp
mov 2, %l3
st %l3, [%fp-20]
st %l3, [%fp-24]
mov 3, %l3
st %l3, [%fp-28]
st %l3, [%fp-32]
ld [%fp-24], %l1
ld [%fp-32], %l2
cmp %l1, %l2
bge CMP_F_0
set 1, %l3
b CMP_AFT_0
nop
CMP_F_0:
set 0, %l3
CMP_AFT_0:
st %l3, [%fp-36]
ld [%fp-36], %l1
set 1, %l2
cmp %l1, %l2
bne IF_FALSE_1
sethi %hi(string_0), %l3
or %l3, %lo(string_0), %l3
st %l3, [%fp-40]
st %l3, [%fp-44]
ld [%fp-32], %l1
ld [%fp-24], %l2
add %l2, %l1, %l3
st %l3, [%fp-48]
st %l3, [%fp-52]
ld [%fp-44], %o0
ld [%fp-52], %o1
call printf, 0
nop
mov %o0, %l3
st %l3, [%fp-56]
b IF_END_1
nop
IF_FALSE_1:
IF_END_1:

nop
mov 0, %g1
mov %g1, %i0
ret
restore
.align 8
string_0:
.asciz "%d"



Output

> 5


So far we have the following things working:

assignments, expressions, conditionals, and output.