forked from DoctorWkt/acwj
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Warren Toomey
committed
Dec 4, 2019
1 parent
db8a42f
commit 6136a47
Showing
303 changed files
with
7,924 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Define the location of the include directory | ||
# and the location to install the compiler binary | ||
INCDIR=/tmp/include | ||
BINDIR=/tmp | ||
|
||
HSRCS= data.h decl.h defs.h | ||
SRCS= cg.c decl.c expr.c gen.c main.c misc.c \ | ||
opt.c scan.c stmt.c sym.c tree.c types.c | ||
|
||
ARMSRCS= cg.c decl.c expr.c gen.c main.c misc.c \ | ||
opt.c scan.c stmt.c sym.c tree.c types.c | ||
|
||
cwj: $(SRCS) $(HSRCS) | ||
cc -o cwj -g -Wall -DINCDIR=\"$(INCDIR)\" $(SRCS) | ||
|
||
cwjarm: $(ARMSRCS) $(HSRCS) | ||
cc -o cwjarm -g -Wall $(ARMSRCS) | ||
cp cwjarm cwj | ||
|
||
install: cwj | ||
mkdir -p $(INCDIR) | ||
rsync -a include/. $(INCDIR) | ||
cp cwj $(BINDIR) | ||
chmod +x $(BINDIR)/cwj | ||
|
||
clean: | ||
rm -f cwj cwjarm *.o *.s out a.out | ||
|
||
test: install tests/runtests | ||
(cd tests; chmod +x runtests; ./runtests) | ||
|
||
armtest: cwjarm tests/runtests | ||
(cd tests; chmod +x runtests; ./runtests) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,270 @@ | ||
# Part 53: Mopping Up, part 2 | ||
|
||
In this part of our compiler writing journey, I fix a few annoying things | ||
that we use in the compiler's own source code. | ||
|
||
## Consecutive String Literals | ||
|
||
C allows the declaration of string literals by splitting them across | ||
multiple lines or as multiple strings, e.g. | ||
|
||
``` | ||
char *c= "hello " "there, " | ||
"how " "are " "you?"; | ||
``` | ||
|
||
Now, we could fix this problem up in the lexical scanner. However, I spent | ||
a lot of time trying to do this. The problem is that the code is now | ||
complicated with dealing with the C pre-processor, and I couldn't find a clean | ||
way of allowing consecutive string literals. | ||
|
||
My solution is to do it in the parser, with a bit of help from the code | ||
generator. In `primary()` in `expr.c`, the code that deals with string | ||
literals now looks like this: | ||
|
||
``` | ||
case T_STRLIT: | ||
// For a STRLIT token, generate the assembly for it. | ||
id = genglobstr(Text, 0); // 0 means generate a label | ||
// For successive STRLIT tokens, append their contents | ||
// to this one | ||
while (1) { | ||
scan(&Peektoken); | ||
if (Peektoken.token != T_STRLIT) break; | ||
genglobstr(Text, 1); // 1 means don't generate a label | ||
scan(&Token); // Skip it | ||
} | ||
// Now make a leaf AST node for it. id is the string's label. | ||
genglobstrend(); | ||
n = mkastleaf(A_STRLIT, pointer_to(P_CHAR), NULL, NULL, id); | ||
break; | ||
``` | ||
|
||
`genglobstr()` now takes a second argument which tells it if this | ||
is the first part of the string or a successive part of the string. | ||
Also, `genglobstrend()` now has the job of NUL terminating the string literal. | ||
|
||
## Empty Statements | ||
|
||
C allows both empty statements and empty compound statements, e.g. | ||
|
||
``` | ||
while ((c=getc()) != 'x') ; // ';' is an empty statement | ||
int fred() { } // Function with empty body | ||
``` | ||
|
||
and I use both of these in the compiler, so we need to support both of them. | ||
In `stmt.c`, the code now does this: | ||
|
||
``` | ||
static struct ASTnode *single_statement(void) { | ||
struct ASTnode *stmt; | ||
struct symtable *ctype; | ||
switch (Token.token) { | ||
case T_SEMI: | ||
// An empty statement | ||
semi(); | ||
break; | ||
... | ||
} | ||
... | ||
} | ||
struct ASTnode *compound_statement(int inswitch) { | ||
struct ASTnode *left = NULL; | ||
struct ASTnode *tree; | ||
while (1) { | ||
// Leave if we've hit the end token. We do this first to allow | ||
// an empty compound statement | ||
if (Token.token == T_RBRACE) | ||
return (left); | ||
... | ||
} | ||
... | ||
} | ||
``` | ||
|
||
and that fixes both shortcomings. | ||
|
||
## Redeclared Symbols | ||
|
||
C allows a global variable to later be declared extern, and an extern | ||
variable to be declared later as a global variable, and vice versa. | ||
However, the types of both | ||
declarations have to match. We also want to ensure that only one | ||
version of the symbol ends up in the symbol table: we don't want both a | ||
C_GLOBAL and a C_EXTERN entry! | ||
|
||
In `stmt.c` I've added a new function called `is_new_symbol()`. We call this | ||
after we have parsed the name of a variable and after we have tried | ||
to find it in the symbol table. Thus, `sym` may be NULL (no existing | ||
variable) or not NULL (is an existing variable). | ||
|
||
If the symbol exists, it's actually quite complicated to ensure that | ||
it's a safe redeclaration. | ||
|
||
``` | ||
// Given a pointer to a symbol that may already exist | ||
// return true if this symbol doesn't exist. We use | ||
// this function to convert externs into globals | ||
int is_new_symbol(struct symtable *sym, int class, | ||
int type, struct symtable *ctype) { | ||
// There is no existing symbol, thus is new | ||
if (sym==NULL) return(1); | ||
// global versus extern: if they match that it's not new | ||
// and we can convert the class to global | ||
if ((sym->class== C_GLOBAL && class== C_EXTERN) | ||
|| (sym->class== C_EXTERN && class== C_GLOBAL)) { | ||
// If the types don't match, there's a problem | ||
if (type != sym->type) | ||
fatals("Type mismatch between global/extern", sym->name); | ||
// Struct/unions, also compare the ctype | ||
if (type >= P_STRUCT && ctype != sym->ctype) | ||
fatals("Type mismatch between global/extern", sym->name); | ||
// If we get to here, the types match, so mark the symbol | ||
// as global | ||
sym->class= C_GLOBAL; | ||
// Return that symbol is not new | ||
return(0); | ||
} | ||
// It must be a duplicate symbol if we get here | ||
fatals("Duplicate global variable declaration", sym->name); | ||
return(-1); // Keep -Wall happy | ||
} | ||
``` | ||
|
||
The code is straight-forward but not elegant. Also note that any redeclared | ||
extern symbol is turned into a global symbol. This means we don't have to | ||
remove the symbol from the symbol table and add in a new, global, symbol. | ||
|
||
## Operand Types to Logical Operations | ||
|
||
The next bug I hit was something like this: | ||
|
||
``` | ||
int *x; | ||
int y; | ||
if (x && y > 12) ... | ||
``` | ||
|
||
The compiler evaluates the `&&` operation in `binexpr()`. To do this, | ||
it ensures that the types of each side of the binary operator are | ||
compatible. Well, if the operator above was a `+` then, definitely, | ||
the types are incompatible. But with a logical comparison, we can | ||
*AND* these together. | ||
|
||
I've fixed this by adding some more code to the top of `modify_type()` | ||
in `types.c`. If we are doing an `&&` or an `||` operation, then | ||
we need either integer or pointer types on each side of the operation. | ||
|
||
``` | ||
struct ASTnode *modify_type(struct ASTnode *tree, int rtype, | ||
struct symtable *rctype, int op) { | ||
int ltype; | ||
int lsize, rsize; | ||
ltype = tree->type; | ||
// For A_LOGOR and A_LOGAND, both types have to be int or pointer types | ||
if (op==A_LOGOR || op==A_LOGAND) { | ||
if (!inttype(ltype) && !ptrtype(ltype)) | ||
return(NULL); | ||
if (!inttype(ltype) && !ptrtype(rtype)) | ||
return(NULL); | ||
return (tree); | ||
} | ||
... | ||
} | ||
``` | ||
|
||
I've also realised that I've implemented `&&` and `||` incorrectly, so | ||
I'll have to fix that. Now now, but soon. | ||
|
||
## Return with No Value | ||
|
||
One other C feature that is missing is the ability to return from a | ||
void function, i.e. just leave without returning any value. However, | ||
the current parser expects to see parentheses and an expression after | ||
the `return` keyword. | ||
|
||
So, in `return_statement()` in `stmt.c`, we now have: | ||
|
||
``` | ||
// Parse a return statement and return its AST | ||
static struct ASTnode *return_statement(void) { | ||
struct ASTnode *tree= NULL; | ||
// Ensure we have 'return' | ||
match(T_RETURN, "return"); | ||
// See if we have a return value | ||
if (Token.token == T_LPAREN) { | ||
// Code to parse the parentheses and the expression | ||
... | ||
} else { | ||
if (Functionid->type != P_VOID) | ||
fatal("Must return a value from a non-void function"); | ||
} | ||
// Add on the A_RETURN node | ||
tree = mkastunary(A_RETURN, P_NONE, NULL, tree, NULL, 0); | ||
// Get the ';' | ||
semi(); | ||
return (tree); | ||
} | ||
``` | ||
|
||
If the `return` token isn't followed by a left parenthesis, we leave the | ||
expression `tree` set to NULL. We also check that this is a void returning | ||
function, and print out a fatal error if not. | ||
|
||
Now that we have parsed the `return` function, we may create an A_RETURN | ||
AST node with a NULL child. So now we have to deal with this in the code | ||
generator. The top of `cgreturn()` in `cg.c` now has: | ||
|
||
``` | ||
// Generate code to return a value from a function | ||
void cgreturn(int reg, struct symtable *sym) { | ||
// Only return a value if we have a value to return | ||
if (reg != NOREG) { | ||
.. | ||
} | ||
cgjump(sym->st_endlabel); | ||
} | ||
``` | ||
|
||
If there was no child AST tree, then there is no register with the | ||
expression's value. Thus, we only output the jump to the function's | ||
end label. | ||
|
||
|
||
## Conclusion and What's Next | ||
|
||
We've fixed five minor issues in the compiler: things that we need to work to get | ||
the compiler to compile itself. | ||
|
||
I did identify a problem with `&&` and `||`. However, before we get to | ||
that I need to solve an important, pressing, problem: we have a limited | ||
set of CPU registers and, for large source files, we are running out of | ||
them. | ||
|
||
In the next part of our compiler writing journey, I will have to work | ||
on implementing register spills. I've been delaying this, but now most of | ||
the fatal errors from the compiler (when compiling itself) are register | ||
issues. So now it's time to sort this out. |
Oops, something went wrong.