Skip to content

Tags: carld/compiler-tutorial

Tags

1.9

Toggle 1.9's commit message
Chapter 1.9 Heap Allocation

Note the extern scheme_entry function prototype return type changed from int to
long for the 64bit quadwords to be correctly returned.

compt master % cat m64.c
int main() {
  printf("%ld\n", sizeof(long));
  printf("%ld\n", sizeof(int));
  printf("%ld\n", sizeof(void *));
  printf("%ld\n", sizeof(long *));
}
compt master % ./sizes
8
4
8
8
compt master %

Introducing the begin special form, not functional anymore...

gdb tip: to show memory (esp when debugging vectors), for example 8 words from
address:
  display /8xg 0x100068000

Store registers marked as preserve in the context structure, and restore after
call to scheme entry point.

Pairs are adjacent quad words, tagged with 0x00000001, thus to untag and
reference the car,   pair-1, and to untag and reference the cdr, pair+7.

1.8

Toggle 1.8's commit message
Chapter 1.8 Iteration via Proper Tail Calls

Collapsing the stack does not consider how many locals are on it,
simply move the current set of arguments adjacent to the first cell containing
the return address.

tests-1.8 appear to have tests not related to tail calls so these tests have been
removed.

A lot of code can be refactored now

It seems like everything aside from argument evaluation is in a tail
position in this compiler?

passed all 419 tests

1.7

Toggle 1.7's commit message
Chapter 1.7 Procedures

Reminder: the stack grows downwards in memory, meaning decreasing the stack
pointer uses up more stack space not less.

The call instruction performs the following: (1) computes the return point (i.e.
the address of the instruction following the call instruc- tion), (2) decrements
the value of %esp by 4, (3) saves the return point at 0(%esp), then (4) directs
the execution to the target of the call.

The ret instruction performs the following: (1) it loads the return point
address from 0(%esp), (2) increments the value of %esp by 4, then (3) directs
the execution to the return point.

In a call, rsp starts out pointing to the return address of the call.

Regarding earlier confusion,
    not sure about this line:
        (emit "  mov [rsp + 8], rsp") ; stack base argument
    tutorial has: movl 4(%esp), %esp

It looks like the tutorial code would pick the stack_base argument directly
off the stack - as it was the last local in main C function, and could
be expected to be in the machine word above the current stack pointer.

In 64-bit calling convention, this argument is passed in rdi.

stack_base is passed via the calling convention, and as it's the  only
argument, it is passed in the rdi register.

So this line has been changed to the following:
    (emit "  mov rsp, rdi")

Using 64-bit cells, Bus Error would occur in deeply nested procedures tests
due to use of rsp pointing to out of bounds memory, need more memory...

   int stack_size = (16 * 4096); /* holds 16K cells */

Increasing 16 to 32768 enables those tests to pass, allowing enough stack size:
   5000000 (call depth in test) * 2 (lambdas) * 8 (wordsize) = 80,000,000
   32768 * 4096 = 134,217,728

Tip: to get gdb to show intel syntax,
% cat ~/.gdbinit
set disassembly-flavor intel

Tip: to see code in gdb,
layout asm
layout reg

The file tests-1.7-req.scm appeared to have a repeat of earlier binary
primitives tests, so these have been removed.

Add gcc flags, -g to include debugging symbols, -fomit-frame-pointer so gcc
does not generate code that uses the rbp register,
-Wall and -pedantic to provide C language warnings

In this implementation, letrec cannot be nested and has to be the outermost
expression.

Remove reliance on ctype.h for isspace, iscntrl (down the track startup.c can be
ported to assembly)

Emit comments next to some assembly code to help debug Bus Error (see above)

passed all 400 tests

1.6

Toggle 1.6's commit message
Chapter 1.6 Local Variables

Variables are implemented by placing values on the stack. The compiler
keeps an environment association list that maps the name of the variable
to its index on the stack.

- emit-stack-save means to mov value from rax to stack at current stack index
- emit-load-stack means to mov from stack at current index to rax

passed all 381 tests

1.5

Toggle 1.5's commit message
Chapter 1.5 Binary Primitives

Notes on System V calling convention:
  - return value in rax, rdx (if it's 128-bit)
  - parameters in rdi, rsi, rdx, rcx, r8, r9, then stack right to left
  - aligned to 16-byte
  - scratch registers rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11
  - preserve registers rbx, rsp, rbp, r12, r13, r14, r15
  - call list rbp
  - the call instruction pushes the address of the next instruction to the stack
    and jumps
  - stack has 128 byte red zone
  - push,pop, call, ret instructions affect rsp
  - in 64-bit the word size is a quad word, 8 bytes
  - the stack grows down in memory, the index starts at -8

not sure about this line:
    (emit "  mov [rsp + 8], rsp") ; stack base argument
tutorial has: movl 4(%esp), %esp

1.4

Toggle 1.4's commit message
Chapter 1.4 Conditional Expressions

Exercises 1&2, implement if; and; or
TODO: Exercise 3, minimize the number of comparisons performed

1.3

Toggle 1.3's commit message
Exercise 1.3 Unary Primitives

passed all 196 tests

- Chez Scheme Version 9.4.1
- NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Feb 10 2016
- gcc-6 (Homebrew gcc 6.1.0) 6.1.0
- Darwin 15.6.0 x86_64 i386

1.2

Toggle 1.2's commit message
Exercise 1.2 Immediate Constants, from the compiler tutorial.

Notes:
- compile-program is a chez scheme builtin proc, so use emit-program
  for the compiler procedure instead,
- formatted error messages use the errorf procedure in chez scheme,
- use a makefile to build the executable, and set path to gcc, nasm
  in the makefile, outside tests-driver,
- tests-driver had duplicated functions: get-string, test-with-string-output,
  execute, build-program

1.1

Toggle 1.1's commit message
Exercise 1 from the compiler tutorial, using nasm instead of gas