-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathREADME.gc
206 lines (175 loc) · 8.17 KB
/
README.gc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
Ur-Scheme doesn't have a garbage collector yet. In case you want to
implement one, here are some notes on the thoughts I'd had so far
about garbage collection. Probably the actual garbage collector will
be much smaller than this.
Heap Overflow Checking
----------------------
One of the things needed for garbage collection is heap overflow
checking on every malloc, with a conditional branch to the garbage
collector.
Right now, memory allocation is inlined. Here's an example memory
allocation of 12 bytes:
412 # allocate bytes:12
413 0272 50 push %eax
414 0273 A1040000 movl (arena_pointer), %eax
414 00
415 0278 83050400 add $12, (arena_pointer)
415 00000C
416 # now %eax points to newly allocated memory
And here's an allocation (the only one actually) of an arbitrary
number of bytes, which is 8 plus a tagged integer in %eax:
199 # we need 8 bytes more than the string length
200 0116 D1F8 sar %eax
201 0118 D1F8 sar %eax
202 011a 83C008 add $8, %eax
203 011d 83C003 add $3, %eax
204 0120 83E0FC and $~3, %eax
205 0123 8B1D0400 movl (arena_pointer), %ebx
205 0000
206 0129 01050400 add %eax, (arena_pointer)
206 0000
207 012f 89D8 movl %ebx, %eax
I was thinking that procedure-call overhead on modern CPUs might be
high enough that doing these out-of-line would be slow. But of course
running the latter code in place of the former would also be painful.
x86 doesn't have a conditional CALL instruction. So you need a
conditional jump around a call instruction, so the heap-overflow check
looks something like this:
cmp %eax, (arena_end)
jbe dont_gc
call garbage_collector
dont_gc:
But you can't have multiple labels "dont_gc". Either you can use a
different label each time, which means you can't memoize the
allocation sequence as it is now done, or you can use gas's "1f" local
label syntax, which would be an obstacle to portability to non-gas
assemblers.
The alternative is that I could un-inline the memory allocation ---
saying "mov $12, %ecx; call malloc" in place of the first sequence
above --- and just use a single conditional jump in the allocator,
since the return address is already on the stack.
So, either:
- un-inline memory allocation and use conditional jump;
- use local labels;
- allocate a new label for every allocation.
Additionally, un-inlining would imply using a generic
memory-allocation routine even in cases where the size is known,
rather than one specialized for the known sizes.
GC strategy
-----------
Because the compiler is written in an almost purely functional style,
it tends to create a lot of very short-lived garbage very quickly, and
only a tiny amount of data is long-lived. So it would probably run
fastest with a copying collector. Abdulaziz Ghuloum reports that,
when compiled with Ikarus, on his 2GHz Core Duo, it compiles itself in
450ms, allocating 19 megabytes, and taking 4ms in the garbage
collector (with a 4MB nursery). And it should do well with a
generational collector, because mutation is rare (except, erm, for the
parser's input character buffer (!)) and almost all of the mutation is
to store things permanently.
I suspect a Cheney two-finger collector is probably the simplest
collector that could work.
If generational collection got implemented, there would be a
nursery-size performance tradeoff. With a nursery of 14MB or more, no
collection would ever need to happen when the compiler compiles
itself. With a smaller nursery, a few short-lived objects would
survive each nursery collection and clutter up the older generation;
this effect is more serious as nurseries get smaller. But if the
nursery is smaller than a cache, then allocation out of the nursery
will rarely cause cache misses on that cache. My current CPU has a
unified L2 cache of 256KiB and a L1 D-cache of 16KiB, so nursery sizes
in the neighborhood of 128KiB or 8KiB would improve locality in these
two caches.
Constants
---------
Some Scheme values are constants, from the .rodata section, rather
than heap-allocated values. Constants cannot be mutated and therefore
cannot point to anything created after the program starts, such as
heap-allocated data, so scanning them is unnecessary. Copying them is
also unnecessary. I think that distinguishing them from heap items
will require pointer comparisons.
Scanning the Stack
------------------
The stack contains both Scheme values and non-Scheme values, so
treating it as a mere array of words is likely to be difficult. It
should be tractable if treated as mostly a linked list of stack frames
whose head is in %ebp and whose tail is the value 0x610ba1, which %ebp
is initialized to in "main" or "_start".
The structure of each stack frame is:
.
+--------------+ :
| caller stuff | <------------\ |
+--------------+ | |
| argument N | | |
+--------------+ | |
| argument N-1 | | |
. . . | |
: : : | |
| argument 3 | | |
+--------------+ | |
| argument 2 | | |
+--------------+ | |
| argument 1 | | |
+--------------+ | |
| argument 0 | <---- %ebp | |
+--------------+ | |
| caller %eip | | |
+--------------+ | |
| caller %esp | -------------/ |
+--------------+ |
| caller %ebp | ---------------/
+--------------+
| heap var 0 |
+--------------+
| heap var 1 |
. . .
: : :
| temp val 0 |
+--------------+
| temp val 1 |
+--------------+
| temp val 2 |
+--------------+
| temp val 3 | <----- %esp
+--------------+
Normally everything on the stack except for saved %eip, %esp, and %ebp
values will be a run-time tagged value; it's just that you have to
follow the %ebp chain to find out where those are.
(An alternative approach might be pointer comparisons: saved %eips
will point into the .text section, not the heap, and saved %esps and
%ebps will point into the stack, not the heap. As explained above,
you need pointer comparisons to figure out which pointers point to
constants anyway.)
Run-time tagged values are either Scheme values or heap vars. The
heap vars can be treated by the GC as if they were Scheme values
--- they're aligned pointers that point to heap-allocated cells with
magic numbers, just as Scheme values are. So even though they can't
appear as the value of a variable in a Scheme program, they don't add
any extra complexity to the GC.
Some assembly-language primitives, such as "display" and
"string->symbol", will push things that are not Scheme values on the
stack. I don't know if any of them currently try to allocate memory
while in this state, but if so, they will have to be changed.
Similarly, some assembly-language primitives will store Scheme values
in other registers that are not part of the stack. I don't know if
there are any cases where the only reference to a Scheme value is in a
register. None of the primitives currently call set-procedure-arg, so
values passed in as arguments are safe; it's only values that might
have been created during the execution of the primitive that are
potentially in danger.
Writing a GC in Scheme
----------------------
If you can write the GC in Scheme, it will be a lot shorter and
probably easier to debug. You'll need at least pointer comparison,
pointer increment, and pointer fetch primitive operations, and you'll
need to write the GC so that it doesn't allocate any memory on the
heap.
Currently the things that allocate memory on the heap are:
- primitives that obviously do so: make-string, cons, string->symbol
- entering functions that create closures (including closures
implicitly created by let-expressions, including let-expressions
implicitly created by "or" and "case")
- actually creating the closures (this is probably irrelevant since
you can't do it without entering the function first)
- invoking variadic functions (including list, append, and error) with
any arguments. (It's safe to invoke them with no arguments!)