-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathclosures
423 lines (349 loc) · 18 KB
/
closures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
So, closures.
The Lua 5.0 paper says:
The implementation of closures : Lua 5.0 supports first-class
functions with lexical scoping. This mechanism poses a well-known
difficulty for languages that use an array-based stack to store
activation records. Lua uses a novel approach to function closures
that keeps local variables in the (array-based) stack and only
moves them to the heap if they go out of scope while being
referred by nested functions. The implementation of closures is
discussed in Section 5.
...
Each closure has a reference to its corresponding prototype, a
reference to its environment (a table wherein it looks for global
variables), and an array of references to upvalues, which are used
to access outer local variables.
Lua uses a structure called an upvalue to implement closures. Any
outer local variable is accessed indirectly through an
upvalue. The upvalue originally points to the stack slot wherein
the variable lives (Figure 4, left). When the variable goes out of
scope, it migrates into a slot inside the upvalue itself (Figure
4, right). Because access is indirect through a pointer in the
upvalue, this migration is transparent to any code that reads or
writes the variable. Unlike its inner functions, the function that
declares the variable accesses it as it accesses its own local
variables: directly in the stack.
Mutable state is shared correctly among closures by creating at
most one upvalue per variable and reusing it as needed. To ensure
this uniqueness, Lua keeps a linked list with all open upvalues
(that is, those that still point to the stack) of a stack (the
pending vars list in Figure 4). When Lua creates a new closure, it
goes through all its outer local variables. For each one, if it
can find an open upvalue in the list, it reuses that
upvalue. Otherwise, Lua creates a new upvalue and links it in the
list. Notice that the list search typically probes only a few
nodes, because the list contains at most one entry for each local
variable that is used by a nested function. Once a closed upvalue
is no longer referred by any closure, it is eventually garbage
collected.
It is possible for a function to access an outer local variable
that does not belong to its immediately enclosing function, but to
an outer function. In that case, even by the time the closure is
created, the variable may no longer exist in the stack. Lua solves
this case by using flat closures [5]. With flat closures, whenever
a function accesses an outer variable that is not local to its
enclosing function, the variable also goes to the closure of the
enclosing function. Thus, when a function is instantiated, all
variables that go into its closure are either in the enclosing
function's stack or in the enclosing function's closure.
[5] is L. Cardelli. Compiling a functional language. In LISP and
Functional Programming, pages 208--217, 1984.
If the outermost function also used the upvalue to access the
variable, there would be no need to copy the upvalue's value into the
stack at function exit time. I hadn't previously considered putting
each closed-over value into the heap separately, rather than together
in a single record, but it's clear that it's a better idea. To
illustrate, here's a simple program:
(lambda (a b)
(lambda (c d)
(lambda (e f) (+ e f c a))))
I was thinking of something like this:
function activation record
[ local var e ]
[ local var f ]
[ parent ptr ] -> function activation record
[ local var c ]
[ local var d ]
[ parent ptr ] -> function activation record
[ local var a ]
[ local var b ]
The "parent ptr" pointers point to the activation record for the
lexically enclosing function. So accessing a closed-over variable
could involve chasing several pointers. There are also problems with
garbage collection precision with this approach: if there's a variable
that isn't captured by any closures, you can avoid copying it to the
heap, but if you have multiple overlapping closures in the same outer
scope, the survival of one of them may result in retaining references
to the variables the others use as well. However, it has the nice
approach that closure values are small and all the same size.
The approach contemplated by the Intel x86 ENTER instruction is
fairly similar:
function activation record
[ local var e ]
[ local var f ]
[ parent ptr ] -> function activation record
[ grandpa ptr ] - [ local var c ] ---
| [ local var d ] | |
| [ parent ptr ] -- V
--------------------> function activation record
[ local var a ]
[ local var b ]
This avoids the need to chase several pointers. (As implemented by
the ENTER instruction, it also assumes the activation records are on
the stack.)
The upvalue approach instead looks like this:
function activation record
[ local var e ]
[ local var f ]
[ upvalue ptr ] ------------------------------------> [ local var a ]
[ upvalue ptr ] -----------------> [ local var c ] ^ ^
function activation record ^ | |
[ local var d ] | | |
[ upvalue ptr ] ----------------------- | |
[ upvalue ptr ] ------------------------------------------- |
function activation record |
[ local var b ] |
[ upvalue ptr ] ---------------------------------------------
Here the "function activation record"s can all be stack-allocated
rather than heap-allocated. The middle function carries a reference
to "local var a" merely because its inner function does. Now all
closed-over values are accessed in the same way.
Other Approaches
----------------
C# 2.0 creates anonymous classes whose instances correspond to
the closed-over part of function activation records:
> http://www.thinkingms.com/pensieve/CommentView,guid,9fe42970-09e3-44e2-a4d0-32d63139351a.aspx
Analysis Necessary
------------------
Variables must be classified into four types with regard to each
lambda: global variables, stack-allocated arguments, heap-allocated
arguments, and variables inherited from an outer scope.
Stack-allocated arguments are arguments that do not occur free in any
nested lambda; heap-allocated arguments are those that occur free in
some nested lambda; variables inherited from an outer scope are
lexically-bound variables that are not arguments; and global variables
are variables that are none of the above.
There's a term "captured variable" but it's ambiguous as to context
--- it could mean a heap-allocated argument or a variable inherited
from an outer scope. Consequently I will invent the term "artifact"
to mean "variable inherited from an outer scope".
Heap-allocated arguments and artifacts are collectively "heap
variables". In Lua, these would be variables for which upvalues
exist; but the approach I'm going to take is a little bit simpler than
the upvalue approach. I'm just going to allocate them on the heap on
entry to the scope where they are heap-allocated arguments, and access
them indirectly thereafter.
About Instruction Sequences
---------------------------
I'm currently writing a toy Scheme compiler; it stores its current
top-of-stack in %eax and a pointer to the current procedure's
arguments in %ebp. If I use this approach, I'll push the
heap-variable pointers somewhere near %ebp.
Now there are just two instruction sequences to access a local
variable; for a variable actually in the stack:
mov 8(%ebp), %eax
and for a heap variable:
mov -16(%ebp), %ebx
mov 4(%ebx), %eax
On procedure entry, we push pointers to all heap variables --- whether
heap-allocated arguments or artifacts --- so that we can access them
uniformly in the above fashion.
One cost is that the sizes of closure values vary --- they may need to
contain an arbitrary number of artifact pointers --- and to get the
two-instruction heap-variable access above, all those artifact
pointers have to be copied from the closure into the stack at
procedure entry time. The alternative to copying all of them is using
another instruction on each access:
mov -8(%ebp), %ebx # fetch closure pointer
mov 12(%ebx), %ebx # fetch artifact pointer
mov 4(%ebx), %eax # fetch contained value
But that doesn't help with the problem of heap-allocated arguments.
Copying from the closure into the stack isn't terribly expensive in
code; upon procedure entry, the pointer to the closure value is in
%eax:
push 12(%eax) # push first heap-variable pointer
push 16(%eax) # push second heap-variable pointer
(Each of the above instructions happens to be three bytes.)
Heap-allocated arguments are slightly trickier; each one must be
copied into a heap variable during the procedure entry, and then the
pointer to it must be pushed.
mov 8(%ebp), %eax # get argument value
call allocate_heap_variable
push %eax
allocate_heap_variable looks something like this:
allocate_heap_variable:
push %eax
mov $1 + 8<<2, %eax
# memory allocation code goes here, leaves pointer in %eax
mov $0x1abe11ed, (%eax)
pop 4(%eax)
ret
Then at procedure exit time, all the heap-varible pointers must be
popped off, but that's going to be handled by restoring %esp, so it
doesn't take any extra instructions.
Representation of Lexical Environments
--------------------------------------
Up to now, I've represented lexical environments simply as an alist of
getters --- code to generate assembly code to fetch the value of the
name. But now I need to be able to do three different things with
names in the lexical environment:
- (generate code to) fetch the value;
- (generate code to) initialize the value at procedure entry time;
- come up with an element of the lexical environment for an inner
scope. Stack-allocated arguments can't do this; heap-allocated
arguments become variables captured from an outer scope; and
variables captured from an outer scope remain variables captured
from an outer scope.
So the representation of an environment could be something like
'((foo stack 0) (bar heaparg 1 0) (baz artifact 1)).
I could probably write something like this:
(define get-value
(lambda (description)
(if (eq? (car description) 'stack) (get-argument (cadr description))
(if (eq? (car description) 'heaparg) (get-heap-var (caddr description))
(if (eq? (car description) 'artifact)
(get-heap-var (cadr description)))))))
It's a little hairier because you can't write xlate-for-inner-scope in
the same way. The numbers in the cdrs of the heaparg and artifact
need to be assigned inside each particular lambda so they don't
collide with one another and don't leave empty spaces.
So compute-inner-environment, in order to sort variables among the
four categories and nonexistence, would need the enclosing lexical
environment to distinguish between global variables and artifacts; in
order to distinguish between arguments and artifacts or global
variables, would need the argument list; in order to distinguish
between stack-allocated and heap-allocated arguments, would need to
know which variables are captured by inner lambdas; in order to
distinguish between artifacts and variables not used, would need to
know the variables that occur free within the body of the lambda; and
in order to assign heap-pointer and argument slots, would need to know
how the number of slots it has reserved so far. Whew. It's probably
possible to write that function, but it isn't very appealing.
However, we don't need all that information at any given place, do we?
At a variable reference, we only need to know whether the variable is
heap, stack, or global, and what its slot number is; when computing
the result of a lambda expression, we only need to know which
variables it captures (and can use the normal variable reference
apparatus to look them up); when compiling a procedure prologue, we
need to know which variables are heap arguments (and what slot they go
into, and what argument index they come from) and which variables are
artifacts (and what slot they go into).
So the procedure prologue is probably the hardest case. Ideally we'd
like to write it like this:
(define push-heap-var-pointers
(lambda (args artifacts heap-args)
(push-heap-var-pointers-2 args artifacts heap-args 0)))
(define push-heap-var-pointers-2
(lambda (artifacts heap-args slot-num)
(if (null? artifacts) (push-heap-args args heap-args slot-num)
(begin (push-artifact slot-num)
(push-heap-var-pointers-2 args (cdr artifacts)
heap-args (+ slot-num 1))))))
(define push-heap-args
(lambda (args heap-args slot-num)
(if (null? heap-args) '()
(begin (push-heap-arg args (car heap-args) slot-num)
(push-heap-args args (cdr heap-args) (+ slot-num 1))))))
(define push-artifact
(lambda (slot-num)
(asm-push (offset eax (+ 12 (quadruple slot-num))))))
(define push-heap-arg
(lambda (args arg slot-num)
(get-variable arg args)
(asm-pop ebx) ; get-variable will have pushed %eax
(asm-push eax)))
Incidentally, in the above, we also compute the heap-variable slot
number for each thing we push; so it might be good to return that,
like so:
(define push-heap-var-pointers
(lambda (args artifacts heap-args)
(push-heap-var-pointers-2 args artifacts heap-args 0 args)))
(define push-heap-var-pointers-2
(lambda (artifacts heap-args slot-num heap-vars)
(if (null? artifacts) (push-heap-args args heap-args slot-num heap-vars)
(begin (push-artifact slot-num)
(push-heap-var-pointers-2
args
(cdr artifacts)
heap-args
(+ slot-num 1)
(cons (heap-var-slot (car artifacts) slot-num) heap-vars))))))
(define push-heap-args
(lambda (args heap-args slot-num heap-vars)
(if (null? heap-args) heap-vars
(begin (push-heap-arg args (car heap-args) slot-num)
(push-heap-args args (cdr heap-args) (+ slot-num 1)
(cons (heap-var-slot (car heap-args) slot-num) heap-vars)))))
(define heap-var-slot (lambda (name num) (list 'heapvar name num)))
This conses all the heap variables onto the front of the args list
(which is presumably an alist explaining how to get arguments off the
stack).
Perhaps a nicer approach would first compute all the slot numbers and
types, then push them. Then the pushing code could look like this:
(define push-heap-vars
(lambda (args heap-vars)
(for-each (lambda (var) (push-heap-var args var)) heap-vars)))
(define push-heap-var
(lambda (args var) (push-heap-var-2 args (car var) (cadr var) (caddr var))))
(define push-heap-var-2
(lambda (args type name slot-num)
(if (eq? type 'artifact) (push-artifact slot-num)
(push-heap-arg arg name))))
(define push-artifact
(lambda (slot-num)
(asm-push (offset eax (+ 12 (quadruple slot-num))))))
(define push-heap-arg
(lambda (args arg)
(get-variable arg args)
(asm-pop ebx) ; get-variable will have pushed %eax
(asm-push eax)))
That does depend on the slot-nums in the list not being out of order.
Anyway, then you could compute the slot numbers and types as follows:
(define heap-var-slot-defns
(lambda (artifacts heap-args) (heap-var-slot-defns-2 artifacts heap-args 0)))
(define heap-var-slot-defns-2
(lambda (artifacts heap-args slot-num)
(if (null? artifacts) (heap-arg-slot-defns heap-args slot-num)
(cons (artifact (car artifacts) slot-num)
(heap-var-slot-defns-2 (cdr artifacts) heap-args
(+ slot-num 1))))))
(define heap-arg-slot-defns
(lambda (heap-args slot-num)
(if (null? heap-args) '()
(cons (heap-arg (car heap-args) slot-num)
(heap-arg-slot-defns (cdr heap-args) (+ slot-num 1))))))
(define artifact (lambda (name slot-num) (list 'artifact name slot-num)))
(define heap-arg (lambda (name slot-num) (list 'heap-arg name slot-num)))
So then maybe you could toss the artifacts and heap-args into the
environment for compiling the interior of the procedure, along with
the stack arguments. You'd probably want to reformat them a bit,
though, so that you can look them up by name with assq.
This brings us back to the scary old compute-inner-environment
function, which is only slightly different from the above
heap-var-slot-defns function.
The artifact list has to be computed for compiling the lambda as well,
and it has to be computed as having the same order as the artifact
list that the prologue is using. So probably best to pass it in.
Computing the artifact list requires just the lambda-expression and
the surrounding environment:
(define artifacts-used-by
(lambda (expr env)
(filter (lambda (var) (assq var env)) (free-vars expr))))
And the heap arguments are similarly easy:
(define heap-args
(lambda (args body) (set-intersect args (captured-vars body))))
So, to compile a lambda, today we do this:
(jmp jumplabel)
(compile-procedure-labeled proclabel nargs
(lambda () (compile-begin body (lambda-environment env vars 0) #t)))
(label jumplabel)
(push-const proclabel)
The "lambda-environment" part provides the environment for the
compilation of the body, and it is defined as follows:
(define lambda-environment
(lambda (env vars idx)
(if (null? vars) env
(cons (cons (car vars) (lambda () (get-procedure-arg idx)))
(lambda-environment env (cdr vars) (+ idx 1))))))
This is incorrect; it should not include "env", nor should it need it
as an argument. Only artifacts-used-by should depend on "env".