title | author | output |
---|---|---|
Protecting the BC Stack from Mutation |
Luke Tierney |
html_document |
In computing an expression like
<expr1> + <expr2>
the value of <expr1>
is computed first and then the value of <expr2>
.
It is possible, though unusual, that the expression for computing
<expr2>
might contain complex assignments. These assignments must not
mutate the computed value of <expr1>
. In R 3.4.4 this was not handled
properly and this example produced the wrong result:
x <- 1 + 0
x + { x[1] <- 2; 1}
## [1] 3 # should be 2
This issue was present in all calls to multi-argument BUILTIN
functions from the AST interpreter. A similar issue existed in subset
operations, implemented by a SPECIAL
:
x <- matrix(1:4, 2)
i <- 1 + 0
x[i, { i[1]<-2; 1}]
## [1] 2 # this is m[2, 1]; should be 1
Similar issues also existed in byte compiled code where a complex assignment could mutate values on the stack from earlier computations.
A variant of this issue was reported to the R-devel list by Lukas Stadler in August 2017.
These issues were partially addressed in R 3.5.0 and more completely
in R 3.6.0. The issue for BUILTIN
calls is handled in eval.c
and
needs no further action or maintenance elsewhere.
Implementation changes to SPECIAL
functions that evaluate more than
one argument need to pay attention to this issue and make sure values
computed earlier cannot be mutated while evaluating later
arguments. Ths should be done using INCREMENT_LINKS
and
DECREMENT_LINKS
, e.g. for two arguments as
val1 = eval(arg1, rho);
INCREMENT_LINKS(val1);
val2 = eval(arg2, rho);
DECREMENT_LINKS(val1);
The fix for protecting the byte code stack used in R 3.6.0 required the compiler to emit additional instructions protecting and unprotecting values on the stack. This note describes an alternative approach with less overhead that will be incorporated in R 4.0.0. (Committed to R_devel in r77275.)
Legitimate mutations through R code can only occur in a few places:
-
In byte code between
STARTASSIGN/ENDASSIGN
orSTARTASSIGN2/ENDASSIGN2
instructions. -
In calls to
applyDefine
from the AST interpreter.
The byte code interpreter also mutates values on the stack in a few
situations where this is known to be safe. One place is- in updating
the value of the loop variable in for
loops for some data
types. This is safe for values that are not shared as they cannot also
appear lower on the stack.
To make sure that these operations do not mutate any boxed value on the BC stack all values on the stack need to have their link counts (NAMED or REFCNT) incremented and then decrement around these possible mutation points. This is done by calling
INCLNK_stack
before entering possibly mutating code;DECLNK_stack
after leaving this code.
These functions use a stack pointer R_BCProtStack
to keep track of
how much of the stack has been protected. The pointer is stored in
contexts and used to decrement link counts on a LONGJMP.
A key assumption is that none of the values on the protected stack
will be changed between matching INCLNK_stack
and DECLNK_stack
operations. The binding cache and the variable value for a for
loop
may need to change, but these also do not need to be protected. Their
stack fields are marked with the NLNKSXP
tag so they will not have
their links incremented and decremented.
The stack is protected around every call to applydefine
. This
ensures that values on the stack are protected during complex
assignment operations during calls into the AST interpreter.
To reduce link count adjustments for objects lower on the stack, the stack protection pointer adjustment during byte compiled complex assignments has several components:
-
The stack protection pointer is raised at the beginning of a
bcEval
call and restored at the end. This means that top level complex assignments in byte compiled code need no further protection. -
The protection pointer does need to be raised around non-top-level complex assignments. This is done by
INCLNKSTK
andDECLNKSTK
instructions emitted by the compiler for non-top-level complex assignments.
Together these ensure that the stack is protected during all byte compiled complex assignments.
To further reduce unnecessary link count adjustments:
-
For
for
loops the stack protection pointer is adjusted around the loop body to avoid processing the loop data for every stack protection pointer adjustment needed while executing the body. -
Similarly, the stack protection pointer is adjusted for loops needing a jump context to avoid repeated processing of the context data.
A final efficiency improvement is to defer actually committing the link count increments until they are needed. This means that code written in a functional style that does not use complex assignments does not need to actually perform the link count adjustments. The points where count increments are committed are
-
in
applydefine
for the AST interpreter; -
in the
STARTASSIGN
andSTARTASSIGN
instructions.
Raising the minimal BC level will allow redoing the binding cache to be much smaller, and allowing the non-small-cache option to be dropped. The hacks for supporting old byte code can also be dropped at that point.