Skip to content

A lifetime inference algorithm for the Rust programming language written in Soufflé.

Notifications You must be signed in to change notification settings

rljacobson/lifetimes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a lifetime inference algorithm for Rust written in Soufflé. It is a proof of concept and will not actually work with your source code automatically—but it could in principle if we hooked it up to Rust’s AST. The To Do section below illustrates how unnecessarily restrictive the Rust compiler is regarding lifetimes.

Usage

Since it’s a proof of concept, it’s very unsophisticated. In “real life,“ the input comes from the AST generated by the compiler. In this script I just manually typed the input into the script itself. The input is labeled as such.

The inference algorithm is just a Soufflé script, so the only dependency is Soufflé. Once Soufflé is installed, run the script like so:

$ souffle -D. lifetimes.dl

What’s nice about Soufflé is that it will tell you how it deduced something. Here’s how:

$ souffle -D. -t explain lifetimes.dl
>      explain <<thing that was deduced here>>

The -t explain flag drops you into a command line. Type explain followed by the thing you want explaining. I got tired of doing this, so I rewrote my “error” predicate to tell me where it comes from together with an error_msg that produces a human readable error message string.

.output error_msg

Lifetime Inference in Rust

Last Monday I decided to take a look at Rust for the first time. Rust has the notion of lifetimes which is a mechanism to prohibit errors like use after free and race conditions. However, the Rust compiler requires the programmer to figure out the lifetimes manually. Programmers new to Rust find the “borrow checker,“ the component of the compiler that checks lifetime correctness, to be a challenging learning curve. The internet is full of people “fighting the borrow checker.”

A lifetime is a type. The Rust compiler infers types, but it doesn’t infer lifetimes.1 Why not? I don’t know. It seems like such a thing would be a priority considering how big of a barrier it is to new users. I decided to write my own lifetime inference engine.

How does it work?

It works the same way type inference works—literally. A lifetime is a type. Consider this example from The Book:

fn longest(x: &str, y: &str) -> &str { // ━━━━━━┱───┐ -x and y borrows here.
    if x.len() > y.len() { //   x's borrow    S4┃ S5│ 
        x     //                overlaps with   ┃   │ -loan_if = x, x drops.
    } else {  //                y's borrow      ┃   │
        y     // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋───┤ -loan_if = y, y drops.
    } // either x or y is still borrowing here  ┃   │
} // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┹───┘ -result borrows here, while
                                                   //  x and y go out of scope.
fn main() {
    let string1 = String::from(  // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ -string1 born
        "long string is long");  //   string1's scope              S1┃
    let result;                  // ┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉╂┉┉┐
    { //    result's scope, but not assigned to, so no lifetime yet  ┃S4┋
        let string2 = String::from("xyz"); //─────────────────────┐  ┃  ┋ -string2 born.
        result = longest(string1.as_str(), // string2's scope   S2│  ┃  ┋
                        string2.as_str());       // ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┼┄┄╂┄┄┼┄┄┄┐ -string1 or string2 borrowed here.
    }                                            // ──────────────┘  ┃  ┋ S3┆ -string2 dies.
    println!("The longest string is {}", result);// - S5             ┃  ┋   ┆ -result drops.
                                                 // ━━━━━━━━━━━━━━━━━┹┉┉┘┄┄┄┘ 
}                                                // - S6, global scope.

The boxes labeled with S# represent either lifetimes or regions in which a reference holds a “loan,” which is what I will call a borrow when I need a noun to refer to the abstraction of a reference holding a particular heap-allocated data element.

Program input

The sourcecode data needs to be input into the program. In particular, the input consists of:

  1. The control flow graph. The control flow graph tells us whether program control can flow from point A to point B during execution. The points are program statements, which we identify with line numbers in the source code for simplicity. We write CFG.edge("L3","L4”){:.rust} to mean control flows from line L3 to line L4. The control flow graph itself can be constructed by computing reachability along edges, but we don’t compute it directly, so CFG is only an abstraction. What we actually construct is the data flow graph in our application, because it describes whether a piece of data's lifetime, via borrowing or scope, is "live" at any given point.
  2. Where a loan is created together with some kind of identifier. We write object_born("string1", "L10"){:.rust} and, for simplicity, use the name of the original variable that created the loan as the name of the loan.
  3. Where the loan dies, i.e. where it’s owner drops the loan. We write object_dies("string1", "L20"){:.rust}.
  4. Where which reference borrows from which other reference. We write borrows("loan_if", "x", "L3”){:.rust}. Note that which loan is borrowed is unnecessary—that is inferred by the algorithm.
  5. Where which reference releases their loan, e.g., because it went out of scope. We write drops("x", "string1", "L3”){:.rust}.

Computations

From the program input, we compute the set of all points a given referenceis borrowing a given loan. The reference starts holding the loan exactly when it borrows it:

borrowing(ref, ln, p) :- borrows(ref, ln, p).

It continues borrowing the loan along the control flow graph until it drops the loan, perhaps by going out of scope:

borrowing(ref, ln, p2) :- CFG.edge(p1, p2),
                        borrowing(ref, ln, p1),
                        !drops(ref, ln, p2).

This computes the region of source code that the reference ref holds the loan ln—sort of: crucially, despite the fact that I’ve called ln a loan, it is actually a reference to a loan. So borrowing tells us which reference is borrowing from which other reference at which points in the source code. But which loan is it referencing?

To bootstrap, we declare (perhaps awkwardly) that the loan’s owner references itself. (Recall that we are naming the loan after its owner, so it makes a bit more sense to read “references” as, “is held by” or something.)

references(var, var, pt) :- object_born(var, pt).

Next, whatever reference is borrowing the loan, we need the reference to propogate to the next point on the control graph (the next statement) if it’s still borrowing from the same thing:

references(ref2, ln, pt) :- CFG.edge(p1, p2),
                        borrowing(ref2, ref1, p1),
                        references(ref2, ln, p1),
                        borrowing(ref2, ref1, p2).

The crucial difference between references and borrowing is, references doesn’t update its referrent. That is, the thing being referred to never changes from when it came into being at an object_born:

references(ref2, ln, pt) :- CFG.edge(p1, p2),
                        borrowing(ref1, _, p1),
                        references(ref1, ln, p1),
                        borrowing(ref2, ref1, p2).

Detecting Errors

Conceptually, if ref2 borrows from ref1, then we need the set S2 := { p ∈ Program | borrowing(ref2, ref1, p) } to be a subset of the set S1 := { p  | borrowing(ref1, _, p) }, that is, S1⊂ S2 . Said another way, the lifetime of ref2's borrow must be a subset of the lifetime of ref1's borrow.2

A dangling pointer occurs when the loan dies while something is still referencing it, perhaps because its owner goes out of scope. From the conceptual perspective of sets, an error occurs when we find a point p, a reference ref, and a loan ln such that references(ref, ln, p){:.prolog} but NOT borrowing(ln, ln, p){:.prolog} (the owner drops ln). In symbols, S1⊂ S2 is a constraint, but p ∈ S1∖ S2, contradicting the constraint. To be fancy, we also compute the point at which the reference first borrows the object.

error_use_freed(var, ln, errpt, borpt) :- references(var, ln, errpt),
                        object_dies(ln, errpt),
                        references(var, ln, borpt),
                        borrows(var, _, borpt).

It is also an error to have distinct (mutable) references to the same object. I don't actually make a distinction between mutable and immutable borrows, so this error isn't true to life. It is just to show how it would work. Again, we're fancy and compute the reference already pointing to the object and where that reference borrowed from the object.

error_already_borrowed(ref1, ln, errpt, ref2, borpt) :- references(ref1, ln, errpt),
                                borrows(ref1, ln, errpt),
                                references(ref2, ln, errpt), ref2!=ref1,
                                borrows(ref2, ref3, borpt), ref2!=ref3,
                                references(ref2, ln, borpt).

Et voilà!

Error: 'string2' is dropped at L17, but 'result' still holds the loan it borrowed at L15.

And that’s pretty much it, modulo a few details you can find in the file lifetimes.dl. From here it’s not too hard to compute lifetime constraints, for example, lifetime(ref2)lifetime(ref1).

Mathematical Points

  • The body of a function completely determines the most general lifetime annotations of the function signature.
  • The lifetimes in a function signature affect the code at the call site but not the other way around. In other words, lifetimes are synthesized from the bottom up.
  • Lifetimes are types, and lifetime inference is exactly the same as type inference. To determine the most general lifetime annotations for a function signature, one just unifies the lifetimes of the return value and program arguments.
  • I suspect lifetime inference is complete and can be done in linear time.

To Do

The last three items below demonstrate how much the Rust compiler could do that it currently isn’t doing. The Rust compiler is far more restrictive than necessary.

  1. Make a distinction between mutable and immutable borrows.
  2. Better examples of multiple mutable borrows.
  3. Is it easy to hook this into a Rust parser? If so, do it.
  4. Generate inferred lifetimes.
  5. Allow a (mutable) borrow if the previous (immutable) borrow will no longer be used.
  6. Allow a borrow after a loan die if the borrow will no longer be used.

1: This is a convenient lie.

2: This is also a convenient lie.


License

Created by Robert Jacobson on 16 June 2019. I consider this too trivial to need a license, but I’m giving it the MIT License just in case someone needs one. The example Rust source code is from The Rust Programming Language, which is part of The Rust Project. The Rust Project is dual-licensed under Apache 2.0 and MIT terms.

Copyright (c) 2019 Robert Jacobson.

The MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

A lifetime inference algorithm for the Rust programming language written in Soufflé.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages