Rust Borrowing, Lifetimes, and Ownership

Introduction

References, lifetimes, borrowing, ownership, the borrowchecker. It can all be very confusing for someone new to Rust, especially without any C/C++ or other low level systems programming language experience. This guide will not attempt to replace the Rust Book but merely add supplementory material to help you really understand what's going on and enhance what you learned from the book.

If you are coming to Rust without C/C++ (or other low level language) experience, Rust's borrowchecker will kick your lily-white(or w/e color) a$$. So let's give you a fighting chance.

Terms:

First let's define a few essential terms.

String Literal - A string that is copied into the executable's code and thus resides inside the program, not in the heap like most strings. This could reside in the .data or .bss section if you are familiar with assembly. When your program is run, the code of the program itself is loaded into memory (with extra room for the string literal's contents) and the string literal will be assigned an address that refers to that chunk of the program code which contains the string literal.

Reference - A reference can point to data stored in the heap, stack, or even in the program itself (string and byte literals) - they are all addresses. References are not just addresses; the compiler ensures that when the reference is used, the area it points to must be valid, otherwise it will result in a compile error.

Raw Pointer - like a reference without the compiler safety checks. It is just an address nothing more. A raw pointer is similar to references in C/C++. Note that raw pointers have a type of *const or *mut, the asterisk is part of the type's name, it does not do any dereferencing but merely indiciates it is a raw pointer.

Memory

The Heap

The heap is an area of memory that is allocated in chunks as needed by programs. The heap will store strings, vector### Str
A str without the reference cannot actually exist. This is because the contents of a str are really just an array of data from somewhere else. Dynamically sized types are structs with the last field being an array. So &strs are an array of characters, but it can only exist as a reference because that data is not owned, it is a reference to existing data.
s, and other complex objects whose length is not defined at compile time. The heap is a simple concept. Constant allocations and deallocations to the heap leave variables scattered around, with holes of free space between allocated data. In order for an allocation to reserve an appropriately sized chunk of space, that is not yet taken, the allocator uses an algorithm (which one varies between allocators) which may take a long time, relatively. In Rust, since determining an allocation can be a complex and costly process it is only used when needed - for example integers are stored on the stack while String's are stored on the heap.

When a string/vector is full it usually allocates a new larger chunk of memory and moves the contents to that new space and deletes the old. This is why using with_capacity() is highly recommended when you can guess the size, otherwise it starts with a small capacity and each reallocation is larger; many reallocations will be required when no capacity is specified and a large number of items are added.

Using the heap is much much slower than the stack because it has to find a free chunk of memory with the size requested (which can be a relatively costly process when large amounts of allocations are made). The term 'allocation' usually refers to heap memory allocations, not stack.

Stack

The stack is an area of memory containing chunks of memory that act like the Stack data structure. The stack is a last-in first-out strucuture, meaning if you add the characters 'a', 'b', and 'c' to a stack, and pop() it, the returned value would be 'c', however if you wanted to remove 'a' you would first have to remove 'b' and 'c'.

Non-trivial functions (not simple ones with no local variables) will assign local variables to the stack, as long as they are primitive variables and not Vectors, Strings, or inside a Box/Rc/Arc or other heap allocated structure.

Each function call will create a new entry, or chunk of memory, on the stack and when it is finished the entry will be removed. In Rust all programs start with a specific function main() (or another specified entry point) which will create an entry on the stack, and all function calls inside main() will create additional entries (which are removed when the function returns). When main() reaches the end it will remove its stack entry.

You can read from previous stack entries, but you can only delete or add entries at the end. The compiler will create a memory layout for each function that will be loaded into the stack when the program executes. This is why items on the stack are fixed sizes, and their sizes must be known in advanced.

Strings and &str's

Strings

A String is a heap allocated collection of chars. The String object is just a vector of chars. Its contents is owned.

&strs

The str is basically a reference to a collection, or a subset, of characters. The str contains a specified length, and a pointer to where it starts. The last field is a dynamically sized type. The last field is a pointer to an array of characters, with the specified length. The str can retrieve the contents by dereference. The str's structure would look something like:

struct str<'a> {
    length: usize,
    contents: [&'a char]
}

Since the last field is a dynamically sized type it only works as a reference, which makes sense because the contents refer to an area of memory that has already been allocated somewhere. The contents can point to part of a String (which is allocated on the heap) or it can point to a string literal which is stored in the program's binary (perhaps the .data or .bss section in assembly).

Str

A str without the reference cannot actually exist. This is because the contents of a str are really just an array of data from somewhere else. Dynamically sized types are structs with the last field being an array. So &strs are an array of characters, but it can only exist as a reference because that data is not owned, it is a reference to existing data.

&String vs &str

A &str is a dynamically sized type, with its contents pointing to a sequence of characters. A &String is a reference to a String. The String object is basically just a vector of characters. The difference is in the datastructure layout - a &String in memory is just an address that points to a String, where a &str is a dynamically sized type whose contents points to, and dereferences to, a collection of characters.

Conufsed??

It's ok, the important thing to know is that &String and &str have a different memory layout so they are not compatible, they cannot be implicitly converted into each other. It is not very important to know how a &str works internally, just know that it is a reference to a sequence of characters, where as a &String is a reference to an owned* vector of characters (*: More on owned later).

Memory Layout

In many systems the stack will be located at one end of the address space while the heap is located at the other end. Typically the stack will be the low end while the heap is the high end, however this is platform dependent.

Ownership

Ownership is a simple concept really. Each variable is either owned or borrowed. Owned means it is not a reference, it owns its contents. Borrowed means it is borrowing the contents of data, either in a read-only or mutable manner. If it has a & it is not owned. A function taking an owned string would have a definition like fn foo(a_name_here: String), a function taking a reference to a String would look like fn foo(a_name_here: &String).

Given two variable, a and b, if you assign a to b like let b = a; the variable a is no longer a valid variable, as it is empty, the b variable was given ownership of what was formerly a's contents.

If you pass an owned variable into a function that is the end of that variable from outside the function.

So essentially when you pass an owned variable (not a reference) into a function or into another variable, the original variable becomes empty and all attempts to use the original variable will result in an error.

There is an exception however. For small, cheaply copied, types (like integers, booleans, etc) the variable will just recieve a copy of whatever that variable is at the time. This is due to the Copy trait, which applies to primitive types. Strings do not implement Copy, neither do structs.

let owned_name = "Goliath";
let goliath = owned_name;
let error = owned_name;
// Ownership of owned_name was already passed to goliath
// so it cannot be used any longer

References and Lifetimes

References and lifetimes are related concepts. A lifetime refers to the portion of a program for which a specified reference is valid for. This lifetime cannot be extended and is determined in advanced by the compiler. The only way to manipulate lifetimes is to alter the flow of your program.

In Rust all references can only refer to variables that are still alive, or valid. Each reference will have a named lifetime like 'a, however Rust will often let you omit specifying a name for a lifetime and assign it one itself behind the scenes. All functions that have not finished will still have entries on the stack, so all references to variables within those functions are still valid. So if you have variables defined in your main() and then call another function, that function may receive references to those variables in main() as they are still valid.

In order to use references and ensure they are valid you may need to reorganize your program by nesting function calls to keep certain data alive until it is no longer needed. There is no way to extend a lifetime, it just refers to a portion of the program (function calls) and can only be changed by code/function reorganization. So if you want to ensure that &str is valid for some function later on, you ensure this by rearranging function calls and nesting function calls so the desired data stays alive on the stack, and later references to it all occur further down in the stack - later on in the same function or in a function call nested in your function.

Borrowing

The term for using references is called borrowing. When you pass a reference to a variable into a function or another variable you are borrowing its' contents. Borrowing rules are enforced by the compiler by a part of the program called the borrow checker.

The borrowchecker ensures that the program is valid, and all borrowing rules are followed and all references are valid. Once the borrowchecker has finished all of the lifetimes are thrown away; the borrowchecker only checks that the program is valid or not. Lifetimes are only used by the borrowchecker to ensure the references are valid and your program is correct.

  • Mutable references require the object to not have any other mutable or read-only references to it at the same time.
  • Read-only references can have multiple read-only references to it at the same time but no mutable references to it.

You can think of the borrowing rules as a read-write lock. You can have multiple readers and no writers, or only a single writer.

Moves

When ownership of data is changed by passing it into a function it is said to be moved. This will result in the actual data in memory being relocated to a new address in the stack, an address associated with the function's stack entry.

A move can be demonstrated with the following:

main() {
  // take_owned_string() does not take a reference to a String,
  // if it did it would no longer be taking an owned string
  fn take_owned_string(a: String) -> usize {
    a.len()
  }

  let owned_string = "Hello kitty.".to_owned();
  //to_owned() is just another way of creating a String from a &str
  // you could just as easily use from() or into()

  let length = take_owned_string(owned_string);

  // The following line will produce an error
  // because owned_string is no longer valid
  // the ownership of owned_string was passed
  // into take_owned_string()
  println!("The length of '{}' is {}", owned_string, length);

}

Moves can be very useful. If you wish to destructure a data structure into separate variable you can accomplish this with a function that takes the owned data and returns a tuple of its parts.

struct Employee {
  first_name: String,
  last_name: String,
  salary: f32,
  warnings: u8,
}

// Note: destructure_employee()
fn destructure_employee(emp: Employee) -> (String, String, f32, u8) {

}