Starting to Corrode

Starting to Corrode:
Pointers, Memory, Strings, and I/O


Previous Table of Contents Next

This section of the tutorial introduces one of the "rustiest" aspects of Rust: memory management.

Rust provides memory references in a way that is quite different from other languages you may be familiar with, and it requires some effort to understand, but provide the big advantage of providing both explicit control and safety. We'll also cover strings and vectors, file and user I/O, and basic error handling in Rust. At the end of this section, you'll write a simple encryption program that employs many of the new concepts.

Memory Management

Boxes

The box in Rust is the fundamental abstraction of memory. References in Rust can point to boxes. Because Rust is designed to emphasize safety, any allocated memory in Rust is boxed, and thus any box can be thought of as simply a chunk of memory.

Pointer Types

Rust provides two main pointer types: the owned pointer (indicated by the tilde: ~ ) and the borrowed reference (denoted with ampersand: & ). Both kinds of pointers point to a box, and the pointer types are orthogonal to the type of data the box contains. For example, ~[&str] is a owned reference to a vector of borrowed strings.

Similarly, dereferencing a Rust pointer is done with the star operator ( * ). The next section on ownership will go into more detail on the usage and interaction of both of these.

Note: Legacy Rust code (before 0.9) also has a "managed pointer" represented by the @ which was automatically managed (garbage collected). As of Rust 0.9, this syntax is deprecated, however it is planned to be reintroduced in future releases. These added a deal of simplicity to pointer management, however other pointer types could be used with more efficency. Automatically managed pointers may be implemented as library types.

Ownership

In Rust, there is a notion of ownership of an object. The owner of an object, which could be a variable that refers to that object, manages the object's lifetime (that is, when its memory is allocated and reclaimed). Programmers do not have to explicitly allocated and deallocate storage. Rather, it is done by the Rust compiler and runtime based on how the object references are used.

Owned Boxes

A declaration of an owned ( ~ ) pointer can be thought of as giving the declared variable ownership of the box. What does this mean in code? The following block works as anticipated, printing out "10".

1
2
    let x = ~10;
    println!("{:d}", *x);

However the following block gives a compilation error:

1
2
3
    let x = ~10;
    let y = x;
    println!("{:d}", *x);

tut2.rs:3:23: 3:24 error: use of moved value: `x`
tut2.rs:3     println!("{:d}", *x);
The error message is a bit unclear, but what it is reporting is a violation of ownership rules. When we create the owned box of ~10 , that box is owned by the variable x . The initialization, let y = x; assigns the pointer to y. Because of Rust's emphasis on safety this is not allowed: it creates two references to an owned box.

In other languages, such as C and Java, there are no restrictions on pointer sharing. We could do, y = x; *y = 3; thus changing the value of *x through the alias. With Rust, the assignment, y = x transfers ownership of the box x refers to from the x reference to the y reference. From this point on, any attempts to use x to manipulate that box will result in a compiler error.

We can make a copy of a box using the clone method which copies over the content of a box and creates a new owned pointer to the copy. So the following code will assign y to a new owned pointer pointing to a new copy of the value "10". The dereferencing of x no longer causes a compiler error, as x still has proper ownership of its box.

1
2
3
    let x = ~10;
    let y = x.clone();
    println!("{:d}", *x);

But, x and y refer to different boxes now — a modification of *x will not be visible through y .

Borrowed References

All that copying would be expensive, and eliminate the benefits of data sharing for uses like pass-by-reference and data structures. Rust's solution is to provide a way for owners to loan out their objects.

Using the & operator creates a temporary reference to some memory that has already been allocated. These references are refered to as "borrowed" because they can access the contents of the box during the borrowing period, but do not own the box. This most commonly used to pass in a pointer reference to a method.

Because of Rust's emphasis on safety there are many resctriction on what can be done with a reference. Most notably, if a pointer has a borrowed reference in scope (more on scope in the next section), any function which would free the memory or change the type being pointed to would cause a compile-time error. Think of it this way: creating a reference to a box "lends" that box's contents to the reference until the loaned box is "returned". Thus, a reference owner gives up full control over a loaned box while it is being borrowed.

Borrowed references are frequently used as function parameters. For example,

1
2
3
fn borrow(r : &int) -> int {
    *r
}

declares the function borrow to take a borrowed pointer to an int. We can call it by passing in an owned pointer:

1
2
    let x = ~10;
    println!("borrow(x): {:d}", borrow(x));

Borrowed pointers can also be loaned by the borrowee, within the lifetime of the original loan. We can pass in an owned pointer to borrow2, which borrows that pointer and passes it as a borrowed pointer to borrow:

1
2
3
fn borrow2(r : &int) -> int {
   borrow(r)
}

Pointer Mutability

As with scalar variables, boxes are immutable by default. The mut qualifier is used to indicate that a box is mutable. Mutability applies to the contents of a box, and not the pointer itself.

For example,

1
2
3
fn increment(r: &int) {
    *r = *r + 1;
}

is not permitted:


borrowed.rs:2:4: 2:6 error: cannot assign to immutable dereference of & pointer
borrowed.rs:2     *r = *r + 1;
                   ^~
To allow the modification, we need to declare the parameter as a borrowed, mutable reference: r: &mut int Now, the compiler produces an error for the callsite since the passed object is not mutable:

borrowed.rs:16:14: 16:15 error: cannot borrow immutable dereference of ~ pointer as mutable
borrowed.rs:16     increment(x);

To change this, we need to add mut to the declaration:

1
2
3
fn main() {
    let mut x = ~10;
    increment(x);

The mut here applies to the reference x and is also inherited by the box (so no separate mut annotation is needed for the ~ . It allows both the variable x to be reassigned, as well as the contents of the owned box to be modified:

1
2
3
    let mut x = ~10;
    x = ~20;
    *x = 30;

The first assignment statement creates a new box owned by x holding the value 20 . The second assignment statement modifies the value in that box to be 30 . Without the mut neither assignment would be permitted by the Rust compiler.

Borrowed references do not inherit mutability. For example, this code is invalid:

1
2
3
4
5
    let mut val1 = 10;
    let mut val2 = 20;
    let mut borrowed = &val1;
    borrowed = &val2;
    *borrowed = 11;


borrowed.rs:23:4: 23:13 error: cannot assign to immutable dereference of & pointer
borrowed.rs:23     *borrowed = 11;
                   ^~~~~~~~~
To allow both modifications, we need:

1
    let mut borrowed = &mut val1;

which allows both modifying the referenced box and changing what borrowed references.

Keep in mind that references are type checked (yet another safety measure of Rust!) so the following code produces an error at compile time due to trying to reassign an int reference to a float reference.

1
2
3
4
    let mut val1 = 10;
    let mut val3 = 10.0;
    let borrowed = &val1;
    borrowed = &val3;

tut2.rs:4:16: 6:21 error: mismatched types: expected `&<VI0>` but found `&<VF0>` (expected integral variable but found floating-point variable)
tut2.rs:4     borrowed = &val3;

Freezing Objects

When an object is loaned through a borrowed reference, the lender loses the ability to modify that object. This is especially important for multi-tasking (which we'll get to soon), but still relevant with a single thread.

For example, the following program produces a compiler error:

1
2
3
4
5
6
    let mut x = 10;
    {
        let y = &x;
        x = 11; // Error
    }
    x = 12; //This is fine

example.rs:4:9: 4:10 error: cannot assign to `x` because it is borrowed
example.rs:4         x = 11;

When the reference y borrows the value x refers to, the value of x is frozen until the reference to it goes out of scope. Thus, the first reassignment here is invalid, but the second (after y is out of scope) is fine.

Lifetimes

Memory allocated in Rust is automatically freed when its owner either goes out of scope or is reassigned. This eliminates the need to explicitly free storage (as in C or C++), but guarantees safety without giving up control to a garbage collector.

Owned pointers follow similar rules of deallocation, so borrowing a reference to some box and trying to access it once it has gone out of scope throws a compiler error.

1
2
3
4
5
6
    let mut reference: &~int;
    {
        let val: ~int = ~10;
        reference = &val;
    } //val deallocated here
    println!("{:d}", **reference); //Referencing something that's gone!

example.rs:4:21: 4:25 error: borrowed value does not live long enough
example.rs:4         reference = &val;

Vectors and Strings

Rust provides built-in vector and string types, consistent with Rust's focus on safety.

Vectors

A vector is defined by using comma seperated values within brackets.

A specific element can be accessed via foo[index] . Any vector definition where the elements may change after the initial definition should be an owned box, so the methods of std::vec::OwnedVector may be used to mutate the vector.

These examples show simple vector functions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
    let mut vec = ~[0, 1, 2];
    vec.push(3); // Appends to end: [0, 1, 2, 3]

    vec.insert(2, 10); 
    // Inserts 10 at position 2: [0, 1, 10, 2, 3]

    let last = vec.pop(); 
    // Returns last element, removing it from the vector: [0, 1, 10, 2]

    let element1 = vec.remove(1); 
    // Returns and removes element at specified index: [0, 10, 2]

    // An iterator for going through all elements in order: 
    for &x in vec.iter() { // note the use of & for borrowing
        println!("{}", x);
    }
    
    // len() returns the number of elements in the vector
    for i in range(0, vec.len()) { 
        println!("{}", vec[i]);
    }

Strings

Rust supposrts a scdoe string edoc type which is a vector of characters. Each character is a UTF-8 sequence, represented by the u8 type in Rust.

A notable construct that is very important for strings is the slice . This represents a view into a string, but not a string itself. Its type is &string . Methods that produce a slice include slice(&self, begin: int, end: int) , which returns a slice of characters between begin and end - 1, slice_from(&self, begin: int) , which returns a slice with first element from the index provided that continues to the end, and slice_to(&self, end: int) , which returns a slice starting at the beginning with final element from index end - 1.

These slice methods produce an immutable values, so they cannot be modified.

To do something such as taking a substring, a slice may be converted to an owned string via the to_owned(&self) method. These slice methods work on any vector, but string manipulation is the most common use of slices.

Other notable string methods include str::eq(&~str, &~str) which checks two strings for bytewise equality, and split(&self, char) which returns an iterator that splits the string the method is called on into slices delimited by the parameter character.

For string concatenation, the + operator is used.

These methods are used in the example below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
    let string = ~"This is a string";
    let subst1 = string.slice(10, 16).to_owned();
    let subst2 = string.slice_from(10).to_owned();
    println!("{}", std::str::eq(&subst1, &subst2));
    let doublesub = subst1 + subst2;
    println!("{}", doublesub);
    // Using split iterator to print word by word
    for tempstr in string.split(' ') {
        println!("{}", tempstr);
    }
    // Another useful function of the split iterator
    // collect() Creates a vector ~["This", "is", "a", "string"]
    let wordvec: ~[&str] = string.split(' ').collect(); 
    for &s in wordvec.iter() {
        println!("{}", s);
    }

Exercises

Exercise 2.1. Implement a function, increment that takes as input a vector of integers and returns a new vector of integers that has the values of the original list each incremented by one. For example:

1
2
3
4
5
6
7
fn main() {
   let p = ~[1, 2, 3];
   let q = increment(p);
   for &x in q.iter() {
      print!("{:d} ", x);
   }
}

should print out 2 3 4.

Exercise 2.2. Implement a function, incrementMut that takes as input a vector of integers and modifies the values of the original list by incrementing each value by one. For example:

1
2
3
4
5
6
7
fn main() {
   let mut p = ~[1, 2, 3];
   incrementMut(p);
   for &x in p.iter() {
      print!("{:d} ", x);
   }
}

should print out 2 3 4.

Basic I/O

Rust handles file and user input/output using the std::io module.

The most intuitive way to work with these is through the use of a BufferedReader, which will be explained more in depth in the following sections. Take note that many methods in BufferedReader return vectors of bytes, not strings, so appropriate conversions may be necessary.

Working with Standard Input (stdin)

Standard input is provided through std::io::stdin , and implemented by creating a buffered reader. The following example (taken from the Rust documentation) prints out a series of lines from standard input, automatically stopping at the end of input. The lines method returns an iterator that goes through each line until the end of imput.

1
2
3
4
5
6
7
8
9
use std::io::buffered::BufferedReader;
use std::io::stdin;

fn main(){
    let mut stdin = BufferedReader::new(stdin());
    for line in stdin.lines() {
        print(line);
    }
}

Reading line by line is a bit more difficult, as the read_line method is of an Option<~str> type. These option types are explained more in depth in the "Error Handling" section later on this page of the tutorial. The gist of it is the following though: a call to read_line can either succeed or fail. Thus, when it is called, we need a match statement to determine if it succeeded (where it will return line in the following example) or fail (which returns None ). The example code below reads in the next 5 lines from stdin, and prints them. If there are less than 5 lines, the loop ends without crashing the program.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
use std::io::buffered::BufferedReader;
use std::io::stdin;
fn main(){
    let mut stdin = BufferedReader::new(stdin());
    for i in range(0, 5) {
        match stdin.read_line() {
            Some(line) => {
                print(line);
            }
            None => {
                println("End of input!");
                break;
            }
        }
    }
}

Files

The standard library functions for working with files in Rust are handled via the std::io::File module. This module contains methods for opening and writing to a file.

A file is opened using Path objects, which are included in any Rust program by default. Creating a Path object is done through Path::new("path/to/file.foo") .

To simply open a new file for writing, File::create(&Path) is the simplest call. This returns an edoc Option type , so it must be matched to ensure that a file was successfully opened.

The two most common ways to write to the file once opened are:
File::write(&mut self, &[u8]) which writes a buffer of bytes to a file, and:
File::write_str(&mut self, &str) which writes a string to the file.

This example demonstrated opening a file for writing using both of these methods:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
use std::io::File;

fn main()
{
    match File::create(&Path::new("message.txt")) {
        Some(mut file) => {
            file.write(bytes!("line one\n"));
            file.write_str("line two\n");
        }
        None =>{
            println("Opening message.txt failed!");
        }
    }
}

Reading a file can be accomplished easily through using the BufferedReader module, much like reading from standard in.

Opening a file to a BufferedReader is done using the File::open(&Path) function, which opens the file at a given path in read-only mode, followed by BufferedReader::new(File) .

Note that the file needs to be matched before converting to a BufferedReader , as File::open returns an Option<File> type. Once converted to a BufferedReader , a file is read in an identical manner to reading from stdin in the previous example, and all relevant methods are outlined on the BufferedReader documentation.

Here's a simple example of reading a file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
use std::io::buffered::BufferedReader;
use std::io::File;

fn main()
{
    match File::open(&Path::new("message.txt")) {
        Some(file) => {
            let reader = BufferedReader::new(file);
            //reading from file
        }
        None =>{
            println("Opening message.txt failed!");
        }
    }
}

The File::open_mode(&Path, FileMode, FileAccess) method allows specifying the mode in which a file is accessed. FileMode and FileAccess are enumerable types imported from std::io::FileMode and std::io::FileAccess respectively. For example, to open a file with read and write access, positioned to append to the end of the file, the following must be included, and then called as so.

1
2
3
4
5
use std::io::{File, Append, ReadWrite};
/*
...
*/
let file = File::open_mode(&Path::new("message.txt"), Append, ReadWrite);

Like the other file methods, this returns an Option objects that must be matched to check for opening failures.

Error Handling

Error handling in Rust is done with the intention of catching what can be caught at compile-time, and reducing runtime errors. We've seen the Option<T> type used for functions which either create an object or fail (represented by None() ). There is another, similar construct to this called a Result<T,E> which is like a Option , but determined on whether a function succeds as intended on a pre-existing object (as opposed to the creation of a new object with Option ).

A Result requires a match statement to use the values obtained. The Ok(T) branch represents a successful call, whereas Err(E) is a failed called. The error result may provide specific information about the failure, depending on the function.

If a None() or Err(E) branch is reached and the program can no longer continue, a fail!() macro can be used to terminate a program with an optional message.

The following code terminates with a message if "foo.txt" is failed to open:

1
2
3
4
5
6
match File::create(&Path::new("foo.txt")) {
    Some(mut file) => { /* ... */ }
    None => {
      fail!("Creating foo.txt failed!");
    }
}

Example: Secret Sharing

The XOR operation is a simple, but effective method for hiding meaning. It's derived from the identity A = (A ^ B) ^ B. XOR-ing a message with a random key provides perfect encryption, known as a one-time pad. It is essentially the only form of encryption that is information-theoretically secure, but is impractical for most purposes since it requires a perfectly-random key as long as the message and that key can never be reused.

It works by taking in a message and a random bit sequence (key) of equal length. Each bit in the message is XORed with the corresponding bit in the key to produce the encrypted message. The original message can then be recovered by applying the XOR process to the key and the encrypted message.

For this exercise, we will use this strategy to convert a plaintext file into two secret shares, each of which discloses no information about the original file (other than its length), but if someone has both shares they can put them together to produce the original file. A paranoid individual might use such an approach to store a file in the cloud by storing one share using Dropbox and the other share using Google Docs. The user could acquire both shares to obtain the file, and if the NSA can hypothetically (of course) obtain all of the data from one of the services but not both of them, it would not be enough to learn the contents of the file.

Note that our implementation uses std::rand::random to generate the random key. This is not cryptographically strong randomness, so should be used for entertainment purposes only! (If you are interested in better random numbers, see tRustees: True Random Number Generation.)

We'll provide the code to do the splitting, and leave it as an exercise for you to write the joining code.

Splitter

The following Rust code implements the splitter. It takes as input the name of a file, and writes out two share files.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
use std::rand::random;
use std::os;
use std::io::File;

fn main() {
    let args: ~[~str] = os::args();
    if args.len() != 2 {
        println!("Usage: {:s} <inputfile>", args[0]); 
    } else {
        let fname = args[1];
        let path = Path::new(fname.clone());
        let msg_file = File::open(&path);

        match (msg_file) {
            Some(mut msg) => {
                let msg_bytes: ~[u8] = msg.read_to_end();
                let share1_file 
                       = File::create(&Path::new(fname + ".share1"));
                let share2_file 
                       = File::create(&Path::new(fname + ".share2"));
                
                match (share1_file, share2_file) {
                    (Some(share1), Some(share2)) => { 
                        split(msg_bytes, share1, share2); 
                        } ,
                    (_, _) => fail!("Error opening output files!"),
                }
            } ,
            None => fail!("Error opening message file: {:s}", fname)
        }
    }
}

fn xor(a: &[u8], b: &[u8]) -> ~[u8] {
    let mut ret = ~[];
    for i in range(0, a.len()) {
	ret.push(a[i] ^ b[i]);
    }
    ret
}

fn split(msg_bytes: &[u8], mut share1: File, mut share2: File) {
    let mut random_bytes: ~[u8] = ~[];
    // This is not cryptographically strong randomness! 
    // (For entertainment purposes only.)
    for _ in range(0, msg_bytes.len()) {
	let random_byte = random();
	random_bytes.push(random_byte);
    }
    
    let encrypted_bytes = xor(msg_bytes, random_bytes);
    share1.write(random_bytes);
    share2.write(encrypted_bytes);
}

Joiner

The joiner reverses the splitting process, taking the two shares and combining them to produce the original message.

Exercise 2.3. Implement the joiner. It should take two file names as its inputs, and output to standard output the result of XOR-ing the bytes in those files. The inputs files must be the same length.

Your joiner should be able to produce the plaintext from the example files: msg.share1 and msg.share2. You should download these files (look at them to confirm they appear to be random bytes).

Then, try executing:


joiner msg.share1 msg.share2

If you implemented the joiner correctly, you should see a (somewhat) meaningful message.


Previous Table of Contents Next

This tutorial was created by Alex Lamana, Rob Michaels, Wil Thomason, and David Evans for University of Virginia cs4414 Spring 2014.