trpl1 PDF
trpl1 PDF
trpl1 PDF
Language
The Rust Programming
Language
I Introduction 7
1 The Rust Programming Language 9
II Getting Started 11
1 Getting Started 13
2 Installing Rust 15
3 Hello, world! 17
4 Hello, Cargo! 21
5 Closing Thoughts 27
2 Set up 33
3 Processing a Guess 35
5 Comparing guesses 47
6 Looping 53
7 Complete! 61
6
VI Appendix 393
1 Glossary 395
3 Bibliography 409
Part I
Introduction
Chapter 1
Welcome! This book will teach you about the Rust Programming Lan-
guage. Rust is a systems programming language focused on three goals:
safety, speed, and concurrency. It maintains these goals without hav-
ing a garbage collector, making it a useful language for a number of use
cases other languages aren’t good at: embedding in other languages,
programs with specific space and time requirements, and writing low-
level code, like device drivers and operating systems. It improves on
current languages targeting this space by having a number of compile-
time safety checks that produce no runtime overhead, while eliminat-
ing all data races. Rust also aims to achieve ‘zero-cost abstractions’
even though some of these abstractions feel like those of a high-level
language. Even then, Rust still allows precise control like a low-level
language would.
“The Rust Programming Language” is split into chapters. This
introduction is the first. After this:
• Syntax and Semantics - Each bit of Rust, broken down into small
chunks.
Contributing
The source files from which this book is generated can be found on
GitHub.
Getting Started
Chapter 1
Getting Started
This first chapter of the book will get us going with Rust and its tool-
ing. First, we’ll install Rust. Then, the classic ‘Hello World’ program.
Finally, we’ll talk about Cargo, Rust’s build system and package man-
ager.
We’ll be showing off a number of commands using a terminal, and
those lines all start with $. You don’t need to type in the $s, they
are there to indicate the start of each command. We’ll see many tuto-
rials and examples around the web that follow this convention: $ for
commands run as our regular user, and # for commands we should be
running as an administrator.
14
Chapter 2
Installing Rust
The first step to using Rust is to install it. Generally speaking, you’ll
need an Internet connection to run the commands in this section, as
we’ll be downloading Rust from the Internet.
The Rust compiler runs on, and compiles to, a great number of
platforms, but is best supported on Linux, Mac, and Windows, on the
x86 and x86-64 CPU architecture. There are official builds of the Rust
compiler and standard library for these platforms and more. For full
details on Rust platform support see the website.
Installing Rust
All you need to do on Unix systems like Linux and macOS is open a
terminal and type this:
Uninstalling
Uninstalling Rust is as easy as installing it:
Troubleshooting
If we’ve got Rust installed, we can open up a shell, and type this:
$ rustc --version
You should see the version number, commit hash, and commit date.
If you do, Rust has been installed successfully! Congrats!
If you don’t, that probably means that the PATH environment vari-
able doesn’t include Cargo’s binary directory, ~/.cargo/bin on Unix,
or %USERPROFILE%\.cargo\bin on Windows. This is the directory
where Rust development tools live, and most Rust developers keep it in
their PATH environment variable, which makes it possible to run rustc
on the command line. Due to differences in operating systems, com-
mand shells, and bugs in installation, you may need to restart your
shell, log out of the system, or configure PATH manually as appropriate
for your operating environment.
Rust does not do its own linking, and so you’ll need to have a linker
installed. Doing so will depend on your specific system. For Linux-
based systems, Rust will attempt to call cc for linking. On windows-
msvc (Rust built on Windows with Microsoft Visual Studio), this de-
pends on having Microsoft Visual C++ Build Tools installed. These
do not need to be in %PATH% as rustc will find them automatically. In
general, if you have your linker in a non-traditional location you can
call rustc linker=/path/to/cc, where /path/to/cc should point to
your linker path.
If you are still stuck, there are a number of places where we can get
help. The easiest is the #rust-beginners IRC channel on irc.mozilla.org
and for general discussion the #rust IRC channel on irc.mozilla.org,
which we can access through Mibbit. Then we’ll be chatting with other
Rustaceans (a silly nickname we call ourselves) who can help us out.
Other great resources include the user’s forum and Stack Overflow.
This installer also installs a copy of the documentation locally, so
we can read it offline. It’s only a rustup doc away!
Chapter 3
Hello, world!
Now that you have Rust installed, we’ll help you write your first Rust
program. It’s traditional when learning a new language to write a little
program to print the text “Hello, world!” to the screen, and in this
section, we’ll follow that tradition.
The nice thing about starting with such a simple program is that
you can quickly verify that your compiler is installed, and that it’s
working properly. Printing information to the screen is also a pretty
common thing to do, so practicing it early on is good.
terminal and enter the following commands to make a directory for this
particular project:
$ mkdir ~/projects
$ cd ~/projects
$ mkdir hello_world
$ cd hello_world
fn main() {
println!("Hello, world!");
}
$ rustc main.rs
$ ./main
Hello, world!
fn main() {
These lines define a function in Rust. The main function is special: it’s
the beginning of every Rust program. The first line says, “I’m declaring
a function named main that takes no arguments and returns nothing.”
If there were arguments, they would go inside the parentheses (( and )
), and because we aren’t returning anything from this function, we can
omit the return type entirely.
Also note that the function body is wrapped in curly braces ({ and
}). Rust requires these around all function bodies. It’s considered good
style to put the opening curly brace on the same line as the function
declaration, with one space in between.
Inside the main() function:
println!("Hello, world!");
This line does all of the work in this little program: it prints text to
the screen. There are a number of details that are important here. The
first is that it’s indented with four spaces, not tabs.
The second important part is the println!() line. This is calling a
Rust macro, which is how metaprogramming is done in Rust. If it were
calling a function instead, it would look like this: println() (without
the !). We’ll discuss Rust macros in more detail later, but for now you
only need to know that when you see a ! that means that you’re calling
a macro instead of a normal function.
Next is “Hello, world!” which is a string. Strings are a surpris-
ingly complicated topic in a systems programming language, and this
is a statically allocated string. We pass this string as an argument to
println!, which prints the string to the screen. Easy enough!
The line ends with a semicolon (;). Rust is an expression-oriented
language, which means that most things are expressions, rather than
statements. The ; indicates that this expression is over, and the next
one is ready to begin. Most lines of Rust code end with a ;.
Before running a Rust program, you have to compile it. You can
use the Rust compiler by entering the rustc command and passing it
the name of your source file, like this:
$ rustc main.rs
$ ls
main main.rs
$ dir
main.exe
main.rs
This shows we have two files: the source code, with an .rs extension,
and the executable (main.exe on Windows, main everywhere else). All
that’s left to do from here is run the main or main.exe file, like this:
If main.rs were your “Hello, world!” program, this would print Hello,
world! to your terminal.
If you come from a dynamic language like Ruby, Python, or JavaScript,
you may not be used to compiling and running a program being sep-
arate steps. Rust is an ahead-of-time compiled language, which means
that you can compile a program, give it to someone else, and they can
run it even without Rust installed. If you give someone a .rb or .py
or .js file, on the other hand, they need to have a Ruby, Python, or
JavaScript implementation installed (respectively), but you only need
one command to both compile and run your program. Everything is a
tradeoff in language design.
Just compiling with rustc is fine for simple programs, but as your
project grows, you’ll want to be able to manage all of the options your
project has, and make it easy to share your code with other people and
projects. Next, I’ll introduce you to a tool called Cargo, which will help
you write real-world Rust programs.
Chapter 4
Hello, Cargo!
$ cargo --version
Into a terminal. If you see a version number, great! If you see an error
like ‘command not found’, then you should look at the documentation
for the system in which you installed Rust, to determine if Cargo is
separate.
Converting to Cargo
Let’s convert the Hello World program to Cargo. To Cargo-fy a project,
you need to do three things:
22
[package]
name = "hello_world"
version = "0.0.1"
authors = [ "Your name <[email protected]>" ]
The first line, [package], indicates that the following statements are
configuring a package. As we add more information to this file, we’ll
add other sections, but for now, we only have the package configuration.
The other three lines set the three bits of configuration that Cargo
needs to know to compile your program: its name, what version it is,
and who wrote it.
Once you’ve added this information to the Cargo.toml file, save it
to finish creating the configuration file.
$ cargo run
Compiling hello_world v0.0.1 (file:///home/yourname/
projects/hello_world)
Running `target/debug/hello_world`
Hello, world!
Cargo checks to see if any of your project’s files have been modified,
and only rebuilds your project if they’ve changed since the last time
you built it.
With simple projects, Cargo doesn’t bring a whole lot over just
using rustc, but it will become useful in the future. This is especially
true when you start using crates; these are synonymous with a ‘library’
or ‘package’ in other programming languages. For complex projects
composed of multiple crates, it’s much easier to let Cargo coordinate
the build. Using Cargo, you can run cargo build, and it should work
the right way.
[root]
name = "hello_world"
version = "0.0.1"
Cargo uses the Cargo.lock file to keep track of dependencies in your ap-
plication. This is the Hello World project’s Cargo.lock file. This project
doesn’t have dependencies, so the file is a bit sparse. Realistically, you
won’t ever need to touch this file yourself; just let Cargo handle it.
That’s it! If you’ve been following along, you should have success-
fully built hello_world with Cargo.
Even though the project is simple, it now uses much of the real
tooling you’ll use for the rest of your Rust career. In fact, you can
25
expect to start virtually all Rust projects with some variation on the
following commands:
This command passes --bin because the goal is to get straight to mak-
ing an executable application, as opposed to a library. Executables are
often called binaries (as in /usr/bin, if you’re on a Unix system).
Cargo has generated two files and one directory for us: a Cargo.
toml and a src directory with a main.rs file inside. These should look
familiar, they’re exactly what we created by hand, above.
This output is all you need to get started. First, open Cargo.toml.
It should look something like this:
[package]
name = "hello_world"
version = "0.1.0"
authors = ["Your Name <[email protected]>"]
[dependencies]
fn main() {
println!("Hello, world!");
}
Cargo has generated a “Hello World!” for you, and you’re ready to
start coding!
Closing Thoughts
This chapter covered the basics that will serve you well through the
rest of this book, and the rest of your time with Rust. Now that you’ve
got the tools down, we’ll cover more about the Rust language itself.
You have two options: Dive into a project with ‘Tutorial: Guessing
Game’, or start from the bottom and work your way up with ‘Syntax
and Semantics’. More experienced systems programmers will proba-
bly prefer ‘Tutorial: Guessing Game’, while those from dynamic back-
grounds may enjoy either. Different people learn differently! Choose
whatever’s right for you.
28
Part III
Tutorial: Guessing
Games
Chapter 1
Guessing Game
Let’s learn some Rust! For our first project, we’ll implement a classic
beginner programming problem: the guessing game. Here’s how it
works: Our program will generate a random integer between one and
a hundred. It will then prompt us to enter a guess. Upon entering
our guess, it will tell us if we’re too low or too high. Once we guess
correctly, it will congratulate us. Sounds good?
Along the way, we’ll learn a little bit about Rust. The next chapter,
‘Syntax and Semantics’, will dive deeper into each part.
32
Chapter 2
Set up
We pass the name of our project to cargo new, and then the --bin
flag, since we’re making a binary, rather than a library.
Check out the generated Cargo.toml:
[package]
name = "guessing_game"
version = "0.1.0"
authors = ["Your Name <[email protected]>"]
Cargo gets this information from your environment. If it’s not correct,
go ahead and fix that.
Finally, Cargo generated a ‘Hello, world!’ for us. Check out src/
main.rs:
fn main() {
34
println!("Hello, world!");
}
$ cargo build
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 0.53 secs
Great! Our game is just the kind of project run is good for: we need
to quickly test each iteration before moving on to the next one.
Chapter 3
Processing a Guess
Let’s get to it! The first thing we need to do for our guessing game is
allow our player to input a guess. Put this in your src/main.rs:
use std::io;
fn main() {
println!("Guess the number!");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
We’ll need to take user input, and then print the result as output.
As such, we need the io library from the standard library. Rust only
imports a few things by default into every program, the ‘prelude’. If
it’s not in the prelude, you’ll have to use it directly. There is also a
second ‘prelude’, the io prelude, which serves a similar function: you
import it, and it imports a number of useful, io-related things.
36
fn main() {
As you’ve seen before, the main() function is the entry point into your
program. The fn syntax declares a new function, the ()s indicate that
there are no arguments, and { starts the body of the function. Because
we didn’t include a return type, it’s assumed to be (), an empty tuple.
Now we’re getting interesting! There’s a lot going on in this little line.
The first thing to notice is that this is a let statement, which is used
to create ‘variable bindings’. They take this form:
This will create a new binding named foo, and bind it to the value
bar. In many languages, this is called a ‘variable’, but Rust’s variable
bindings have a few tricks up their sleeves.
For example, they’re immutable by default. That’s why our example
uses mut: it makes a binding mutable, rather than immutable. let
doesn’t take a name on the left hand side of the assignment, it actually
accepts a ‘pattern’. We’ll use patterns later. It’s easy enough to use for
now:
Oh, and // will start a comment, until the end of the line. Rust ignores
everything in comments.
So now we know that let mut guess will introduce a mutable
binding named guess, but we have to look at the other side of the =
for what it’s bound to: String::new().
String is a string type, provided by the standard library. A String
is a growable, UTF-8 encoded bit of text.
The ::new() syntax uses :: because this is an ‘associated function’
of a particular type. That is to say, it’s associated with String itself,
37
That’s a lot more! Let’s go bit-by-bit. The first line has two parts.
Here’s the first:
io::stdin()
Remember how we used std::io on the first line of the program?
We’re now calling an associated function on it. If we didn’t use std:
:io, we could have written this line as std::io::stdin().
This particular function returns a handle to the standard input for
your terminal. More specifically, a std::io::Stdin.
The next part will use this handle to get input from the user:
.read_line(&mut guess)
Here, we call the read_line method on our handle. Methods are like
associated functions, but are only available on a particular instance of
a type, rather than the type itself. We’re also passing one argument to
read_line(): &mut guess.
Remember how we bound guess above? We said it was mutable.
However, read_line doesn’t take a String as an argument: it takes a
&mut String. Rust has a feature called ‘references’, which allows you to
have multiple references to one piece of data, which can reduce copying.
References are a complex feature, as one of Rust’s major selling points
is how safe and easy it is to use references. We don’t need to know a
lot of those details to finish our program right now, though. For now,
all we need to know is that like let bindings, references are immutable
by default. Hence, we need to write &mut guess, rather than &guess.
Why does read_line() take a mutable reference to a string? Its
job is to take what the user types into standard input, and place that
into a string. So it takes that string as an argument, and in order to
add the input, it needs to be mutable.
But we’re not quite done with this line of code, though. While it’s
a single line of text, it’s only the first part of the single logical line of
code:
38
When you call a method with the .foo() syntax, you may introduce a
newline and other whitespace. This helps you split up long lines. We
could have done:
io::stdin().read_line(&mut guess).expect("Failed to
read line");
But that gets hard to read. So we’ve split it up, two lines for two
method calls. We already talked about read_line(), but what about
expect()? Well, we already mentioned that read_line() puts what
the user types into the &mut String we pass it. But it also returns a
value: in this case, an io::Result. Rust has a number of types named
Result in its standard library: a generic Result, and then specific
versions for sub-libraries, like io::Result.
The purpose of these Result types is to encode error handling infor-
mation. Values of the Result type, like any type, have methods defined
on them. In this case, io::Result has an expect() method that takes
a value it’s called on, and if it isn’t a successful one, panic!s with a
message you passed it. A panic! like this will cause our program to
crash, displaying the message.
If we do not call expect(), our program will compile, but we’ll get
a warning:
$ cargo build
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
warning: unused result which must be used, #[warn(unused_
must_use)] on by default
--> src/main.rs:10:5
|
10 | io::stdin().read_line(&mut guess);
| ^
recover from the error somehow, we’d do something else, but we’ll save
that for a future project.
There’s only one line of this first example left:
This prints out the string we saved our input in. The {}s are a place-
holder, and so we pass it guess as an argument. If we had multiple
{}s, we would pass multiple arguments:
let x = 5;
let y = 10;
Easy.
Anyway, that’s the tour. We can run what we have with cargo run:
$ cargo run
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 0.44 secs
Running `target/debug/guessing_game`
Guess the number!
Please input your guess.
6
You guessed: 6
All right! Our first part is done: we can get input from the keyboard,
and then print it back out.
40
Chapter 4
Generating a secret
number
Next, we need to generate a secret number. Rust does not yet include
random number functionality in its standard library. The Rust team
does, however, provide a rand crate. A ‘crate’ is a package of Rust code.
We’ve been building a ‘binary crate’, which is an executable. rand is
a ‘library crate’, which contains code that’s intended to be used with
other programs.
Using external crates is where Cargo really shines. Before we can
write the code using rand, we need to modify our Cargo.toml. Open
it up, and add these few lines at the bottom:
[dependencies]
rand = "0.3.0"
The [dependencies] section of Cargo.toml is like the [package] sec-
tion: everything that follows it is part of it, until the next section
starts. Cargo uses the dependencies section to know what dependen-
cies on external crates you have, and what versions you require. In
this case, we’ve specified version 0.3.0, which Cargo understands to
be any release that’s compatible with this specific version. Cargo un-
derstands Semantic Versioning, which is a standard for writing version
numbers. A bare number like above is actually shorthand for ^0.3.0,
meaning “anything compatible with 0.3.0”. If we wanted to use only 0.
3.0 exactly, we could say rand = “=0.3.0” (note the two equal signs).
We could also use a range of versions. Cargo’s documentation contains
42
more details.
Now, without changing any of our code, let’s build our project:
$ cargo build
Updating registry `https://github.com/rust-lang/crates.
io-index`
Downloading rand v0.3.14
Downloading libc v0.2.17
Compiling libc v0.2.17
Compiling rand v0.3.14
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 5.88 secs
$ cargo build
Finished debug [unoptimized + debuginfo] target(s)
in 0.0 secs
That’s right, nothing was done! Cargo knows that our project has been
built, and that all of its dependencies are built, and so there’s no reason
to do all that stuff. With nothing to do, it simply exits. If we open
up src/main.rs again, make a trivial change, and then save it again,
we’ll only see two lines:
$ cargo build
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 0.45 secs
43
use std::io;
use rand::Rng;
fn main() {
println!("Guess the number!");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
The first thing we’ve done is change the first line. It now says extern
crate rand. Because we declared rand in our [dependencies], we
can use extern crate to let Rust know we’ll be making use of it. This
also does the equivalent of a use rand; as well, so we can make use of
anything in the rand crate by prefixing it with rand::.
Next, we added another use line: use rand::Rng. We’re going to
use a method in a moment, and it requires that Rng be in scope to
work. The basic idea is this: methods are defined on something called
‘traits’, and for the method to work, it needs the trait to be in scope.
For more about the details, read the traits section.
There are two other lines we added, in the middle:
$ cargo run
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 0.55 secs
Running `target/debug/guessing_game`
Guess the number!
The secret number is: 7
Please input your guess.
4
You guessed: 4
$ cargo run
Finished debug [unoptimized + debuginfo] target(s)
in 0.0 secs
Running `target/debug/guessing_game`
Guess the number!
The secret number is: 83
Please input your guess.
5
You guessed: 5
Comparing guesses
Now that we’ve got user input, let’s compare our guess to the secret
number. Here’s our next step, though it doesn’t quite compile yet:
use std::io;
use std::cmp::Ordering;
use rand::Rng;
fn main() {
println!("Guess the number!");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!"),
Ordering::Greater => println!("Too big!"),
Ordering::Equal => println!("You win!"),
}
}
A few new bits here. The first is another use. We bring a type called
std::cmp::Ordering into scope. Then, five new lines at the bottom
that use it:
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!"),
Ordering::Greater => println!("Too big!"),
Ordering::Equal => println!("You win!"),
}
enum Foo {
Bar,
Baz,
}
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!"),
Ordering::Greater => println!("Too big!"),
Ordering::Equal => println!("You win!"),
}
49
If it’s Less, we print Too small!, if it’s Greater, Too big!, and if
Equal, You win!. match is really useful, and is used often in Rust.
I did mention that this won’t quite compile yet, though. Let’s try
it:
$ cargo build
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
error[E0308]: mismatched types
--> src/main.rs:23:21
|
23 | match guess.cmp(&secret_number) {
| ^^^^^^^^^^^^^^ expected struct
`std::string::String`, found integral variable
|
= note: expected type `&std::string::String`
= note: found type `&{integer}`
use std::io;
use std::cmp::Ordering;
50
use rand::Rng;
fn main() {
println!("Guess the number!");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!"),
Ordering::Greater => println!("Too big!"),
Ordering::Equal => println!("You win!"),
}
}
The new two lines:
let guess: u32 = guess.trim().parse()
.expect("Please type a number!");
Wait a minute, I thought we already had a guess? We do, but Rust
allows us to ‘shadow’ the previous guess with a new one. This is often
used in this exact situation, where guess starts as a String, but we
want to convert it to an u32. Shadowing lets us re-use the guess name,
rather than forcing us to come up with two unique names like guess_
str and guess, or something else.
We bind guess to an expression that looks like something we wrote
earlier:
51
guess.trim().parse()
Here, guess refers to the old guess, the one that was a String with
our input in it. The trim() method on Strings will eliminate any
white space at the beginning and end of our string. This is important,
as we had to press the ‘return’ key to satisfy read_line(). This means
that if we type 5 and hit return, guess looks like this: 5\n. The \n
represents ‘newline’, the enter key. trim() gets rid of this, leaving our
string with only the 5. The parse() method on strings parses a string
into some kind of number. Since it can parse a variety of numbers,
we need to give Rust a hint as to the exact type of number we want.
Hence, let guess: u32. The colon (:) after guess tells Rust we’re
going to annotate its type. u32 is an unsigned, thirty-two bit integer.
Rust has a number of built-in number types, but we’ve chosen u32. It’s
a good default choice for a small positive number.
Just like read_line(), our call to parse() could cause an error.
What if our string contained A�%? There’d be no way to convert that
to a number. As such, we’ll do the same thing we did with read_line(
): use the expect() method to crash if there’s an error.
Let’s try our program out!
$ cargo run
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 0.57 secs
Running `target/guessing_game`
Guess the number!
The secret number is: 58
Please input your guess.
76
You guessed: 76
Too big!
Nice! You can see I even added spaces before my guess, and it still
figured out that I guessed 76. Run the program a few times, and verify
that guessing the number works, as well as guessing a number too small.
Now we’ve got most of the game working, but we can only make
one guess. Let’s change that by adding loops!
52
Chapter 6
Looping
The loop keyword gives us an infinite loop. Let’s add that in:
use std::io;
use std::cmp::Ordering;
use rand::Rng;
fn main() {
println!("Guess the number!");
loop {
println!("Please input your guess.");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!")
,
Ordering::Greater => println!("Too big!"),
Ordering::Equal => println!("You win!"),
}
}
}
And try it out. But wait, didn’t we just add an infinite loop? Yup.
Remember our discussion about parse()? If we give a non-number
answer, we’ll panic! and quit. Observe:
$ cargo run
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 0.58 secs
Running `target/guessing_game`
Guess the number!
The secret number is: 59
Please input your guess.
45
You guessed: 45
Too small!
Please input your guess.
60
You guessed: 60
Too big!
Please input your guess.
59
You guessed: 59
You win!
Please input your guess.
quit
thread 'main' panicked at 'Please type a number!'
Ha! quit actually quits. As does any other non-number input. Well,
55
this is suboptimal to say the least. First, let’s actually quit when you
win the game:
use std::io;
use std::cmp::Ordering;
use rand::Rng;
fn main() {
println!("Guess the number!");
loop {
println!("Please input your guess.");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!")
,
Ordering::Greater => println!("Too big!"),
Ordering::Equal => {
println!("You win!");
break;
}
}
}
56
By adding the break line after the You win!, we’ll exit the loop when
we win. Exiting the loop also means exiting the program, since it’s
the last thing in main(). We have only one more tweak to make: when
someone inputs a non-number, we don’t want to quit, we want to ignore
it. We can do that like this:
extern crate rand;
use std::io;
use std::cmp::Ordering;
use rand::Rng;
fn main() {
println!("Guess the number!");
loop {
println!("Please input your guess.");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!")
,
57
This is how you generally move from ‘crash on error’ to ‘actually handle
the error’, by switching from expect() to a match statement. A Result
is returned by parse(), this is an enum like Ordering, but in this case,
each variant has some data associated with it: Ok is a success, and Err
is a failure. Each contains more information: the successfully parsed
integer, or an error type. In this case, we match on Ok(num), which sets
the name num to the unwrapped Ok value (the integer), and then we
return it on the right-hand side. In the Err case, we don’t care what
kind of error it is, so we just use the catch all _ instead of a name. This
catches everything that isn’t Ok, and continue lets us move to the next
iteration of the loop; in effect, this enables us to ignore all errors and
continue with our program.
Now we should be good! Let’s try:
$ cargo run
Compiling guessing_game v0.1.0 (file:///home/you/projects/
guessing_game)
Finished debug [unoptimized + debuginfo] target(s)
in 0.57 secs
Running `target/guessing_game`
Guess the number!
The secret number is: 61
Please input your guess.
10
You guessed: 10
Too small!
58
Awesome! With one tiny last tweak, we have finished the guessing
game. Can you think of what it is? That’s right, we don’t want to
print out the secret number. It was good for testing, but it kind of
ruins the game. Here’s our final source:
use std::io;
use std::cmp::Ordering;
use rand::Rng;
fn main() {
println!("Guess the number!");
loop {
println!("Please input your guess.");
io::stdin().read_line(&mut guess)
.expect("Failed to read line");
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!")
,
Ordering::Greater => println!("Too big!"),
Ordering::Equal => {
println!("You win!");
break;
}
}
}
}
60
Chapter 7
Complete!
This project showed you a lot: let, match, methods, associated func-
tions, using external crates, and more.
At this point, you have successfully built the Guessing Game! Con-
gratulations!
62
Part IV
This chapter breaks Rust down into small chunks, one for each concept.
If you’d like to learn Rust from the bottom up, reading this in order
is a great way to do that.
These sections also form a reference for each concept, so if you’re
reading another tutorial and find something confusing, you can find it
explained somewhere in here.
Variable Bindings
Virtually every non-’Hello World’ Rust program uses variable bindings.
They bind some value to a name, so it can be used later. let is used
to introduce a binding, like this:
fn main() {
let x = 5;
}
Patterns
In many languages, a variable binding would be called a variable, but
Rust’s variable bindings have a few tricks up their sleeves. For example
66
Type annotations
Rust is a statically typed language, which means that we specify our
types up front, and they’re checked at compile time. So why does our
first example compile? Well, Rust has this thing called ‘type inference’.
If it can figure out what the type of something is, Rust doesn’t require
you to explicitly type it out.
We can add the type if we want to, though. Types come after a
colon (:):
let x: i32 = 5;
If I asked you to read this out loud to the rest of the class, you’d say
“x is a binding with the type i32 and the value 5.”
In this case we chose to represent x as a 32-bit signed integer. Rust
has many different primitive integer types. They begin with i for signed
integers and u for unsigned integers. The possible integer sizes are 8,
16, 32, and 64 bits.
In future examples, we may annotate the type in a comment. The
examples will look like this:
fn main() {
let x = 5; // x: i32
}
Note the similarities between this annotation and the syntax you use
with let. Including these kinds of comments is not idiomatic Rust, but
we’ll occasionally include them to help you understand what the types
that Rust infers are.
Mutability
By default, bindings are immutable. This code will not compile:
67
let x = 5;
x = 10;
Initializing bindings
Rust variable bindings have one more aspect that differs from other
languages: bindings are required to be initialized with a value before
you’re allowed to use them.
Let’s try it out. Change your src/main.rs file to look like this:
fn main() {
let x: i32;
println!("Hello world!");
}
You can use cargo build on the command line to build it. You’ll get
a warning, but it will still print “Hello, world!”:
68
fn main() {
let x: i32;
$ cargo build
Compiling hello_world v0.0.1 (file:///home/you/projects/
hello_world)
src/main.rs:4:39: 4:40 error: use of possibly uninitialized
variable: `x`
src/main.rs:4 println!("The value of x is: {}", x);
^
note: in expansion of format_args!
<std macros>:2:23: 2:77 note: expansion site
<std macros>:1:1: 3:2 note: in expansion of println!
src/main.rs:4:5: 4:42 note: expansion site
error: aborting due to previous error
Could not compile `hello_world`.
Rust will not let us use a value that has not been initialized.
Let us take a minute to talk about this stuff we’ve added to println!.
If you include two curly braces ({}, some call them moustaches...) in
your string to print, Rust will interpret this as a request to interpolate
some sort of value. String interpolation is a computer science term that
means “stick in the middle of a string.” We add a comma, and then
x, to indicate that we want x to be the value we’re interpolating. The
comma is used to separate arguments we pass to functions and macros,
if you’re passing more than one.
69
When you use the curly braces, Rust will attempt to display the
value in a meaningful way by checking out its type. If you want to
specify the format in a more detailed manner, there are a wide number
of options available. For now, we’ll stick to the default: integers aren’t
very complicated to print.
The first println! would print “The value of x is 17 and the value of
y is 3”, but this example cannot be compiled successfully, because the
second println! cannot access the value of y, since it is not in scope
anymore. Instead we get this error:
$ cargo build
Compiling hello v0.1.0 (file:///home/you/projects/hello_
world)
main.rs:7:62: 7:63 error: unresolved name `y`. Did you
mean `x`? [E0425]
main.rs:7 println!("The value of x is {} and value
of y is {}", x, y); // This won't work.
^
70
Shadowing and mutable bindings may appear as two sides of the same
coin, but they are two distinct concepts that can’t always be used
interchangeably. For one, shadowing enables us to rebind a name to a
value of a different type. It is also possible to change the mutability of
a binding. Note that shadowing a name does not alter or destroy the
value it was bound to, and the value will continue to exist until it goes
out of scope, even if it is no longer accessible by any means.
let mut x: i32 = 1;
x = 7;
let x = x; // `x` is now immutable and is bound to `7`.
let y = 4;
let y = "I can also be bound to text!"; // `y` is now of
71
a different type.
Functions
Every Rust program has at least one function, the main function:
fn main() {
}
This is the simplest possible function declaration. As we mentioned
before, fn says ‘this is a function’, followed by the name, some paren-
theses because this function takes no arguments, and then some curly
braces to indicate the body. Here’s a function named foo:
fn foo() {
}
So, what about taking arguments? Here’s a function that prints a
number:
fn print_number(x: i32) {
println!("x is: {}", x);
}
Here’s a complete program that uses print_number:
fn main() {
print_number(5);
}
fn print_number(x: i32) {
println!("x is: {}", x);
}
As you can see, function arguments work very similar to let declara-
tions: you add a type to the argument name, after a colon.
Here’s a complete program that adds two numbers together and
prints them:
fn main() {
print_sum(5, 6);
}
x = y = 5
The compiler is telling us here that it was expecting to see the begin-
ning of an expression, and a let can only begin a statement, not an
expression.
Note that assigning to an already-bound variable (e.g. y = 5) is
still an expression, although its value is not particularly useful. Unlike
other languages where an assignment evaluates to the assigned value
(e.g. 5 in the previous example), in Rust the value of an assignment is
an empty tuple () because the assigned value can have only one owner,
and any other returned value would be too surprising:
let mut y = 5;
Early returns
But what about early returns? Rust does have a keyword for that,
return:
fn foo(x: i32) -> i32 {
return x;
Diverging functions
Rust has some special syntax for ‘diverging functions’, which are func-
tions that do not return:
75
fn diverges() -> ! {
panic!("This function never returns!");
}
panic! is a macro, similar to println!() that we’ve already seen.
Unlike println!(), panic!() causes the current thread of execution
to crash with the given message. Because this function will cause a
crash, it will never return, and so it has the type ‘!’, which is read
‘diverges’.
If you add a main function that calls diverges() and run it, you’ll
get some output that looks like this:
thread ‘main’ panicked at ‘This function never returns!’,
hello.rs:2
If you want more information, you can get a backtrace by setting the
RUST_BACKTRACE environment variable:
$ RUST_BACKTRACE=1 ./diverges
thread 'main' panicked at 'This function never returns!',
hello.rs:2
Some details are omitted, run with `RUST_BACKTRACE=full`
for a verbose backtrace.
stack backtrace:
hello::diverges
at ./hello.rs:2
hello::main
at ./hello.rs:6
If you want the complete backtrace and filenames:
$ RUST_BACKTRACE=full ./diverges
thread 'main' panicked at 'This function never returns!',
hello.rs:2
stack backtrace:
1: 0x7f402773a829 - sys::backtrace::write::h0942de78b6c028
2: 0x7f402773d7fc - panicking::on_panic::h3f23f9d0b5f4c91b
3: 0x7f402773960e - rt::unwind::begin_unwind_inner:
:h2844b8c5e81e79558Bw
4: 0x7f4027738893 - rt::unwind::begin_unwind::h43752794474
5: 0x7f4027738809 - diverges::h2266b4c4b850236beaa
6: 0x7f40277389e5 - main::h19bb1149c2f00ecfBaa
7: 0x7f402773f514 - rt::unwind::try::try_fn::h131868834791
76
8: 0x7f402773d1d8 - __rust_try
9: 0x7f402773f201 - rt::lang_start::ha172a3ce74bb453aK5w
10: 0x7f4027738a19 - main
11: 0x7f402694ab44 - __libc_start_main
12: 0x7f40277386c8 - <unknown>
13: 0x0 - <unknown>
If you need to override an already set RUST_BACKTRACE, in cases when
you cannot just unset the variable, then set it to 0 to avoid getting a
backtrace. Any other value (even no value at all) turns on backtrace.
$ export RUST_BACKTRACE=1
...
$ RUST_BACKTRACE=0 ./diverges
thread 'main' panicked at 'This function never returns!',
hello.rs:2
note: Run with `RUST_BACKTRACE=1` for a backtrace.
# fn diverges() -> ! {
# panic!("This function never returns!");
# }
let x: i32 = diverges();
let x: String = diverges();
Function pointers
We can also create variable bindings which point to functions:
let f: fn(i32) -> i32;
f is a variable binding which points to a function that takes an i32 as
an argument and returns an i32. For example:
fn plus_one(i: i32) -> i32 {
i + 1
}
Primitive Types
The Rust language has a number of types that are considered ‘primi-
tive’. This means that they’re built-in to the language. Rust is struc-
tured in such a way that the standard library also provides a number
of useful types built on top of these ones, as well, but these are the
most primitive.
Booleans
Rust has a built-in boolean type, named bool. It has two values, true
and false:
78
let x = true;
char
The char type represents a single Unicode scalar value. You can create
chars with a single tick: (')
let x = 'x';
let two_hearts = '�';
Unlike some other languages, this means that Rust’s char is not a single
byte, but four.
You can find more documentation for chars in the standard library
documentation.
Numeric types
Rust has a variety of numeric types in a few categories: signed and
unsigned, fixed and variable, floating-point and integer.
These types consist of two parts: the category, and the size. For
example, u16 is an unsigned type with sixteen bits of size. More bits
lets you have bigger numbers.
If a number literal has nothing to cause its type to be inferred, it
defaults:
let x = 42; // `x` has type `i32`.
Here’s a list of the different numeric types, with links to their docu-
mentation in the standard library:
• i8
• i16
• i32
79
• i64
• u8
• u16
• u32
• u64
• isize
• usize
• f32
• f64
Fixed-size types
Fixed-size types have a specific number of bits in their representation.
Valid bit sizes are 8, 16, 32, and 64. So, u32 is an unsigned, 32-bit
integer, and i64 is a signed, 64-bit integer.
Variable-size types
Rust also provides types whose particular size depends on the under-
lying machine architecture. Their range is sufficient to express the size
of any collection, so these types have ‘size’ as the category. They come
in signed and unsigned varieties which account for two types: isize
and usize.
80
Floating-point types
Rust also has two floating point types: f32 and f64. These correspond
to IEEE-754 single and double precision numbers.
Arrays
Like many programming languages, Rust has list types to represent
a sequence of things. The most basic is the array, a fixed-size list of
elements of the same type. By default, arrays are immutable.
Arrays have type [T; N]. We’ll talk about this T notation in the generics
section. The N is a compile-time constant, for the length of the array.
There’s a shorthand for initializing each element of an array to the
same value. In this example, each element of a will be initialized to 0:
Slices
A ‘slice’ is a reference to (or “view” into) another data structure. They
are useful for allowing safe, efficient access to a portion of an array
without copying. For example, you might want to reference only one
line of a file read into memory. By nature, a slice is not created directly,
but from an existing variable binding. Slices have a defined length, and
can be mutable or immutable.
Internally, slices are represented as a pointer to the beginning of the
data and a length.
Slicing syntax
You can use a combo of & and [] to create a slice from various things.
The & indicates that slices are similar to references, which we will cover
in detail later in this section. The []s, with a range, let you define the
length of the slice:
Slices have type &[T]. We’ll talk about that T when we cover generics.
You can find more documentation for slices in the standard library
documentation.
str
Rust’s str type is the most primitive string type. As an unsized type,
it’s not very useful by itself, but becomes useful when placed behind a
reference, like &str. We’ll elaborate further when we cover Strings and
references.
You can find more documentation for str in the standard library
documentation.
Tuples
A tuple is an ordered list of fixed size. Like this:
82
The parentheses and commas form this two-length tuple. Here’s the
same code, but with the type annotated:
As you can see, the type of a tuple looks like the tuple, but with each
position having a type name rather than the value. Careful readers will
also note that tuples are heterogeneous: we have an i32 and a &str in
this tuple. In systems programming languages, strings are a bit more
complex than in other languages. For now, read &str as a string slice,
and we’ll learn more soon.
You can assign one tuple into another, if they have the same con-
tained types and arity. Tuples have the same arity when they have the
same length.
x = y;
You can access the fields in a tuple through a destructuring let. Here’s
an example:
Remember before when I said the left-hand side of a let statement was
more powerful than assigning a binding? Here we are. We can put a
pattern on the left-hand side of the let, and if it matches up to the
right-hand side, we can assign multiple bindings at once. In this case,
let “destructures” or “breaks up” the tuple, and assigns the bits to
three bindings.
This pattern is very powerful, and we’ll see it repeated more later.
You can disambiguate a single-element tuple from a value in paren-
theses with a comma:
Tuple Indexing
You can also access fields of a tuple with indexing syntax:
let tuple = (1, 2, 3);
let x = tuple.0;
let y = tuple.1;
let z = tuple.2;
Like array indexing, it starts at zero, but unlike array indexing, it uses
a ., rather than []s.
You can find more documentation for tuples in the standard library
documentation.
Functions
Functions also have a type! They look like this:
fn foo(x: i32) -> i32 { x }
Comments
Now that we have some functions, it’s a good idea to learn about com-
ments. Comments are notes that you leave to other programmers to
help explain things about your code. The compiler mostly ignores them.
Rust has two kinds of comments that you should care about: line
comments and doc comments.
// Line comments are anything after ‘//’ and extend to
the end of the line.
The other kind of comment is a doc comment. Doc comments use ///
instead of //, and support Markdown notation inside:
if
Rust’s take on if is not particularly complex, but it’s much more like
the if you’ll find in a dynamically typed language than in a more
traditional systems language. So let’s talk about it, to make sure you
grasp the nuances.
if is a specific form of a more general concept, the ‘branch’, whose
name comes from a branch in a tree: a decision point, where depending
on a choice, multiple paths can be taken.
In the case of if, there is one choice that leads down two paths:
let x = 5;
if x == 5 {
println!("x is five!");
}
If we changed the value of x to something else, this line would not print.
More specifically, if the expression after the if evaluates to true, then
the block is executed. If it’s false, then it is not.
If you want something to happen in the false case, use an else:
let x = 5;
if x == 5 {
println!("x is five!");
} else {
println!("x is not five :(");
}
let x = 5;
if x == 5 {
println!("x is five!");
} else if x == 6 {
println!("x is six!");
} else {
println!("x is not five or six :(");
}
let x = 5;
let y = if x == 5 {
10
} else {
15
}; // y: i32
Loops
Rust currently provides three approaches to performing some kind of
iterative activity. They are: loop, while and for. Each approach has
its own set of uses.
loop
The infinite loop is the simplest form of loop available in Rust. Using
the keyword loop, Rust provides a way to loop indefinitely until some
terminating statement is reached. Rust’s infinite loops look like this:
loop {
println!("Loop forever!");
}
while
Rust also has a while loop. It looks like this:
let mut x = 5; // mut x: i32
let mut done = false; // mut done: bool
while !done {
87
x += x - 3;
println!("{}", x);
if x % 5 == 0 {
done = true;
}
}
while loops are the correct choice when you’re not sure how many
times you need to loop.
If you need an infinite loop, you may be tempted to write this:
while true {
loop {
for
The for loop is used to loop a particular number of times. Rust’s for
loops work a bit differently than in other systems languages, however.
Rust’s for loop doesn’t look like this “C-style” for loop:
for x in 0..10 {
println!("{}", x); // x: i32
}
Enumerate
When you need to keep track of how many times you have already
looped, you can use the .enumerate() function.
On ranges:
On iterators:
let lines = "hello\nworld".lines();
while !done {
x += x - 3;
println!("{}", x);
if x % 5 == 0 {
done = true;
}
}
We had to keep a dedicated mut boolean variable binding, done, to
know when we should exit out of the loop. Rust has two keywords to
help us with modifying iteration: break and continue.
In this case, we can write the loop in a better way with break:
let mut x = 5;
loop {
x += x - 3;
println!("{}", x);
if x % 5 == 0 { break; }
}
We now loop forever with loop and use break to break out early.
Issuing an explicit return statement will also serve to terminate the
loop early.
continue is similar, but instead of ending the loop, it goes to the
next iteration. This will only print the odd numbers:
90
for x in 0..10 {
if x % 2 == 0 { continue; }
println!("{}", x);
}
Loop labels
You may also encounter situations where you have nested loops and
need to specify which one your break or continue statement is for. Like
most other languages, Rust’s break or continue apply to the innermost
loop. In a situation where you would like to break or continue for one
of the outer loops, you can use labels to specify which loop the break
or continue statement applies to.
In the example below, we continue to the next iteration of outer
loop when x is even, while we continue to the next iteration of inner
loop when y is even. So it will execute the println! when both x and
y are odd.
'outer: for x in 0..10 {
'inner: for y in 0..10 {
if x % 2 == 0 { continue 'outer; } // Continues
the loop over `x`.
if y % 2 == 0 { continue 'inner; } // Continues
the loop over `y`.
println!("x: {}, y: {}", x, y);
}
}
Vectors
A ‘vector’ is a dynamic or ‘growable’ array, implemented as the stan-
dard library type Vec<T>. The T means that we can have vectors of
any type (see the chapter on generics for more). Vectors always allocate
their data on the heap. You can create them with the vec! macro:
let v = vec![1, 2, 3, 4, 5]; // v: Vec<i32>
(Notice that unlike the println! macro we’ve used in the past, we use
square brackets [] with vec! macro. Rust allows you to use either in
either situation, this is just convention.)
There’s an alternate form of vec! for repeating an initial value:
91
Accessing elements
To get the value at a particular index in the vector, we use []s:
let v = vec![1, 2, 3, 4, 5];
let i: usize = 0;
let j: i32 = 0;
// Works:
v[i];
// Doesn’t:
v[j];
Indexing with a non-usize type gives an error that looks like this:
error: the trait bound `collections::vec::Vec<_> : core:
:ops::Index<i32>`
is not satisfied [E0277]
v[j];
^~~~
note: the type `collections::vec::Vec<_>` cannot be indexed
by `i32`
error: aborting due to previous error
Out-of-bounds Access
If you try to access an index that doesn’t exist:
then the current thread will panic with a message like this:
thread 'main' panicked at 'index out of bounds: the len
is 3 but the index is 7'
If you want to handle out-of-bounds errors without panicking, you can
use methods like get or get_mut that return None when given an invalid
index:
let v = vec![1, 2, 3];
match v.get(7) {
Some(x) => println!("Item 7 is {}", x),
None => println!("Sorry, this vector is too short.")
Iterating
Once you have a vector, you can iterate through its elements with for.
There are three versions:
let mut v = vec![1, 2, 3, 4, 5];
for i in &v {
println!("A reference to {}", i);
}
for i in &mut v {
println!("A mutable reference to {}", i);
}
for i in v {
println!("Take ownership of the vector and its element
{}", i);
}
Note: You cannot use the vector again once you have iterated by taking
ownership of the vector. You can iterate the vector multiple times by
93
for i in v {
println!("Take ownership of the vector and its element
{}", i);
}
for i in v {
println!("Take ownership of the vector and its element
{}", i);
}
for i in &v {
println!("This is a reference to {}", i);
}
for i in &v {
println!("This is a reference to {}", i);
}
Vectors have many more useful methods, which you can read about in
their API documentation.
Ownership
This is the first of three sections presenting Rust’s ownership system.
This is one of Rust’s most distinct and compelling features, with which
Rust developers should become quite acquainted. Ownership is how
Rust achieves its largest goal, memory safety. There are a few distinct
concepts, each with its own chapter:
These three chapters are related, and in order. You’ll need all three to
fully understand the ownership system.
Meta
Before we get to the details, two important notes about the ownership
system.
Rust has a focus on safety and speed. It accomplishes these goals
through many ‘zero-cost abstractions’, which means that in Rust, ab-
stractions cost as little as possible in order to make them work. The
ownership system is a prime example of a zero-cost abstraction. All of
the analysis we’ll talk about in this guide is done at compile time. You
do not pay any run-time cost for any of these features.
However, this system does have a certain cost: learning curve. Many
new users to Rust experience something we like to call ‘fighting with the
borrow checker’, where the Rust compiler refuses to compile a program
that the author thinks is valid. This often happens because the pro-
grammer’s mental model of how ownership should work doesn’t match
the actual rules that Rust implements. You probably will experience
similar things at first. There is good news, however: more experienced
Rust developers report that once they work with the rules of the own-
ership system for a period of time, they fight the borrow checker less
and less.
With that in mind, let’s learn about ownership.
Ownership
Variable bindings have a property in Rust: they ‘have ownership’ of
what they’re bound to. This means that when a binding goes out of
scope, Rust will free the bound resources. For example:
fn foo() {
let v = vec![1, 2, 3];
}
When v comes into scope, a new vector is created on the stack, and it
allocates space on the heap for its elements. When v goes out of scope
at the end of foo(), Rust will clean up everything related to the vector,
even the heap-allocated memory. This happens deterministically, at the
end of the scope.
95
Move semantics
There’s some more subtlety here, though: Rust ensures that there is
exactly one binding to any given resource. For example, if we have a
vector, we can assign it to another binding:
let v2 = v;
let v2 = v;
fn take(v: Vec<i32>) {
// What happens here isn’t important.
}
take(v);
96
The details
The reason that we cannot use a binding after we’ve moved it is subtle,
but important.
When we write code like this:
let x = 10;
Rust allocates memory for an integer i32 on the stack, copies the bit
pattern representing the value of 10 to the allocated memory and binds
the variable name x to this memory region for future reference.
Now consider the following code fragment:
let v = vec![1, 2, 3];
let mut v2 = v;
The first line allocates memory for the vector object v on the stack
like it does for x above. But in addition to that it also allocates some
memory on the heap for the actual data ([1, 2, 3]). Rust copies the
address of this heap allocation to an internal pointer, which is part of
the vector object placed on the stack (let’s call it the data pointer).
It is worth pointing out (even at the risk of stating the obvious) that
the vector object and its data live in separate memory regions instead
of being a single contiguous memory allocation (due to reasons we will
not go into at this point of time). These two parts of the vector (the
one on the stack and one on the heap) must agree with each other at
all times with regards to things like the length, capacity, etc.
When we move v to v2, Rust actually does a bitwise copy of the
vector object v into the stack allocation represented by v2. This shallow
copy does not create a copy of the heap allocation containing the actual
data. Which means that there would be two pointers to the contents of
the vector both pointing to the same memory allocation on the heap.
It would violate Rust’s safety guarantees by introducing a data race if
one could access both v and v2 at the same time.
For example if we truncated the vector to just two elements through
v2:
97
and v were still accessible we’d end up with an invalid vector since v
would not know that the heap data has been truncated. Now, the part
of the vector v on the stack does not agree with the corresponding part
on the heap. v still thinks there are three elements in the vector and
will happily let us access the non existent element v[2] but as you
might already know this is a recipe for disaster. Especially because it
might lead to a segmentation fault or worse allow an unauthorized user
to read from memory to which they don’t have access.
This is why Rust forbids using v after we’ve done the move.
It’s also important to note that optimizations may remove the actual
copy of the bytes on the stack, depending on circumstances. So it may
not be as inefficient as it initially seems.
Copy types
We’ve established that when ownership is transferred to another bind-
ing, you cannot use the original binding. However, there’s a trait that
changes this behavior, and it’s called Copy. We haven’t discussed traits
yet, but for now, you can think of them as an annotation to a particular
type that adds extra behavior. For example:
let v = 1;
let v2 = v;
In this case, v is an i32, which implements the Copy trait. This means
that, just like a move, when we assign v to v2, a copy of the data is
made. But, unlike a move, we can still use v afterward. This is because
an i32 has no pointers to data somewhere else, copying it is a full copy.
All primitive types implement the Copy trait and their ownership
is therefore not moved like one would assume, following the ‘ownership
rules’. To give an example, the two following snippets of code only
compile because the i32 and bool types implement the Copy trait.
98
fn main() {
let a = 5;
let _y = double(a);
println!("{}", a);
}
fn main() {
let a = true;
let _y = change_truth(a);
println!("{}", a);
}
If we had used types that do not implement the Copy trait, we would
have gotten a compile error because we tried to use a moved value.
v
}
This would get very tedious. It gets worse the more things we want to
take ownership of:
Ugh! The return type, return line, and calling the function gets way
more complicated.
Luckily, Rust offers a feature which helps us solve this problem. It’s
called borrowing and is the topic of the next section!
These three chapters are related, and in order. You’ll need all three to
fully understand the ownership system.
100
Meta
Before we get to the details, two important notes about the ownership
system.
Rust has a focus on safety and speed. It accomplishes these goals
through many ‘zero-cost abstractions’, which means that in Rust, ab-
stractions cost as little as possible in order to make them work. The
ownership system is a prime example of a zero-cost abstraction. All of
the analysis we’ll talk about in this guide is done at compile time. You
do not pay any run-time cost for any of these features.
However, this system does have a certain cost: learning curve. Many
new users to Rust experience something we like to call ‘fighting with the
borrow checker’, where the Rust compiler refuses to compile a program
that the author thinks is valid. This often happens because the pro-
grammer’s mental model of how ownership should work doesn’t match
the actual rules that Rust implements. You probably will experience
similar things at first. There is good news, however: more experienced
Rust developers report that once they work with the rules of the own-
ership system for a period of time, they fight the borrow checker less
and less.
With that in mind, let’s learn about borrowing.
Borrowing
At the end of the ownership section, we had a nasty function that
looked like this:
fn foo(v: &Vec<i32>) {
v.push(5);
}
let v = vec![];
foo(&v);
&mut references
There’s a second kind of reference: &mut T. A ‘mutable reference’ allows
you to mutate the resource you’re borrowing. For example:
let mut x = 5;
{
let y = &mut x;
*y += 1;
}
println!("{}", x);
}
^
The Rules
Here are the rules for borrowing in Rust:
First, any borrow must last for a scope no greater than that of the
owner. Second, you may have one or the other of these two kinds of
borrows, but not both at the same time:
You may notice that this is very similar to, though not exactly the
same as, the definition of a data race:
104
Thinking in scopes
Here’s the code:
fn main() {
let mut x = 5;
let y = &mut x;
*y += 1;
println!("{}", x);
}
This code gives us this error:
error: cannot borrow `x` as immutable because it is also
borrowed as mutable
println!("{}", x);
^
This is because we’ve violated the rules: we have a &mut T pointing to
x, and so we aren’t allowed to create any &Ts. It’s one or the other.
The note hints at how to think about this problem:
note: previous borrow ends here
fn main() {
}
^
In other words, the mutable borrow is held through the rest of our
example. What we want is for the mutable borrow by y to end so that
the resource can be returned to the owner, x. x can then provide an
immutable borrow to println!. In Rust, borrowing is tied to the scope
that the borrow is valid for. And our scopes look like this:
105
fn main() {
let mut x = 5;
{
let y = &mut x; // -+ &mut borrow starts here.
*y += 1; // |
} // -+ ... and ends here.
Iterator invalidation
One example is ‘iterator invalidation’, which happens when you try to
mutate a collection that you’re iterating over. Rust’s borrow checker
prevents this from happening:
let mut v = vec![1, 2, 3];
for i in &v {
106
println!("{}", i);
}
This prints out one through three. As we iterate through the vector,
we’re only given references to the elements. And v is itself borrowed as
immutable, which means we can’t change it while we’re iterating:
for i in &v {
println!("{}", i);
v.push(34);
}
let y: &i32;
{
let x = 5;
y = &x;
}
println!("{}", y);
In other words, y is only valid for the scope where x exists. As soon as
x goes away, it becomes invalid to refer to it. As such, the error says
that the borrow ‘doesn’t live long enough’ because it’s not valid for the
right amount of time.
The same problem occurs when the reference is declared before the
variable it refers to. This is because resources within the same scope
are freed in the opposite order they were declared:
let y: &i32;
let x = 5;
y = &x;
108
println!("{}", y);
println!("{}", y);
}
println!("{}", y);
}
In the above example, y is declared before x, meaning that y lives longer
than x, which is not allowed.
Lifetimes
This is the last of three sections presenting Rust’s ownership system.
This is one of Rust’s most distinct and compelling features, with which
Rust developers should become quite acquainted. Ownership is how
Rust achieves its largest goal, memory safety. There are a few distinct
concepts, each with its own chapter:
These three chapters are related, and in order. You’ll need all three to
fully understand the ownership system.
Meta
Before we get to the details, two important notes about the ownership
system.
Rust has a focus on safety and speed. It accomplishes these goals
through many ‘zero-cost abstractions’, which means that in Rust, ab-
stractions cost as little as possible in order to make them work. The
ownership system is a prime example of a zero-cost abstraction. All of
the analysis we’ll talk about in this guide is done at compile time. You
do not pay any run-time cost for any of these features.
However, this system does have a certain cost: learning curve. Many
new users to Rust experience something we like to call ‘fighting with the
borrow checker’, where the Rust compiler refuses to compile a program
that the author thinks is valid. This often happens because the pro-
grammer’s mental model of how ownership should work doesn’t match
the actual rules that Rust implements. You probably will experience
similar things at first. There is good news, however: more experienced
Rust developers report that once they work with the rules of the own-
ership system for a period of time, they fight the borrow checker less
and less.
With that in mind, let’s learn about lifetimes.
Lifetimes
Lending out a reference to a resource that someone else owns can be
complicated. For example, imagine this set of operations:
To fix this, we have to make sure that step four never happens after
step three. In the small example above the Rust compiler is able to
report the issue as it can see the lifetimes of the various values in the
function.
When we have a function that takes arguments by reference the
situation becomes more complex. Consider the following example:
let v;
{
let p = format!("lang:{}=", lang); // -+ `p` comes
into scope.
v = skip_prefix(line, p.as_str()); // |
} // -+ `p` goes
out of scope.
println!("{}", v);
Syntax
The ’a reads ‘the lifetime a’. Technically, every reference has some
lifetime associated with it, but the compiler lets you elide (i.e. omit,
see “Lifetime Elision” below) them in common cases. Before we get to
that, though, let’s look at a short example with explicit lifetimes:
fn bar<'a>(...)
We previously talked a little about function syntax, but we didn’t dis-
cuss the <>s after a function’s name. A function can have ‘generic
112
parameters’ between the <>s, of which lifetimes are one kind. We’ll
discuss other kinds of generics later in the book, but for now, let’s
focus on the lifetimes aspect.
We use <> to declare our lifetimes. This says that bar has one
lifetime, ’a. If we had two reference parameters with different lifetimes,
it would look like this:
fn bar<'a, 'b>(...)
If you compare &mut i32 to &’a mut i32, they’re the same, it’s that
the lifetime ’a has snuck in between the & and the mut i32. We read
&mut i32 as ‘a mutable reference to an i32’ and &’a mut i32 as ‘a
mutable reference to an i32 with the lifetime ’a’.
In structs
You’ll also need explicit lifetimes when working with structs that con-
tain references:
struct Foo<'a> {
x: &'a i32,
}
fn main() {
let y = &5; // This is the same as `let _y = 5; let
y = &_y;`.
let f = Foo { x: y };
println!("{}", f.x);
}
As you can see, structs can also have lifetimes. In a similar way to
functions,
113
struct Foo<'a> {
# x: &'a i32,
# }
# struct Foo<'a> {
x: &'a i32,
# }
uses it. So why do we need a lifetime here? We need to ensure that any
reference to a Foo cannot outlive the reference to an i32 it contains.
impl blocks
Let’s implement a method on Foo:
struct Foo<'a> {
x: &'a i32,
}
impl<'a> Foo<'a> {
fn x(&self) -> &'a i32 { self.x }
}
fn main() {
let y = &5; // This is the same as `let _y = 5; let
y = &_y;`.
let f = Foo { x: y };
As you can see, we need to declare a lifetime for Foo in the impl line.
We repeat ’a twice, like on functions: impl<’a> defines a lifetime ’a,
and Foo<’a> uses it.
Multiple lifetimes
If you have multiple references, you can use the same lifetime multiple
times:
114
Thinking in scopes
A way to think about lifetimes is to visualize the scope that a reference
is valid for. For example:
fn main() {
let y = &5; // -+ `y` comes into scope.
// |
// Stuff... // |
// |
} // -+ `y` goes out of scope.
struct Foo<'a> {
x: &'a i32,
}
fn main() {
let y = &5; // -+ `y` comes into scope.
let f = Foo { x: y }; // -+ `f` comes into scope.
// |
// Stuff... // |
// |
} // -+ `f` and `y` go out of scope.
struct Foo<'a> {
x: &'a i32,
}
fn main() {
let x; // -+ `x` comes into scope.
// |
{ // |
let y = &5; // ---+ `y` comes into scope.
Whew! As you can see here, the scopes of f and y are smaller than
the scope of x. But when we do x = &f.x, we make x a reference to
something that’s about to go out of scope.
Named lifetimes are a way of giving these scopes a name. Giving
something a name is the first step towards being able to talk about it.
’static
The lifetime named ‘static’ is a special lifetime. It signals that some-
thing has the lifetime of the entire program. Most Rust programmers
first come across ’static when dealing with strings:
String literals have the type &’static str because the reference is
always alive: they are baked into the data segment of the final binary.
Another example are globals:
This adds an i32 to the data segment of the binary, and x is a reference
to it.
Lifetime Elision
Rust supports powerful local type inference in the bodies of functions,
but it deliberately does not perform any reasoning about types for item
signatures. However, for ergonomic reasons, a very restricted secondary
inference algorithm called “lifetime elision” does apply when judging
lifetimes. Lifetime elision is concerned solely with inferring lifetime pa-
rameters using three easily memorizable and unambiguous rules. This
means lifetime elision acts as a shorthand for writing an item signature,
while not hiding away the actual types involved as full local inference
would if applied to it.
When talking about lifetime elision, we use the terms input lifetime
and output lifetime. An input lifetime is a lifetime associated with a
parameter of a function, and an output lifetime is a lifetime associated
with the return value of a function. For example, this function has an
input lifetime:
Examples
Here are some examples of functions with elided lifetimes. We’ve paired
each example of an elided lifetime with its expanded form.
In the preceding example, lvl doesn’t need a lifetime because it’s not
a reference (&). Only things relating to references (such as a struct
which contains a reference) need lifetimes.
Mutability
Mutability, the ability to change something, works a bit differently in
Rust than in other languages. The first aspect of mutability is its non-
default status:
118
let x = 5;
x = 6; // Error!
We can introduce mutability with the mut keyword:
let mut x = 5;
x = 6; // No problem!
This is a mutable variable binding. When a binding is mutable, it
means you’re allowed to change what the binding points to. So in the
above example, it’s not so much that the value at x is changing, but
that the binding changed from one i32 to another.
You can also create a reference to it, using &x, but if you want to
use the reference to change it, you will need a mutable reference:
let mut x = 5;
let y = &mut x;
y is an immutable binding to a mutable reference, which means that
you can’t bind ‘y’ to something else (y = &mut z), but y can be used
to bind x to something else (*y = 5). A subtle distinction.
Of course, if you need both:
let mut x = 5;
let mut y = &mut x;
Now y can be bound to another value, and the value it’s referencing
can be changed.
It’s important to note that mut is part of a pattern, so you can do
things like this:
let (mut x, y) = (5, 6);
fn foo(mut x: i32) {
# }
Note that here, the x is mutable, but not the y.
use std::sync::Arc;
let x = Arc::new(5);
let y = x.clone();
When we call clone(), the Arc<T> needs to update the reference count.
Yet we’ve not used any muts here, x is an immutable binding, and we
didn’t take &mut 5 or anything. So what gives?
To understand this, we have to go back to the core of Rust’s guiding
philosophy, memory safety, and the mechanism by which Rust guaran-
tees it, the ownership system, and more specifically, borrowing:
So, that’s the real definition of ‘immutability’: is this safe to have two
pointers to? In Arc<T>’s case, yes: the mutation is entirely contained
inside the structure itself. It’s not user facing. For this reason, it hands
out &T with clone(). If it handed out &mut Ts, though, that would be
a problem.
Other types, like the ones in the std::cell module, have the op-
posite: interior mutability. For example:
use std::cell::RefCell;
let x = RefCell::new(42);
let y = x.borrow_mut();
RefCell hands out &mut references to what’s inside of it with the borrow_
mut() method. Isn’t that dangerous? What if we do:
use std::cell::RefCell;
let x = RefCell::new(42);
let y = x.borrow_mut();
let z = x.borrow_mut();
# (y, z);
120
Field-level mutability
Mutability is a property of either a borrow (&mut) or a binding (let
mut). This means that, for example, you cannot have a struct with
some fields mutable and some immutable:
struct Point {
x: i32,
mut y: i32, // Nope.
}
The mutability of a struct is in its binding:
struct Point {
x: i32,
y: i32,
}
a.x = 10;
let b = Point { x: 5, y: 6 };
struct Point {
x: i32,
y: Cell<i32>,
}
point.y.set(7);
Structs
structs are a way of creating more complex data types. For example, if
we were doing calculations involving coordinates in 2D space, we would
need both an x and a y value:
let origin_x = 0;
let origin_y = 0;
A struct lets us combine these two into a single, unified datatype with
x and y as field labels:
struct Point {
x: i32,
y: i32,
}
fn main() {
let origin = Point { x: 0, y: 0 }; // origin: Point
struct Point {
x: i32,
y: i32,
}
fn main() {
let mut point = Point { x: 0, y: 0 };
point.x = 5;
struct Point {
mut x: i32, // This causes an error.
y: i32,
}
struct Point {
x: i32,
y: i32,
}
fn main() {
let mut point = Point { x: 0, y: 0 };
point.x = 5;
Your structure can still contain &mut references, which will let you do
some kinds of mutation:
struct Point {
x: i32,
y: i32,
}
struct PointRef<'a> {
x: &'a mut i32,
y: &'a mut i32,
}
fn main() {
let mut point = Point { x: 0, y: 0 };
{
let r = PointRef { x: &mut point.x, y: &mut point.
y };
*r.x = 5;
*r.y = 6;
}
assert_eq!(5, point.x);
assert_eq!(6, point.y);
}
fn main() {
124
// Debug-print struct
println!("{:?}", peter);
}
Update syntax
A struct can include .. to indicate that you want to use a copy of
some other struct for some of the values. For example:
struct Point3d {
x: i32,
y: i32,
z: i32,
}
This gives point a new y, but keeps the old x and z values. It doesn’t
have to be the same struct either, you can use this syntax when making
new ones, and it will copy the values you don’t specify:
# struct Point3d {
# x: i32,
# y: i32,
# z: i32,
# }
let origin = Point3d { x: 0, y: 0, z: 0 };
let point = Point3d { z: 1, x: 2, .. origin };
Tuple structs
Rust has another data type that’s like a hybrid between a tuple and
a struct, called a ‘tuple struct’. Tuple structs have a name, but their
fields don’t. They are declared with the struct keyword, and then
with a name followed by a tuple:
125
Here, black and origin are not the same type, even though they
contain the same values.
The members of a tuple struct may be accessed by dot notation or
destructuring let, just like regular tuples:
struct Inches(i32);
As above, you can extract the inner integer type through a destruc-
turing let. In this case, the let Inches(integer_length) assigns 10
to integer_length. We could have used dot notation to do the same
thing:
# struct Inches(i32);
# let length = Inches(10);
let integer_length = length.0;
It’s always possible to use a struct instead of a tuple struct, and can
be clearer. We could write Color and Point like this instead:
126
struct Color {
red: i32,
blue: i32,
green: i32,
}
struct Point {
x: i32,
y: i32,
z: i32,
}
Good names are important, and while values in a tuple struct can be
referenced with dot notation as well, a struct gives us actual names,
rather than positions.
Unit-like structs
You can define a struct with no members at all:
Enums
An enum in Rust is a type that represents data that is one of several
possible variants. Each variant in the enum can optionally have data
associated with it:
127
enum Message {
Quit,
ChangeColor(i32, i32, i32),
Move { x: i32, y: i32 },
Write(String),
}
The syntax for defining variants resembles the syntaxes used to define
structs: you can have variants with no data (like unit-like structs),
variants with named data, and variants with unnamed data (like tuple
structs). Unlike separate struct definitions, however, an enum is a single
type. A value of the enum can match any of the variants. For this
reason, an enum is sometimes called a ‘sum type’: the set of possible
values of the enum is the sum of the sets of possible values for each
variant.
We use the :: syntax to use the name of each variant: they’re
scoped by the name of the enum itself. This allows both of these to
work:
# enum Message {
# Move { x: i32, y: i32 },
# }
let x: Message = Message::Move { x: 3, y: 4 };
enum BoardGameTurn {
Move { squares: i32 },
Pass,
}
Both variants are named Move, but since they’re scoped to the name of
the enum, they can both be used without conflict.
A value of an enum type contains information about which variant
it is, in addition to any data associated with that variant. This is
sometimes referred to as a ‘tagged union’, since the data includes a
‘tag’ indicating what type it is. The compiler uses this information to
enforce that you’re accessing the data in the enum safely. For instance,
you can’t simply try to destructure a value as if it were one of the
possible variants:
128
fn process_color_change(msg: Message) {
let Message::ChangeColor(r, g, b) = msg; // This causes
a compile-time error.
}
Not supporting these operations may seem rather limiting, but it’s a
limitation which we can overcome. There are two ways: by implement-
ing equality ourselves, or by pattern matching variants with match ex-
pressions, which you’ll learn in the next section. We don’t know enough
about Rust to implement equality yet, but we’ll find out in the traits
section.
Constructors as functions
An enum constructor can also be used like a function. For example:
# enum Message {
# Write(String),
# }
let m = Message::Write("Hello, world".to_string());
is the same as
# enum Message {
# Write(String),
# }
fn foo(x: String) -> Message {
Message::Write(x)
}
Match
Often, a simple if/else isn’t enough, because you have more than
two possible options. Also, conditions can get quite complex. Rust
has a keyword, match, that allows you to replace complicated if/else
groupings with something more powerful. Check it out:
let x = 5;
match x {
1 => println!("one"),
2 => println!("two"),
3 => println!("three"),
4 => println!("four"),
5 => println!("five"),
_ => println!("something else"),
}
match takes an expression and then branches based on its value. Each
‘arm’ of the branch is of the form val => expression. When the value
matches, that arm’s expression will be evaluated. It’s called match be-
cause of the term ‘pattern matching’, which match is an implementation
of. There’s a separate section on patterns that covers all the patterns
that are possible here.
One of the many advantages of match is it enforces ‘exhaustiveness
checking’. For example if we remove the last arm with the underscore
_, the compiler will give us an error:
Rust is telling us that we forgot some value. The compiler infers from
x that it can have any 32bit integer value; for example -2,147,483,648
to 2,147,483,647. The _ acts as a ‘catch-all’, and will catch all possible
values that aren’t specified in an arm of match. As you can see in the
previous example, we provide match arms for integers 1-5, if x is 6 or
any other value, then it is caught by _.
match is also an expression, which means we can use it on the right-
hand side of a let binding or directly where an expression is used:
130
let x = 5;
Matching on enums
Another important use of the match keyword is to process the possible
variants of an enum:
enum Message {
Quit,
ChangeColor(i32, i32, i32),
Move { x: i32, y: i32 },
Write(String),
}
fn quit() { /* ... */ }
fn change_color(r: i32, g: i32, b: i32) { /* ... */ }
fn move_cursor(x: i32, y: i32) { /* ... */ }
fn process_message(msg: Message) {
match msg {
Message::Quit => quit(),
Message::ChangeColor(r, g, b) => change_color(r,
g, b),
Message::Move { x, y: new_name_for_y } => move_
cursor(x, new_name_for_y),
Message::Write(s) => println!("{}", s),
};
}
you have a match arm for every variant of the enum. If you leave one
off, it will give you a compile-time error unless you use _ or provide all
possible arms.
Unlike the previous uses of match, you can’t use the normal if
statement to do this. You can use the if let statement, which can be
seen as an abbreviated form of match.
Patterns
Patterns are quite common in Rust. We use them in variable bindings,
match expressions, and other places, too. Let’s go on a whirlwind tour
of all of the things patterns can do!
A quick refresher: you can match against literals directly, and _
acts as an ‘any’ case:
let x = 1;
match x {
1 => println!("one"),
2 => println!("two"),
3 => println!("three"),
_ => println!("anything"),
}
This prints one.
It’s possible to create a binding for the value in the any case:
let x = 1;
match x {
y => println!("x: {} y: {}", x, y),
}
This prints:
x: 1 y: 1
match x {
132
There’s one pitfall with patterns: like anything that introduces a new
binding, they introduce shadowing. For example:
let x = 1;
let c = 'c';
match c {
x => println!("x: {} c: {}", x, c),
}
println!("x: {}", x)
This prints:
x: c c: c
x: 1
In other words, x => matches the pattern and introduces a new binding
named x. This new binding is in scope for the match arm and takes
on the value of c. Notice that the value of x outside the scope of the
match has no bearing on the value of x within it. Because we already
have a binding named x, this new x shadows it.
Multiple patterns
You can match multiple patterns with |:
let x = 1;
match x {
1 | 2 => println!("one or two"),
3 => println!("three"),
_ => println!("anything"),
}
Destructuring
If you have a compound data type, like a struct, you can destructure
it inside of a pattern:
struct Point {
x: i32,
y: i32,
}
match origin {
Point { x, y } => println!("({},{})", x, y),
}
struct Point {
x: i32,
y: i32,
}
match origin {
Point { x: x1, y: y1 } => println!("({},{})", x1, y1)
,
}
If we only care about some of the values, we don’t have to give them
all names:
struct Point {
x: i32,
y: i32,
}
match point {
134
match point {
Point { y, .. } => println!("y is {}", y),
}
This prints y is 3.
This ‘destructuring’ behavior works on any compound data type,
like tuples or enums.
Ignoring bindings
You can use _ in a pattern to disregard the type and value. For example,
here’s a match against a Result<T, E>:
# let some_value: Result<i32, &'static str> = Err("There
was an error");
match some_value {
Ok(value) => println!("got a value: {}", value),
Err(_) => println!("an error occurred"),
}
In the first arm, we bind the value inside the Ok variant to value. But
in the Err arm, we use _ to disregard the specific error, and print a
general error message.
_ is valid in any pattern that creates a binding. This can be useful
to ignore parts of a larger structure:
fn coordinate() -> (i32, i32, i32) {
// Generate and return some sort of triple tuple.
# (1, 2, 3)
}
135
// However,
This also means that any temporary variables will be dropped at the
end of the statement:
// Here, the String created will be dropped immediately,
as it’s not bound:
match x {
OptionalTuple::Value(..) => println!("Got a tuple!")
,
OptionalTuple::Missing => println!("No such luck.")
,
}
This prints Got a tuple!.
match x {
ref r => println!("Got a reference to {}", r),
}
This prints Got a reference to 5.
Here, the r inside the match has the type &i32. In other words, the
ref keyword creates a reference, for use in the pattern. If you need a
mutable reference, ref mut will work in the same way:
let mut x = 5;
match x {
ref mut mr => println!("Got a mutable reference to
{}", mr),
}
Ranges
You can match a range of values with ...:
let x = 1;
match x {
1 ... 5 => println!("one through five"),
_ => println!("anything"),
137
let x = '�';
match x {
'a' ... 'j' => println!("early letter"),
'k' ... 'z' => println!("late letter"),
_ => println!("something else"),
}
Bindings
You can bind values to names with @:
let x = 1;
match x {
e @ 1 ... 5 => println!("got a range element {}", e)
,
_ => println!("anything"),
}
This prints got a range element 1. This is useful when you want to
do a complicated match of part of a data structure:
#[derive(Debug)]
struct Person {
name: Option<String>,
}
_ => {}
}
This prints Some(“Steve”): we’ve bound the inner name to a.
If you use @ with |, you need to make sure the name is bound in
each part of the pattern:
let x = 5;
match x {
e @ 1 ... 5 | e @ 8 ... 10 => println!("got a range
element {}", e),
_ => println!("anything"),
}
Guards
You can introduce ‘match guards’ with if:
enum OptionalInt {
Value(i32),
Missing,
}
let x = OptionalInt::Value(5);
match x {
OptionalInt::Value(i) if i > 5 => println!("Got an
int bigger than five!"),
OptionalInt::Value(..) => println!("Got an int!"),
OptionalInt::Missing => println!("No such luck."),
}
This prints Got an int!.
If you’re using if with multiple patterns, the if applies to both
sides:
let x = 4;
let y = false;
match x {
4 | 5 if y => println!("yes"),
139
_ => println!("no"),
}
This prints no, because the if applies to the whole of 4 | 5, and not
to only the 5. In other words, the precedence of if behaves like this:
(4 | 5) if y => ...
not this:
4 | (5 if y) => ...
match x {
Foo { x: Some(ref name), y: None } => ...
}
Patterns are very powerful. Make good use of them.
Method Syntax
Functions are great, but if you want to call a bunch of them on some
data, it can be awkward. Consider this code:
baz(bar(foo));
We would read this left-to-right, and so we see ‘baz bar foo’. But this
isn’t the order that the functions would get called in, that’s inside-out:
‘foo bar baz’. Wouldn’t it be nice if we could do this instead?
foo.bar().baz();
Luckily, as you may have guessed with the leading question, you can!
Rust provides the ability to use this ‘method call syntax’ via the impl
keyword.
Method calls
Here’s how it works:
140
struct Circle {
x: f64,
y: f64,
radius: f64,
}
impl Circle {
fn area(&self) -> f64 {
std::f64::consts::PI * (self.radius * self.radius)
}
}
fn main() {
let c = Circle { x: 0.0, y: 0.0, radius: 2.0 };
println!("{}", c.area());
}
struct Circle {
x: f64,
y: f64,
radius: f64,
}
impl Circle {
fn reference(&self) {
141
fn mutable_reference(&mut self) {
println!("taking self by mutable reference!");
}
fn takes_ownership(self) {
println!("taking ownership of self!");
}
}
You can use as many impl blocks as you’d like. The previous example
could have also been written like this:
struct Circle {
x: f64,
y: f64,
radius: f64,
}
impl Circle {
fn reference(&self) {
println!("taking self by reference!");
}
}
impl Circle {
fn mutable_reference(&mut self) {
println!("taking self by mutable reference!");
}
}
impl Circle {
fn takes_ownership(self) {
println!("taking ownership of self!");
}
}
142
struct Circle {
x: f64,
y: f64,
radius: f64,
}
impl Circle {
fn area(&self) -> f64 {
std::f64::consts::PI * (self.radius * self.radius)
fn main() {
let c = Circle { x: 0.0, y: 0.0, radius: 2.0 };
println!("{}", c.area());
let d = c.grow(2.0).area();
println!("{}", d);
}
# struct Circle;
# impl Circle {
fn grow(&self, increment: f64) -> Circle {
# Circle } }
Associated functions
You can also define associated functions that do not take a self pa-
rameter. Here’s a pattern that’s very common in Rust code:
struct Circle {
x: f64,
y: f64,
radius: f64,
}
impl Circle {
fn new(x: f64, y: f64, radius: f64) -> Circle {
Circle {
x: x,
y: y,
radius: radius,
}
}
}
fn main() {
let c = Circle::new(0.0, 0.0, 2.0);
}
This ‘associated function’ builds a new Circle for us. Note that associ-
ated functions are called with the Struct::function() syntax, rather
than the ref.method() syntax. Some other languages call associated
functions ‘static methods’.
Builder Pattern
Let’s say that we want our users to be able to create Circles, but
we will allow them to only set the properties they care about. Other-
wise, the x and y attributes will be 0.0, and the radius will be 1.0.
Rust doesn’t have method overloading, named arguments, or variable
arguments. We employ the builder pattern instead. It looks like this:
struct Circle {
x: f64,
y: f64,
144
radius: f64,
}
impl Circle {
fn area(&self) -> f64 {
std::f64::consts::PI * (self.radius * self.radius)
}
}
struct CircleBuilder {
x: f64,
y: f64,
radius: f64,
}
impl CircleBuilder {
fn new() -> CircleBuilder {
CircleBuilder { x: 0.0, y: 0.0, radius: 1.0, }
}
fn main() {
let c = CircleBuilder::new()
.x(1.0)
.y(2.0)
.radius(2.0)
.finalize();
Strings
Strings are an important concept for any programmer to master. Rust’s
string handling system is a bit different from other languages, due to
its systems focus. Any time you have a data structure of variable
size, things can get tricky, and strings are a re-sizable data structure.
That being said, Rust’s strings also work differently than in some other
systems languages, such as C.
Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar
values encoded as a stream of UTF-8 bytes. All strings are guaranteed
to be a valid encoding of UTF-8 sequences. Additionally, unlike some
systems languages, strings are not NUL-terminated and can contain
NUL bytes.
Rust has two main types of strings: &str and String. Let’s talk
about &str first. These are called ‘string slices’. A string slice has a
fixed size, and cannot be mutated. It is a reference to a sequence of
146
UTF-8 bytes.
let s = "foo
bar";
let s = "foo\
bar";
assert_eq!("foobar", s);
Note that you normally cannot access a str directly, but only through
a &str reference. This is because str is an unsized type which requires
additional runtime information to be usable. For more information see
the chapter on unsized types.
Rust has more than only &strs though. A String is a heap-
allocated string. This string is growable, and is also guaranteed to
be UTF-8. Strings are commonly created by converting from a string
slice using the to_string method.
s.push_str(", world.");
println!("{}", s);
fn takes_slice(slice: &str) {
println!("Got: {}", slice);
}
fn main() {
let s = "Hello".to_string();
takes_slice(&s);
}
This coercion does not happen for functions that accept one of &str’s
traits instead of &str. For example, TcpStream::connect has a pa-
rameter of type ToSocketAddrs. A &str is okay but a String must be
explicitly converted using &*.
use std::net::TcpStream;
TcpStream::connect("192.168.0.1:3000"); // Parameter is
of type &str.
Indexing
Because strings are valid UTF-8, they do not support indexing:
let s = "hello";
for b in hachiko.as_bytes() {
print!("{}, ", b);
}
println!("");
for c in hachiko.chars() {
print!("{}, ", c);
}
println!("");
This prints:
229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131,
129, 229, 133, 172,
�, �, �, �, �,
This emphasizes that we have to walk from the beginning of the list of
chars.
Slicing
You can get a slice of a string with the slicing syntax:
But note that these are byte offsets, not character offsets. So this will
fail at runtime:
let dog = "�����";
let hachi = &dog[0..2];
Concatenation
If you have a String, you can concatenate a &str to the end of it:
let hello = "Hello ".to_string();
let world = "world!";
Generics
Sometimes, when writing a function or data type, we may want it to
work for multiple types of arguments. In Rust, we can do this with
generics. Generics are called ‘parametric polymorphism’ in type the-
ory, which means that they are types or functions that have multi-
ple forms (‘poly’ is multiple, ‘morph’ is form) over a given parameter
(‘parametric’).
Anyway, enough type theory, let’s check out some generic code.
Rust’s standard library provides a type, Option<T>, that’s generic:
enum Option<T> {
Some(T),
None,
}
The <T> part, which you’ve seen a few times before, indicates that this
is a generic data type. Inside the declaration of our enum, wherever
we see a T, we substitute that type for the same type used in the
generic. Here’s an example of using Option<T>, with some extra type
annotations:
150
That doesn’t mean we can’t make Option<T>s that hold an f64! They
have to match up:
This type is generic over two types: T and E. By the way, the capital
letters can be any letter you’d like. We could define Result<T, E> as:
if we wanted to. Convention says that the first generic parameter should
be T, for ‘type’, and that we use E for ‘error’. Rust doesn’t care, however.
The Result<T, E> type is intended to be used to return the result
of a computation, and to have the ability to return an error if it didn’t
work out.
151
Generic functions
We can write functions that take generic types with a similar syntax:
fn takes_anything<T>(x: T) {
// Do something with `x`.
}
The syntax has two parts: the <T> says “this function is generic over
one type, T”, and the x: T says “x has the type T.”
Multiple arguments can have the same generic type:
fn takes_two_of_the_same_things<T>(x: T, y: T) {
// ...
}
We could write a version that takes multiple types:
fn takes_two_things<T, U>(x: T, y: U) {
// ...
}
Generic structs
You can store a generic type in a struct as well:
struct Point<T> {
x: T,
y: T,
}
fn swap(&mut self) {
std::mem::swap(&mut self.x, &mut self.y);
}
}
So far you’ve seen generics that take absolutely any type. These are
useful in many cases: you’ve already seen Option<T>, and later you’ll
meet universal container types like Vec<T>. On the other hand, often
you want to trade that flexibility for increased expressive power. Read
about trait bounds to see why and how.
Resolving ambiguities
Most of the time when generics are involved, the compiler can infer the
generic parameters automatically:
Sometimes though, the compiler needs a little help. For example, had
we omitted the last line, we would get a compile error:
let v = Vec::new();
// ^^^^^^^^ cannot infer type for `T`
//
// note: type annotations or generic parameter binding
required
println!("{:?}", v);
let v = Vec::<bool>::new();
println!("{:?}", v);
Traits
A trait is a language feature that tells the Rust compiler about func-
tionality a type must provide.
Recall the impl keyword, used to call a function with method syn-
tax:
struct Circle {
x: f64,
y: f64,
radius: f64,
}
impl Circle {
fn area(&self) -> f64 {
std::f64::consts::PI * (self.radius * self.radius)
}
}
Traits are similar, except that we first define a trait with a method
signature, then implement the trait for a type. In this example, we
implement the trait HasArea for Circle:
struct Circle {
x: f64,
y: f64,
radius: f64,
}
trait HasArea {
fn area(&self) -> f64;
}
154
}
}
As you can see, the trait block looks very similar to the impl block,
but we don’t define a body, only a type signature. When we impl a
trait, we use impl Trait for Item, rather than only impl Item.
Self may be used in a type annotation to refer to an instance of
the type implementing this trait passed as a parameter. Self, &Self or
&mut Self may be used depending on the level of ownership required.
struct Circle {
x: f64,
y: f64,
radius: f64,
}
trait HasArea {
fn area(&self) -> f64;
bound, the types they accept. Consider this function, which does not
compile:
fn print_area<T>(shape: T) {
println!("This shape has an area of {}", shape.area(
));
}
Rust complains:
error: no method named `area` found for type `T` in the
current scope
Because T can be any type, we can’t be sure that it implements the
area method. But we can add a trait bound to our generic T, ensuring
that it does:
# trait HasArea {
# fn area(&self) -> f64;
# }
fn print_area<T: HasArea>(shape: T) {
println!("This shape has an area of {}", shape.area(
));
}
The syntax <T: HasArea> means “any type that implements the HasArea
trait.” Because traits define function type signatures, we can be sure
that any type which implements HasArea will have an .area() method.
Here’s an extended example of how this works:
trait HasArea {
fn area(&self) -> f64;
}
struct Circle {
x: f64,
y: f64,
radius: f64,
}
}
}
struct Square {
x: f64,
y: f64,
side: f64,
}
fn print_area<T: HasArea>(shape: T) {
println!("This shape has an area of {}", shape.area(
));
}
fn main() {
let c = Circle {
x: 0.0f64,
y: 0.0f64,
radius: 1.0f64,
};
let s = Square {
x: 0.0f64,
y: 0.0f64,
side: 1.0f64,
};
print_area(c);
print_area(s);
}
This program outputs:
This shape has an area of 3.141593
This shape has an area of 1
As you can see, print_area is now generic, but also ensures that we
157
print_area(5);
struct Rectangle<T> {
x: T,
y: T,
width: T,
height: T,
}
fn main() {
let mut r = Rectangle {
x: 0,
y: 0,
width: 47,
height: 47,
};
assert!(r.is_square());
r.height = 42;
assert!(!r.is_square());
}
is_square() needs to check that the sides are equal, so the sides must
be of a type that implements the core::cmp::PartialEq trait:
158
println!("{}", 1.0.approx_equal(&1.00000001));
This may seem like the Wild West, but there are two restrictions around
implementing traits that prevent this from getting out of hand. The
first is that if the trait isn’t defined in your scope, it doesn’t apply.
Here’s an example: the standard library provides a Write trait which
adds extra functionality to Files, for doing file I/O. By default, a File
won’t have its methods:
let mut f = std::fs::File::create("foo.txt").expect("Couldn’t
create foo.txt");
let buf = b"whatever"; // buf: &[u8; 8], a byte string
literal.
let result = f.write(buf);
# result.unwrap(); // Ignore the error.
Here’s the error:
159
fn foo<T: Clone>(x: T) {
x.clone();
}
use std::fmt::Debug;
Where clause
Writing functions with only a few generic types and a small number
of trait bounds isn’t too bad, but as the number increases, the syntax
gets increasingly awkward:
use std::fmt::Debug;
fn main() {
foo("Hello", "world");
bar("Hello", "world");
}
foo() uses the syntax we showed earlier, and bar() uses a where clause.
All you need to do is leave off the bounds when defining your type
parameters, and then add where after the parameter list. For longer
lists, whitespace can be added:
use std::fmt::Debug;
fn bar<T, K>(x: T, y: K)
where T: Clone,
K: Clone + Debug {
x.clone();
y.clone();
println!("{:?}", y);
}
trait ConvertTo<Output> {
fn convert(&self) -> Output;
}
x.convert()
}
This shows off the additional feature of where clauses: they allow
bounds on the left-hand side not only of type parameters T, but also
of types (i32 in this case). In this example, i32 must implement
ConvertTo<T>. Rather than defining what i32 is (since that’s obvi-
ous), the where clause here constrains T.
Default methods
A default method can be added to a trait definition if it is already
known how a typical implementor will define a method. For example,
is_invalid() is defined as the opposite of is_valid():
trait Foo {
fn is_valid(&self) -> bool;
struct OverrideDefault;
Inheritance
Sometimes, implementing a trait requires implementing another trait:
trait Foo {
fn foo(&self);
}
# trait Foo {
# fn foo(&self);
# }
# trait FooBar : Foo {
# fn foobar(&self);
# }
struct Baz;
164
Deriving
Implementing traits like Debug and Default repeatedly can become
quite tedious. For that reason, Rust provides an attribute that allows
you to let Rust automatically implement traits for you:
#[derive(Debug)]
struct Foo;
fn main() {
println!("{:?}", Foo);
}
However, deriving is limited to a certain set of traits:
• Clone
• Copy
• Debug
• Default
• Eq
• Hash
• Ord
• PartialEq
• PartialOrd
165
Drop
Now that we’ve discussed traits, let’s talk about a particular trait pro-
vided by the Rust standard library, Drop. The Drop trait provides a
way to run some code when a value goes out of scope. For example:
struct HasDrop;
fn main() {
let x = HasDrop;
// Do stuff.
When x goes out of scope at the end of main(), the code for Drop will
run. Drop has one method, which is also called drop(). It takes a
mutable reference to self.
That’s it! The mechanics of Drop are very simple, but there are
some subtleties. For example, values are dropped in the opposite order
they are declared. Here’s another example:
struct Firework {
strength: i32,
}
fn main() {
let firecracker = Firework { strength: 1 };
166
if let
if let permits patterns matching within the condition of an if state-
ment. This allows us to reduce the overhead of certain kinds of pattern
matches and express them in a more convenient way.
For example, let’s say we have some sort of Option<T>. We want to
call a function on it if it’s Some<T>, but do nothing if it’s None. That
looks like this:
# let option = Some(5);
# fn foo(x: i32) { }
match option {
Some(x) => { foo(x) },
None => {},
}
We don’t have to use match here, for example, we could use if:
while let
In a similar fashion, while let can be used when you want to con-
ditionally loop as long as a value matches a certain pattern. It turns
code like this:
Trait Objects
When code involves polymorphism, there needs to be a mechanism to
determine which specific version is actually run. This is called ‘dis-
patch’. There are two major forms of dispatch: static dispatch and
dynamic dispatch. While Rust favors static dispatch, it also supports
dynamic dispatch through a mechanism called ‘trait objects’.
Background
For the rest of this chapter, we’ll need a trait and some implementa-
tions. Let’s make a simple one, Foo. It has one method that is expected
to return a String.
trait Foo {
fn method(&self) -> String;
}
We’ll also implement this trait for u8 and String:
# trait Foo { fn method(&self) -> String; }
impl Foo for u8 {
fn method(&self) -> String { format!("u8: {}", *self)
}
}
Static dispatch
We can use this trait to perform static dispatch with trait bounds:
# trait Foo { fn method(&self) -> String; }
# impl Foo for u8 { fn method(&self) -> String { format!(
"u8: {}", *self) } }
# impl Foo for String { fn method(&self) -> String { format!(
"string: {}", *self) } }
fn do_something<T: Foo>(x: T) {
x.method();
}
169
fn main() {
let x = 5u8;
let y = "Hello".to_string();
do_something(x);
do_something(y);
}
Rust uses ‘monomorphization’ to perform static dispatch here. This
means that Rust will create a special version of do_something() for
both u8 and String, and then replace the call sites with calls to these
specialized functions. In other words, Rust generates something like
this:
# trait Foo { fn method(&self) -> String; }
# impl Foo for u8 { fn method(&self) -> String { format!(
"u8: {}", *self) } }
# impl Foo for String { fn method(&self) -> String { format!(
"string: {}", *self) } }
fn do_something_u8(x: u8) {
x.method();
}
fn do_something_string(x: String) {
x.method();
}
fn main() {
let x = 5u8;
let y = "Hello".to_string();
do_something_u8(x);
do_something_string(y);
}
This has a great upside: static dispatch allows function calls to be
inlined because the callee is known at compile time, and inlining is the
key to good optimization. Static dispatch is fast, but it comes at a
tradeoff: ‘code bloat’, due to many copies of the same function existing
in the binary, one for each type.
Furthermore, compilers aren’t perfect and may “optimize” code to
become slower. For example, functions inlined too eagerly will bloat
170
the instruction cache (cache rules everything around us). This is part
of the reason that #[inline] and #[inline(always)] should be used
carefully, and one reason why using a dynamic dispatch is sometimes
more efficient.
However, the common case is that it is more efficient to use static
dispatch, and one can always have a thin statically-dispatched wrapper
function that does a dynamic dispatch, but not vice versa, meaning
static calls are more flexible. The standard library tries to be statically
dispatched where possible for this reason.
Dynamic dispatch
Rust provides dynamic dispatch through a feature called ‘trait objects’.
Trait objects, like &Foo or Box<Foo>, are normal values that store a
value of any type that implements the given trait, where the precise
type can only be known at runtime.
A trait object can be obtained from a pointer to a concrete type
that implements the trait by casting it (e.g. &x as &Foo) or coercing
it (e.g. using &x as an argument to a function that takes &Foo).
These trait object coercions and casts also work for pointers like
&mut T to &mut Foo and Box<T> to Box<Foo>, but that’s all at the
moment. Coercions and casts are identical.
This operation can be seen as ‘erasing’ the compiler’s knowledge
about the specific type of the pointer, and hence trait objects are some-
times referred to as ‘type erasure’.
Coming back to the example above, we can use the same trait to
perform dynamic dispatch with trait objects by casting:
fn main() {
let x = 5u8;
do_something(&x as &Foo);
}
171
or by coercing:
fn main() {
let x = "Hello".to_string();
do_something(&x);
}
A function that takes a trait object is not specialized to each of the
types that implements Foo: only one copy is generated, often (but not
always) resulting in less code bloat. However, this comes at the cost
of requiring slower virtual function calls, and effectively inhibiting any
chance of inlining and related optimizations from occurring.
Why pointers?
Rust does not put things behind a pointer by default, unlike many
managed languages, so types can have different sizes. Knowing the size
of the value at compile time is important for things like passing it as
an argument to a function, moving it about on the stack and allocating
(and deallocating) space on the heap to store it.
For Foo, we would need to have a value that could be at least either
a String (24 bytes) or a u8 (1 byte), as well as any other type for
which dependent crates may implement Foo (any number of bytes at
all). There’s no way to guarantee that this last point can work if the
values are stored without a pointer, because those other types can be
arbitrarily large.
Putting the value behind a pointer means the size of the value is
not relevant when we are tossing a trait object around, only the size of
the pointer itself.
Representation
The methods of the trait can be called on a trait object via a special
record of function pointers traditionally called a ‘vtable’ (created and
172
// u8:
byte.method()
173
// String:
string.method()
}
// let y: &Foo = x;
let y = TraitObject {
// Store the data:
data: &x,
// Store the methods:
vtable: &Foo_for_u8_vtable
};
// b.method();
(b.vtable.method)(b.data);
// y.method();
(y.vtable.method)(y.data);
Object Safety
Not every trait can be used to make a trait object. For example, vectors
implement Clone, but if we try to make a trait object:
let v = vec![1, 2, 3];
let o = &v as &Clone;
We get an error:
error: cannot convert to a trait object because trait `core:
:clone::Clone` is not object-safe [E0038]
let o = &v as &Clone;
^~
note: the trait cannot require that `Self : Sized`
175
Whew! As we can see, almost all of these rules talk about Self. A good
intuition is “except in special circumstances, if your trait’s method uses
Self, it is not object-safe.”
Closures
Sometimes it is useful to wrap up a function and free variables for
better clarity and reuse. The free variables that can be used come from
the enclosing scope and are ‘closed over’ when used in the function.
From this, we get the name ‘closures’ and Rust provides a really great
implementation of them, as we’ll see.
Syntax
Closures look like this:
let plus_one = |x: i32| x + 1;
assert_eq!(2, plus_one(1));
result += 1;
result += 1;
result
};
assert_eq!(4, plus_two(2));
You’ll notice a few things about closures that are a bit different from
regular named functions defined with fn. The first is that we did not
need to annotate the types of arguments the closure takes or the values
it returns. We can:
let plus_one = |x: i32| -> i32 { x + 1 };
assert_eq!(2, plus_one(1));
But we don’t have to. Why is this? Basically, it was chosen for er-
gonomic reasons. While specifying the full type for named functions
is helpful with things like documentation and type inference, the full
type signatures of closures are rarely documented since they’re anony-
mous, and they don’t cause the kinds of error-at-a-distance problems
that inferring named function types can.
The second is that the syntax is similar, but a bit different. I’ve
added spaces here for easier comparison:
let num = 5;
let plus_num = |x: i32| x + num;
assert_eq!(10, plus_num(5));
This closure, plus_num, refers to a let binding in its scope: num. More
specifically, it borrows the binding. If we do something that would
conflict with that binding, we get an error. Like this one:
let mut num = 5;
let plus_num = |x: i32| x + num;
If your closure requires it, however, Rust will take ownership and move
the environment instead. This doesn’t work:
let nums = vec![1, 2, 3];
println!("{:?}", nums);
move closures
We can force our closure to take ownership of its environment with the
move keyword:
let num = 5;
{
179
add_num(5);
}
assert_eq!(10, num);
So in this case, our closure took a mutable reference to num, and then
when we called add_num, it mutated the underlying value, as we’d ex-
pect. We also needed to declare add_num as mut too, because we’re
mutating its environment.
If we change to a move closure, it’s different:
let mut num = 5;
{
let mut add_num = move |x: i32| num += x;
add_num(5);
}
assert_eq!(5, num);
We only get 5. Rather than taking a mutable borrow out on our num,
we took ownership of a copy.
Another way to think about move closures: they give a closure its
own stack frame. Without move, a closure may be tied to the stack
frame that created it, while a move closure is self-contained. This means
that you cannot generally return a non-move closure from a function,
for example.
But before we talk about taking and returning closures, we should
talk some more about the way that closures are implemented. As a
systems language, Rust gives you tons of control over what your code
does, and closures are no different.
Closure implementation
Rust’s implementation of closures is a bit different than other lan-
guages. They are effectively syntax sugar for traits. You’ll want to
make sure to have read the traits section before this one, as well as the
section on trait objects.
Got all that? Good.
180
• Fn
• FnMut
• FnOnce
There are a few differences between these traits, but a big one is self:
Fn takes &self, FnMut takes &mut self, and FnOnce takes self. This
covers all three kinds of self via the usual method call syntax. But
we’ve split them up into three traits, rather than having a single one.
This gives us a large amount of control over what kind of closures we
can take.
The || {} syntax for closures is sugar for these three traits. Rust
will generate a struct for the environment, impl the appropriate trait,
and then use it.
some_closure(1)
}
assert_eq!(3, answer);
assert_eq!(3, answer);
Now we take a trait object, a &Fn. And we have to make a reference to
our closure when we pass it to call_with_one, so we use &||.
A quick note about closures that use explicit lifetimes. Sometimes
you might have a closure that takes a reference like so:
fn call_with_ref<F>(some_closure:F) -> i32
where F: Fn(&i32) -> i32 {
let value = 0;
some_closure(&value)
182
}
Normally you can specify the lifetime of the parameter to our closure.
We could annotate it on the function declaration:
fn call_with_ref<'a, F>(some_closure:F) -> i32
where F: Fn(&'a i32) -> i32 {
However, this presents a problem in our case. When a function has
an explicit lifetime parameter, that lifetime must be at least as long
as the entire call to that function. The borrow checker will complain
that value doesn’t live long enough, because it is only in scope after
its declaration inside the function body.
What we need is a closure that can borrow its argument only for
its own invocation scope, not for the outer function’s scope. In order
to say that, we can use Higher-Ranked Trait Bounds with the for<..
.> syntax:
fn call_with_ref<F>(some_closure:F) -> i32
where F: for<'a> Fn(&'a i32) -> i32 {
This lets the Rust compiler find the minimum lifetime to invoke our clo-
sure and satisfy the borrow checker’s rules. Our function then compiles
and executes as we expect.
fn call_with_ref<F>(some_closure:F) -> i32
where F: for<'a> Fn(&'a i32) -> i32 {
let value = 0;
some_closure(&value)
}
let f = add_one;
assert_eq!(2, answer);
In this example, we don’t strictly need the intermediate variable f, the
name of the function works just fine too:
let answer = call_with_one(&add_one);
Returning closures
It’s very common for functional-style code to return closures in various
situations. If you try to return a closure, you may run into an error.
At first, it may seem strange, but we’ll figure it out. Here’s how you’d
probably try to return a closure from a function:
fn factory() -> (Fn(i32) -> i32) {
let num = 5;
|x| x + num
}
let f = factory();
^~~~~~~~~~~~~~~~
error: the trait bound `core::ops::Fn(i32) -> i32 : core:
:marker::Sized` is not satisfied [E0277]
let f = factory();
^
note: `core::ops::Fn(i32) -> i32` does not have a constant
size known at compile-time
let f = factory();
^
In order to return something from a function, Rust needs to know what
size the return type is. But since Fn is a trait, it could be various things
of various sizes: many different types can implement Fn. An easy way
to give something a size is to take a reference to it, as references have
a known size. So we’d write this:
fn factory() -> &(Fn(i32) -> i32) {
let num = 5;
|x| x + num
}
let f = factory();
|x| x + num
}
185
let f = factory();
Box::new(|x| x + num)
}
let f = factory();
let f = factory();
trait Bar {
fn f(&self);
}
struct Baz;
187
let b = Baz;
If we were to try to call b.f(), we’d get an error:
# trait Foo {
# fn f(&self);
# }
# trait Bar {
# fn f(&self);
# }
# struct Baz;
# impl Foo for Baz {
# fn f(&self) { println!("Baz’s impl of Foo"); }
# }
# impl Bar for Baz {
# fn f(&self) { println!("Baz’s impl of Bar"); }
188
# }
# let b = Baz;
Foo::f(&b);
Bar::f(&b);
Let’s break it down.
Foo::
Bar::
These halves of the invocation are the types of the two traits: Foo and
Bar. This is what ends up actually doing the disambiguation between
the two: Rust calls the one from the trait name you use.
f(&b)
When we call a method like b.f() using method syntax, Rust will
automatically borrow b if f() takes &self. In this case, Rust will not,
and so we need to pass an explicit &b.
Angle-bracket Form
The form of UFCS we just talked about:
Trait::method(args);
Is a short-hand. There’s an expanded form of this that’s needed in
some situations:
<Type as Trait>::method(args);
The <>:: syntax is a means of providing a type hint. The type goes
inside the <>s. In this case, the type is Type as Trait, indicating that
we want Trait’s version of method to be called here. The as Trait
part is optional if it’s not ambiguous. Same with the angle brackets,
hence the shorter form.
Here’s an example of using the longer form.
trait Foo {
fn foo() -> i32;
}
struct Bar;
impl Bar {
189
fn main() {
assert_eq!(10, <Bar as Foo>::foo());
assert_eq!(20, Bar::foo());
}
Using the angle bracket syntax lets you call the trait method instead
of the inherent one.
Japanese (���) as two languages for those phrases to be in. We’ll use
this module layout:
+-----------+
+---| greetings |
+---------+ | +-----------+
+---| english |---+
| +---------+ | +-----------+
| +---| farewells |
+---------+ | +-----------+
| phrases |---+
+---------+ | +-----------+
| +---| greetings |
| +----------+ | +-----------+
+---| japanese |--+
+----------+ | +-----------+
+---| farewells |
+-----------+
In this example, phrases is the name of our crate. All of the rest are
modules. You can see that they form a tree, branching out from the
crate root, which is the root of the tree: phrases itself.
Now that we have a plan, let’s define these modules in code. To
start, generate a new crate with Cargo:
$ tree .
.
��� Cargo.toml
��� src
��� lib.rs
1 directory, 2 files
Defining Modules
To define each of our modules, we use the mod keyword. Let’s make
our src/lib.rs look like this:
mod english {
mod greetings {
}
mod farewells {
}
}
mod japanese {
mod greetings {
}
mod farewells {
}
}
After the mod keyword, you give the name of the module. Module
names follow the conventions for other Rust identifiers: lower_snake_
case. The contents of each module are within curly braces ({}).
Within a given mod, you can declare sub-mods. We can refer to sub-
modules with double-colon (::) notation: our four nested modules are
english::greetings, english::farewells, japanese::greetings,
and japanese::farewells. Because these sub-modules are names-
paced under their parent module, the names don’t conflict: english::
greetings and japanese::greetings are distinct, even though their
names are both greetings.
Because this crate does not have a main() function, and is called
lib.rs, Cargo will build this crate as a library:
$ cargo build
Compiling phrases v0.0.1 (file:///home/you/projects/
phrases)
$ ls target/debug
build deps examples libphrases-a7448e02a0468eaa.rlib
native
libphrases-<hash>.rlib is the compiled crate. Before we see how to
use this crate from another crate, let’s break it up into multiple files.
192
mod english {
// Contents of our module go here.
}
��� libphrases-a7448e02a0468eaa.rlib
��� native
src/lib.rs is our crate root, and looks like this:
mod english;
mod japanese;
mod greetings;
mod farewells;
• src/english/greetings.rs or src/english/greetings/mod.rs,
• src/english/farewells.rs or src/english/farewells/mod.rs,
• src/japanese/greetings.rs or src/japanese/greetings/mod.
rs,
• and src/japanese/farewells.rs or src/japanese/farewells/
mod.rs.
fn main() {
println!("Hello in English: {}", phrases::english::
greetings::hello());
println!("Goodbye in English: {}", phrases::english:
:farewells::goodbye());
:farewells::goodbye());
}
The extern crate declaration tells Rust that we need to compile and
link to the phrases crate. We can then use phrases’ modules in this
one. As we mentioned earlier, you can use double colons to refer to
sub-modules and the functions inside of them.
(Note: when importing a crate that has dashes in its name “like-
this”, which is not a valid Rust identifier, it will be converted by chang-
ing the dashes to underscores, so you would write extern crate like_
this;.)
Also, Cargo assumes that src/main.rs is the crate root of a bi-
nary crate, rather than a library crate. Our package now has two
crates: src/lib.rs and src/main.rs. This pattern is quite common
for executable crates: most functionality is in a library crate, and the
executable crate uses that library. This way, other programs can also
use the library crate, and it’s also a nice separation of concerns.
This doesn’t quite work yet, though. We get four errors that look
similar to this:
$ cargo build
Compiling phrases v0.0.1 (file:///home/you/projects/
phrases)
src/main.rs:4:38: 4:72 error: function `hello` is private
src/main.rs:4 println!("Hello in English: {}", phrases:
:english::greetings::hello());
^~~~~~~~~~~~~~
note: in expansion of format_args!
<std macros>:2:25: 2:58 note: expansion site
<std macros>:1:1: 2:62 note: in expansion of print!
<std macros>:3:1: 3:54 note: expansion site
<std macros>:1:1: 3:58 note: in expansion of println!
phrases/src/main.rs:4:5: 4:76 note: expansion site
pub keyword. Let’s focus on the english module first, so let’s reduce
our src/main.rs to only this:
extern crate phrases;
fn main() {
println!("Hello in English: {}", phrases::english::
greetings::hello());
println!("Goodbye in English: {}", phrases::english:
:farewells::goodbye());
}
In our src/lib.rs, let’s add pub to the english module declaration:
pub mod english;
mod japanese;
And in our src/english/mod.rs, let’s make both pub:
pub mod greetings;
pub mod farewells;
In our src/english/greetings.rs, let’s add pub to our fn declara-
tion:
pub fn hello() -> String {
"Hello!".to_string()
}
And also in src/english/farewells.rs:
pub fn goodbye() -> String {
"Goodbye.".to_string()
}
Now, our crate compiles, albeit with warnings about not using the
japanese functions:
$ cargo run
Compiling phrases v0.0.1 (file:///home/you/projects/
phrases)
src/japanese/greetings.rs:1:1: 3:2 warning: function is
never used: `hello`, #[warn(dead_code)] on by default
src/japanese/greetings.rs:1 fn hello() -> String {
src/japanese/greetings.rs:2 "�����".to_string()
src/japanese/greetings.rs:3 }
197
pub also applies to structs and their member fields. In keeping with
Rust’s tendency toward safety, simply making a struct public won’t
automatically make its members public: you must mark the fields in-
dividually with pub.
Now that our functions are public, we can use them. Great! How-
ever, typing out phrases::english::greetings::hello() is very long
and repetitive. Rust has another keyword for importing names into the
current scope, so that you can refer to them with shorter names. Let’s
talk about use.
use phrases::english::greetings;
use phrases::english::farewells;
fn main() {
println!("Hello in English: {}", greetings::hello()
);
println!("Goodbye in English: {}", farewells::goodbye(
));
}
The two use lines import each module into the local scope, so we can
refer to the functions by a much shorter name. By convention, when
importing functions, it’s considered best practice to import the module,
rather than the function directly. In other words, you can do this:
198
use phrases::english::greetings::hello;
use phrases::english::farewells::goodbye;
fn main() {
println!("Hello in English: {}", hello());
println!("Goodbye in English: {}", goodbye());
}
But it is not idiomatic. This is significantly more likely to introduce
a naming conflict. In our short program, it’s not a big deal, but as
it grows, it becomes a problem. If we have conflicting names, Rust
will give a compilation error. For example, if we made the japanese
functions public, and tried to do this:
extern crate phrases;
use phrases::english::greetings::hello;
use phrases::japanese::greetings::hello;
fn main() {
println!("Hello in English: {}", hello());
println!("Hello in Japanese: {}", hello());
}
Rust will give us a compile-time error:
If we’re importing multiple names from the same module, we don’t have
to type it out twice. Instead of this:
use phrases::english::greetings;
use phrases::english::farewells;
use phrases::english::{greetings,farewells};
use phrases::japanese;
fn main() {
println!("Hello in English: {}", greetings::hello()
);
println!("Goodbye in English: {}", farewells::goodbye(
));
mod greetings;
mod farewells;
The pub use declaration brings the function into scope at this part
of our module hierarchy. Because we’ve pub used this inside of our
japanese module, we now have a phrases::japanese::hello() func-
tion and a phrases::japanese::goodbye() function, even though the
code for them lives in phrases::japanese::greetings::hello() and
phrases::japanese::farewells::goodbye(). Our internal organiza-
tion doesn’t define our external interface.
Here we have a pub use for each function we want to bring into
the japanese scope. We could alternatively use the wildcard syntax
to include everything from greetings into the current scope: pub use
self::greetings::*.
What about the self? Well, by default, use declarations are abso-
lute paths, starting from your crate root. self makes that path relative
to your current place in the hierarchy instead. There’s one more spe-
cial form of use: you can use super:: to reach one level up the tree
from your current location. Some people like to think of self as . and
super as .., from many shells’ display for the current directory and
the parent directory.
Outside of use, paths are relative: foo::bar() refers to a function
inside of foo relative to where we are. If that’s prefixed with ::, as in
::foo::bar(), it refers to a different foo, an absolute path from your
crate root.
This will build and run:
$ cargo run
Compiling phrases v0.0.1 (file:///home/you/projects/
phrases)
Running `target/debug/phrases`
Hello in English: Hello!
Goodbye in English: Goodbye.
201
Complex imports
Rust offers several advanced options that can add compactness and
convenience to your extern crate and use statements. Here is an
example:
fn main() {
println!("Hello in English; {}", en_greetings::hello(
));
println!("And in Japanese: {}", ja_greetings::hello(
));
println!("Goodbye in English: {}", english::farewells:
:goodbye());
println!("Again: {}", en_farewells::goodbye());
println!("And in Japanese: {}", goodbye());
}
The third use statement bears more explanation. It’s using “brace
expansion” globbing to compress three use statements into one (this
sort of syntax may be familiar if you’ve written Linux shell scripts
before). The uncompressed form of this statement would be:
use sayings::english;
use sayings::english::greetings as en_greetings;
use sayings::english::farewells as en_farewells;
As you can see, the curly brackets compress use statements for several
items under the same path, and in this context self refers back to that
path. Note: The curly brackets cannot be nested or mixed with star
globbing.
const N: i32 = 5;
static
Rust provides a ‘global variable’ sort of facility in static items. They’re
similar to constants, but static items aren’t inlined upon use. This
means that there is only one instance for each value, and it’s at a fixed
location in memory.
Here’s an example:
static N: i32 = 5;
Mutability
You can introduce mutability with the mut keyword:
static mut N: i32 = 5;
Because this is mutable, one thread could be updating N while another
is reading it, causing memory unsafety. As such both accessing and
mutating a static mut is unsafe, and so must be done in an unsafe
block:
# static mut N: i32 = 5;
unsafe {
N += 1;
Initializing
Both const and static have requirements for giving them a value.
They must be given a value that’s a constant expression. In other
words, you cannot use the result of a function call or anything similarly
complex or at runtime.
Attributes
Declarations can be annotated with ‘attributes’ in Rust. They look like
this:
204
#[test]
# fn foo() {}
or like this:
# mod foo {
#![test]
# }
The difference between the two is the !, which changes what the at-
tribute applies to:
#[foo]
struct Foo;
mod bar {
#![bar]
}
The #[foo] attribute applies to the next item, which is the struct
declaration. The #![bar] attribute applies to the item enclosing it,
which is the mod declaration. Otherwise, they’re the same. Both change
the meaning of the item they’re attached to somehow.
For example, consider a function like this:
#[test]
fn check() {
assert_eq!(2, 1 + 1);
}
It is marked with #[test]. This means it’s special: when you run tests,
this function will execute. When you compile as usual, it won’t even
be included. This function is now a test function.
Attributes may also have additional data:
#[inline(always)]
fn super_fast_fn() {
# }
#[cfg(target_os = "macos")]
mod macos_only {
205
# }
Type Aliases
The type keyword lets you declare an alias of another type:
Note, however, that this is an alias, not a new type entirely. In other
words, because Rust is strongly typed, you’d expect a comparison be-
tween two different types to fail:
let x: i32 = 5;
let y: i64 = 5;
if x == y {
// ...
}
this gives
let x: i32 = 5;
let y: Num = 5;
if x == y {
// ...
}
This compiles without error. Values of a Num type are the same as a
value of type i32, in every way. You can use tuple struct to really get
a new type.
You can also use type aliases with generics:
use std::result;
enum ConcreteError {
Foo,
Bar,
}
Coercion
Coercion between types is implicit and has no syntax of its own, but
can be spelled out with as.
Coercion occurs in let, const, and static statements; in function
call arguments; in field values in struct initialization; and in a function
result.
207
• &mut T to &T
• *mut T to *const T
• &T to *const T
• &mut T to *mut T
as
The as keyword does safe casting:
let x: i32 = 5;
let y = x as i64;
There are three major categories of safe cast: explicit coercions, casts
between numeric types, and pointer casts.
Casting is not transitive: even if e as U1 as U2 is a valid expression,
e as U2 is not necessarily so (in fact it will only be valid if U1 coerces
to U2).
Explicit coercions
A cast e as U is valid if e has type T and T coerces to U.
Numeric casts
A cast e as U is also valid in any of the following cases:
For example
• Casting between two integers of the same size (e.g. i32 -> u32)
is a no-op
Pointer casts
Perhaps surprisingly, it is safe to cast raw pointers to and from inte-
gers, and to cast between pointers to different types subject to some
constraints. It is only unsafe to dereference the pointer:
• e has type *T, U has type *U_0, and either U_0: Sized or unsize_
kind(T) == unsize_kind(U_0); a ptr-ptr-cast
transmute
as only allows safe casting, and will for example reject an attempt to
cast four bytes into a u32:
because they make assumptions about the way that multiple under-
lying structures are implemented. For this, we need something more
dangerous.
The transmute function is very simple, but very scary. It tells Rust
to treat a value of one type as though it were another type. It does
this regardless of the typechecking system, and completely trusts you.
In our previous example, we know that an array of four u8s repre-
sents a u32 properly, and so we want to do the cast. Using transmute
instead of as, Rust lets us:
use std::mem;
fn main() {
unsafe {
let a = [0u8, 1u8, 0u8, 0u8];
let b = mem::transmute::<[u8; 4], u32>(a);
println!("{}", b); // 256
// Or, more concisely:
let c: u32 = mem::transmute(a);
println!("{}", c); // 256
}
}
use std::mem;
unsafe {
let a = [0u8, 0u8, 0u8, 0u8];
with:
211
Associated Types
Associated types are a powerful part of Rust’s type system. They’re
related to the idea of a ‘type family’, in other words, grouping multiple
types together. That description is a bit abstract, so let’s dive right
into an example. If you want to write a Graph trait, you have two
types to be generic over: the node type and the edge type. So you
might write a trait, Graph<N, E>, that looks like this:
While this sort of works, it ends up being awkward. For example, any
function that wants to take a Graph as a parameter now also needs to
be generic over the Node and Edge types too:
trait Graph {
type N;
type E;
trait Graph {
type N;
type E;
Simple enough. Associated types use the type keyword, and go inside
the body of the trait, with the functions.
These type declarations work the same way as those for functions.
For example, if we wanted our N type to implement Display, so we can
print the nodes out, we could do this:
use std::fmt;
trait Graph {
type N: fmt::Display;
type E;
# trait Graph {
# type N;
# type E;
# fn has_edge(&self, &Self::N, &Self::N) -> bool;
# fn edges(&self, &Self::N) -> Vec<Self::E>;
# }
struct Node;
struct Edge;
struct MyGraph;
# trait Graph {
# type N;
# type E;
# fn has_edge(&self, &Self::N, &Self::N) -> bool;
# fn edges(&self, &Self::N) -> Vec<Self::E>;
# }
# struct Node;
# struct Edge;
# struct MyGraph;
# impl Graph for MyGraph {
# type N = Node;
# type E = Edge;
# fn has_edge(&self, n1: &Node, n2: &Node) -> bool
{
# true
# }
# fn edges(&self, n: &Node) -> Vec<Edge> {
# Vec::new()
# }
# }
let graph = MyGraph;
let obj = Box::new(graph) as Box<Graph>;
Unsized Types
Most types have a particular size, in bytes, that is knowable at compile
time. For example, an i32 is thirty-two bits big, or four bytes. However,
there are some types which are useful to express, but do not have a
defined size. These are called ‘unsized’ or ‘dynamically sized’ types.
One example is [T]. This type represents a certain number of T in
sequence. But we don’t know how many there are, so the size is not
known.
Rust understands a few of these types, but they have some restric-
tions. There are three:
?Sized
If you want to write a function that accepts a dynamically sized type,
you can use the special bound syntax, ?Sized:
struct Foo<T: ?Sized> {
f: T,
}
This ?Sized, read as “T may or may not be Sized”, which allows us
to match both sized and unsized types. All generic type parameters
implicitly have the Sized bound, so the ?Sized can be used to opt-out
of the implicit bound.
use std::ops::Add;
#[derive(Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let p1 = Point { x: 1, y: 0 };
let p2 = Point { x: 2, y: 3 };
let p3 = p1 + p2;
println!("{:?}", p3);
}
In main, we can use + on our two Points, since we’ve implemented
Add<Output=Point> for Point.
There are a number of operators that can be overloaded this way,
and all of their associated traits live in the std::ops module. Check
out its documentation for the full list.
Implementing these traits follows a pattern. Let’s look at Add in
more detail:
# mod foo {
pub trait Add<RHS = Self> {
type Output;
trait HasArea<T> {
fn area(&self) -> T;
}
struct Square<T> {
x: T,
y: T,
side: T,
}
}
}
fn main() {
let s = Square {
x: 0.0f64,
y: 0.0f64,
side: 12.0f64,
};
Deref coercions
The standard library provides a special trait, Deref. It’s normally used
to overload *, the dereference operator:
use std::ops::Deref;
struct DerefExample<T> {
value: T,
}
}
}
fn main() {
let x = DerefExample { value: 'a' };
assert_eq!('a', *x);
}
fn foo(s: &str) {
// Borrow a string for a second.
}
use std::rc::Rc;
fn foo(s: &str) {
// Borrow a string for a second.
}
All we’ve done is wrap our String in an Rc<T>. But we can now pass
the Rc<String> around anywhere we’d have a String. The signature of
foo didn’t change, but works just as well with either type. This example
has two conversions: &Rc<String> to &String and then &String to
&str. Rust will do this as many times as possible until the types
match.
Another very common implementation provided by the standard
library is:
fn foo(s: &[i32]) {
// Borrow a slice for a second.
}
foo(&owned);
struct Foo;
impl Foo {
fn foo(&self) { println!("Foo"); }
}
let f = &&Foo;
f.foo();
Even though f is a &&Foo and foo takes &self, this works. That’s
because these things are the same:
f.foo();
(&f).foo();
(&&f).foo();
(&&&&&&&&f).foo();
222
Macros
By now you’ve learned about many of the tools Rust provides for ab-
stracting and reusing code. These units of code reuse have a rich se-
mantic structure. For example, functions have a type signature, type
parameters have trait bounds, and overloaded functions must belong
to a particular trait.
This structure means that Rust’s core abstractions have powerful
compile-time correctness checking. But this comes at the price of re-
duced flexibility. If you visually identify a pattern of repeated code,
you may find it’s difficult or cumbersome to express that pattern as a
generic function, a trait, or anything else within Rust’s semantics.
Macros allow us to abstract at a syntactic level. A macro invocation
is shorthand for an “expanded” syntactic form. This expansion happens
early in compilation, before any static checking. As a result, macros
can capture many patterns of code reuse that Rust’s core abstractions
cannot.
The drawback is that macro-based code can be harder to under-
stand, because fewer of the built-in rules apply. Like an ordinary func-
tion, a well-behaved macro can be used without understanding its im-
plementation. However, it can be difficult to design a well-behaved
macro! Additionally, compiler errors in macro code are harder to in-
terpret, because they describe problems in the expanded code, not the
source-level form that developers use.
These drawbacks make macros something of a “feature of last re-
sort”. That’s not to say that macros are bad; they are part of Rust be-
cause sometimes they’re needed for truly concise, well-abstracted code.
Just keep this tradeoff in mind.
Defining a macro
You may have seen the vec! macro, used to initialize a vector with
any number of elements.
let x: Vec<u32> = {
let mut temp_vec = Vec::new();
temp_vec.push(1);
temp_vec.push(2);
temp_vec.push(3);
temp_vec
};
# assert_eq!(x, [1, 2, 3]);
1
We can implement this shorthand, using a macro:
macro_rules! vec {
( $( $x:expr ),* ) => {
{
let mut temp_vec = Vec::new();
$(
temp_vec.push($x);
)*
temp_vec
}
};
}
# fn main() {
# assert_eq!(vec![1,2,3], [1, 2, 3]);
# }
This says we’re defining a macro named vec, much as fn vec would
define a function named vec. In prose, we informally write a macro’s
name with an exclamation point, e.g. vec!. The exclamation point is
part of the invocation syntax and serves to distinguish a macro from
an ordinary function.
1 The actual definition of vec! in libcollections differs from the one presented
Matching
The macro is defined through a series of rules, which are pattern-
matching cases. Above, we had
( $( $x:expr ),* ) => { ... };
This is like a match expression arm, but the matching happens on
Rust syntax trees, at compile time. The semicolon is optional on the
last (here, only) case. The “pattern” on the left-hand side of => is
known as a ‘matcher’. These have their own little grammar within the
language.
The matcher $x:expr will match any Rust expression, binding that
syntax tree to the ‘metavariable’ $x. The identifier expr is a ‘fragment
specifier’; the full possibilities are enumerated later in this chapter.
Surrounding the matcher with $(...),* will match zero or more ex-
pressions, separated by commas.
Aside from the special matcher syntax, any Rust tokens that appear
in a matcher must match exactly. For example,
macro_rules! foo {
(x => $e:expr) => (println!("mode X: {}", $e));
(y => $e:expr) => (println!("mode Y: {}", $e));
}
fn main() {
foo!(y => 3);
}
will print
mode Y: 3
With
foo!(z => 3);
we get the compiler error
error: no rules expected the token `z`
Expansion
The right-hand side of a macro rule is ordinary Rust syntax, for the
most part. But we can splice in bits of syntax captured by the matcher.
From the original example:
225
$(
temp_vec.push($x);
)*
Each matched expression $x will produce a single push statement in the
macro expansion. The repetition in the expansion proceeds in “lock-
step” with repetition in the matcher (more on this in a moment).
Because $x was already declared as matching an expression, we
don’t repeat :expr on the right-hand side. Also, we don’t include a
separating comma as part of the repetition operator. Instead, we have
a terminating semicolon within the repeated block.
Another detail: the vec! macro has two pairs of braces on the
right-hand side. They are often combined like so:
macro_rules! foo {
() => {{
...
}}
}
The outer braces are part of the syntax of macro_rules!. In fact, you
can use () or [] instead. They simply delimit the right-hand side as a
whole.
The inner braces are part of the expanded syntax. Remember, the
vec! macro is used in an expression context. To write an expression
with multiple statements, including let-bindings, we use a block. If
your macro expands to a single expression, you don’t need this extra
layer of braces.
Note that we never declared that the macro produces an expression.
In fact, this is not determined until we use the macro as an expression.
With care, you can write a macro whose expansion works in several
contexts. For example, shorthand for a data type could be valid as
either an expression or a pattern.
Repetition
The repetition operator follows two principal rules:
1. $(...)* walks through one “layer” of repetitions, for all of the
$names it contains, in lockstep, and
2. each $name must be under at least as many $(...)*s as it was
matched against. If it is under more, it’ll be duplicated, as ap-
propriate.
226
macro_rules! o_O {
(
$(
$x:expr; [ $( $y:expr ),* ]
);*
) => {
&[ $($( $x + $y ),*),* ]
}
}
fn main() {
let a: &[i32]
= o_O!(10; [1, 2, 3];
20; [4, 5, 6]);
That’s most of the matcher syntax. These examples use $(...)*, which
is a “zero or more” match. Alternatively you can write $(...)+ for a
“one or more” match. Both forms optionally include a separator, which
can be any token except + or *.
This system is based on ”Macro-by-Example” (PDF link).
Hygiene
Some languages implement macros using simple text substitution, which
leads to various problems. For example, this C program prints 13 in-
stead of the expected 25.
#define FIVE_TIMES(x) 5 * x
int main() {
printf("%d\n", FIVE_TIMES(2 + 3));
return 0;
}
know the standard idioms for avoiding this problem, as well as five or
six others. In Rust, we don’t have to worry about it.
macro_rules! five_times {
($x:expr) => (5 * $x);
}
fn main() {
assert_eq!(25, five_times!(2 + 3));
}
#define LOG(msg) do { \
int state = get_log_state(); \
if (state > 0) { \
printf("log(%d): %s\n", state, msg); \
} \
} while (0)
This expands to
The second variable named state shadows the first one. This is a
problem because the print statement should refer to both of them.
The equivalent Rust macro has the desired behavior.
228
fn main() {
let state: &str = "reticulating splines";
log!(state);
}
This works because Rust has a hygienic macro system. Each macro
expansion happens in a distinct ‘syntax context’, and each variable is
tagged with the syntax context where it was introduced. It’s as though
the variable state inside main is painted a different “color” from the
variable state inside the macro, and therefore they don’t conflict.
This also restricts the ability of macros to introduce new bindings
at the invocation site. Code such as the following will not work:
macro_rules! foo {
() => (let x = 3;);
}
fn main() {
foo!();
println!("{}", x);
}
Instead you need to pass the variable name into the invocation, so that
it’s tagged with the right syntax context.
macro_rules! foo {
($v:ident) => (let $v = 3;);
}
fn main() {
foo!(x);
println!("{}", x);
229
}
This holds for let bindings and loop labels, but not for items. So the
following code does compile:
macro_rules! foo {
() => (fn x() { });
}
fn main() {
foo!();
x();
}
Recursive macros
A macro’s expansion can include more macro invocations, including
invocations of the very same macro being expanded. These recursive
macros are useful for processing tree-structured input, as illustrated by
this (simplistic) HTML shorthand:
# #![allow(unused_must_use)]
macro_rules! write_html {
($w:expr, ) => (());
fn main() {
# // FIXME(#21826)
use std::fmt::Write;
let mut out = String::new();
230
write_html!(&mut out,
html[
head[title["Macros guide"]]
body[h1["Macros are the best!"]]
]);
assert_eq!(out,
"<html><head><title>Macros guide</title></head>\
<body><h1>Macros are the best!</h1></body></html>")
;
}
Syntactic requirements
Even when Rust code contains un-expanded macros, it can be parsed
as a full syntax tree. This property can be very useful for editors and
other tools that process code. It also has a few consequences for the
design of Rust’s macro system.
One consequence is that Rust must determine, when it parses a
macro invocation, whether the macro stands in for
A macro invocation within a block could stand for some items, or for
an expression / statement. Rust uses a simple rule to resolve this
ambiguity. A macro invocation that stands for items must be either
There are additional rules regarding the next token after a metavari-
able:
• expr and stmt variables may only be followed by one of: => , ;
• ty and path variables may only be followed by one of: => , = |
; : > [ { as where
• pat variables may only be followed by one of: => , = | if in
• Other variables may be followed by any token.
These rules provide some flexibility for Rust’s syntax to evolve without
breaking existing macros.
The macro system does not deal with parse ambiguity at all. For
example, the grammar $($i:ident)* $e:expr will always fail to parse,
because the parser would be forced to choose between parsing $i and
parsing $e. Changing the invocation syntax to put a distinctive token
in front can solve the problem. In this case, you can write $(I $i:
ident)* E $e:expr.
#[macro_use]
mod macros;
mod client;
The opposite order would result in a compilation failure:
mod client;
#[macro_use]
mod macros;
mod foo {
// Visible here: `m1`.
#[macro_export]
macro_rules! m2 { () => (()) }
#[macro_use]
mod bar {
// Visible here: `m1`, `m3`.
#[macro_export]
macro_rules! inc_a {
($x:expr) => ( ::increment($x) )
}
#[macro_export]
macro_rules! inc_b {
($x:expr) => ( ::mylib::increment($x) )
235
}
# fn main() { }
inc_a only works within mylib, while inc_b only works outside the
library. Furthermore, inc_b will break if the user imports mylib under
another name.
Rust does not (yet) have a hygiene system for crate references, but
it does provide a simple workaround for this problem. Within a macro
imported from a crate named foo, the special macro variable $crate
will expand to ::foo. By contrast, when a macro is defined and then
used in the same crate, $crate will expand to nothing. This means we
can write
#[macro_export]
macro_rules! inc {
($x:expr) => ( $crate::increment($x) )
}
# fn main() { }
to define a single macro that works both inside and outside our library.
The function name will expand to either ::increment or ::mylib::
increment.
To keep this system simple and correct, #[macro_use] extern
crate ... may only appear at the root of your crate, not inside
mod.
Common macros
Here are some common macros you’ll see in Rust code.
panic!
This macro causes the current thread to panic. You can give it a
message to panic with:
panic!("oh no!");
vec!
The vec! macro is used throughout the book, so you’ve probably seen
it already. It creates Vec<T>s with ease:
It also lets you make vectors with repeating values. For example, a
hundred zeroes:
assert!(true);
assert_eq!(5, 3 + 2);
// Nope :(
try!
try! is used for error handling. It takes something that can return a
Result<T, E>, and gives T if it’s a Ok<T>, and returns with the Err(
E) if it’s that. Like this:
use std::fs::File;
Ok(())
}
This is cleaner than doing this:
use std::fs::File;
let f = match f {
Ok(t) => t,
Err(e) => return Err(e),
};
Ok(())
238
unreachable!
This macro is used when you think some code should never execute:
if false {
unreachable!();
}
Sometimes, the compiler may make you have a different branch that
you know will never, ever run. In these cases, use this macro, so that
if you end up wrong, you’ll get a panic! about it.
match x {
Some(_) => unreachable!(),
None => println!("I know x is None!"),
}
unimplemented!
The unimplemented! macro can be used when you’re trying to get your
functions to typecheck, and don’t want to worry about writing out the
body of the function. One example of this situation is implementing
a trait with multiple required methods, where you want to tackle one
at a time. Define the others as unimplemented! until you’re ready to
write them.
Raw Pointers
Rust has a number of different smart pointer types in its standard
library, but there are two types that are extra-special. Much of Rust’s
safety comes from compile-time checks, but raw pointers don’t have
such guarantees, and are unsafe to use.
*const T and *mut T are called ‘raw pointers’ in Rust. Sometimes,
when writing certain kinds of libraries, you’ll need to get around Rust’s
safety guarantees for some reason. In this case, you can use raw pointers
to implement your library, while exposing a safe interface for your users.
For example, * pointers are allowed to alias, allowing them to be used
to write shared-ownership types, and even thread-safe shared memory
239
types (the Rc<T> and Arc<T> types are both implemented entirely in
Rust).
Here are some things to remember about raw pointers that are
different than other pointer types. They:
• are not guaranteed to point to valid memory and are not even
guaranteed to be non-NULL (unlike both Box and &);
• do not have any automatic clean-up, unlike Box, and so require
manual resource management;
• are plain-old-data, that is, they don’t move ownership, again un-
like Box, hence the Rust compiler cannot protect against bugs like
use-after-free;
• lack any form of lifetimes, unlike &, and so the compiler cannot
reason about dangling pointers; and
• have no guarantees about aliasing or mutability other than mu-
tation not being allowed directly through a *const T.
Basics
Creating a raw pointer is perfectly safe:
let x = 5;
let raw = &x as *const i32;
let x = 5;
let raw = &x as *const i32;
^~~~
When you dereference a raw pointer, you’re taking responsibility that
it’s not pointing somewhere that would be incorrect. As such, you need
unsafe:
let x = 5;
let raw = &x as *const i32;
FFI
Raw pointers are useful for FFI: Rust’s *const T and *mut T are similar
to C’s const T* and T*, respectively. For more about this use, consult
the FFI chapter.
// Implicit coercion:
let mut m: u32 = 2;
let p_mut: *mut u32 = &mut m;
unsafe {
let ref_imm: &u32 = &*p_imm;
let ref_mut: &mut u32 = &mut *p_mut;
}
Unsafe
Rust’s main draw is its powerful static guarantees about behavior. But
safety checks are conservative by nature: there are some programs that
are actually safe, but the compiler is not able to verify this is true. To
write these kinds of programs, we need to tell the compiler to relax its
restrictions a bit. For this, Rust has a keyword, unsafe. Code using
unsafe has fewer restrictions than normal code does.
Let’s go over the syntax, and then we’ll talk semantics. unsafe is
used in four contexts. The first one is to mark a function as unsafe:
unsafe fn danger_will_robinson() {
// Scary stuff...
}
All functions called from FFI must be marked as unsafe, for example.
The second use of unsafe is an unsafe block:
unsafe {
// Scary stuff...
}
• Deadlocks
• Leaks of memory or other resources
• Exiting without calling destructors
• Integer overflow
Rust cannot prevent all kinds of software problems. Buggy code can
and will be written in Rust. These things aren’t great, but they don’t
qualify as unsafe specifically.
In addition, the following are all undefined behaviors in Rust, and
must be avoided, even when writing unsafe code:
• Data races
• Dereferencing a NULL/dangling raw pointer
• Reads of undef (uninitialized) memory
• Breaking the pointer aliasing rules with raw pointers.
• &mut T and &T follow LLVM’s scoped noalias model, except if the
&T contains an UnsafeCell<U>. Unsafe code must not violate
these aliasing guarantees.
• Mutating an immutable value/reference without UnsafeCell<U>
• Invoking undefined behavior via compiler intrinsics:
– Indexing outside of the bounds of an object with std::ptr:
:offset (offset intrinsic), with the exception of one byte
past the end which is permitted.
243
Unsafe Superpowers
In both unsafe functions and unsafe blocks, Rust will let you do three
things that you normally can not do. Just three. Here they are:
That’s it. It’s important that unsafe does not, for example, ‘turn off
the borrow checker’. Adding unsafe to some random Rust code doesn’t
change its semantics, it won’t start accepting anything. But it will let
you write things that do break some of the rules.
You will also encounter the unsafe keyword when writing bindings
to foreign (non-Rust) interfaces. You’re encouraged to write a safe,
native Rust interface around the methods provided by the library.
Let’s go over the basic three abilities listed, in order.
Effective Rust
Chapter 1
Effective Rust
So you’ve learned how to write some Rust code. But there’s a difference
between writing any Rust code and writing good Rust code.
This chapter consists of relatively independent tutorials which show
you how to take your Rust to the next level. Common patterns and
standard library features will be introduced. Read these sections in
any order of your choosing.
Memory management
These two terms are about memory management. The stack and the
heap are abstractions that help you determine when to allocate and
deallocate memory.
Here’s a high-level comparison:
The stack is very fast, and is where memory is allocated in Rust by
default. But the allocation is local to a function call, and is limited in
size. The heap, on the other hand, is slower, and is explicitly allocated
by your program. But it’s effectively unlimited in size, and is globally
accessible. Note this meaning of heap, which allocates arbitrary-sized
blocks of memory in arbitrary order, is quite different from the heap
data structure.
The Stack
Let’s talk about this Rust program:
fn main() {
let x = 42;
}
This program has one variable binding, x. This memory needs to be
allocated from somewhere. Rust ‘stack allocates’ by default, which
means that basic values ‘go on the stack’. What does that mean?
Well, when a function gets called, some memory gets allocated for
all of its local variables and some other information. This is called
a ‘stack frame’, and for the purpose of this tutorial, we’re going to
ignore the extra information and only consider the local variables we’re
allocating. So in this case, when main() is run, we’ll allocate a single
32-bit integer for our stack frame. This is automatically handled for
you, as you can see; we didn’t have to write any special Rust code or
anything.
When the function exits, its stack frame gets deallocated. This
happens automatically as well.
That’s all there is for this simple program. The key thing to under-
stand here is that stack allocation is very, very fast. Since we know all
the local variables we have ahead of time, we can grab the memory all
at once. And since we’ll throw them all away at the same time as well,
we can get rid of it very fast too.
The downside is that we can’t keep values around if we need them
for longer than a single function. We also haven’t talked about what the
249
fn foo() {
let y = 5;
let z = 100;
}
fn main() {
let x = 42;
foo();
}
This program has three variables total: two in foo(), one in main()
. Just as before, when main() is called, a single integer is allocated
for its stack frame. But before we can show what happens when foo(
) is called, we need to visualize what’s going on with memory. Your
operating system presents a view of memory to your program that’s
pretty simple: a huge list of addresses, from 0 to a large number, rep-
resenting how much RAM your computer has. For example, if you have
a gigabyte of RAM, your addresses go from 0 to 1,073,741,823. That
number comes from 230, the number of bytes in a gigabyte. 1
1 ‘Gigabyte’ can mean two things: 109, or 230. The IEC standard resolved this
by stating that ‘gigabyte’ is 109, and ‘gibibyte’ is 230. However, very few people use
this terminology, and rely on context to differentiate. We follow in that tradition
here.
This memory is kind of like a giant array: addresses start at zero and go up to
the final number. So here’s a diagram of our first stack frame:
Because 0 was taken by the first frame, 1 and 2 are used for foo()’s stack frame.
It grows upward, the more functions we call. Notice that we are not taking into
account the size of each variable (for example, a 32 bit variable would use the
memory addresses from 0 to 3, or 4 bytes).
250
There are some important things we have to take note of here. The
numbers 0, 1, and 2 are all solely for illustrative purposes, and bear no
relationship to the address values the computer will use in reality. In
particular, the series of addresses are in reality going to be separated by
some number of bytes that separate each address, and that separation
may even exceed the size of the value being stored.
After foo() is over, its frame is deallocated:
And then, after main(), even this last value goes away. Easy!
It’s called a ‘stack’ because it works like a stack of dinner plates:
the first plate you put down is the last plate to pick back up. Stacks
are sometimes called ‘last in, first out queues’ for this reason, as the
last value you put on the stack is the first one you retrieve from it.
Let’s try a three-deep example:
fn italic() {
let i = 6;
}
fn bold() {
let a = 5;
let b = 100;
let c = 1;
italic();
}
fn main() {
let x = 42;
bold();
}
And then we’re done. Getting the hang of it? It’s like piling up
dishes: you add to the top, you take away from the top.
The Heap
Now, this works pretty well, but not everything can work like this.
Sometimes, you need to pass some memory between different functions,
or keep it alive for longer than a single function’s execution. For this,
we can use the heap.
In Rust, you can allocate memory on the heap with the Box<T>
type. Here’s an example:
252
fn main() {
let x = Box::new(5);
let y = 42;
}
called ‘moving out of the box’. More complex examples will be covered later.
254
fn main() {
let x = 5;
let y = &x;
foo(y);
}
When we enter main(), memory looks like this:
Address Name Value
1 y →0
0 x 5
x is a plain old 5, and y is a reference to x. So its value is the
memory location that x lives at, which in this case is 0.
What about when we call foo(), passing y as an argument?
Address Name Value
3 z 42
2 i →0
1 y →0
0 x 5
Stack frames aren’t only for local bindings, they’re for arguments
too. So in this case, we need to have both i, our argument, and z, our
local variable binding. i is a copy of the argument, y. Since y’s value
is 0, so is i’s.
This is one reason why borrowing a variable doesn’t deallocate any
memory: the value of a reference is a pointer to a memory location. If
we got rid of the underlying memory, things wouldn’t work very well.
A complex example
Okay, let’s go through this complex program step-by-step:
fn foo(x: &i32) {
let y = 10;
let z = &y;
baz(z);
bar(x, z);
}
255
baz(e);
}
fn baz(f: &i32) {
let g = 100;
}
fn main() {
let h = 3;
let i = Box::new(20);
let j = &h;
foo(j);
}
First, we call main():
Space gets allocated for x, y, and z. The argument x has the same
value as j, since that’s what we passed it in. It’s a pointer to the 0
address, since j points at h.
Next, foo() calls baz(), passing z:
With this, we’re at our deepest point! Whew! Congrats for following
along this far.
After baz() is over, we get rid of f and g:
258
And then, finally, main(), which cleans the rest up. When i is
Dropped, it will clean up the last of the heap too.
259
Which to use?
So if the stack is faster and easier to manage, why do we need the heap?
A big reason is that Stack-allocation alone means you only have ‘Last
In First Out (LIFO)’ semantics for reclaiming storage. Heap-allocation
is strictly more general, allowing storage to be taken from and returned
to the pool in arbitrary order, but at a complexity cost.
Generally, you should prefer stack allocation, and so, Rust stack-
allocates by default. The LIFO model of the stack is simpler, at a
fundamental level. This has two big impacts: runtime efficiency and
semantic impact.
Runtime Efficiency
Managing the memory for the stack is trivial: The machine increments
or decrements a single value, the so-called “stack pointer”. Managing
memory for the heap is non-trivial: heap-allocated memory is freed
at arbitrary points, and each block of heap-allocated memory can be
of arbitrary size, so the memory manager must generally work much
harder to identify memory for reuse.
If you’d like to dive into this topic in greater detail, this paper is a
great introduction.
Semantic impact
Stack-allocation impacts the Rust language itself, and thus the devel-
oper’s mental model. The LIFO semantics is what drives how the Rust
language handles automatic memory management. Even the deallo-
cation of a uniquely-owned heap-allocated box can be driven by the
stack-based LIFO semantics, as discussed throughout this chapter. The
flexibility (i.e. expressiveness) of non LIFO-semantics means that in
general the compiler cannot automatically infer at compile-time where
260
Testing
Program testing can be a very effective way to show the
presence of bugs, but it is hopelessly inadequate for showing
their absence.
Edsger W. Dijkstra, “The Humble Programmer” (1972)
Let’s talk about how to test Rust code. What we will not be talking
about is the right way to test Rust code. There are many schools of
thought regarding the right and wrong way to write tests. All of these
approaches use the same basic tools, and so we’ll show you the syntax
for using them.
fn it_works() {
}
}
For now, let’s remove the mod bit, and focus on just the function:
Note the #[test]. This attribute indicates that this is a test function.
It currently has no body. That’s good enough to pass! We can run the
tests with cargo test:
$ cargo test
Compiling adder v0.1.0 (file:///home/you/projects/adder)
running 1 test
test it_works ... ok
Doc-tests adder
running 0 tests
Cargo compiled and ran our tests. There are two sets of output here:
one for the test we wrote, and another for documentation tests. We’ll
talk about those later. For now, see this line:
262
running 1 test
test it_works ... FAILED
failures:
false', src/lib.rs:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
failures:
it_works
assert!(false);
}
This test will now succeed if we panic! and fail if we complete. Let’s
try it:
$ cargo test
Compiling adder v0.1.0 (file:///home/you/projects/adder)
running 1 test
test it_works ... ok
Doc-tests adder
running 0 tests
$ cargo test
Compiling adder v0.1.0 (file:///home/you/projects/adder)
running 1 test
test it_works ... ok
Doc-tests adder
running 0 tests
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
#[test]
#[ignore]
fn expensive_test() {
// Code that takes an hour to run...
}
Now we run our tests and see that it_works is run, but expensive_
test is not:
267
$ cargo test
Compiling adder v0.1.0 (file:///home/you/projects/adder)
running 2 tests
test expensive_test ... ignored
test it_works ... ok
Doc-tests adder
running 0 tests
running 1 test
test expensive_test ... ok
Doc-tests adder
running 0 tests
#[cfg(test)]
mod tests {
use super::add_two;
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
}
There’s a few changes here. The first is the introduction of a mod
tests with a cfg attribute. The module allows us to group all of our
tests together, and to also define helper functions if needed, that don’t
become a part of the rest of our crate. The cfg attribute only compiles
our test code if we’re currently trying to run the tests. This can save
compile time, and also ensures that our tests are entirely left out of a
normal build.
The second change is the use declaration. Because we’re in an inner
module, we need to bring the tested function into scope. This can be
annoying if you have a large module, and so this is a common use of
globs. Let’s change our src/lib.rs to make use of it:
# // fn main
#
pub fn add_two(a: i32) -> i32 {
a + 2
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
}
$ cargo test
Updating registry `https://github.com/rust-lang/crates.
io-index`
Compiling adder v0.1.0 (file:///home/you/projects/adder)
Running target/debug/deps/adder-91b3e234d4ed382a
running 1 test
test tests::it_works ... ok
Doc-tests adder
running 0 tests
It works!
The current convention is to use the tests module to hold your
“unit-style” tests. Anything that tests one small bit of functionality
makes sense to go here. But what about “integration-style” tests in-
stead? For that, we have the tests directory.
270
#[test]
fn it_works() {
assert_eq!(4, adder::add_two(2));
}
This looks similar to our previous tests, but slightly different. We now
have an extern crate adder at the top. This is because each test
in the tests directory is an entirely separate crate, and so we need to
import our library. This is also why tests is a suitable place to write
integration-style tests: they use the library like any other consumer of
it would.
Let’s run them:
$ cargo test
Compiling adder v0.1.0 (file:///home/you/projects/adder)
Running target/debug/deps/adder-91b3e234d4ed382a
running 1 test
test tests::it_works ... ok
Running target/debug/integration_test-68064b69521c828a
271
running 1 test
test it_works ... ok
Doc-tests adder
running 0 tests
Documentation tests
Nothing is better than documentation with examples. Nothing is worse
than examples that don’t actually work, because the code has changed
since the documentation has been written. To this end, Rust supports
automatically running examples in your documentation (note: this
only works in library crates, not binary crates). Here’s a fleshed-out
src/lib.rs with examples:
# // The next line exists to trick play.rust-lang.org into
running our code as a
# // test:
# // fn main
#
//! The `adder` crate provides functions that add numbers
to other numbers.
//!
272
//! # Examples
//!
//! ```
//! assert_eq!(4, adder::add_two(2));
//! ```
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
}
$ cargo test
Compiling adder v0.1.0. (file:///home/you/projects/adder)
Running target/debug/deps/adder-91b3e234d4ed382a
273
running 1 test
test tests::it_works ... ok
Running target/debug/integration_test-68064b69521c828a
running 1 test
test it_works ... ok
Doc-tests adder
running 2 tests
test add_two_0 ... ok
test _0 ... ok
Test output
By default Rust’s test library captures and discards output to standard
out/error, e.g. output from println!(). This too can be controlled
using the environment or a switch:
Conditional Compilation
Rust has a special attribute, #[cfg], which allows you to compile code
based on a flag passed to the compiler. It has two forms:
#[cfg(foo)]
# fn foo() {}
#[cfg(bar = "baz")]
# fn bar() {}
#[cfg(any(unix, windows))]
# fn foo() {}
#[cfg(not(foo))]
# fn not_foo() {}
[features]
# no features by default
default = []
--cfg feature="${feature_name}"
The sum of these cfg flags will determine which ones get activated,
and therefore, which code gets compiled. Let’s take this code:
#[cfg(feature = "foo")]
mod foo {
}
cfg_attr
You can also set another attribute based on a cfg variable with cfg_
attr:
276
#[cfg_attr(a, b)]
# fn foo() {}
Will be the same as #[b] if a is set by cfg attribute, and nothing
otherwise.
cfg!
The cfg! macro lets you use these kinds of flags elsewhere in your
code, too:
Documentation
Documentation is an important part of any software project, and it’s
first-class in Rust. Let’s talk about the tooling Rust gives you to doc-
ument your project.
About rustdoc
The Rust distribution includes a tool, rustdoc, that generates docu-
mentation. rustdoc is also used by Cargo through cargo doc.
Documentation can be generated in two ways: from source code,
and from standalone Markdown files.
///
/// ```
/// use std::rc::Rc;
///
/// let five = Rc::new(5);
/// ```
pub fn new(value: T) -> Rc<T> {
// Implementation goes here.
}
This code generates documentation that looks like this. I’ve left the
implementation out, with a regular comment in its place.
The first thing to notice about this annotation is that it uses ///
instead of //. The triple slash indicates a documentation comment.
Documentation comments are written in Markdown.
Rust keeps track of these comments, and uses them when generat-
ing documentation. This is important when documenting things like
enums:
/// The `Option` type. See [the module level documentation](
index.html) for more.
enum Option<T> {
/// No value
None,
/// Some value `T`
Some(T),
}
The above works, but this does not:
/// The `Option` type. See [the module level documentation](
index.html) for more.
enum Option<T> {
None, /// No value
Some(T), /// Some value `T`
}
You’ll get an error:
hello.rs:4:1: 4:2 error: expected ident, found `}`
hello.rs:4 }
^
This unfortunate error is correct; documentation comments apply to
the thing after them, and there’s nothing after that last comment.
278
///
/// Other details about constructing `Rc<T>`s, maybe describing
complicated
/// semantics, maybe additional options, all kinds of stuff.
///
# fn foo() {}
Our original example had just a summary line, but if we had more
things to say, we could have added more explanation in a new para-
graph.
Special sections Next, are special sections. These are indicated with
a header, #. There are four kinds of headers that are commonly used.
They aren’t special syntax, just convention, for now.
/// # Panics
# fn foo() {}
/// # Errors
# fn foo() {}
/// # Safety
# fn foo() {}
/// # Examples
///
/// ```
/// use std::rc::Rc;
///
/// let five = Rc::new(5);
/// ```
# fn foo() {}
/// # Examples
///
/// Simple `&str` patterns:
///
/// ```
/// let v: Vec<&str> = "Mary had a little lamb".split('
').collect();
/// assert_eq!(v, vec!["Mary", "had", "a", "little", "lamb"])
;
/// ```
///
/// More complex patterns with a lambda:
///
/// ```
/// let v: Vec<&str> = "abc1def2ghi".split(|c: char| c.
is_numeric()).collect();
/// assert_eq!(v, vec!["abc", "def", "ghi"]);
/// ```
# fn foo() {}
/// ```
/// println!("Hello, world");
/// ```
# fn foo() {}
This will add code highlighting. If you are only showing plain text, put
text instead of rust after the triple graves (see below).
Documentation as tests
Let’s discuss our sample example documentation:
/// ```
/// println!("Hello, world");
/// ```
# fn foo() {}
You’ll notice that you don’t need a fn main() or anything here. rustdoc
will automatically add a main() wrapper around your code, using
heuristics to attempt to put it in the right place. For example:
/// ```
/// use std::rc::Rc;
///
/// let five = Rc::new(5);
/// ```
# fn foo() {}
This will end up testing:
fn main() {
use std::rc::Rc;
let five = Rc::new(5);
}
Here’s the full algorithm rustdoc uses to preprocess examples:
Yes, that’s right: you can add lines that start with #, and they will be
hidden from the output, but will be used when compiling your code.
You can use this to your advantage. In this case, documentation com-
ments need to apply to some kind of function, so if I want to show you
just a documentation comment, I need to add a little function definition
below it. At the same time, it’s only there to satisfy the compiler, so
hiding it makes the example more clear. You can use this technique to
explain longer examples in detail, while still preserving the testability
of your documentation.
For example, imagine that we wanted to document this code:
let x = 5;
let y = 6;
println!("{}", x + y);
let x = 5;
# let y = 6;
# println!("{}", x + y);
282
To keep each code block testable, we want the whole program in each
block, but we don’t want the reader to see every line every time. Here’s
what we put in our source code:
First, we set `x` to five:
```rust
let x = 5;
# let y = 6;
# println!("{}", x + y);
```
```rust
# let x = 5;
let y = 6;
# println!("{}", x + y);
```
```rust
# let x = 5;
# let y = 6;
println!("{}", x + y);
```
By repeating all parts of the example, you can ensure that your example
still compiles, while only showing the parts that are relevant to that
part of your explanation.
283
Documenting macros
Here’s an example of documenting a macro:
You’ll note three things: we need to add our own extern crate line,
so that we can add the #[macro_use] attribute. Second, we’ll need to
add our own main() as well (for reasons discussed above). Finally, a
judicious use of # to comment out those two things, so they don’t show
up in the output.
Another case where the use of # is handy is when you want to ignore
error handling. Lets say you want the following,
The problem is that try! returns a Result<T, E> and test functions
don’t return anything so this will give a mismatched types error.
284
You can get around this by wrapping the code in a function. This
catches and swallows the Result<T, E> when running tests on the
docs. This pattern appears regularly in the standard library.
/// ```rust,ignore
/// fn foo() {
/// ```
# fn foo() {}
The ignore directive tells Rust to ignore your code. This is almost
never what you want, as it’s the most generic. Instead, consider anno-
tating it with text if it’s not code, or using #s to get a working example
that only shows the part you care about.
285
/// ```rust,should_panic
/// assert!(false);
/// ```
# fn foo() {}
/// ```rust,no_run
/// loop {
/// println!("Hello, world");
/// }
/// ```
# fn foo() {}
The no_run attribute will compile your code, but not run it. This is
important for examples such as “Here’s how to retrieve a web page,”
which you would want to ensure compiles, but might be run in a test
environment that has no network access.
Documenting modules
Rust has another kind of doc comment, //!. This comment doesn’t
document the next item, but the enclosing item. In other words:
mod foo {
//! This is documentation for the `foo` module.
//!
//! # Examples
// ...
}
This is where you’ll see //! used most often: for module documenta-
tion. If you have a module in foo.rs, you’ll often open its code and
see this:
Crate documentation
Crates can be documented by placing an inner doc comment (//!) at
the beginning of the crate root, aka lib.rs:
Other documentation
All of this behavior works in non-Rust source files too. Because com-
ments are written in Markdown, they’re often .md files.
When you write documentation in Markdown files, you don’t need
to prefix the documentation with comments. For example:
/// # Examples
///
/// ```
/// use std::rc::Rc;
///
/// let five = Rc::new(5);
/// ```
# fn foo() {}
is:
# Examples
```
use std::rc::Rc;
% The title
doc attributes
At a deeper level, documentation comments are syntactic sugar for
documentation attributes:
/// this
# fn foo() {}
#[doc="this"]
# fn bar() {}
//! this
#![doc="this"]
You won’t often see this attribute used for writing documentation, but
it can be useful when changing some options, or when writing a macro.
Re-exports
rustdoc will show the documentation for a public re-export in both
places:
This will create documentation for bar both inside the documentation
for the crate foo, as well as the documentation for your crate. It will
use the same documentation in both places.
This behavior can be suppressed with no_inline:
#[doc(no_inline)]
288
Missing documentation
Sometimes you want to make sure that every single public thing in your
project is documented, especially when you are working on a library.
Rust allows you to to generate warnings or errors, when an item is
missing documentation. To generate warnings you use warn:
#![warn(missing_docs)]
#![deny(missing_docs)]
#[allow(missing_docs)]
struct Undocumented;
You might even want to hide items from the documentation completely:
#[doc(hidden)]
struct Hidden;
Controlling HTML
You can control a few aspects of the HTML that rustdoc generates
through the #![doc] version of the attribute:
#![doc(html_logo_url = "https://www.rust-lang.org/logos/
rust-logo-128x128-blk-v2.png",
html_favicon_url = "https://www.rust-lang.org/favicon.
ico",
html_root_url = "https://doc.rust-lang.org/")]
This sets a few different options, with a logo, favicon, and a root URL.
#![doc(test(attr(allow(unused_variables), deny(warnings)
)))]
This allows unused variables within the examples, but will fail the test
for any other lint warning thrown.
Generation options
rustdoc also contains a few other options on the command line, for
further customization:
Security note
The Markdown in documentation comments is placed without process-
ing into the final webpage. Be careful with literal HTML:
/// <script>alert(document.cookie)</script>
# fn foo() {}
Iterators
Let’s talk about loops.
Remember Rust’s for loop? Here’s an example:
for x in 0..10 {
println!("{}", x);
}
Now that you know more Rust, we can talk in detail about how this
works. Ranges (the 0..10) are ‘iterators’. An iterator is something
that we can call the .next() method on repeatedly, and it gives us a
sequence of things.
290
A range with two dots like 0..10 is inclusive on the left (so it starts
at 0) and exclusive on the right (so it ends at 9). A mathematician
would write “[0, 10)”.
Like this:
loop {
match range.next() {
Some(x) => {
println!("{}", x);
},
None => { break }
}
}
for i in 0..nums.len() {
println!("{}", nums[i]);
}
This is strictly worse than using an actual iterator. You can iterate
over vectors directly, so write this:
291
There are two reasons for this. First, this more directly expresses what
we mean. We iterate through the entire vector, rather than iterating
through indexes, and then indexing the vector. Second, this version
is more efficient: the first version will have extra bounds checking be-
cause it used indexing, nums[i]. But since we yield a reference to each
element of the vector in turn with the iterator, there’s no bounds check-
ing in the second example. This is very common with iterators: we can
ignore unnecessary bounds checks, but still know that we’re safe.
There’s another detail here that’s not 100% clear because of how
println! works. num is actually of type &i32. That is, it’s a reference
to an i32, not an i32 itself. println! handles the dereferencing for
us, so we don’t see it. This code works fine too:
Now we’re explicitly dereferencing num. Why does &nums give us ref-
erences? Firstly, because we explicitly asked it to with &. Secondly, if
it gave us the data itself, we would have to be its owner, which would
involve making a copy of the data and giving us the copy. With ref-
erences, we’re only borrowing a reference to the data, and so it’s only
passing a reference, without needing to do the move.
So, now that we’ve established that ranges are often not what you
want, let’s talk about what you do want instead.
There are three broad classes of things that are relevant here: iter-
ators, iterator adaptors, and consumers. Here’s some definitions:
Let’s talk about consumers first, since you’ve already seen an iterator,
ranges.
Consumers
A consumer operates on an iterator, returning some kind of value or
values. The most common consumer is collect(). This code doesn’t
quite compile, but it shows the intention:
let one_to_one_hundred = (1..101).collect();
As you can see, we call collect() on our iterator. collect() takes as
many values as the iterator will give it, and returns a collection of the
results. So why won’t this compile? Rust can’t determine what type of
things you want to collect, and so you need to let it know. Here’s the
version that does compile:
let one_to_one_hundred = (1..101).collect::<Vec<i32>>()
;
If you remember, the ::<> syntax allows us to give a type hint that
tells the compiler we want a vector of integers. You don’t always need
to use the whole type, though. Using a _ will let you provide a partial
hint:
let one_to_one_hundred = (1..101).collect::<Vec<_>>();
This says “Collect into a Vec<T>, please, but infer what the T is for
me.” _ is sometimes called a “type placeholder” for this reason.
collect() is the most common consumer, but there are others too.
find() is one:
let greater_than_forty_two = (0..100)
.find(|x| *x > 42);
match greater_than_forty_two {
Some(_) => println!("Found a match!"),
None => println!("No match found :("),
}
find takes a closure, and works on a reference to each element of an
iterator. This closure returns true if the element is the element we’re
looking for, and false otherwise. find returns the first element sat-
isfying the specified predicate. Because we might not find a matching
element, find returns an Option rather than the element itself.
Another important consumer is fold. Here’s what it looks like:
293
# (1..4)
.fold(0, |sum, x| sum + x);
Iterators
As we’ve said before, an iterator is something that we can call the .
next() method on repeatedly, and it gives us a sequence of things.
Because you need to call the method, this means that iterators can be
lazy and not generate all of the values upfront. This code, for example,
294
does not actually generate the numbers 1-99, instead creating a value
that merely represents the sequence:
Now, collect() will require that the range gives it some numbers, and
so it will do the work of generating the sequence.
Ranges are one of two basic iterators that you’ll see. The other is
iter(). iter() can turn a vector into a simple iterator that gives you
each element in turn:
let nums = vec![1, 2, 3];
These two basic iterators should serve you well. There are some more
advanced iterators, including ones that are infinite.
That’s enough about iterators. Iterator adaptors are the last con-
cept we need to talk about with regards to iterators. Let’s get to it!
Iterator adaptors
Iterator adaptors take an iterator and modify it somehow, producing a
new iterator. The simplest one is called map:
(1..100).map(|x| x + 1);
map is called upon another iterator, and produces a new iterator where
each element reference has the closure it’s been given as an argument
called on it. So this would give us the numbers from 2-100. Well,
almost! If you compile the example, you’ll get a warning:
(1..100).map(|x| x + 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Laziness strikes again! That closure will never execute. This example
doesn’t print any numbers:
If you are trying to execute a closure on an iterator for its side effects,
use for instead.
There are tons of interesting iterator adaptors. take(n) will return
an iterator over the next n elements of the original iterator. Let’s try
it out with an infinite iterator:
for i in (1..).take(5) {
println!("{}", i);
}
1
2
3
4
5
for i in (1..100).filter(|&x| x % 2 == 0) {
println!("{}", i);
}
This will print all of the even numbers between one and a hundred.
(Note that, unlike map, the closure passed to filter is passed a refer-
ence to the element instead of the element itself. The filter predicate
here uses the &x pattern to extract the integer. The filter closure is
passed a reference because it returns true or false instead of the ele-
ment, so the filter implementation must retain ownership to put the
elements into the newly constructed iterator.)
You can chain all three things together: start with an iterator, adapt
it a few times, and then consume the result. Check it out:
296
(1..)
.filter(|&x| x % 2 == 0)
.filter(|&x| x % 3 == 0)
.take(5)
.collect::<Vec<i32>>();
This will give you a vector containing 6, 12, 18, 24, and 30.
This is just a small taste of what iterators, iterator adaptors, and
consumers can help you with. There are a number of really useful
iterators, and you can write your own as well. Iterators provide a safe,
efficient way to manipulate all kinds of lists. They’re a little unusual
at first, but if you play with them, you’ll get hooked. For a full list
of the different iterators and consumers, check out the iterator module
documentation.
Concurrency
Concurrency and parallelism are incredibly important topics in com-
puter science, and are also a hot topic in industry today. Computers are
gaining more and more cores, yet many programmers aren’t prepared
to fully utilize them.
Rust’s memory safety features also apply to its concurrency story.
Even concurrent Rust programs must be memory safe, having no data
races. Rust’s type system is up to the task, and gives you powerful
ways to reason about concurrent code at compile time.
Before we talk about the concurrency features that come with Rust,
it’s important to understand something: Rust is low-level enough that
the vast majority of this is provided by the standard library, not by
the language. This means that if you don’t like some aspect of the
way Rust handles concurrency, you can implement an alternative way
of doing things. mio is a real-world example of this principle in action.
Send
The first trait we’re going to talk about is Send. When a type T im-
plements Send, it indicates that something of this type is able to have
ownership transferred safely between threads.
This is important to enforce certain restrictions. For example, if we
have a channel connecting two threads, we would want to be able to
send some data down the channel and to the other thread. Therefore,
we’d ensure that Send was implemented for that type.
In the opposite way, if we were wrapping a library with FFI that
isn’t thread-safe, we wouldn’t want to implement Send, and so the
compiler will help us enforce that it can’t leave the current thread.
Sync
The second of these traits is called Sync. When a type T implements
Sync, it indicates that something of this type has no possibility of intro-
ducing memory unsafety when used from multiple threads concurrently
through shared references. This implies that types which don’t have
interior mutability are inherently Sync, which includes simple primitive
types (like u8) and aggregate types containing them.
For sharing references across threads, Rust provides a wrapper type
called Arc<T>. Arc<T> implements Send and Sync if and only if T
implements both Send and Sync. For example, an object of type
Arc<RefCell<U>> cannot be transferred across threads because RefCell
does not implement Sync, consequently Arc<RefCell<U>> would not
implement Send.
These two traits allow you to use the type system to make strong
guarantees about the properties of your code under concurrency. Before
we demonstrate why, we need to learn how to create a concurrent Rust
program in the first place!
Threads
Rust’s standard library provides a library for threads, which allow you
to run Rust code in parallel. Here’s a basic example of using std::
thread:
use std::thread;
fn main() {
thread::spawn(|| {
298
fn main() {
let handle = thread::spawn(|| {
"Hello from a thread!"
});
println!("{}", handle.join().unwrap());
}
use std::thread;
fn main() {
let x = 1;
thread::spawn(|| {
println!("x is {}", x);
});
}
});
fn main() {
let x = 1;
thread::spawn(move || {
println!("x is {}", x);
});
}
Many languages have the ability to execute threads, but it’s wildly
unsafe. There are entire books about how to prevent errors that occur
from shared mutable state. Rust helps out with its type system here as
well, by preventing data races at compile time. Let’s talk about how
you actually share things between threads.
The same ownership system that helps prevent using pointers incor-
rectly also helps rule out data races, one of the worst kinds of concur-
rency bugs.
As an example, here is a Rust program that would have a data race
in many languages. It will not compile:
300
use std::thread;
use std::time::Duration;
fn main() {
let mut data = vec![1, 2, 3];
for i in 0..3 {
thread::spawn(move || {
data[0] += i;
});
}
thread::sleep(Duration::from_millis(50));
}
fn main() {
let mut data = Rc::new(vec![1, 2, 3]);
for i in 0..3 {
301
// Use it in a thread:
thread::spawn(move || {
data_ref[0] += i;
});
}
thread::sleep(Duration::from_millis(50));
}
This won’t work, however, and will give us the error:
fn main() {
let mut data = Arc::new(vec![1, 2, 3]);
for i in 0..3 {
let data = data.clone();
thread::spawn(move || {
302
data[0] += i;
});
}
thread::sleep(Duration::from_millis(50));
}
Similarly to last time, we use clone() to create a new owned handle.
This handle is then moved into the new thread.
And... still gives us an error.
fn main() {
let data = Arc::new(Mutex::new(vec![1, 2, 3]));
for i in 0..3 {
let data = data.clone();
thread::spawn(move || {
303
thread::sleep(Duration::from_millis(50));
}
Note that the value of i is bound (copied) to the closure and not shared
among the threads.
We’re “locking” the mutex here. A mutex (short for “mutual ex-
clusion”), as mentioned, only allows one thread at a time to access a
value. When we wish to access the value, we use lock() on it. This
will “lock” the mutex, and no other thread will be able to lock it (and
hence, do anything with the value) until we’re done with it. If a thread
attempts to lock a mutex which is already locked, it will wait until the
other thread releases the lock.
The lock “release” here is implicit; when the result of the lock (in
this case, data) goes out of scope, the lock is automatically released.
Note that lock method of Mutex has this signature:
First, we call lock(), which acquires the mutex’s lock. Because this
may fail, it returns a Result<T, E>, and because this is just an example,
we unwrap() it to get a reference to the data. Real code would have
more robust error handling here. We’re then free to mutate it, since
we have the lock.
Lastly, while the threads are running, we wait on a short timer. But
this is not ideal: we may have picked a reasonable amount of time to
wait but it’s more likely we’ll either be waiting longer than necessary or
not long enough, depending on just how much time the threads actually
take to finish computing when the program runs.
A more precise alternative to the timer would be to use one of the
mechanisms provided by the Rust standard library for synchronizing
threads with each other. Let’s talk about one of them: channels.
Channels
Here’s a version of our code that uses channels for synchronization,
rather than waiting for a specific time:
fn main() {
let data = Arc::new(Mutex::new(0));
for _ in 0..10 {
let (data, tx) = (data.clone(), tx.clone());
thread::spawn(move || {
let mut data = data.lock().unwrap();
*data += 1;
tx.send(()).unwrap();
});
}
305
for _ in 0..10 {
rx.recv().unwrap();
}
}
fn main() {
let (tx, rx) = mpsc::channel();
for i in 0..10 {
let tx = tx.clone();
thread::spawn(move || {
let answer = i * i;
tx.send(answer).unwrap();
});
}
for _ in 0..10 {
println!("{}", rx.recv().unwrap());
}
}
Panics
A panic! will crash the currently executing thread. You can use Rust’s
threads as a simple isolation mechanism:
306
use std::thread;
assert!(result.is_err());
Error Handling
Like most programming languages, Rust encourages the programmer to
handle errors in a particular way. Generally speaking, error handling is
divided into two broad categories: exceptions and return values. Rust
opts for return values.
In this section, we intend to provide a comprehensive treatment of
how to deal with errors in Rust. More than that, we will attempt to
introduce error handling one piece at a time so that you’ll come away
with a solid working knowledge of how everything fits together.
When done naïvely, error handling in Rust can be verbose and an-
noying. This section will explore those stumbling blocks and demon-
strate how to use the standard library to make error handling concise
and ergonomic.
Table of Contents
This section is very long, mostly because we start at the very beginning
with sum types and combinators, and try to motivate the way Rust does
error handling incrementally. As such, programmers with experience
in other expressive type systems may want to jump around.
• The Basics
– Unwrapping explained
– The Option type
* Composing Option<T> values
307
– Initial setup
– Argument parsing
– Writing the logic
– Error handling with Box<Error>
– Reading from stdin
– Error handling with a custom type
– Adding functionality
The Basics
You can think of error handling as using case analysis to determine
whether a computation was successful or not. As you will see, the key
to ergonomic error handling is reducing the amount of explicit case
analysis the programmer has to do while keeping code composable.
Keeping code composable is important, because without that re-
quirement, we could panic whenever we come across something unex-
pected. (panic causes the current task to unwind, and in most cases,
the entire program aborts.) Here’s an example:
fn main() {
guess(11);
}
If you try running this code, the program will crash with a message like
this:
use std::env;
fn main() {
let mut argv = env::args();
let arg: String = argv.nth(1).unwrap(); // error 1
let n: i32 = arg.parse().unwrap(); // error 2
println!("{}", 2 * n);
}
309
If you give this program zero arguments (error 1) or if the first argument
isn’t an integer (error 2), the program will panic just like in the first
example.
You can think of this style of error handling as similar to a bull
running through a china shop. The bull will get to where it wants to
go, but it will trample everything in the process.
Unwrapping explained
In the previous example, we claimed that the program would simply
panic if it reached one of the two error conditions, yet, the program
does not include an explicit call to panic like the first example. This
is because the panic is embedded in the calls to unwrap.
To “unwrap” something in Rust is to say, “Give me the result of the
computation, and if there was an error, panic and stop the program.”
It would be better if we showed the code for unwrapping because it
is so simple, but to do that, we will first need to explore the Option
and Result types. Both of these types have a method called unwrap
defined on them.
enum Option<T> {
None,
Some(T),
}
The Option type is a way to use Rust’s type system to express the
possibility of absence. Encoding the possibility of absence into the type
system is an important concept because it will cause the compiler to
force the programmer to handle that absence. Let’s take a look at an
example that tries to find a character in a string:
if c == needle {
return Some(offset);
}
}
None
}
Notice that when this function finds a matching character, it doesn’t
only return the offset. Instead, it returns Some(offset). Some is
a variant or a value constructor for the Option type. You can think
of it as a function with the type fn<T>(value: T) -> Option<T>.
Correspondingly, None is also a value constructor, except it has no
arguments. You can think of None as a function with the type fn<T>(
) -> Option<T>.
This might seem like much ado about nothing, but this is only half
of the story. The other half is using the find function we’ve written.
Let’s try to use it to find the extension in a file name.
enum Option<T> {
None,
Some(T),
}
311
impl<T> Option<T> {
fn unwrap(self) -> T {
match self {
Option::Some(val) => val,
Option::None =>
panic!("called `Option::unwrap()` on a `None`
value"),
}
}
}
The unwrap method abstracts away the case analysis. This is precisely
the thing that makes unwrap ergonomic to use. Unfortunately, that
panic! means that unwrap is not composable: it is the bull in the
china shop.
(Pro-tip: don’t use this code. Use the extension method in the stan-
dard library instead.)
The code stays simple, but the important thing to notice is that the
type of find forces us to consider the possibility of absence. This is a
good thing because it means the compiler won’t let us accidentally for-
get about the case where a file name doesn’t have an extension. On the
other hand, doing explicit case analysis like we’ve done in extension_
explicit every time can get a bit tiresome.
In fact, the case analysis in extension_explicit follows a very
common pattern: map a function on to the value inside of an Option<T>,
unless the option is None, in which case, return None.
Rust has parametric polymorphism, so it is very easy to define a
combinator that abstracts this pattern:
fn map<F, T, A>(option: Option<T>, f: F) -> Option<A> where
F: FnOnce(T) -> A {
match option {
None => None,
Some(value) => Some(f(value)),
}
}
Indeed, map is defined as a method on Option<T> in the standard li-
brary. As a method, it has a slightly different signature: methods take
self, &self, or &mut self as their first argument.
Armed with our new combinator, we can rewrite our extension_
explicit method to get rid of the case analysis:
from a file path. While most file paths have a file name, not all of them
do. For example, ., .. or /.
So, we are tasked with the challenge of finding an extension given
a file path. Let’s start with explicit case analysis:
You might think that we could use the map combinator to reduce the
case analysis, but its type doesn’t quite fit...
The map function here wraps the value returned by the extension
function inside an Option<_> and since the extension function itself
returns an Option<&str> the expression file_name(file_path).map(
|x| extension(x)) actually returns an Option<Option<&str>>.
But since file_path_ext just returns Option<&str> (and not Option<Option<&str>
we get a compilation error.
The result of the function taken by map as input is always rewrapped
with Some. Instead, we need something like map, but which allows the
caller to return a Option<_> directly without wrapping it in another
Option<_>.
Its generic implementation is even simpler than map:
315
Side note: Since and_then essentially works like map but returns an
Option<_> instead of an Option<Option<_>> it is known as flatmap
in some other languages.
The Option type has many other combinators defined in the stan-
dard library. It is a good idea to skim this list and familiarize yourself
with what’s available—they can often reduce case analysis for you. Fa-
miliarizing yourself with these combinators will pay dividends because
many of them are also defined (with similar semantics) for Result,
which we will talk about next.
Combinators make using types like Option ergonomic because they
reduce explicit case analysis. They are also composable because they
permit the caller to handle the possibility of absence in their own
way. Methods like unwrap remove choices because they will panic if
Option<T> is None.
Parsing integers
The Rust standard library makes converting strings to integers dead
simple. It’s so easy in fact, that it is very tempting to write something
like the following:
fn main() {
let n: i32 = double_number("10");
assert_eq!(n, 20);
}
impl str {
fn parse<F: FromStr>(&self) -> Result<F, F::Err>;
}
and Result. If you can provide detailed error information, then you
probably should. (We’ll see more on this later.)
OK, but how do we write our return type? The parse method as
defined above is generic over all the different number types defined in
the standard library. We could (and probably should) also make our
function generic, but let’s favor explicitness for the moment. We only
care about i32, so we need to find its implementation of FromStr (do a
CTRL-F in your browser for “FromStr”) and look at its associated type
Err. We did this so we can find the concrete error type. In this case,
it’s std::num::ParseIntError. Finally, we can rewrite our function:
use std::num::ParseIntError;
fn main() {
match double_number("10") {
Ok(n) => assert_eq!(n, 20),
Err(err) => println!("Error: {:?}", err),
}
}
This is a little better, but now we’ve written a lot more code! The case
analysis has once again bitten us.
Combinators to the rescue! Just like Option, Result has lots of
combinators defined as methods. There is a large intersection of com-
mon combinators between Result and Option. In particular, map is
part of that intersection:
use std::num::ParseIntError;
fn main() {
match double_number("10") {
Ok(n) => assert_eq!(n, 20),
Err(err) => println!("Error: {:?}", err),
}
}
The usual suspects are all there for Result, including unwrap_or and
and_then. Additionally, since Result has a second type parameter,
there are combinators that affect only the error type, such as map_err
(instead of map) and or_else (instead of and_then).
fn main() {
let mut argv = env::args();
let arg: String = argv.nth(1).unwrap(); // error 1
let n: i32 = arg.parse().unwrap(); // error 2
println!("{}", 2 * n);
}
Given our new found knowledge of Option, Result and their various
combinators, we should try to rewrite this so that errors are handled
properly and the program doesn’t panic if there’s an error.
The tricky aspect here is that argv.nth(1) produces an Option
while arg.parse() produces a Result. These aren’t directly compos-
able. When faced with both an Option and a Result, the solution is
usually to convert the Option to a Result. In our case, the absence of
a command line parameter (from env::args()) means the user didn’t
invoke the program correctly. We could use a String to describe the
error. Let’s try:
use std::env;
fn main() {
match double_arg(env::args()) {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {}", err),
322
}
}
There are a couple new things in this example. The first is the use of
the Option::ok_or combinator. This is one way to convert an Option
into a Result. The conversion requires you to specify what error to use
if Option is None. Like the other combinators we’ve seen, its definition
is very simple:
use std::fs::File;
use std::io::Read;
use std::path::Path;
fn main() {
let doubled = file_double("foobar");
println!("{}", doubled);
}
(N.B. The AsRef<Path> is used because those are the same bounds
used on std::fs::File::open. This makes it ergonomic to use any
kind of string as a file path.)
There are three different errors that can occur here:
The first two problems are described via the std::io::Error type.
We know this because of the return types of std::fs::File::open
and std::io::Read::read_to_string. (Note that they both use the
Result type alias idiom described previously. If you click on the Result
type, you’ll see the type alias, and consequently, the underlying io:
:Error type.) The third problem is described by the std::num::
ParseIntError type. The io::Error type in particular is pervasive
throughout the standard library. You will see it again and again.
Let’s start the process of refactoring the file_double function. To
make this function composable with other components of the program,
it should not panic if any of the above error conditions are met. Effec-
tively, this means that the function should return an error if any of its
324
use std::fs::File;
use std::io::Read;
use std::path::Path;
fn main() {
match file_double("foobar") {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {}", err),
}
}
This code looks a bit hairy. It can take quite a bit of practice before
325
code like this becomes easy to write. The way we write it is by following
the types. As soon as we changed the return type of file_double to
Result<i32, String>, we had to start looking for the right combina-
tors. In this case, we only used three different combinators: and_then,
map and map_err.
and_then is used to chain multiple computations where each com-
putation could return an error. After opening the file, there are two
more computations that could fail: reading from the file and parsing
the contents as a number. Correspondingly, there are two calls to and_
then.
map is used to apply a function to the Ok(...) value of a Result.
For example, the very last call to map multiplies the Ok(...) value
(which is an i32) by 2. If an error had occurred before that point, this
operation would have been skipped because of how map is defined.
map_err is the trick that makes all of this work. map_err is like
map, except it applies a function to the Err(...) value of a Result.
In this case, we want to convert all of our errors to one type: String.
Since both io::Error and num::ParseIntError implement ToString,
we can call the to_string() method to convert them.
With all of that said, the code is still hairy. Mastering use of com-
binators is important, but they have their limits. Let’s try a different
approach: early returns.
Early returns
I’d like to take the code from the previous section and rewrite it using
early returns. Early returns let you exit the function early. We can’t
return early in file_double from inside another closure, so we’ll need
to revert back to explicit case analysis.
use std::fs::File;
use std::io::Read;
use std::path::Path;
fn main() {
match file_double("foobar") {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {}", err),
}
}
Reasonable people can disagree over whether this code is better than
the code that uses combinators, but if you aren’t familiar with the
combinator approach, this code looks simpler to read to me. It uses
explicit case analysis with match and if let. If an error occurs, it
simply stops executing the function and returns the error (by converting
it to a string).
Isn’t this a step backwards though? Previously, we said that the
key to ergonomic error handling is reducing explicit case analysis, yet
we’ve reverted back to explicit case analysis here. It turns out, there
are multiple ways to reduce explicit case analysis. Combinators aren’t
the only way.
macro_rules! try {
($e:expr) => (match $e {
Ok(val) => val,
327
use std::fs::File;
use std::io::Read;
use std::path::Path;
fn main() {
match file_double("foobar") {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {}", err),
}
}
The map_err calls are still necessary given our definition of try!. This
is because the error types still need to be converted to String. The
good news is that we will soon learn how to remove those map_err calls!
The bad news is that we will need to learn a bit more about a couple
important traits in the standard library before we can remove the map_
err calls.
328
use std::io;
use std::num;
329
fn main() {
match file_double("foobar") {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {:?}", err),
}
}
The only change here is switching map_err(|e| e.to_string()) (which
converts errors to strings) to map_err(CliError::Io) or map_err(
330
• Inspect the causal chain of an error, if one exists (via the cause
method).
The first two are a result of Error requiring impls for both Debug
and Display. The latter two are from the two methods defined on
Error. The power of Error comes from the fact that all error types
impl Error, which means errors can be existentially quantified as a trait
object. This manifests as either Box<Error> or &Error. Indeed, the
cause method returns an &Error, which is itself a trait object. We’ll
revisit the Error trait’s utility as a trait object later.
For now, it suffices to show an example implementing the Error
trait. Let’s use the error type we defined in the previous section:
use std::io;
use std::num;
// their implementations.
CliError::Io(ref err) => write!(f, "IO error:
{}", err),
CliError::Parse(ref err) => write!(f, "Parse
error: {}", err),
}
}
}
macro_rules! try {
($e:expr) => (match $e {
Ok(val) => val,
Err(err) => return Err(err),
});
}
This is not its real definition. Its real definition is in the standard
library:
macro_rules! try {
($e:expr) => (match $e {
Ok(val) => val,
Err(err) => return Err(::std::convert::From::from(
err)),
});
}
There’s one tiny but powerful change: the error value is passed through
From::from. This makes the try! macro a lot more powerful because
it gives you automatic type conversion for free.
Armed with our more powerful try! macro, let’s take a look at
code we wrote previously to read a file and convert its contents to an
integer:
335
use std::fs::File;
use std::io::Read;
use std::path::Path;
1. Case analysis.
2. Control flow.
3. Error type conversion.
336
When all three things are combined, we get code that is unencumbered
by combinators, calls to unwrap or case analysis.
There’s one little nit left: the Box<Error> type is opaque. If we
return a Box<Error> to the caller, the caller can’t (easily) inspect un-
derlying error type. The situation is certainly better than String be-
cause the caller can call methods like description and cause, but the
limitation remains: Box<Error> is opaque. (N.B. This isn’t entirely
true because Rust does have runtime reflection, which is useful in some
scenarios that are beyond the scope of this section.)
It’s time to revisit our custom CliError type and tie everything
together.
use std::fs::File;
use std::io::{self, Read};
use std::num;
use std::path::Path;
# #[derive(Debug)]
# enum CliError { Io(io::Error), Parse(num::ParseIntError)
}
use std::io;
use std::num;
use std::fs::File;
use std::io::Read;
use std::path::Path;
use std::io;
use std::num;
enum CliError {
Io(io::Error),
ParseInt(num::ParseIntError),
ParseFloat(num::ParseFloatError),
}
And add a new From impl:
# enum CliError {
# Io(::std::io::Error),
339
# ParseInt(num::ParseIntError),
# ParseFloat(num::ParseFloatError),
# }
use std::num;
Initial setup
We’re not going to spend a lot of time on setting up a project with Cargo
because it is already covered well in the Cargo section and Cargo’s
documentation.
To get started from scratch, run cargo new --bin city-pop and
make sure your Cargo.toml looks something like this:
[package]
name = "city-pop"
version = "0.1.0"
authors = ["Andrew Gallant <[email protected]>"]
[[bin]]
name = "city-pop"
[dependencies]
csv = "0.*"
rustc-serialize = "0.*"
getopts = "0.*"
Argument parsing
Let’s get argument parsing out of the way. We won’t go into too much
detail on Getopts, but there is some good documentation describing it.
The short story is that Getopts generates an argument parser and a
help message from a vector of options (The fact that it is a vector is
hidden behind a struct and a set of methods). Once the parsing is done,
the parser returns a struct that records matches for defined options,
and remaining “free” arguments. From there, we can get information
about the flags, for instance, whether they were passed in, and what
arguments they had. Here’s our program with the appropriate extern
crate statements, and the basic argument setup for Getopts:
extern crate getopts;
extern crate rustc_serialize;
use getopts::Options;
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let program = &args[0];
print_usage(&program, opts);
return;
}
let data_path = &matches.free[0];
let city: &str = &matches.free[1];
#[derive(Debug, RustcDecodable)]
struct Row {
country: String,
city: String,
accent_city: String,
region: String,
fn main() {
let args: Vec<String> = env::args().collect();
let program = &args[0];
if matches.opt_present("h") {
print_usage(&program, opts);
return;
}
344
if row.city == city {
println!("{}, {}: {:?}",
row.city, row.country,
row.population.expect("population count")
);
}
}
}
Let’s outline the errors. We can start with the obvious: the three places
that unwrap is called:
Are there any others? What if we can’t find a matching city? Tools
like grep will return an error code, so we probably should too. So we
have logic errors specific to our problem, IO errors and CSV parsing
errors. We’re going to explore two different ways to approach handling
these errors.
I’d like to start with Box<Error>. Later, we’ll see how defining our
own error type can be useful.
which means the compiler can no longer reason about its underlying
type.
Previously we started refactoring our code by changing the type
of our function from T to Result<T, OurErrorType>. In this case,
OurErrorType is only Box<Error>. But what’s T? And can we add a
return type to main?
The answer to the second question is no, we can’t. That means
we’ll need to write a new function. But what is T? The simplest thing
we can do is to return a list of matching Row values as a Vec<Row>.
(Better code would return an iterator, but that is left as an exercise to
the reader.)
Let’s refactor our code into its own function, but keep the calls
to unwrap. Note that we opt to handle the possibility of a missing
population count by simply ignoring that row.
use std::path::Path;
struct Row {
// This struct remains unchanged.
}
struct PopulationCount {
city: String,
country: String,
// This is no longer an `Option` because values of
this type are only
// constructed if they have a population count.
count: u64,
}
fn main() {
let args: Vec<String> = env::args().collect();
let program = &args[0];
if matches.opt_present("h") {
print_usage(&program, opts);
return;
}
fn search<P: AsRef<Path>>
(file_path: P, city: &str)
-> Result<Vec<PopulationCount>, Box<Error>> {
let mut found = vec![];
let file = try!(File::open(file_path));
let mut rdr = csv::Reader::from_reader(file);
for row in rdr.decode::<Row>() {
let row = try!(row);
match row.population {
None => { } // Skip it.
Some(count) => if row.city == city {
found.push(PopulationCount {
city: row.city,
country: row.country,
count: count,
});
},
}
}
if found.is_empty() {
Err(From::from("No matching cities with a population
were found."))
348
} else {
Ok(found)
}
}
Instead of x.unwrap(), we now have try!(x). Since our function re-
turns a Result<T, E>, the try! macro will return early from the
function if an error occurs.
At the end of search we also convert a plain string to an error type
by using the corresponding From impls:
// We are making use of this impl in the code above, since
we call `From::from`
// on a `&'static str`.
impl<'a> From<&'a str> for Box<Error>
stdin. But maybe we like the current format too—so let’s have both!
Adding support for stdin is actually quite easy. There are only three
things we have to do:
1. Tweak the program arguments so that a single parameter—the
city—can be accepted while the population data is read from
stdin.
2. Modify the program so that an option -f can take the file, if it is
not passed into stdin.
3. Modify the search function to take an optional file path. When
None, it should know to read from stdin.
First, here’s the new usage:
fn print_usage(program: &str, opts: Options) {
println!("{}", opts.usage(&format!("Usage: {} [options]
<city>", program)));
}
Of course we need to adapt the argument handling code:
...
let mut opts = Options::new();
opts.optopt("f", "file", "Choose an input file, instead
of using STDIN.", "NAME");
opts.optflag("h", "help", "Show this usage message.
");
...
let data_path = matches.opt_str("f");
}
}
Err(err) => println!("{}", err)
}
...
We’ve made the user experience a bit nicer by showing the usage mes-
sage, instead of a panic from an out-of-bounds index, when city, the
remaining free argument, is not present.
Modifying search is slightly trickier. The csv crate can build a
parser out of any type that implements io::Read. But how can we
use the same code over both types? There’s actually a couple ways we
could go about this. One way is to write search such that it is generic
on some type parameter R that satisfies io::Read. Another way is to
use trait objects:
use std::io;
fn search<P: AsRef<Path>>
(file_path: &Option<P>, city: &str)
-> Result<Vec<PopulationCount>, Box<Error>> {
let mut found = vec![];
let input: Box<io::Read> = match *file_path {
None => Box::new(io::stdin()),
Some(ref file_path) => Box::new(try!(File::open(
file_path))),
};
let mut rdr = csv::Reader::from_reader(input);
// The rest remains unchanged!
}
#[derive(Debug)]
enum CliError {
Io(io::Error),
Csv(csv::Error),
NotFound,
}
use std::fmt;
Before we can use our CliError type in our search function, we need to
provide a couple From impls. How do we know which impls to provide?
Well, we’ll need to convert from both io::Error and csv::Error to
CliError. Those are the only external errors, so we’ll only need two
From impls for now:
Adding functionality
Writing generic code is great, because generalizing stuff is cool, and
it can then be useful later. But sometimes, the juice isn’t worth the
squeeze. Look at what we just did in the previous step:
The big downside here is that our program didn’t improve a whole
lot. There is quite a bit of overhead to representing errors with enums,
especially in short programs like this.
One useful aspect of using a custom error type like we’ve done here
is that the main function can now choose to handle errors differently.
Previously, with Box<Error>, it didn’t have much of a choice: just print
the message. We’re still doing that here, but what if we wanted to,
say, add a --quiet flag? The --quiet flag should silence any verbose
output.
Right now, if the program doesn’t find a match, it will output a
message saying so. This can be a little clumsy, especially if you intend
for the program to be used in shell scripts.
354
So let’s start by adding the flags. Like before, we need to tweak the
usage string and add a flag to the Option variable. Once we’ve done
that, Getopts does the rest:
...
let mut opts = Options::new();
opts.optopt("f", "file", "Choose an input file, instead
of using STDIN.", "NAME");
opts.optflag("h", "help", "Show this usage message.
");
opts.optflag("q", "quiet", "Silences errors and warnings.
");
...
Now we only need to implement our “quiet” functionality. This requires
us to tweak the case analysis in main:
use std::process;
...
match search(&data_path, city) {
Err(CliError::NotFound) if matches.opt_present(
"q") => process::exit(1),
Err(err) => panic!("{}", err),
Ok(pops) => for pop in pops {
println!("{}, {}: {:?}", pop.city, pop.country,
pop.count);
}
}
...
Certainly, we don’t want to be quiet if there was an IO error or if the
data failed to parse. Therefore, we use case analysis to check if the
error type is NotFound and if --quiet has been enabled. If the search
failed, we still quit with an exit code (following grep’s convention).
If we had stuck with Box<Error>, then it would be pretty tricky to
implement the --quiet functionality.
This pretty much sums up our case study. From here, you should
be ready to go out into the world and write your own programs and
libraries with proper error handling.
let x = Box::new(1);
let y = x;
// `x` is no longer accessible here.
Here, the box was moved into y. As x no longer owns it, the compiler
will no longer allow the programmer to use x after this. A box can
similarly be moved out of a function by returning it.
When a box (that hasn’t been moved) goes out of scope, destructors
are run. These destructors take care of deallocating the inner data.
This is a zero-cost abstraction for dynamic allocation. If you want to
allocate some memory on the heap and safely pass around a pointer to
that memory, this is ideal. Note that you will only be allowed to share
references to this by the regular borrowing rules, checked at compile
time.
tions. The only guarantee that these provide is that they cannot be
dereferenced except in code marked unsafe.
These are useful when building safe, low cost abstractions like Vec<T>,
but should be avoided in safe code.
Rc<T>
This is the first wrapper we will cover that has a runtime cost.
Rc<T> is a reference counted pointer. In other words, this lets us
have multiple “owning” pointers to the same data, and the data will be
dropped (destructors will be run) when all pointers are out of scope.
Internally, it contains a shared “reference count” (also called “ref-
count”), which is incremented each time the Rc is cloned, and decre-
mented each time one of the Rcs goes out of scope. The main responsi-
bility of Rc<T> is to ensure that destructors are called for shared data.
The internal data here is immutable, and if a cycle of references
is created, the data will be leaked. If we want data that doesn’t leak
when there are cycles, we need a garbage collector.
Guarantees The main guarantee provided here is that the data will
not be destroyed until all references to it are out of scope.
This should be used when we wish to dynamically allocate and
share some data (read-only) between various portions of your program,
where it is not certain which portion will finish using the pointer last.
It’s a viable alternative to &T when &T is either impossible to statically
check for correctness, or creates extremely unergonomic code where the
programmer does not wish to spend the development cost of working
with.
This pointer is not thread safe, and Rust will not let it be sent or
shared with other threads. This lets one avoid the cost of atomics in
situations where they are unnecessary.
There is a sister smart pointer to this one, Weak<T>. This is a non-
owning, but also non-borrowed, smart pointer. It is also similar to &T,
but it is not restricted in lifetime—a Weak<T> can be held on to forever.
However, it is possible that an attempt to access the inner data may fail
and return None, since this can outlive the owned Rcs. This is useful
for cyclic data structures and other things.
Cell types
Cells provide interior mutability. In other words, they contain data
which can be manipulated even if the type cannot be obtained in a
mutable form (for example, when it is behind an &-ptr or Rc<T>).
The documentation for the cell module has a pretty good expla-
nation for these.
These types are generally found in struct fields, but they may be
found elsewhere too.
Cell<T>
Cell<T> is a type that provides zero-cost interior mutability by moving
data in and out of the cell. Since the compiler knows that all the data
owned by the contained value is on the stack, there’s no worry of leaking
any data behind references (or worse!) by simply replacing the data.
It is still possible to violate your own invariants using this wrapper,
so be careful when using it. If a field is wrapped in Cell, it’s a nice
indicator that the chunk of data is mutable and may not stay the same
between the time you first read it and when you intend to use it.
use std::cell::Cell;
let x = Cell::new(1);
let y = &x;
let z = &x;
x.set(2);
y.set(3);
z.set(4);
println!("{}", x.get());
Note that here we were able to mutate the same value from various
immutable references.
This has the same runtime cost as the following:
359
let mut x = 1;
let y = &mut x;
let z = &mut x;
x = 2;
*y = 3;
*z = 4;
println!("{}", x);
RefCell<T>
RefCell<T> also provides interior mutability, but doesn’t move data in
and out of the cell.
However, it has a runtime cost. RefCell<T> enforces the read-
write lock pattern at runtime (it’s like a single-threaded mutex), unlike
&T/&mut T which do so at compile time. This is done by the borrow()
and borrow_mut() functions, which modify an internal reference count
and return smart pointers which can be dereferenced immutably and
mutably respectively. The refcount is restored when the smart pointers
go out of scope. With this system, we can dynamically ensure that
there are never any other borrows active when a mutable borrow is
active. If the programmer attempts to make such a borrow, the thread
will panic.
use std::cell::RefCell;
360
let x = RefCell::new(vec![1,2,3,4]);
{
println!("{:?}", *x.borrow())
}
{
let mut my_ref = x.borrow_mut();
my_ref.push(1);
}
Similar to Cell, this is mainly useful for situations where it’s hard or
impossible to satisfy the borrow checker. Generally we know that such
mutations won’t happen in a nested form, but it’s good to check.
For large, complicated programs, it becomes useful to put some
things in RefCells to make things simpler. For example, a lot of the
maps in the ctxt struct in the Rust compiler internals are inside this
wrapper. These are only modified once (during creation, which is not
right after initialization) or a couple of times in well-separated places.
However, since this struct is pervasively used everywhere, juggling mu-
table and immutable pointers would be hard (perhaps impossible) and
probably form a soup of &-ptrs which would be hard to extend. On
the other hand, the RefCell provides a cheap (not zero-cost) way of
safely accessing these. In the future, if someone adds some code that
attempts to modify the cell when it’s already borrowed, it will cause a
(usually deterministic) panic which can be traced back to the offending
borrow.
Similarly, in Servo’s DOM there is a lot of mutation, most of which
is local to a DOM type, but some of which crisscrosses the DOM and
modifies various things. Using RefCell and Cell to guard all mutation
lets us avoid worrying about mutability everywhere, and it simultane-
ously highlights the places where mutation is actually happening.
Note that RefCell should be avoided if a mostly simple solution is
possible with & pointers.
Synchronous types
Many of the types above cannot be used in a threadsafe manner. Par-
ticularly, Rc<T> and RefCell<T>, which both use non-atomic refer-
ence counts (atomic reference counts are those which can be incre-
mented from multiple threads without causing a data race), cannot be
used this way. This makes them cheaper to use, but we need thread
safe versions of these too. They exist, in the form of Arc<T> and
Mutex<T>/RwLock<T>
Note that the non-threadsafe types cannot be sent between threads,
and this is checked at compile time.
There are many useful wrappers for concurrent programming in the
sync module, but only the major ones will be covered below.
Arc<T>
Arc<T> is a version of Rc<T> that uses an atomic reference count (hence,
“Arc”). This can be sent freely between threads.
C++’s shared_ptr is similar to Arc, however in the case of C++
the inner data is always mutable. For semantics similar to that from
C++, we should use Arc<Mutex<T>>, Arc<RwLock<T>>, or Arc<UnsafeCell<T>>3
3 Arc<UnsafeCell<T>> actually won’t compile since UnsafeCell<T> isn’t Send or
Sync, but we can wrap it in a type and implement Send/Sync for it manually to get
Arc<Wrapper<T>> where Wrapper is struct Wrapper<T>(UnsafeCell<T>).
Guarantees
Like Rc, this provides the (thread safe) guarantee that the destructor for the internal
data will be run when the last Arc goes out of scope (barring any cycles).
Cost
This has the added cost of using atomics for changing the refcount (which will
happen whenever it is cloned or goes out of scope). When sharing data from an Arc
in a single thread, it is preferable to share & pointers whenever possible.
(UnsafeCell<T> is a cell type that can be used to hold any data and
has no runtime cost, but accessing it requires unsafe blocks). The last
one should only be used if we are certain that the usage won’t cause
any memory unsafety. Remember that writing to a struct is not an
atomic operation, and many functions like vec.push() can reallocate
internally and cause unsafe behavior, so even monotonicity may not be
enough to justify UnsafeCell.
RwLock has the added benefit of being efficient for multiple reads. It
is always safe to have multiple readers to shared data as long as there
are no writers; and RwLock lets readers acquire a “read lock”. Such
locks can be acquired concurrently and are kept track of via a reference
count. Writers must obtain a “write lock” which can only be obtained
when all readers have gone out of scope.
Composition
A common gripe when reading Rust code is with types like Rc<RefCell<Vec<T>>>
(or even more complicated compositions of such types). It’s not always
clear what the composition does, or why the author chose one like this
(and when one should be using such a composition in one’s own code)
Usually, it’s a case of composing together the guarantees that you
need, without paying for stuff that is unnecessary.
For example, Rc<RefCell<T>> is one such composition. Rc<T> itself
can’t be dereferenced mutably; because Rc<T> provides sharing and
shared mutability can lead to unsafe behavior, so we put RefCell<T>
inside to get dynamically verified shared mutability. Now we have
{
let guard = mutex.lock();
// `guard` dereferences mutably to the inner type.
*guard += 1;
} // Lock is released when destructor runs.
363
shared mutable data, but it’s shared in a way that there can only be
one mutator (and no readers) or multiple readers.
Now, we can take this a step further, and have Rc<RefCell<Vec<T>>>
or Rc<Vec<RefCell<T>>>. These are both shareable, mutable vectors,
but they’re not the same.
With the former, the RefCell<T> is wrapping the Vec<T>, so the
Vec<T> in its entirety is mutable. At the same time, there can only
be one mutable borrow of the whole Vec at a given time. This means
that your code cannot simultaneously work on different elements of the
vector from different Rc handles. However, we are able to push and
pop from the Vec<T> at will. This is similar to a &mut Vec<T> with
the borrow checking done at runtime.
With the latter, the borrowing is of individual elements, but the
overall vector is immutable. Thus, we can independently borrow sep-
arate elements, but we cannot push or pop from the vector. This is
similar to a &mut [T]4 , but, again, the borrow checking is at runtime.
In concurrent programs, we have a similar situation with Arc<Mutex<T>>,
which provides shared mutability and ownership.
When reading code that uses these, go in step by step and look at
the guarantees/costs provided.
When choosing a composed type, we must do the reverse; figure out
which guarantees we want, and at which point of the composition we
need them. For example, if there is a choice between Vec<RefCell<T>>
and RefCell<Vec<T>>, we should figure out the tradeoffs as done above
and pick one.
Introduction
This guide will use the snappy compression/decompression library as
an introduction to writing bindings for foreign code. Rust is currently
unable to call directly into a C++ library, but snappy includes a C
interface (documented in snappy-c.h).
4 &[T] and &mut [T] are slices; they consist of a pointer and a length and can
refer to a portion of a vector or array. &mut [T] can have its elements mutated,
however its length cannot be touched.
364
[dependencies]
libc = "0.2.0"
#[link(name = "snappy")]
extern {
fn snappy_max_compressed_length(source_length: size_
t) -> size_t;
}
fn main() {
let x = unsafe { snappy_max_compressed_length(100)
};
println!("max compressed length of a 100 byte buffer:
{}", x);
}
#[link(name = "snappy")]
extern {
fn snappy_compress(input: *const u8,
input_length: size_t,
compressed: *mut u8,
compressed_length: *mut size_t)
-> c_int;
fn snappy_uncompress(compressed: *const u8,
compressed_length: size_t,
uncompressed: *mut u8,
uncompressed_length: *mut size_
t) -> c_int;
fn snappy_max_compressed_length(source_length: size_
t) -> size_t;
fn snappy_uncompressed_length(compressed: *const u8,
compressed_length: size_
t,
result: *mut size_t)
-> c_int;
fn snappy_validate_compressed_buffer(compressed: *const
u8,
compressed_length:
size_t) -> c_int;
}
# fn main() {}
# compressed_
length: size_t)
# -> c_int
{ 0 }
# fn main() { }
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn valid() {
let d = vec![0xde, 0xad, 0xd0, 0x0d];
let c: &[u8] = &compress(&d);
assert!(validate_compressed_buffer(c));
assert!(uncompress(c) == Some(d));
}
#[test]
fn invalid() {
let d = vec![0, 0, 0, 0];
assert!(!validate_compressed_buffer(&d));
assert!(uncompress(&d).is_none());
}
#[test]
fn empty() {
let d = vec![];
assert!(!validate_compressed_buffer(&d));
assert!(uncompress(&d).is_none());
let c = compress(&d);
assert!(validate_compressed_buffer(&c));
assert!(uncompress(&c) == Some(d));
}
}
Destructors
Foreign libraries often hand off ownership of resources to the calling
code. When this occurs, we must use Rust’s destructors to provide
safety and guarantee the release of these resources (especially in the
370
case of panic).
For more about destructors, see the Drop trait.
#[link(name = "extlib")]
extern {
fn register_callback(cb: extern fn(i32)) -> i32;
fn trigger_callback();
}
fn main() {
unsafe {
register_callback(callback);
trigger_callback(); // Triggers the callback.
}
}
C code:
typedef void (*rust_callback)(int32_t);
rust_callback cb;
void trigger_callback() {
cb(7); // Will call callback(7) in Rust.
}
In this example Rust’s main() will call trigger_callback() in C,
which would, in turn, call back to callback() in Rust.
#[link(name = "extlib")]
extern {
fn register_callback(target: *mut RustObject,
cb: extern fn(*mut RustObject,
i32)) -> i32;
fn trigger_callback();
}
372
fn main() {
// Create the object that will be referenced in the
callback:
let mut rust_object = Box::new(RustObject { a: 5 })
;
unsafe {
register_callback(&mut *rust_object, callback);
trigger_callback();
}
}
C code:
void trigger_callback() {
cb(cb_target, 7); // Will call callback(&rustObject,
7) in Rust.
}
Asynchronous callbacks
In the previously given examples the callbacks are invoked as a direct
reaction to a function call to the external C library. The control over the
current thread is switched from Rust to C to Rust for the execution
of the callback, but in the end the callback is executed on the same
thread that called the function which triggered the callback.
Things get more complicated when the external library spawns its
own threads and invokes callbacks from there. In these cases access
to Rust data structures inside the callbacks is especially unsafe and
proper synchronization mechanisms must be used. Besides classical
373
Linking
The link attribute on extern blocks provides the basic building block
for instructing rustc how it will link to native libraries. There are two
accepted forms of the link attribute today:
• #[link(name = “foo”)]
In both of these cases, foo is the name of the native library that we’re
linking to, and in the second case bar is the type of native library that
the compiler is linking to. There are currently three known types of
native libraries:
Unsafe blocks
Some operations, like dereferencing raw pointers or calling functions
that have been marked unsafe are only allowed inside unsafe blocks.
Unsafe blocks isolate unsafety and are a promise to the compiler that
the unsafety does not leak out of the block.
Unsafe functions, on the other hand, advertise it to the world. An
unsafe function is written like this:
unsafe fn kaboom(ptr: *const i32) -> i32 { *ptr }
#[link(name = "readline")]
extern {
static rl_readline_version: libc::c_int;
}
fn main() {
println!("You have readline version {} installed.",
unsafe { rl_readline_version as i32 });
}
use std::ffi::CString;
use std::ptr;
#[link(name = "readline")]
extern {
static mut rl_prompt: *const libc::c_char;
}
fn main() {
let prompt = CString::new("[my-awesome-shell] $").unwrap(
);
unsafe {
rl_prompt = prompt.as_ptr();
println!("{:?}", rl_prompt);
rl_prompt = ptr::null();
}
}
Note that all interaction with a static mut is unsafe, both reading
and writing. Dealing with global mutable state requires a great deal of
care.
376
This applies to the entire extern block. The list of supported ABI
constraints are:
• stdcall
• aapcs
• cdecl
• fastcall
• Rust
• rust-intrinsic
• system
• C
• win64
• sysv64
377
Most of the abis in this list are self-explanatory, but the system abi
may seem a little odd. This constraint selects whatever the appropriate
ABI is for interoperating with the target’s libraries. For example, on
win32 with a x86 architecture, this means that the abi used would be
stdcall. On x86_64, however, windows uses the C calling convention,
so C would be used. This means that in our previous example, we could
have used extern “system” { ... } to define a block for all windows
systems, not only x86 ones.
Variadic functions
In C, functions can be ‘variadic’, meaning they accept a variable num-
ber of arguments. This can be achieved in Rust by specifying ...
within the argument list of a foreign function declaration:
extern {
fn foo(x: i32, ...);
}
378
fn main() {
unsafe {
foo(10, 20, 30, 40, 50);
}
}
# #[cfg(hidden)]
extern "C" {
/// Registers the callback.
fn register(cb: Option<extern "C" fn(Option<extern
"C" fn(c_int) -> c_int>, c_int) -> c_int>);
}
# unsafe fn register(_: Option<extern "C" fn(Option<extern
"C" fn(c_int) -> c_int>,
# c_int) ->
c_int>)
# {}
fn main() {
unsafe {
register(Some(apply));
}
}
No transmute required!
380
#[no_mangle]
pub extern fn hello_rust() -> *const u8 {
"Hello, world!\0".as_ptr()
}
# fn main() {}
use std::panic::catch_unwind;
#[no_mangle]
pub extern fn oh_no() -> i32 {
let result = catch_unwind(|| {
panic!("Oops!");
});
match result {
Ok(_) => 0,
Err(_) => 1,
}
}
fn main() {}
Please note that catch_unwind will only catch unwinding panics, not
those who abort the process. See the documentation of catch_unwind
for more information.
381
extern "C" {
pub fn foo(arg: *mut libc::c_void);
pub fn bar(arg: *mut libc::c_void);
}
# fn main() {}
This is a perfectly valid way of handling the situation. However, we
can do a bit better. To solve this, some C libraries will instead create a
struct, where the details and memory layout of the struct are private.
This gives some amount of type safety. These structures are called
‘opaque’. Here’s an example, in C:
struct Foo; /* Foo is a structure, but its contents are
not part of the public interface */
struct Bar;
void foo(struct Foo *arg);
void bar(struct Bar *arg);
To do this in Rust, let’s create our own opaque types with enum:
pub enum Foo {}
pub enum Bar {}
extern "C" {
pub fn foo(arg: *mut Foo);
pub fn bar(arg: *mut Bar);
}
# fn main() {}
By using an enum with no variants, we create an opaque type that we
can’t instantiate, as it has no variants. But because our Foo and Bar
types are different, we’ll get type safety between the two of them, so
we cannot accidentally pass a pointer to Foo to bar().
382
Borrow
The Borrow trait is used when you’re writing a data structure, and you
want to use either an owned or borrowed type as synonymous for some
purpose.
For example, HashMap has a get method which uses Borrow:
The K parameter is the type of key the HashMap uses. So, looking at the
signature of get() again, we can use get() when the key implements
Borrow<Q>. That way, we can make a HashMap which uses String keys,
but use &strs when we’re searching:
use std::collections::HashMap;
assert_eq!(map.get("Foo"), Some(&42));
This is because the standard library has impl Borrow<str> for String.
For most types, when you want to take an owned or borrowed type,
a &T is enough. But one area where Borrow is effective is when there’s
more than one kind of borrowed value. This is especially true of refer-
ences and slices: you can have both an &T or a &mut T. If we wanted
to accept both of these types, Borrow is up for it:
383
use std::borrow::Borrow;
use std::fmt::Display;
let mut i = 5;
foo(&i);
foo(&mut i);
This will print out a is borrowed: 5 twice.
AsRef
The AsRef trait is a conversion trait. It’s used for converting some
value to a reference in generic code. Like this:
let s = "Hello".to_string();
fn foo<T: AsRef<str>>(s: T) {
let slice = s.as_ref();
}
Release Channels
The Rust project uses a concept called ‘release channels’ to manage
releases. It’s important to understand this process to choose which
version of Rust your project should use.
384
Overview
There are three channels for Rust releases:
• Nightly
• Beta
• Stable
New nightly releases are created once a day. Every six weeks, the latest
nightly release is promoted to ‘Beta’. At that point, it will only receive
patches to fix serious errors. Six weeks later, the beta is promoted to
‘Stable’, and becomes the next release of 1.x.
This process happens in parallel. So every six weeks, on the same
day, nightly goes to beta, beta goes to stable. When 1.x is released, at
the same time, 1.(x + 1)-beta is released, and the nightly becomes
the first version of 1.(x + 2)-nightly.
Choosing a version
Generally speaking, unless you have a specific reason, you should be
using the stable release channel. These releases are intended for a
general audience.
However, depending on your interest in Rust, you may choose to use
nightly instead. The basic tradeoff is this: in the nightly channel, you
can use unstable, new Rust features. However, unstable features are
subject to change, and so any new nightly release may break your code.
If you use the stable release, you cannot use experimental features,
but the next release of Rust will not cause significant issues through
breaking changes.
language: rust
rust:
- nightly
- beta
- stable
matrix:
allow_failures:
- rust: nightly
With this configuration, Travis will test all three channels, but if some-
thing breaks on nightly, it won’t fail your build. A similar configuration
is recommended for any CI system, check the documentation of the one
you’re using for more details.
#![no_std]
#![no_std]
#[derive(Debug)]
struct Point {
x: i32,
y: i32,
}
struct Point {
x: i32,
y: i32,
}
use std::fmt;
Hello World
So the first thing we need to do is start a new crate for our project.
#[derive(HelloWorld)]
struct Pancakes;
fn main() {
Pancakes::hello_world();
}
With some kind of nice output, like Hello, World! My name is
Pancakes..
Let’s go ahead and write up what we think our macro will look like
from a user perspective. In src/main.rs we write:
#[macro_use]
extern crate hello_world_derive;
trait HelloWorld {
fn hello_world();
}
#[derive(HelloWorld)]
struct FrenchToast;
388
#[derive(HelloWorld)]
struct Waffles;
fn main() {
FrenchToast::hello_world();
Waffles::hello_world();
}
Great. So now we just need to actually write the procedural macro. At
the moment, procedural macros need to be in their own crate. Eventu-
ally, this restriction may be lifted, but for now, it’s required. As such,
there’s a convention; for a crate named foo, a custom derive procedu-
ral macro is called foo-derive. Let’s start a new crate called hello-
world-derive inside our hello-world project.
$ cargo new hello-world-derive
To make sure that our hello-world crate is able to find this new crate
we’ve created, we’ll add it to our toml:
[dependencies]
hello-world-derive = { path = "hello-world-derive" }
use proc_macro::TokenStream;
#[proc_macro_derive(HelloWorld)]
pub fn hello_world(input: TokenStream) -> TokenStream {
// Construct a string representation of the type definition
let s = input.to_string();
So there is a lot going on here. We have introduced two new crates: syn
and quote. As you may have noticed, input: TokenSteam is immedi-
ately converted to a String. This String is a string representation of
the Rust code for which we are deriving HelloWorld. At the moment,
the only thing you can do with a TokenStream is convert it to a string.
A richer API will exist in the future.
So what we really need is to be able to parse Rust code into some-
thing usable. This is where syn comes to play. syn is a crate for parsing
Rust code. The other crate we’ve introduced is quote. It’s essentially
the dual of syn as it will make generating Rust code really easy. We
could write this stuff on our own, but it’s much simpler to use these
libraries. Writing a full parser for Rust code is no simple task.
The comments seem to give us a pretty good idea of our overall
strategy. We are going to take a String of the Rust code for the type
we are deriving, parse it using syn, construct the implementation of
hello_world (using quote), then pass it back to Rust compiler.
One last note: you’ll see some unwrap()s there. If you want to
provide an error for a procedural macro, then you should panic! with
the error message. In this case, we’re keeping it as simple as possible.
Great, so let’s write impl_hello_world(&ast).
So this is where quotes comes in. The ast argument is a struct that
gives us a representation of our type (which can be either a struct
or an enum). Check out the docs, there is some useful information
there. We are able to get the name of the type using ast.ident.
390
The quote! macro lets us write up the Rust code that we wish to
return and convert it into Tokens. quote! lets us use some really cool
templating mechanics; we simply write #name and quote! will replace
it with the variable named name. You can even do some repetition
similar to regular macros work. You should check out the docs for a
good introduction.
So I think that’s it. Oh, well, we do need to add dependencies for
syn and quote in the cargo.toml for hello-world-derive.
[dependencies]
syn = "0.11.11"
quote = "0.3.15"
[lib]
proc-macro = true
Custom Attributes
In some cases it might make sense to allow users some kind of config-
uration. For example, the user might want to overwrite the name that
is printed in the hello_world() method.
This can be achieved with custom attributes:
391
#[derive(HelloWorld)]
#[HelloWorldName = "the best Pancakes"]
struct Pancakes;
fn main() {
Pancakes::hello_world();
}
If we try to compile this though, the compiler will respond with an
error:
error: The attribute `HelloWorldName` is currently unknown
to the compiler and may have meaning added to it in the
future (see issue #29642)
The compiler needs to know that we’re handling this attribute and to
not respond with an error. This is done in the hello-world-derive
crate by adding attributes to the proc_macro_derive attribute:
#[proc_macro_derive(HelloWorld, attributes(HelloWorldName)
)]
pub fn hello_world(input: TokenStream) -> TokenStream
Raising Errors
Let’s assume that we do not want to accept enums as input to our
custom derive method.
This condition can be easily checked with the help of syn. But how
do we tell the user, that we do not accept enums? The idiomatic way
to report errors in procedural macros is to panic:
fn impl_hello_world(ast: &syn::DeriveInput) -> quote::Tokens
{
let name = &ast.ident;
// Check if derive(HelloWorld) was specified for a
struct
if let syn::Body::Struct(_) = ast.body {
// Yes, this is a struct
quote! {
impl HelloWorld for #name {
fn hello_world() {
392
Appendix
Chapter 1
Glossary
Arity
Arity refers to the number of arguments a function or operation takes.
396
Bounds
Bounds are constraints on a type or trait. For example, if a bound is
placed on the argument a function takes, types passed to that function
must abide by that constraint.
Combinators
Combinators are higher-order functions that apply only functions and
earlier defined combinators to provide a result from its arguments.
They can be used to manage control flow in a modular fashion.
Expression
In computer programming, an expression is a combination of values,
constants, variables, operators and functions that evaluate to a single
value. For example, 2 + (3 * 4) is an expression that returns the
value 14. It is worth noting that expressions can have side-effects. For
example, a function included in an expression might perform actions
other than simply returning a value.
Expression-Oriented Language
In early programming languages, expressions and statements were two
separate syntactic categories: expressions had a value and statements
did things. However, later languages blurred this distinction, allow-
ing expressions to do things and statements to have a value. In an
expression-oriented language, (nearly) every statement is an expression
and therefore returns a value. Consequently, these expression state-
ments can themselves form part of larger expressions.
397
Statement
In computer programming, a statement is the smallest standalone ele-
ment of a programming language that commands a computer to per-
form an action.
398
Chapter 2
Syntax Index
Keywords
• as: primitive casting, or disambiguating the specific trait contain-
ing an item. See Casting Between Types (as), Universal Function
Call Syntax (Angle-bracket Form), Associated Types.
• const: constant items and constant raw pointers. See const and
static, Raw Pointers.
• type: type alias, and associated type definition. See type Aliases,
Associated Types.
• use: import symbols into scope. See Crates and Modules (Im-
porting Modules with use).
• & (&type, &mut type, &’a type, &’a mut type): borrowed
pointer type. See References and Borrowing.
• * (*expr): dereference.
• -> (fn(…) -> type, |…| -> type): function and closure return
type. See Functions, Closures.
• => (pat => expr): part of match arm syntax. See Match.
Other Syntax
• ’ident: named lifetime or loop label. See Lifetimes, Loops
(Loops Labels).
• T: ’a: generic type T must outlive lifetime ’a. When we say that
a type ‘outlives’ the lifetime, we mean that it cannot transitively
contain any references with lifetimes shorter than ’a.
Bibliography
Type system
• Region based memory management in Cyclone
Concurrency
• Singularity: rethinking the software stack
• Epoch-based reclamation.
Others
• Crash-only software