Strings

Strings are an important concept for any programmer to master. Rust’s string handling system is a bit different from other languages, due to its systems focus. Any time you have a data structure of variable size, things can get tricky, and strings are a re-sizable data structure. That being said, Rust’s strings also work differently than in some other systems languages, such as C.

Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid encoding of UTF-8 sequences. Additionally, unlike some systems languages, strings are not null-terminated and can contain null bytes.

Rust has two main types of strings: &str and String. Let’s talk about &str first. These are called ‘string slices’. A string slice has a fixed size, and cannot be mutated. It is a reference to a sequence of UTF-8 bytes.

let greeting = "Hello there."; // greeting: &'static str

"Hello there." is a string literal and its type is &'static str. A string literal is a string slice that is statically allocated, meaning that it’s saved inside our compiled program, and exists for the entire duration it runs. The greeting binding is a reference to this statically allocated string. Any function expecting a string slice will also accept a string literal.

String literals can span multiple lines. There are two forms. The first will include the newline and the leading spaces:

let s = "foo
    bar";

assert_eq!("foo\n        bar", s);

The second, with a \, trims the spaces and the newline:

let s = "foo\
    bar";

assert_eq!("foobar", s);

Rust has more than only &strs though. A String, is a heap-allocated string. This string is growable, and is also guaranteed to be UTF-8. Strings are commonly created by converting from a string slice using the to_string method.

let mut s = "Hello".to_string(); // mut s: String
println!("{}", s);

s.push_str(", world.");
println!("{}", s);

Strings will coerce into &str with an &:

fn takes_slice(slice: &str) {
    println!("Got: {}", slice);
}

fn main() {
    let s = "Hello".to_string();
    takes_slice(&s);
}

This coercion does not happen for functions that accept one of &str’s traits instead of &str. For example, [TcpStream::connect]connect has a parameter of type ToSocketAddrs. A &str is okay but a String must be explicitly converted using &*.

use std::net::TcpStream;

TcpStream::connect("192.168.0.1:3000"); // &str parameter

let addr_string = "192.168.0.1:3000".to_string();
TcpStream::connect(&*addr_string); // convert addr_string to &str

Viewing a String as a &str is cheap, but converting the &str to a String involves allocating memory. No reason to do that unless you have to!

Indexing

Because strings are valid UTF-8, strings do not support indexing:

let s = "hello";

println!("The first letter of s is {}", s[0]); // ERROR!!!

Usually, access to a vector with [] is very fast. But, because each character in a UTF-8 encoded string can be multiple bytes, you have to walk over the string to find the nᵗʰ letter of a string. This is a significantly more expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’ isn’t something defined in Unicode, exactly. We can choose to look at a string as individual bytes, or as codepoints:

let hachiko = "忠犬ハチ公";

for b in hachiko.as_bytes() {
    print!("{}, ", b);
}

println!("");

for c in hachiko.chars() {
    print!("{}, ", c);
}

println!("");

This prints:

229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172,
忠, 犬, ハ, チ, 公,

As you can see, there are more bytes than chars.

You can get something similar to an index like this:

# let hachiko = "忠犬ハチ公";
let dog = hachiko.chars().nth(1); // kinda like hachiko[1]

This emphasizes that we have to walk from the beginning of the list of chars.

Slicing

You can get a slice of a string with slicing syntax:

let dog = "hachiko";
let hachi = &dog[0..5];

But note that these are byte offsets, not character offsets. So this will fail at runtime:

let dog = "忠犬ハチ公";
let hachi = &dog[0..2];

with this error:

thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on
character boundary'

Concatenation

If you have a String, you can concatenate a &str to the end of it:

let hello = "Hello ".to_string();
let world = "world!";

let hello_world = hello + world;

But if you have two Strings, you need an &:

let hello = "Hello ".to_string();
let world = "world!".to_string();

let hello_world = hello + &world;

This is because &String can automatically coerce to a &str. This is a feature called ‘Deref coercions’.