Arrays and Strings
Arrays
Array
- Owned type
- Type includes size
- Types like
[u8;5]
, values like[0u8;5]
or[0u8, 1u8]
.
Slice
- Reference to an array — contents not owned
- Reference is "smart pointer" that remembers the length
- Types like
&[u8]
, values like&[0u8;5]
or&[0u8, 1u8]
. Can index with a range to get a "slice" of the underlying array, e.g.
let a = [0u8, 1, 2, 3, 4]; let s = &a[1..4]; assert_eq!(s.len(), 3); assert_eq!(s[0], 1); assert_eq!(s[2], 3);
There are lots of cool operations on slices; see the textbook and The Book and the docs
Vec
- Heap-allocated array-like object
- Length is tracked at runtime: storage is managed
- Types like
Vec<u8>
, values likeVec::new()
orvec!(1u8, 2, 3)
- Can append to a
Vec
withpush()
, extract last withpop()
Can slice a
Vec
, e.g.let mut a = vec![0u8, 1, 2, 3, 4]; assert_eq!(a.len(), 5); a.push(5); assert_eq!(a.len(), 6); let s = &a[1..4]; assert_eq!(s.len(), 3); assert_eq!(s[0], 1); assert_eq!(s[2], 3);
Indexing
Can only index with value of type
usize
Indices are bounds-checked at runtime: bounds checks are often lifted or omitted by clever compiler
May be more efficient and readable to iterate over values or references than to do the indexing
let mut a: Vec<u8> = (0..5).collect(); for i in 0..a.len() { a[i] += 1; }
vs
let mut a: Vec<u8> = (0..5).collect(); for v in a.iter_mut() { *v += 1; }
Strings (ugh)
Lots of stuff here
tl;dr:
- There's
char
which is a Unicode code point - There's
str
which is a UTF-8 string of bytes - There's
String
which is the "owned" version ofstr
- There's
Chars
"Character" can mean a lot of things. There's 7-bit ASCII, 8-bit "latin-1". There's a million per-language coding standards
In many languages there can be some debate about what constitutes a single "character". Even in English, is a ligature like
ff
a character?Rust's
char
is a Unicode "code point". It's a 32-bit quantity, but not all possible values are legalThere are the usual character classifiers and converters, which work on full Unicode
There's a bit of ASCII support, but probably shouldn't normally use it
You can always cast a
char
to any integer type big enough to hold itYou can't cast an integer to
char
: you need to usestd::char::from_u32()
or something like it. It returns anOption
depending on whether the particular input is a legal Unicode code pointNote that case conversions can't return a single character because uppercase ←→ lowercase is not always 1::1. So return a
char
iterator, which is super-annoying
str
It is deemed "not best practice" to store strings as sequences of 32-bit code points. So a compressed encoding called UTF-8 is used for strings. This encoding stores ASCII characters as themselves, and uses an escape convention to get multibyte coding of non-ASCII strings. A UTF-8 string is almost always much smaller than 4x the number of code points
A Rust
str
is like an array, except of UTF-8 text. Astr
is unsized, so it is really only useful in certain type declarations
String and &str
Let's just refer to
&str
andString
values collectively as "strings"An
&str
is a reference to astr
. It is a fat pointer that contains the size of the&str
in bytes. The normal borrow rules applyA
String
is an owned reference to astr
. It is a fat pointer that contains the size of thestr
in bytes.Because
String
is owned, you can modify the contained bytes. However, all the methods provided for this are guaranteed to preserve UTF-8 encoding of the bytes. This is carefully tuned to avoid troubleYou can get an iterator over a string's code points (
.chars()
) oru8
bytes (.bytes()
)There is no convenient way to go to a given character (code point) position in a string. If you plan to do that a lot, use
chars().collect()
String Methods
There are the obvious methods for converting these things around. Read the book carefully to learn about the vocabulary
Many of the string manipulation functions take a "pattern". There's a lot of kinds: read p. 402 of the book for the details. You will use them all, eventually
There's a regex package in the library. It's OK.
You can use
from_str()
or.parse()
to convert a string into something else. You can useto_string()
to convert other things intoString
You can use
.as_bytes()
and.into_bytes()
to grab the bytes of a string for freeThe
::from_utf8()
methods come in checked, lossy and unsafe flavors. Choose wisely