## Arrays

• Array

• Owned type
• Type includes size
• Types like [u8;5], values like [0u8;5] or [0u8, 1u8].
• Slice

• Reference to an array — contents not owned
• Reference is "smart pointer" that remembers the length
• Types like &[u8], values like &[0u8;5] or &[0u8, 1u8].
• Can index with a range to get a "slice" of the underlying array, e.g.

let a = [0u8, 1, 2, 3, 4];
let s = &a[1..4];
assert_eq!(s.len(), 3);
assert_eq!(s[0], 1);
assert_eq!(s[2], 3);

• There are lots of cool operations on slices; see the textbook and The Book and the docs

• Vec

• Heap-allocated array-like object
• Length is tracked at runtime: storage is managed
• Types like Vec<u8>, values like Vec::new() or vec!(1u8, 2, 3)
• Can append to a Vec with push(), extract last with pop()
• Can slice a Vec, e.g.

let mut a = vec![0u8, 1, 2, 3, 4];
assert_eq!(a.len(), 5);
a.push(5);
assert_eq!(a.len(), 6);
let s = &a[1..4];
assert_eq!(s.len(), 3);
assert_eq!(s[0], 1);
assert_eq!(s[2], 3);


## Indexing

• Can only index with value of type usize

• Indices are bounds-checked at runtime: bounds checks are often lifted or omitted by clever compiler

• May be more efficient and readable to iterate over values or references than to do the indexing

    let mut a: Vec<u8> = (0..5).collect();
for i in 0..a.len() {
a[i] += 1;
}


vs

    let mut a: Vec<u8> = (0..5).collect();
for v in a.iter_mut() {
*v += 1;
}


## Strings (ugh)

• Lots of stuff here

• tl;dr:

• There's char which is a Unicode code point
• There's str which is a UTF-8 string of bytes
• There's String which is the "owned" version of str

## Chars

• "Character" can mean a lot of things. There's 7-bit ASCII, 8-bit "latin-1". There's a million per-language coding standards

• In many languages there can be some debate about what constitutes a single "character". Even in English, is a ligature like ff a character?

• Rust's char is a Unicode "code point". It's a 32-bit quantity, but not all possible values are legal

• There are the usual character classifiers and converters, which work on full Unicode

• There's a bit of ASCII support, but probably shouldn't normally use it

• You can always cast a char to any integer type big enough to hold it

• You can't cast an integer to char: you need to use std::char::from_u32() or something like it. It returns an Option depending on whether the particular input is a legal Unicode code point

• Note that case conversions can't return a single character because uppercase ←→ lowercase is not always 1::1. So return a char iterator, which is super-annoying

## str

• It is deemed "not best practice" to store strings as sequences of 32-bit code points. So a compressed encoding called UTF-8 is used for strings. This encoding stores ASCII characters as themselves, and uses an escape convention to get multibyte coding of non-ASCII strings. A UTF-8 string is almost always much smaller than 4x the number of code points

• A Rust str is like an array, except of UTF-8 text. A str is unsized, so it is really only useful in certain type declarations

## String and &str

• Let's just refer to &str and String values collectively as "strings"

• An &str is a reference to a str. It is a fat pointer that contains the size of the &str in bytes. The normal borrow rules apply

• A String is an owned reference to a str. It is a fat pointer that contains the size of the str in bytes.

• Because String is owned, you can modify the contained bytes. However, all the methods provided for this are guaranteed to preserve UTF-8 encoding of the bytes. This is carefully tuned to avoid trouble

• You can get an iterator over a string's code points (.chars()) or u8 bytes (.bytes())

• There is no convenient way to go to a given character (code point) position in a string. If you plan to do that a lot, use chars().collect()

## String Methods

• There are the obvious methods for converting these things around. Read the book carefully to learn about the vocabulary

• Many of the string manipulation functions take a "pattern". There's a lot of kinds: read p. 402 of the book for the details. You will use them all, eventually

• There's a regex package in the library. It's OK.

• You can use from_str() or .parse() to convert a string into something else. You can use to_string() to convert other things into String

• You can use .as_bytes() and .into_bytes() to grab the bytes of a string for free

• The ::from_utf8() methods come in checked, lossy and unsafe flavors. Choose wisely

Last modified: Monday, 1 July 2019, 1:37 AM