Author: worcsrcsgg
background
Jiacai Liu, a classmate in our team, mentioned in the previous article that the pointer to trait OBj is a fat pointer:
Rust uses fat Pointers (two Pointers) to represent references to trait objects, one to data and one to vtable.
In addition, the team uses some C libraries lib, such as rust-Rocksdb, in which the data structure that encapsulates C frequently appears #[repr(C)].
This paper is expanded under the extension of the above two problems, learning to explore the memory layout of Rust data types.
It is mainly divided into two parts: one is the basic memory layout of data types in Rust; the other is the representation of memory layout.
Commonly used type
The layout of a type is its size, alignment, and the relative offsets of its fields. For enumerations, how the discriminants are laid out and interpreted is also part of the type layout. For a Sized data type, the memory layout is known at compile time and the size and align can be obtained from size_of and ALIGN_of.
The layout of a type is its size, alignment, and the relative offsets of its fields.
For enums, how the discriminant is laid out and interpreted is also part of type layout.
Type layout can be changed with each compilation.
Copy the code
Numeric types
Integer types
Type | Minimum | Maximum | size(bytes) | align(bytes) |
---|---|---|---|---|
u8 |
0 | 28– 1 | 1 | 1 |
u16 |
0 | 216– 1 | 2 | 2 |
u32 |
0 | 232– 1 | 4 | 4 |
u64 |
0 | 264– 1 | 8 | 8 |
u128 |
0 | 2128– 1 | 16 | 16 |
Type | Minimum | Maximum | size(bytes) | align(bytes) |
---|---|---|---|---|
i8 |
– (27) | 27– 1 | 1 | 1 |
i16 |
– (215) | 215– 1 | 2 | 2 |
i32 |
– (231) | 231– 1 | 4 | 4 |
i64 |
– (263) | 263– 1 | 8 | 8 |
i128 |
– (2127) | 2127– 1 | 16 | 16 |
Floating point Numbers
The IEEE 754-2008 “binary32” and “binary64” floating-point types are f32
and f64
, respectively.
Type | size(bytes) | align(bytes) |
---|---|---|
f32 | 4 | 4 |
f64 | 8 | 8 |
F64 is aligned to 4 bytes on x86 systems.
usized & isized
Usize unsigned integer, isize signed integer. The value is 8 bytes on a 64-bit system and 4 bytes on a 32-bit system.
bool
The value can be true or false. The length and alignment length is 1 byte.
array
let array: [i32; 3] = [1.2.3];
Copy the code
The memory layout of arrays is an ordered combination of tuples of system types.
Size n*size_of::<T>() align is align_of::<T>()Copy the code
str
The type char
Char: a 32-bit character, a Unicode Scalar Value. Unicode Scalar Value is in the 0x0000-0xD7FF or 0xe000-0x10FFFF.
STR type
STR represents a U8 slice as [u8] does. The standard library in Rust has an assumption about STR: that STR is utF-8. The memory layout is the same as [U8].
slice
Slice is of type DST and is a view of a sequence of type T. Slice must be used through Pointers. &[T] is a fat pointer that holds the address and number of elements to the data. The memory layout of a slice is the same as the array part it points to.
&The difference between STR and String
The following is the memory structure comparison for &str String:
let mut my_name = "Pascal".to_string();
my_name.push_str( " Precht");
let last_name = &my_name[7. ] ;Copy the code
String
Buffer/capacity / / length / / / + - + - + - + stack frame, │ │ │ │ 6 8, < - my_name: String + - + - + - + │ │ - [- │ -- -- -- -- capacity -- -- -- -- -- -] │ + - V - + - + - + - + - + - + - + - + heap P │ │ │ │ a s c │ │ a L │ │ │ + - + - + - + - + - + - + - + - + [-- -- -- - length -- -- -- --]Copy the code
String vs &str
my_name: String last_name: & STR [-- -- -- -- -- --] [-- -- -- -] + - + - + - + - + - + - + stack frame, │ │ │ 16, 13 │ │ │ │ 6 + - │ - + - + - + - + - + + - │ - │ │ │ + - - - - - - + │ │ │ │ │ │ [- - - - - - - the STR -- -- -- -- -] + - V - + - + - + - + - + - + - + - V - + - + - + - + - + - + - + - + - + heap P │ │ │ │ │ s c a │ │ a l P │ │ │ │ │ │ c h e t r │ │ │ │ + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +Copy the code
struct
Structs are named compound types. There are several types of structs: StructExprStruct
struct A {
a: u8,}Copy the code
StructExprTuple
struct Position(i32.i32.i32);
Copy the code
StructExprUnit
struct Gamma;
Copy the code
See Section 2, Data Layout – Data Alignment, for the detailed memory layout.
tuple
A tuple is an anonymous compound type. There are several types of tuples:
() (unit)
(f64, f64)
(String, i32)
(i32, String) (different type from the previous example)
(i32, f64, Vec<String>, Option<bool>)
Copy the code
The structure of a tuple is the same as that of a Struct, except that the elements are accessed through an index.
closure
A closure is equivalent to a structure that captures a variable, implementing either FnOnce or FnMut or Fn.
fn f<F : FnOnce() - >String> (g: F) {
println!("{}", g());
}
let mut s = String::from("foo");
let t = String::from("bar");
f(|| {
s += &t;
s
});
// Prints "foobar".
Copy the code
Generate a closure type:
struct Closure<'a> {
s : String,
t : &'a String,}impl<'a> FnOnce< > ()for Closure<'a> {
type Output = String;
fn call_once(self) - >String {
self.s += &*self.t;
self.s
}
}
f(Closure{s: s, t: &t});
Copy the code
union
The key feature of a union is that all fields of the union share common storage. Thus, writes to one field of the union can override its other fields, and the size of the union is determined by the size of its largest field.
#[repr(C)]
union MyUnion {
f1: u32,
f2: f32,}Copy the code
Each union access interprets the storage only on the type of field used for access. Read union field Reads the union bit at the field type. Fields may have non-zero offsets (unless C notation is used); In this case, bits starting at the field offset are read. It is the programmer’s responsibility to ensure that the data is valid on the type of the field. Failing to do so results in undefined behavior. For example, if you read the integer 3, but you want to convert it to bool, you get an error.
enum
enum Animal {
Dog(String.f64),
Cat { name: String, weight: f64}},let mut a: Animal = Animal::Dog("Cocoa".to_string(), 37.2);
a = Animal::Cat { name: "Spotty".to_string(), weight: 2.7 };
Copy the code
Enumeration item declaration types and many variants, each of which is named independently and has the syntax of a struct, tuple struct, or unit-like struct. An enum is a union of named labels, so the memory consumed by its value is the memory of the largest variable of the corresponding enumeration type and the size required to store the discriminant.
use std::mem;
enum Foo { A(&'static str), B(i32), C(i32)}assert_eq!(mem::discriminant(&Foo::A("bar")), mem::discriminant(&Foo::A("baz")));
assert_eq!(mem::discriminant(&Foo::B(1)), mem::discriminant(&Foo::B(2)));
assert_ne!(mem::discriminant(&Foo::B(3)), mem::discriminant(&Foo::C(3)));
Copy the code
enum Foo {
A(u32),
B(u64),
C(u8),}struct FooRepr {
data: u64.U64, U32, or U8, depending on the tag
tag: u8.// 0 = A, 1 = B, 2 = C
}
Copy the code
trait obj
Official definition:
A trait object is an opaque value of another type that implements a set of traits.
The set of traits is made up of an object safe base trait plus any number of auto traits.
Copy the code
Trait OBj is of type DST. The pointer to trait OBj is also a needle, pointing to data and vtable, respectively. A more detailed description is available
Dynamically Sized Types (DST)
In general, for most types, size and alignment properties can be determined at compile time, and the Sized trait ensures this. The size (? Sized) and DST. DST types include Slice and trait OBj. The DST type must be used through Pointers. Note:
- DST can be used as a generic parameter, but note that the generic parameter defaults to Sized. If the type is DST, you need to specify? Sized.
struct S {
s: i32
}
impl S {
fn new(i: i32) -> S {
S{s:i}
}
}
trait T {
fn get(&self) - >i32;
}
impl T for S {
fn get(&self) - >i32 {
self.s
}
}
fn test<R: T>(t: Box<R>) -> i32 {
t.get()
}
fn main() {
let t: Box<T> = Box::new(S::new(1));
let _ = test(t);
}
Copy the code
A compiler error
error[E0277]: the size for values of type `dyn T` cannot be known at compilation time | 21 | fn test<R: T>(t: Box<R>) -> i32 { | - required by this bound in `test` ... 28 | let _ = test(t); | ^ doesn't have a size known at compile-time | = help: the trait `Sized` is not implemented for `dyn T` help: consider relaxing the implicit `Sized` restriction | 21 | fn test<R: T + ? Sized>(t: Box<R>) -> i32 { | ^^^^^^^^Copy the code
fix it
fn test<R: T + ?Sized>(t: Box<R>) -> i32 {
t.get()
}
Copy the code
- Are traits implemented by default? Sized.
- Structures can actually store a DST directly as their last member field, but this also makes the structure DST. You can refer to DST to learn more about user-defined DST.
ZST, Zero Sized Type
struct Nothing; // No fields = no size
// All fields have no size = no size
struct LotsOfNothing {
foo: Nothing,
qux: (), // empty tuple has no size
baz: [u8; 0].// empty array has no size
}
Copy the code
One of the most extreme examples of ZST is Set and Map. We already have the type Map
, so the common way to implement Set
is to simply encapsulate a Map
. Many languages have to allocate space for UselessJunk, store it, load it, and then simply discard it without doing anything. It is difficult for the compiler to determine that these actions are actually unnecessary. But in Rust, we can just say Set
= Map
. Rust statically knows that all load and store operations are useless and does not actually allocate space. As a result, this generic code is simply an implementation of a HashSet, with no extra processing of values from a HashMap.
Empty Types
enum Void {} // No variants = EMPTY
Copy the code
A major application scenario for empty types is to declare unreachable at the type level. For example, an API usually needs to return a Result, but in the special case it will never fail. In this case, by setting the return value to Result<T, Void>, the API caller can confidently use unwrap because it is impossible to produce a Void value, so the return value cannot be an Err.
Data layout
The data aligned
Data alignment has significant benefits for BOTH CPU operations and caching. The alignment property of a structure in Rust is equal to the largest alignment property of all its members. Rust fills in blank data where necessary to ensure that each member is properly aligned and that the size of the entire type is an integer multiple of the aligned property. Such as:
struct A {
a: u8,
b: u32,
c: u16,}Copy the code
Print the address of the variable, and you can see that the alignment attribute is 4.
fn main() {
let a = A {
a: 1,
b: 2,
c: 3};println!("0x{:X} 0x{:X} 0x{:X}", &a.a as *const u8 as usize, &a.b as *const u32 as usize , &a.c as *const u16 as usize)}0x7FFEE6769276
0x7FFEE6769270
0x7FFEE6769274
Copy the code
Data alignment in Rust
struct A {
b: u32,
c: u16,
_pad1: [u8; 2],
a: u8,
_pad2: [u8; 3],}Copy the code
Compiler optimization
Let’s look at this structure
struct Foo<T, U> {
count: u16,
data1: T,
data2: U,
}
Copy the code
fn main() {
let foo1 = Foo::<u16.u32> {
count: 1,
data1: 2,
data2: 3};let foo2 = Foo::<u32.u16> {
count: 1,
data1: 2,
data2: 3};println!("0x{:X} 0x{:X} 0x{:X}", &foo1.count as *const u16 as usize, &foo1.data1 as *const u16 as usize, &foo1.data2 as *const u32 as usize);
println!("0x{:X} 0x{:X} 0x{:X}", &foo2.count as *const u16 as usize, &foo2.data1 as *const u32 as usize, &foo2.data2 as *const u16 as usize);
}
0x7FFEDFDD61C4 0x7FFEDFDD61C6 0x7FFEDFDD61C0
0x7FFEDFDD61CC 0x7FFEDFDD61C8 0x7FFEDFDD61CE
Copy the code
Foo1: data1(8), count(c), data2(e) The principle of memory optimization requires that different paradigms can have different order of members. If not optimized, the following may occur, resulting in a large memory overhead:
struct Foo<u16.u32> {
count: u16,
data1: u16,
data2: u32,}struct Foo<u32.u16> {
count: u16,
_pad1: u16,
data1: u32,
data2: u16,
_pad2: u16,}Copy the code
repr(C)
The purpose of repr(C) is simply to keep the memory layout consistent with C. All types that need to interact through FFI should have repr(C). Repr (C) is also necessary if we are going to play with data layout, such as reparsing data into another type. For more information, see Repr (C).
repr(u) repr(i)
These two can specify the size of a no-member enumeration. The value can be U8, U16, U32, U64, U128, USize, i8, I16, I32, I64, I128, and ISize.
enum Enum {
Variant0(u8),
Variant1,
}
#[repr(C)]
enum EnumC {
Variant0(u8),
Variant1,
}
#[repr(u8)]
enum Enum8 {
Variant0(u8),
Variant1,
}
#[repr(u16)]
enum Enum16 {
Variant0(u8),
Variant1,
}
fn main() {
assert_eq!(std::mem::size_of::<Enum>(), 2);
// The size of the C representation is platform dependant
assert_eq!(std::mem::size_of::<EnumC>(), 8);
// One byte for the discriminant and one byte for the value in Enum8::Variant0
assert_eq!(std::mem::size_of::<Enum8>(), 2);
// Two bytes for the discriminant and one byte for the value in Enum16::Variant0
// plus one byte of padding.
assert_eq!(std::mem::size_of::<Enum16>(), 4);
}
Copy the code
repr(align(x)) repr(pack(x))
The align and Packed modifiers can be used to raise or lower the alignment of structures and unions, respectively. Packed can also change the padding between fields. Align enables tricks such as ensuring that adjacent elements of an array never share the same cache line (which can speed up some types of concurrent code). Pack is not easy to use. It should not be used unless extremely requested.
#[repr(C)]
struct A {
a: u8,
b: u32,
c: u16,}#[repr(C, align(8))]
struct A8 {
a: u8,
b: u32,
c: u16,}fn main() {
let a = A {
a: 1,
b: 2,
c: 3};println!("{}", std::mem::align_of::<A>());
println!("{}", std::mem::size_of::<A>());
println!("0x{:X} 0x{:X} 0x{:X}", &a.a as *const u8 as usize, &a.b as *const u32 as usize, &a.c as *const u16 as usize);
let a = A8 {
a: 1,
b: 2,
c: 3};println!("{}", std::mem::align_of::<A8>());
println!("{}", std::mem::size_of::<A8>());
println!("0x{:X} 0x{:X} 0x{:X}", &a.a as *const u8 as usize, &a.b as *const u32 as usize, &a.c as *const u16 as usize); } the result:4
12
0x7FFEE7F0B070 0x7FFEE7F0B074 0x7FFEE7F0B078
8
16
0x7FFEE7F0B1A0 0x7FFEE7F0B1A4 0x7FFEE7F0B1A8
Copy the code
#[repr(C)]
struct A {
a: u8,
b: u32,
c: u16,}#[repr(C, packed(1))]
struct A8 {
a: u8,
b: u32,
c: u16,}fn main() {
let a = A {
a: 1,
b: 2,
c: 3};println!("{}", std::mem::align_of::<A>());
println!("{}", std::mem::size_of::<A>());
println!("0x{:X} 0x{:X} 0x{:X}", &a.a as *const u8 as usize, &a.b as *const u32 as usize, &a.c as *const u16 as usize);
let a = A8 {
a: 1,
b: 2,
c: 3};println!("{}", std::mem::align_of::<A8>());
println!("{}", std::mem::size_of::<A8>());
println!("0x{:X} 0x{:X} 0x{:X}", &a.a as *const u8 as usize, &a.b as *const u32 as usize, &a.c as *const u16 as usize); } the result:4
12
0x7FFEED627078 0x7FFEED62707C 0x7FFEED627080
1
7
0x7FFEED6271A8 0x7FFEED6271A9 0x7FFEED6271AD
Copy the code
repr(transparent)
Repr (transparent) is used on structs or enums that have only a single field and is intended to tell the Rust compiler that new types are only used in Rust and that new types (struc or enum) need to be ignored by the ABI. The new type of memory layout should be treated as a single field.
The attribute can be applied to a newtype-like structs that contains a single field. It indicates that the newtype should be represented exactly like that field's type, i.e., the newtype should be ignored for ABI purpopses: not only is it laid out the same in memory, it is also passed identically in function calls. Structs and enums with this representation have the same layout and ABI as the single non-zero sized field.Copy the code
conclusion
The preceding information describes the memory layout of common data types in Rust
reference
- Type system
- Data Layout
- Item
- String vs &str in Rust
- Data layout
- enter-reprtransparent
About us
We are the time series storage team at The Ant Intelligence Monitoring Technology Center. We are using Rust to build a new generation of time series databases with high performance, low cost and real-time analysis capabilities. Please contact: [email protected]