String

Let’s think about how much memory the String variable takes up.

var str1 = "0123456789"
print(Mems.size(ofVal: &str1)) // 16
print(Mems.memStr(ofVal: &str1)) // 0x3736353433323130 0xea00000000003938
Copy the code

We can see by printing that the String variable takes up 16 bytes, and printing the memory layout takes up 8 bytes in front and 8 bytes in front

Let’s take a look at disassembly

You can see that these instructions allocate the first and last eight bytes to the String variable

So what is stored underneath the String variable?

We can see from the above that the 16 byte value of the String variable is actually the corresponding ASCII code value converted

The address of the ASCII code table is www.ascii-code.com

As we can see from the figure above, the hexadecimal ASCII code values of 0~9 correspond to the left side, and because of the principle of small endian mode high byte high address, low byte low address, the comparison is exactly the data stored in the 16 bytes we print

0x3736353433323130 0xea00000000003938
Copy the code

And then we look at the first 8 byteseandaThe two aretypeandThe length of the

If the String’s data is stored directly in a variable, it is typed with e; if it is stored elsewhere, it is typed with a different letter

Our String character is exactly 10 in length, so it’s a hexadecimal A

var str1 = "0123456789ABCDE"
print(Mems.size(ofVal: &str1)) // 16
print(Mems.memStr(ofVal: &str1)) // 0x3736353433323130 0xef45444342413938
Copy the code

If we print the String variable above, we find that the length value is exactly F, and the next seven bytes are filled, so we can only store 15 bytes of data in this way

This is very similar to the way Tagger Pointer is stored in OC

What does a String variable look like if the data stored is more than 15 characters long?

We change the value of the String variable, and then we print it

var str1 = "0123456789ABCDEF"
print(Mems.size(ofVal: &str1)) // 16
print(Mems.memStr(ofVal: &str1)) // 0xd000000000000010 0x80000001000079a0
Copy the code

We found that the memory footprint of the String variable was still 16 bytes, but the memory layout was completely different

At this point we need to use disassembly to further analysis

If you look at the figure above, you can see that the first eight bytes of the String variable will be allocated, but the function will be called and the return value will be given to the first eight bytes of the String variable

And pass in the value of the string and the length of the string as arguments, so let’s see what happens in the function called

We can see that the function internally adds the value of a mask to the address value of the String variable and stores it in the last 8 bytes of the String variable

So we can work backwards to figure out the actual address of the stored data

0x80000001000079a0 - 0x7fffffffffffffe0 = 0x1000079C0
Copy the code

It’s essentially the value that was originally stored in the RDI

By printing the real address value, you can see that the 16 bytes are indeed storing the corresponding ASCII code value

So where is the real data stored?

By looking at its address we can guess that it is in the data section. To verify this, use MachOView to directly see where this code is actually stored in the executable file

We find the executable in the project and right-click Show in Finder

And then right click and open it through MachOView

Finally we find ourselves in the string constant area of the code snippet

Compares the storage locations of two strings

Let’s now check to see if the two strings are stored in the same location

var str1 = "0123456789"
var str2 = "0123456789ABCDEF"
Copy the code

We open the executable again with MachOView and find that the real addresses of both strings are placed in the string constant area of the code snippet and are 16 bytes apart

Then we look at the first eight bytes of the printed address

0xd000000000000010 0x80000001000079a0
Copy the code

Presumably 10 is also a hexadecimal length, with the d in front representing this type

We change the value of the string and find that, sure enough, the length value changes as well

var str2 = "0123456789ABCDEFGH"
print(Mems.size(ofVal: &str2)) // 16
print(Mems.memStr(ofVal: &str2)) // 0xd000000000000012 0x80000001000079a0
Copy the code

What if we concatenate two String variables separately?

var str1 = "0123456789"
str1.append("G")

print(Mems.size(ofVal: &str1)) // 16
print(Mems.memStr(ofVal: &str1)) // 0x3736353433323130 0xeb00000000473938

var str2 = "0123456789ABCDEF"
str2.append("G")

print(Mems.size(ofVal: &str2)) // 16
print(Mems.memStr(ofVal: &str2)) // 0xf000000000000011 0x0000000100776ed0
Copy the code

We found that the last 8 bytes of str1 had room for new strings, so we continued to store them in memory variables

The memory layout of the STR2 is different. The first eight bytes can be seen as f, and the string length is 11 in hexadecimal. The next 8 bytes of address are much like the address value of heap space

Verify that the String variable is stored in heap space

To verify our conjecture, let’s use disassembly to observe

We create an instance variable of the class before we validate it, and then go in and break the instruction that calls malloc internally

class Person { }

var p = Person()
Copy the code

We then greyback the breakpoint and disassemble the Sting variables

Then turn on the breakpoint of the grayed malloc and enter

The discovery does go into the breakpoint where we called malloc earlier, so this verifies that the heap space memory is actually allocated to store the value of the String variable

We can also use the LLDB instruction BT to print the call stack details for viewing

The malloc call is also made after the append method is called, which verifies our conjecture in this respect

What are the values of str2 stored in the heap space?

We then pass the append function and print the address of STR2, and then print the address of the heap space for the next 8 bytes

Internally offset by 32 bytes, it is the ASCII value of our String variable

conclusion

1. If the string length is less than or equal to 0xF (15 in decimal notation), the string content is directly stored in the memory of the string variable and is stored in small-endian ASCII code mode

The ninth byte stores the type and character length of the string variable

var str1 = "0123456789"
print(Mems.size(ofVal: &str1)) // 16
print(Mems.memStr(ofVal: &str1)) // 0x3736353433323130 0xeb00000000473938
Copy the code

After the string concatenation operation

If the string length after concatenation is equal to or less than 0xF (15 in decimal notation), the storage location is the same as before the concatenation

var str1 = "0123456789"
str1.append("ABCDE")

print(Mems.size(ofVal: &str1)) // 16
print(Mems.memStr(ofVal: &str1)) // 0x3736353433323130 0xef45444342413938
Copy the code

If the concatenated string length is greater than 0xF (15 in decimal), heap space is allocated to store the string contents

In the address value of the string, the first 8 bytes store the type and character length of the string variable, and the last 8 bytes store the address value of the heap space. The heap space address + 0x20 can get the real string content

The first 32 bytes of the heap space address are used to store description information

Since the constant area is located before the program runs, concatenating strings is a run-time operation that cannot be stored in the constant area, so it allocates heap space for storage

var str1 = "0123456789"
str1.append("ABCDEF")

print(Mems.size(ofVal: &str1)) // 16
print(Mems.memStr(ofVal: &str1)) // 0xf000000000000010 0x000000010051d600
Copy the code

2. If the string length is greater than 0xF (15 in decimal), the contents of the string are stored in __text. cstring (constant area).

In the address value of the string, the first 8 bytes store the type and character length of the string variable, and the last 8 bytes store an address value. The address value & mask can get the real address value of the string content in the constant area

var str2 = "0123456789ABCDEF"
print(Mems.size(ofVal: &str2)) // 16
print(Mems.memStr(ofVal: &str2)) // 0xd000000000000010 0x80000001000079a0
Copy the code

After string concatenation operation, open up heap space storage as above

var str2 = "0123456789ABCDEF"
str2.append("G")

print(Mems.size(ofVal: &str2)) // 16
print(Mems.memStr(ofVal: &str2)) // 0xf000000000000011 0x0000000106232230
Copy the code

dyld_stub_binder

We disassemble and see that the underlying string.init method is actually a method in the dynamic library, and the dynamic library is located in memory at the higher address of the Mach-O file, as shown below

So the address value we see here is actually a fake address value, just used as a placeholder

We then follow up and find that it will jump to another address internally and retrieve the address value stored by it that really needs to be called

The address value of the next call is typically 6 bytes apart

0x10000774E + 0x6 = 0x100007754 0x100007754 + 0x48BC (%rip) = 0x10000C010 Finally, find the address value 0x100007858 in 0x10000C010Copy the code

It then follows through to dyLD_STUB_binder, the dynamic library, for binding

Init is actually executed in the dynamic library, and you can see that its real address value is very large, which is also a side effect of the dynamic library is higher in the executable file location

Then we make the next call to string.init

If you go in and realize that this is the address to jump to, it’s already the real address of string.init in the dynamic library

This also shows that dyLD_STUB_binder only executes once and is called when it is needed, that is, delayed binding

The main purpose of dyLD_STUB_binder is to replace the placeholder addresses of the functions that actually need to be called when the program is running

Array

Let’s think about how much memory an Array variable takes up.

var array = [1, 2, 3, 4]

print(Mems.size(ofVal: &array)) // 8
print(Mems.ptr(ofVal: &array)) // 0x000000010000c1c8
print(Mems.ptr(ofRef: array)) // 0x0000000105862270
Copy the code

We can see from printing that the Array variable takes up 8 bytes and its memory address is the address stored in the global partition

However, we find that the storage space of its memory address stores the address value more like the address of a heap space

Where is the Array variable stored?

In doubt let’s disassemble to observe and break the malloc call

Finding that malloc was indeed called proves that the Array variable allocates heap space internally

After the return value is returned to the Array variable, we print the memory layout of the address value stored in the Array variable and find that its internal offset is 32 bytes where elements 1, 2, 3, and 4 are stored

We can also observe this directly by printing the memory structure

var array = [1, 2, 3, 4]
print(Mems.memStr(ofRef: array))

//0x00007fff88a8dd18
//0x0000000200000003
//0x0000000000000004
//0x0000000000000008
//0x0000000000000001
//0x0000000000000002
//0x0000000000000003
//0x0000000000000004
Copy the code

Let’s adjust the number of elements, print and observe

var array = [Int]() for i in 1... 8 { array.append(i) } print(Mems.memStr(ofRef: array)) //0x00007fff88a8e460 //0x0000000200000003 //0x0000000000000008 //0x0000000000000010 //0x0000000000000001 //0x0000000000000002 //0x0000000000000003 //0x0000000000000004 //0x0000000000000005 //0x0000000000000006 //0x0000000000000007 //0x0000000000000008Copy the code

Notice that the 8-byte position in paragraph 3 also becomes 8, equal to the number of elements we added

The position of the eighth byte in the fourth end becomes 16, indicating that it has doubled in size, presumably storing an increase in capacity

Based on our disassembly and speculation, the internal structure of the Array variable is shown below

The underlying structure of Array and String is more like a reference type data structure, but the surface layer is used as a value type