Introduction to the

There are two ways to store the contents of a file, one is binary, the other is text. If stored as text in a file, reading from the file presents a problem converting the text to Python data types. In fact, even if it is stored as text, the stored data is structured because Python is written in C, which is also called C structure.

Lib/struct.py is the module responsible for this structural transformation.

Methods in struct

Struct (struct)

__all__ = [
    # Functions
    'calcsize'.'pack'.'pack_into'.'unpack'.'unpack_from'.'iter_unpack'.# Classes
    'Struct'.# Exceptions
    'error'
    ]
Copy the code

There are 6 methods and 1 exception.

We will focus on the use of these six methods:

The method name role
struct.pack(format, v1, v2, …) Returns a bytes object containing a string based on the formatformatThe value of the packagev1.v2. The number of arguments must match exactly the values required by the format string.
struct.pack_into(format, buffer, offset, v1, v2, …) According to the format stringformatpackagingv1.v2. And the packed bytes fromoffsetThe starting position is written to the writable bufferbuffer. Please note thatoffsetIs a required parameter.
struct.unpack(format, buffer) According to the format stringformatFrom the bufferbufferUnpacking (assumed bypack(format, ...)Packaging). The result is returned as a tuple, even if it contains only one entry. The buffer’s byte size must match the size required by the format.
struct.unpack_from(format, /, buffer, offset=0) Unpack buffer according to the format string format, starting at position offset. The result is a tuple, even if it contains only one entry.
struct.iter_unpack(format, buffer) According to the format stringformatIteratively from the bufferbufferUnpack. This function returns an iterator that will read a block of the same size from the buffer until its contents are exhausted.
struct.calcsize(format) Returns and format stringformatThe size of the corresponding structure (i.epack(format, ...)The size of the resulting byte object).

One of the most important arguments is format, also known as the format string, which specifies the format in which each string is packaged.

Format string

Format strings are the mechanism used to specify data formats when packaging and unpacking data. They are built with format characters that specify the type of data being packaged/unpackaged. In addition, special characters are used to control byte order, size, and alignment.

Byte order, size, and alignment

By default, C types are represented in the machine’s native format and byte order, and are properly aligned by padding bytes when necessary (according to the rules used by the C compiler).

We can also manually specify the byte order, size, and alignment of the format string:

character Byte order The size of the alignment
@ According to the original byte According to the original byte According to the original byte
= According to the original byte standard There is no
< The small end standard There is no
> Big end standard There is no
! Network (= big end) standard There is no

Big – end and small – end are two types of data storage.

The first Big Endian stores the most significant byte at the start address

The second Little Endian stores the byte of position at the start address

In fact, Big Endian is more in line with the reading and writing habits of humans, while Little Endian is more in line with the reading and writing habits of machines.

The PowerPC series uses big Endian mode to store data, while the x86 series uses little Endian mode to store data.

If different CPU architectures communicate directly, problems can arise due to different read orders.

Padding is added automatically only between contiguous structure members. Padding is not added to the beginning and end of the encoded structure.

When using non-original byte size and alignment i.e. ‘<‘, ‘>’, ‘=’, and ‘! ‘will not add any padding.

Format characters

Let’s take a look at the formats for characters:

format C type Python types Standard size (bytes)
x Padding bytes There is no
c char A byte string of length 1 1
b signed char The integer 1
B unsigned char The integer 1
? _Bool bool 1
h short The integer 2
H unsigned short The integer 2
i int The integer 4
I unsigned int The integer 4
l long The integer 4
L unsigned long The integer 4
q long long The integer 8
Q unsigned long long The integer 8
n ssize_t The integer
N size_t The integer
e (6) Floating point Numbers 2
f float Floating point Numbers 4
d double Floating point Numbers 8
s char[] Byte string
p char[] Byte string
P void * The integer

Digital format

For example, if we want to wrap an int object, we can write:

In [101] :from struct import *

In [102]: pack('i'.10)
Out[102] :b'\n\x00\x00\x00'

In [103]: unpack('i'.b'\n\x00\x00\x00')
Out[103] : (10,)
  
In [105]: calcsize('i')
Out[105] :4
Copy the code

In the above example, we packaged an int object 10 and then unpacked it. And it calculates the length of the I format to be 4 bytes.

As you can see, the output is b’\n\x00\x00\x00′. Without going into the details of what this output means, the b at the beginning is byte, followed by the byte encoding.

Format characters can be preceded by an integer repeat count. For example, the format string ‘4h’ has exactly the same meaning as’ HHHH ‘.

See how to package four short types:

In [106]: pack('4h'.2.3.4.5)
Out[106] :b'\x02\x00\x03\x00\x04\x00\x05\x00'

In [107]: unpack('4h'.b'\x02\x00\x03\x00\x04\x00\x05\x00')
Out[107] : (2.3.4.5)

Copy the code

Whitespace characters between formats are ignored, but not in the struct.calcsize method.

When using a certain integer format (‘ b ‘, ‘b’, ‘h’, ‘h’, ‘I’, ‘I’, ‘l’, ‘l’, ‘q’ and ‘q’) packaging value x, if x is outside the format of the effective range will trigger a struct. The error.

Format characters

After numbers, the most common ones are characters and strings.

Let’s first look at how to use format characters, since characters are 1 byte long, we need to do this:

In [109]: pack('4c'.b'a'.b'b'.b'c'.b'd')
Out[109] :b'abcd'

In [110]: unpack('4c'.b'abcd')
Out[110] : (b'a'.b'b'.b'c'.b'd')

In [111]: calcsize('4c')
Out[111] :4
Copy the code

The b before the character indicates that it is a character that would otherwise be treated as a string.

Format string

Let’s look at the format of the string:

In [114]: pack('4s'.b'abcd')
Out[114] :b'abcd'

In [115]: unpack('4s'.b'abcd')
Out[115] : (b'abcd',)

In [116]: calcsize('4s')
Out[116] :4

In [117]: calcsize('s')
Out[117] :1
Copy the code

You can see that calcsize returns the length of bytes for strings.

Effect of filling

The order of format characters can have an impact on size because the padding required to meet alignment requirements is different:

>>> pack('ci'.b'*'.0x12131415)
b'*\x00\x00\x00\x12\x13\x14\x15'
>>> pack('ic'.0x12131415.b'*')
b'\x12\x13\x14\x15*'
>>> calcsize('ci')
8
>>> calcsize('ic')
5
Copy the code

The following example will show how to manually influence the fill effect:

In [120]: pack('llh'.1.2.3)
Out[120] :b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00'
Copy the code

In the example above, we pack the numbers 1, 2, and 3 in different formats: long, long, and short.

Because long is 4 bytes and short is 2 bytes, it is inherently unaligned.

If we want alignment, we can manually populate it by adding 0L to indicate zero longs:

In [118]: pack('llh0l'.1.2.3)
Out[118] :b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'

In [122]: unpack('llh0l'.b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')
Out[122] : (1.2.3)
Copy the code

Complex applications

Finally, let’s look at a more complex application that reads directly from unpacked data into a tuple:

>>> record = b'raymond \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student'.'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
Copy the code

This article is available at www.flydean.com/13-python-s…

The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!

Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!