If we need a list that contains only numbers, using an array is more efficient than using a list. Arrays also support all operations related to mutable sequences, such as removing an element from a list (.pop), inserting an element (.insert), and appending multiple values from another sequence at the end of the list at once (.extend). In addition, arrays define more efficient ways to read (.frombytes) and write (.tofile) from files.
Creating an array requires a type code, such as array(‘d’), which is used to represent the underlying C data type. Generally we use Python is written in C language implementation, so it is also called CPython.
Python defines the following type codes:
The type code | C type | Python types | Bytes of | annotation |
---|---|---|---|---|
‘b’ | signed char | int | 1 | |
‘B’ | unsigned char | int | 1 | |
‘u’ | Py_UNICODE | Unicode characters | 2 | (1) |
‘h’ | signed short | int | 2 | |
‘H’ | unsigned short | int | 2 | |
‘i’ | signed int | int | 2 | |
‘I’ | unsigned int | int | 2 | |
‘l’ | signed long | int | 4 | |
‘L’ | unsigned long | int | 4 | |
‘q’ | signed long long | int | 8 | |
‘Q’ | unsigned long long | int | 8 | |
‘f’ | float | float | 4 | |
‘d’ | double | float | 8 |
Note (1) : The ‘u’ type code corresponds to an obsolete Unicode character in Python (Py_UNICODE is wchar_t). Depending on the system platform, it may be 16 or 32 bits.
For example, if a b-type code represents a signed char, array(‘ b ‘) creates an array that can hold only one byte of integers, ranging from -128 to 127. By doing this, you can save space even if the sequence is long and has a lot of numbers.
If an array is typed, it cannot hold data that is not of a defined type.
Luciano Ramalho gives an example to illustrate the efficiency of arrays. An array of 10 million random floating-point numbers is created, the data is written, and the data is read.
from array import array
from random import random
floats = array('d', (random() for i in range(10 ** 7)))
logging.info('floats[-1] -> %s', floats[-1])
fp = open('floats.bin', 'wb')
floats.tofile(fp)
fp.close()
floats2 = array('d')
fp = open('floats.bin', 'rb')
floats2.fromfile(fp, 10 ** 7)
fp.close()
logging.info('floats2[-1] -> %s', floats2[-1])
logging.info('floats2==floats -> %s', floats2 == floats)
Copy the code
Running results:
Info-floats [-1] -> 0.9160358679542017 INFo-floatS2 [-1] -> 0.9160358679542017 INFo-floatS2 ==floats -> TrueCopy the code
The code performance is analyzed through the cProfile module, and the following results are output:
Info-192 Function calls (180 Primitive calls) in 0.098 seconds Ordered by: Cumulative time ncalls tottime perCall cumtime percall filename: Lineno (function) 1 0.061 0.061 0.061 0.061 {method cumulative time ncalls tottime percall cumtime percall filename: Lineno (function) 1 0.061 0.061 0.061 0.061 'fromfile' of 'array.array' objects} 1 0.030 0.030 0.030 0.030 0.030 {method 'tofile' of 'array.array' objects} 2 0.007 0.003 0.003 {built-in method IO. Open}...Copy the code
As you can see, it takes about 0.01 seconds to create an array of 10 million random floating point numbers and to read and write files. The resulting file size is about 73M.
- Start by creating an iterable using a generator expression,
**
Represents a power, then generates a double-precision floating-point array (type code ‘d’); - The -1 index of an array retrieves the last element of the array.
- “Wb” opens the file in binary write mode. W is short for write. And b is short for binary;
Binary / ˈ ba ɪ n goes to ri
using only 0 and 1 as a system of numbers 4. When creating an array, you can initialize it or create an empty array without initialization, such as array(‘d’). 5. The second input to the fromfile() method specifies the maximum range of values; 6. You can see that the array read from the file is exactly the same as the array saved.
Because array.tofile writes data to a binary file, it is much faster than writing directly to a text file. According to statistics, the difference in performance between the two can be nearly seven times.