If we need a list that contains only numbers, using an array is more efficient than using a list. Arrays also support all operations related to mutable sequences, such as removing an element from a list (.pop), inserting an element (.insert), and appending multiple values from another sequence at the end of the list at once (.extend). In addition, arrays define more efficient ways to read (.frombytes) and write (.tofile) from files.

Creating an array requires a type code, such as array(‘d’), which is used to represent the underlying C data type. Generally we use Python is written in C language implementation, so it is also called CPython.

Python defines the following type codes:

The type code C type Python types Bytes of annotation
‘b’ signed char int 1
‘B’ unsigned char int 1
‘u’ Py_UNICODE Unicode characters 2 (1)
‘h’ signed short int 2
‘H’ unsigned short int 2
‘i’ signed int int 2
‘I’ unsigned int int 2
‘l’ signed long int 4
‘L’ unsigned long int 4
‘q’ signed long long int 8
‘Q’ unsigned long long int 8
‘f’ float float 4
‘d’ double float 8

Note (1) : The ‘u’ type code corresponds to an obsolete Unicode character in Python (Py_UNICODE is wchar_t). Depending on the system platform, it may be 16 or 32 bits.

For example, if a b-type code represents a signed char, array(‘ b ‘) creates an array that can hold only one byte of integers, ranging from -128 to 127. By doing this, you can save space even if the sequence is long and has a lot of numbers.

If an array is typed, it cannot hold data that is not of a defined type.

Luciano Ramalho gives an example to illustrate the efficiency of arrays. An array of 10 million random floating-point numbers is created, the data is written, and the data is read.

from array import array
from random import random

floats = array('d', (random() for i in range(10 ** 7)))
logging.info('floats[-1] -> %s', floats[-1])

fp = open('floats.bin', 'wb')
floats.tofile(fp)
fp.close()

floats2 = array('d')
fp = open('floats.bin', 'rb')
floats2.fromfile(fp, 10 ** 7)
fp.close()
logging.info('floats2[-1] -> %s', floats2[-1])
logging.info('floats2==floats -> %s', floats2 == floats)
Copy the code

Running results:

Info-floats [-1] -> 0.9160358679542017 INFo-floatS2 [-1] -> 0.9160358679542017 INFo-floatS2 ==floats -> TrueCopy the code

The code performance is analyzed through the cProfile module, and the following results are output:

Info-192 Function calls (180 Primitive calls) in 0.098 seconds Ordered by: Cumulative time ncalls tottime perCall cumtime percall filename: Lineno (function) 1 0.061 0.061 0.061 0.061 {method cumulative time ncalls tottime percall cumtime percall filename: Lineno (function) 1 0.061 0.061 0.061 0.061 'fromfile' of 'array.array' objects} 1 0.030 0.030 0.030 0.030 0.030 {method 'tofile' of 'array.array' objects} 2 0.007 0.003 0.003 {built-in method IO. Open}...Copy the code

As you can see, it takes about 0.01 seconds to create an array of 10 million random floating point numbers and to read and write files. The resulting file size is about 73M.

  1. Start by creating an iterable using a generator expression,**Represents a power, then generates a double-precision floating-point array (type code ‘d’);
  2. The -1 index of an array retrieves the last element of the array.
  3. “Wb” opens the file in binary write mode. W is short for write. And b is short for binary;

Binary / ˈ ba ɪ n goes to ri

using only 0 and 1 as a system of numbers 4. When creating an array, you can initialize it or create an empty array without initialization, such as array(‘d’). 5. The second input to the fromfile() method specifies the maximum range of values; 6. You can see that the array read from the file is exactly the same as the array saved.

Because array.tofile writes data to a binary file, it is much faster than writing directly to a text file. According to statistics, the difference in performance between the two can be nearly seven times.