Article series address

  • NumPy array
  • NumPy Tutorial (2) : Data types
  • NumPy tutorial (3) : NDARray internals and advanced iterations

NumPy data type

1. Data types in NumPy

NumPy supports a wider variety of numeric types than Python. The data types listed in the following table are all built-in to NumPy. To distinguish them from Python’s native data types, bool, int, float, complex, and STR are all named with _ at the end.

Print (numpy.dtype) displays all numpy data types, not Python native data types.

Type the name describe
bool_ Boolean type
Unicode_ / unicode/str_ / str0 (zero non-letter O) Unicode string
int8 / byte
int16 / short
int32 / intc / int_ / long
Int64 / longlong/inTP/int0 (zero non-letter O)
uint8 / ubyte
uint16 / ushort
uint32 / uintc
Uint64 / ULONGLong/UINTP/uint0 (zero non-letter O)
float16 / half A semi-precision floating point number, including 1 sign bit, 5 exponent bits, and 10 mantissa bits
float32 / single Single-precision floating point number, including: 1 sign bit, 8 exponent bits, 23 mantissa bits
float64 / float_ / double A double – precision floating-point number, including 1 sign bit, 11 exponent bits, and 52 mantissa bits
complex64 / singlecomplex Complex number, representing double 32-bit floating point numbers (real and imaginary parts)
complex128 / complex_ / cfloat / cdouble /

longcomplex / clongfloat / clongdouble
Complex number, representing double 64-bit floating point numbers (real and imaginary parts)
datetime64 Date and time types supported by NumPy 1.7
timedelta64 Represents the interval between two times

Here a little do not understand, I am win7 64-bit system, the above type is I measured, but, I look at the source code, which is defined as follows. To be safe, use int32, int64, and other unambiguous types.

int_ = long
intp = long
int64 = long
int0 = long

class long(signedinteger):
    """ 64-bit integer. Character code 'l'. Python int compatible. """
    pass
Copy the code

Addendum: the concept of complex numbers

We call numbers of the form z=a+bi (both a and b are real numbers) complex numbers, where a is the real part, b is the imaginary part, and I is the imaginary unit. When the imaginary part b=0, the complex number z is real; When the imaginary part b! =0, the complex number z is imaginary; When the imaginary part b! =0, and the real part a=0, the complex number z is pure imaginary.

Use of dateTime64

ⅰ. Simple example

Example 1:

import numpy as np

a = np.datetime64('2019-03-01')
print(a)
Copy the code

Output:

2019-03-01
Copy the code

Example 2:

import numpy as np

a = np.datetime64('the 2019-03')
print(a)
Copy the code

Output:

The 2019-03Copy the code

See, you can only display up to “month”, isn’t that great?

ⅱ. Unit use

Datetime64 can specify the units to use, including years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’), while the time units are hours (‘h’), minutes (‘M’), seconds (‘ S ‘), milliseconds (‘ms’), microseconds (‘ US ‘), nanoseconds (‘ns’), picoseconds (‘ps’), Femtoseconds (‘fs’), Atoseconds (‘ AS ‘).

Example 3: Week (‘W’) is an odd unit, showing the current if it is Thursday, and the previous Thursday if it is not. Later I thought, probably because 1970-01-01 was a Thursday.

import numpy as np

a = np.datetime64('2019-03-07'.'W')
b = np.datetime64('2019-03-08'.'W')
print(a, b)
Copy the code

Output :(2019-03-07)

The 2019-03-07 2019-03-07Copy the code

Example 4: When creating a dateTime64 type from a string, NumPy automatically selects units based on the string by default.

import numpy as np

a = np.datetime64('the 2019-03-08 PM')
print(a.dtype)
Copy the code

Output:

datetime64[m]
Copy the code

Example 5: You can also specify the units to use forcibly.

import numpy as np

a = np.datetime64('the 2019-03'.'D')
print(a)
Copy the code

Output:

2019-03-01
Copy the code

Example 6: by the above example, you can see that the 2019-03 and 2019-03-01 is actually the same time. In fact, if two DateTime64 objects have different units, they may still represent the same moment. And switching from larger units (such as months) to smaller units (such as days) is safe.

import numpy as np

print(np.datetime64('the 2019-03') == np.datetime64('2019-03-01'))
Copy the code

Output:

True
Copy the code

Example 7: When creating a date-time array from a string, convert it to the smallest unit in the array if the unit is not uniform.

import numpy as np

a = np.array(['the 2019-03'.'2019-03-08'.'the 2019-03-08 PM'], dtype='datetime64')
print(a)
print(a.dtype)
Copy the code

Output:

['2019-03-01T00:00' '2019-03-08T00:00' '2019-03-08T20:00']
datetime64[m]
Copy the code

ⅲ. Use with Arange function

** Example 8: ** All days in a month

import numpy as np

a = np.arange('the 2019-02'.'the 2019-03', dtype='datetime64[D]')
print(a)
Copy the code

Output:

['2019-02-01' '2019-02-02' '2019-02-03' '2019-02-04' '2019-02-05'
 '2019-02-06' '2019-02-07' '2019-02-08' '2019-02-09' '2019-02-10'
 '2019-02-11' '2019-02-12' '2019-02-13' '2019-02-14' '2019-02-15'
 '2019-02-16' '2019-02-17' '2019-02-18' '2019-02-19' '2019-02-20'
 '2019-02-21' '2019-02-22' '2019-02-23' '2019-02-24' '2019-02-25'
 '2019-02-26' '2019-02-27' '2019-02-28']
Copy the code

The interval can also be 3 days (‘3D’).

import numpy as np

a = np.arange('the 2019-02'.'the 2019-03', dtype='datetime64[3D]')
print(a)
Copy the code

Output:

['2019-02-01' '2019-02-04' '2019-02-07' '2019-02-10' '2019-02-13'
 '2019-02-16' '2019-02-19' '2019-02-22' '2019-02-25']
Copy the code

Find no, here is less 2019-02-28. I think it’s a BUG. There’s no reason to get rid of it.

ⅳ. Datetime64 and Timedelta64 operations

Example 1: Timedelta64 represents the difference between two datetime64s. Timedelta64 also has units and is consistent with the smaller units in the two Datetime64 subtraction operations.

import numpy as np

a = np.datetime64('2019-03-08') - np.datetime64('2019-03-07')
b = np.datetime64('2019-03-08') - np.datetime64('the 2019-03-07 08:00')
c = np.datetime64('2019-03-08') - np.datetime64('the 2019-03-07 23:00'.'D')

print(a, a.dtype)
print(b, b.dtype)
print(c, c.dtype)
Copy the code

Output:

1 days timedelta64[D]
960 minutes timedelta64[m]
1 days timedelta64[D]
Copy the code

Datetime64 (‘2019-03-07 23:00’, ‘D’) = 1

Example 2:

import numpy as np

a = np.datetime64('the 2019-03') + np.timedelta64(20.'D')
print(a)
Copy the code

Output:

2019-03-21
Copy the code

ⅴ, Timedelta64 single operation

Example 1: ** Generates Timedelta64

import numpy as np

a = np.timedelta64(1.'Y')    Style #
b = np.timedelta64(a, 'M')    Way # 2
print(a)
print(b)
Copy the code

Output:

1 years
12 months
Copy the code

** Example two: ** addition, subtraction, multiplication and division

import numpy as np

a = np.timedelta64(1.'Y')
b = np.timedelta64(6.'M')

print(a + b)
print(a - b)
print(2 * a)
print(a / b)
Copy the code

Output:

18 months
6 months
2 years
2.0
Copy the code

Example 3: However, the two units of year (‘Y’) and month (‘M’) are specially treated so that they cannot be computed with other units. How many days are there in a year? How many hours are there in a month? These are uncertain.

import numpy as np

a = np.timedelta64(1.'M')
b = np.timedelta64(a, 'D')
Copy the code

Output:

TypeError: Cannot cast NumPy timedelta64 scalar from metadata [M] to [D] according to the rule 'same_kind'
Copy the code

Datetime64 = datetime. Datetime = numpy.datetime64 = datetime

import numpy as np
import datetime

dt = datetime.datetime(2018.9.1)
dt64 = np.datetime64(dt, 'D')
print(dt64, dt64.dtype)

dt2 = dt64.astype(datetime.datetime)
print(dt2)
Copy the code

Output:

2018- 09- 01 datetime64[D]
2018- 09- 01
Copy the code

ⅶ. Working day function (Busday)

Busday is a working day by default. The implementation is based on a weekmask that contains seven Boolean flags for weekdays.

** Example 1: ** busDAY_offset BusDAY_offset applies the specified offset to a working day, in ‘D’. For example, calculate the next working day:

import numpy as np

a = np.busday_offset('2019-03-08'.1)
print(a)
Copy the code

Output:

2019-03-11
Copy the code

Example 2: If the current date is a non-working day, an error is reported by default.

import numpy as np

a = np.busday_offset('2019-03-09'.1)
print(a)
Copy the code

Output:

ValueError: Non-business day date in busday_offset
Copy the code

Example 3: You can specify a forward or backward rule to avoid errors.

import numpy as np

a = np.busday_offset('2019-03-09'.1, roll='forward')
b = np.busday_offset('2019-03-09'.1, roll='backward')
print(a)
print(b)

c = np.busday_offset('2019-03-09'.0, roll='forward')
d = np.busday_offset('2019-03-09'.0, roll='backward')
print(c)
print(d)
Copy the code

Output:

The 2019-03-12 2019-03-11 2019-03-11 2019-03-11Copy the code

You can specify an offset of 0 to get the most recent working day forward or backward from the current date, or, of course, return the current date if the current date is itself a working day.

Example 4:

import numpy as np

a = np.busday_offset('the 2019-05'.1, roll='forward', weekmask='Sun')
print(a)
Copy the code

Output:

2019-05-12
Copy the code

Mother’s Day is the second Sunday in May. This example can be used to return to the specific day of Mother’s Day. Weekmask (Mon, Tue, Wed, Thu, Fri, Sat, Sun) specifies the day of the week. The code above means: the second Sunday after 2019-05-01 (don’t forget the index starting from 0).

This function may be useful for americans, but in China, who will tell me the date of the Dragon Boat Festival?

** example 5: **is_busday Returns whether the specified date is a workday.

import numpy as np

a = np.is_busday(np.datetime64('2019-03-08'))
b = np.is_busday('2019-03-09')
print(a)
print(b)
Copy the code

Output:

True
False
Copy the code

** Busday_count Returns the number of working days between two dates.

import numpy as np

a = np.busday_count(np.datetime64('2019-03-01'), np.datetime64('2019-03-10'))
b = np.busday_count('2019-03-10'.'2019-03-01')
print(a)
print(b)
Copy the code

Output:

6 -Copy the code

** Example 7: **count_nonzero counts the number of days in a datetime64[‘D’] array.

import numpy as np

c = np.arange('2019-03-01'.'2019-03-10', dtype='datetime64')
d = np.count_nonzero(np.is_busday(c))
print(d)
Copy the code

Output:

6
Copy the code

Example 8: Customize the weekly mask value, that is, specify the working days in a week.

import numpy as np

a = np.is_busday('2019-03-08', weekmask=[1.1.1.1.0.1.0])
b = np.is_busday('2019-03-09', weekmask='1111010')
print(a)
print(b)
Copy the code

Output:

False
True
Copy the code

The weekly mask value can also be used to list all working days using the abbreviation of the word of the week. The following weekly mask indicates the working days: Monday, Tuesday, Wednesday, Thursday, Saturday, Sunday, and Friday is the rest day.

weekmask='Mon Tue Wed Thu Sat Sun'
Copy the code

3. Data type object: dtype

Data type objects are used to describe how the memory region corresponding to an array is used. This depends on several aspects:

  • The type of data (integer, floating point orPythonObject)
  • The size of the data (for example, how many bytes are used to store integers)
  • Byte order of data (small-endian “<” or big-endian “>”, big-endian high-byte first low-byte second, small-endian low-byte first high-byte second)
  • In the case of structured types, the name of the field, the data type of each field, and the portion of the memory block taken by each field (see Example 3)
  • If the data type is a subarray, its shape and data type byte order are determined by presetting “<” or “>” to the data type.

ⅰ. Instantiate dType

Dtype object construction syntax:

numpy.dtype(obj, align=False, copy=False)
Copy the code
parameter describe
object The object to be converted to a data type object
align If True, populate the field to make it resemble a C structure, which can only be True if object is a dictionary or comma-separated string
copy Copy the dType object, if False, to a reference to the built-in data type object

Example 1: int8, int16, int32, int64 The four data types can be replaced by the strings ‘i1’, ‘i2’, ‘i4’, ‘i8’. (See character code)

import numpy as np

dt = np.dtype('i4')
print(dt)
Copy the code

Output:

int32
Copy the code

Example 2:

import numpy as np

dt = np.dtype('<i4')
print(dt)
Copy the code

Output:

int32
Copy the code

Example 3: This example defines a structured data type student that contains the string field name and the integer field age, and applies this dType to the Ndarray object.

import numpy as np
student = np.dtype([('name'.'S20'), ('age'.'i1')])
print(student)

a = np.array([('tom'.21), ('Jerry'.18)], dtype=student)
print(a)
Copy the code

Output:

[('name'.'S20'), ('age'.'i1')]
[(b'tom', 21) (b'Jerry', 18)]
Copy the code

ⅱ. Character code

The character code Corresponding to the type
b The Boolean
i Signed integers, ‘i1’, ‘i2’, ‘i4’, ‘i8’ correspond to int8, int16, int32, int64
u Unsigned integer, ‘U1 ‘,’ U2 ‘, ‘U4 ‘,’ U8 ‘corresponding to uint8, UINT16, uint32, uint64
f Floating-point types, ‘F2 ‘,’ F4 ‘, ‘F8’ correspond to float16, float32, and float64
c Complex numbers, ‘c8’, ‘c16’ correspond to complex64, complex128
m Timedelta64, which is essentially int64
M (uppercase) Datetime64 (datetime)
O (uppercase) Python objects
S (capital) over A (byte-) The value can contain only ASCII characters. If an S or a is followed by a digit, the length of the string is truncated, for example, S20 and A10
U (uppercase) Unicode string. The length of the string followed by a number after U is truncated, for example, U20
V (uppercase) Bytes array, V followed by a number to indicate the length of the array, the excess part will be truncated, less than zero

This is mainly about the use of M and V, the rest are relatively simple and easy to understand, you can look at the example above.

Examples of use of the character code M:

import numpy as np

student = np.dtype([('name'.'S4'), ('age'.'M8[D]')])
print(student)

a = np.array([('tom'.'2011-01-01'), ('Jerry', np.datetime64('2012-05-17'))], dtype=student)
print(a)
print(a['age'].dtype)
Copy the code

Output:

[('name'.'S4'), ('age'.'<M8[D]')]
[(b'tom'.'2011-01-01') (b'Jerr'.'2012-05-17')]
datetime64[D]
Copy the code

3. Cannot cast NumPy timedelta64 scalar from metadata [D] to according to the rule ‘same_kind’.

Examples of use of the character code V:

import numpy as np

student = np.dtype([('name'.'V8'), ('age'.'i1')])
print(student)

a = np.array([(b'tom'.21), (b'Jerry'.18)], dtype=student)
print(a)
print(a['name'].dtype)
Copy the code

Output:

[('name'.'V8'), ('age'.'i1')]
[(b'\x74\x6F\x6D\x00\x00\x00\x00\x00', 21)
 (b'\x4A\x65\x72\x72\x79\x00\x00\x00', 18)]
|V8
Copy the code

4, numpy. Datetime_data

Grammar:

numpy.datetime_data(dtype, /)
Copy the code

** Parameters: ** Can only be datetime64 or timedelta64 type ** Return value: ** Returns a tuple (‘ unit ‘, step size)

Example 1:

import numpy as np

dt_25s = np.dtype('timedelta64[25s]')
print(np.datetime_data(dt_25s))
Copy the code

Output:

('s', 25)
Copy the code

Example 2:

import numpy as np

dt_25s = np.dtype('timedelta64[25s]')
b = np.array([1.2.3.4.5], dt_25s).astype('timedelta64[s]')
print(b)
print(b.dtype)
Copy the code

Output:

[ 25  50  75 100 125]
timedelta64[s]
Copy the code

In this case, b is a NARray, and the data type is changed from timeDelta64 [25s] to timeDelta64 [s], so each number in the array is multiplied by 25.

5, numpy. Datetime_as_string

Converts a date and time array to a string array.

Grammar:

numpy.datetime_as_string(arr, unit=None, timezone='naive', casting='same_kind')
Copy the code
parameter describe
arr Datetimes64 array
unit ‘auto’ or datetime64 units.
timezone The time zone
casting Conversion is allowed when changing between date-time units. There are the following optional values: ‘no’, ‘equiv’, ‘safe’, ‘same_kind’, ‘unsafe’.

Example 1:

import numpy as np

dt_array = np.arange('2019-03-01'.'2019-03-10', dtype='datetime64[D]')
str_array = np.datetime_as_string(dt_array)

print(str_array)
print(str_array.dtype)
Copy the code

Output:

['2019-03-01' '2019-03-02' '2019-03-03' '2019-03-04' '2019-03-05'
 '2019-03-06' '2019-03-07' '2019-03-08' '2019-03-09']
<U28
Copy the code

By default, unit=None. If the datetime64 element in the array is not in the same unit, the smallest unit is output. If unit=’auto’, the smallest unit is output. Of course, if a unit is specified, the output is in the specified unit format.

import numpy as np

dt_array = np.array(['the 2019-03'.'2019-03-08'.'the 2019-03-08 PM'], dtype='datetime64')

str_array1 = np.datetime_as_string(dt_array)
str_array2 = np.datetime_as_string(dt_array, unit='auto')
str_array3 = np.datetime_as_string(dt_array, unit='D')
print(str_array1)
print(str_array2)
print(str_array3)
Copy the code

Output:

['2019-03-01T00:00' '2019-03-08T00:00' '2019-03-08T20:00']
['2019-03-01' '2019-03-08' '2019-03-08T20:00']
['2019-03-01' '2019-03-08' '2019-03-08']
Copy the code

The Path of Python for older code farmers