“This is the 14th day of my participation in the First Challenge 2022.
preface
Following on from the previous section on numpy recording array helper methods, we’ve learned a little about recarray helper methods. We all know that recarray helper methods are provided by the RecFunctions module in numpy.lib as a set of function methods that are structured for creation and manipulation.
- The apply_along_fields() method can use the application function to shrink fields in a structured array
- The append_fields() method adds the new field to the structured array
- The drop_fields() method removes the specified field from a structured array and returns a new array
- The join_by() method joins two arrays by key
- The merge_arrays() method merges arraylists
When the array length is inconsistent, the system will automatically fill the short array with the missing value, depending on the corresponding type.
Filling it | instructions |
---|---|
– 1 | Integer types |
1.0 | Floating point type |
“-“ | character |
‘1’ | string |
True | Boolean value |
In this installment, we will continue to look at the recfunctions module’s approach to structured array manipulation, Let’s go~
1. Operations related to obtaining the name of a structured array field
The RecFunctions module also provides retrieval of field names for structured data types.
1.1 Return the field name as a dictionary
The recFunctions module provides the get_fieldStructure () method to return structured data type fields as dictionaries.
The get_fieldStructure () method is similar to embedded structured data and can be simplified.
get_fieldstructure(adtype, lastname=None, parents=None.)Copy the code
Parameter Description:
parameter | instructions |
---|---|
adtype | Structured data is similar to Np.dtype () |
lastname | Last processed field name, optional |
parents | Parent field dictionary |
>>> import numpy as np
>>> from numpy.lib import recfunctions as rfn
>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8")])
>>> rfn.get_fieldstructure(arr_dtype)
{'A': [].'B': []} > > >Copy the code
The get_fieldStructure () method can be simplified especially for embedded structured data.
>>> import numpy as np
>>> from numpy.lib import recfunctions as rfn
>>> stu_dtype = np.dtype([("school"."S16"), ("class", [("classA"."S6"), ("classB"."S6")]])>>> rfn.get_fieldstructure(stu_dtype)
{'school': [].'class': [].'classA': ['class'].'classB': ['class']} > > >Copy the code
1.2 Return the field name as a tuple
The recFunctions module provides the get_names() method, which returns structured data type fields as tuples
get_names(adtype)
Copy the code
Parameter Description:
parameter | instructions |
---|---|
adtype | Input data type |
>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8"), ("C", [("C1"."i8"), ("C2"."i4")]])>>> rfn.get_names(arr_dtype)
('A'.'B', ('C', ('C1'.'C2'))) > > >Copy the code
1.3 Return the field name in metagroup flat format
The recfunctions module provides get_name_flat() returns a tuple for field names of embedded structured data types.
get_names_flat(adtype)
Copy the code
Parameter Description:
parameter | instructions |
---|---|
adtype | Structured data type |
In contrast to get_names(), for embedded field names, get_names_flat() returns unembedded structures and a group of cells.
>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8"), ("C", [("C1"."i8"), ("C2"."i4")]])>>> rfn.get_names(arr_dtype)
('A'.'B', ('C', ('C1'.'C2')))
>>> rfn.get_names_flat(arr_dtype)
('A'.'B'.'C'.'C1'.'C2') > > >Copy the code
2. Find duplicates in structured arrays
The recFunctions module provides duplicates of find_Duplicates () that can structure an array based on a specific key
find_duplicates(a, key=None, ignoremask=True, return_index=False)
Copy the code
Parameter Description:
parameter | instructions |
---|---|
a | Input array |
key | The name of the field used to check for repeatability. The default is None |
ignoremask | Whether to discard data |
return_index | Whether to return an index with a duplicate value |
If the find_duplicates() method is used, the nP.ma.array () type is required for arrays
>>> arr =np.ma.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> rfn.find_duplicates(arr)
masked_array(data=[(1,), (1,), (1,), (1,), (3,), (3,)],
mask=[(False,), (False,), (False,), (False,), (False,),
(False,)],
fill_value=(999999,),
dtype=[('A'.'<i8')]) > > >Copy the code
If the data created by Np.array () is used to find duplicates, AttributeError is reported
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> rfn.find_duplicates(arr)
Traceback (most recent call last):
File "<stdin>", line 1.in <module>
File "<__array_function__ internals>", line 6.in find_duplicates
File "C:\Users\user\AppData\Roaming\Python\Python37\site-packages\numpy\lib\recfunctions.py", line 1388.in find_duplicates
sorteddata = sortedbase.filled()
AttributeError: 'numpy.ndarray' object has no attribute 'filled'
Copy the code
3. Assign the field name
The recFunctions module also provides the assign_fields_by_name() method, which assigns field values from array A to array B.
- The assign_fields_by_name() method is copied by field name, from fields in the source array to target fields for assignment.
- This method uses recursion and is ideal for structured arrays with nested structures
assign_fields_by_name(dst, src, zero_unassigned=True)
Copy the code
Parameter Description:
parameter | instructions |
---|---|
dst | The source array |
src | The target array |
zero_unassigned | Optional, if True. Fields in SRC that do not match in DST will be filled with 0 |
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr2 =np.array([10.10.20.30.1.3.1],dtype = [("A"."i8")])
>>> rfn.assign_fields_by_name(arr,arr2)
>>> arr
array([(10,), (10,), (20,), (30,), ( 1,), ( 3,), ( 1,)],
dtype=[('A'.'<i8')]) > > >Copy the code
Note that DST and SRC arrays must be the same as shape, otherwise ValueError will be reported
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr2 = np.array([(1.2), (3.3), (1.2)],dtype=[("A"."i8"), ("B"."i8")])
>>> rfn.assign_fields_by_name(arr2,arr)
Traceback (most recent call last):
File "<stdin>", line 1.in <module>
File "<__array_function__ internals>", line 6.in assign_fields_by_name
File "C:\Users\user\AppData\Roaming\Python\Python37\site-packages\numpy\lib\recfunctions.py", line 1200.in assign_fields_by_name
zero_unassigned)
File "<__array_function__ internals>", line 6.in assign_fields_by_name
File "C:\Users\user\AppData\Roaming\Python\Python37\site-packages\numpy\lib\recfunctions.py", line 1191.inassign_fields_by_name dst[...] = src ValueError: couldnot broadcast input array from shape (7) into shape (3) > > >Copy the code
4. Field collapse
The recFunctions module overlays the arraylist fields and returns a new array.
stack_arrays(arrays, defaults=None, usemask=True, asrecarray=False,
autoconvert=False)
Copy the code
Parameter Description:
parameter | instructions |
---|---|
arrays | An array or array sequence |
defaults | Dictionary type that maps field names to corresponding default values |
usemask | Whether to return an array of masks |
asrecarray | Whether to return an array of records |
autoconvert | Whether to automatically convert the field type to the maximum value |
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr2 = np.array([(1.2), (3.3), (1.2)],dtype=[("A"."i8"), ("B"."i8")])
>>> new_arr = rfn.stack_arrays((arr,arr2))
>>> new_arr
masked_array(data=[(1, -), (1, -), (2, -), (3, -), (1, -), (3, -), (1, -), (1.2), (3.3), (1.2)],
mask=[(False.True), (False.True), (False.True),
(False.True), (False.True), (False.True),
(False.True), (False.False), (False.False),
(False.False)],
fill_value=(999999.999999),
dtype=[('A'.'<i8'), ('B'.'<i8')]) > > >Copy the code
Note: In the array sequence field overlay process, the missing value of the field name is filled with “–” by default.
5. Structured and unstructured transformations
Methods for converting structured and unstructured arrays are also supported in the RecFunctions module.
5.1 Transformation from structured to unstructured
The recFunctions module provides the structured_to_unstructured() method to convert structured arrays into unstructured arrays.
- The structured_to_unstructured() method converts nstructured arrays to (n+1) D unstructured arrays
- The new array takes a new last dimension whose size is equal to the number of field elements in the input array
- If no output data type is provided, it is determined by the NUMPY data type rules
structured_to_unstructured(arr, dtype=None, copy=False, casting='unsafe')
Copy the code
Parameter Description:
parameter | instructions |
---|---|
arr | Structured array |
dtype | Specifies the dTYPE to output an unstructured array |
copy | The default is false, if True, a copy is returned, otherwise the view is returned |
casting | Optional values include “no”,”equiv”,”safe”,”some_kind”,”unsafe”, which controls data type conversion |
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr
array([(1,), (1,), (2,), (3,), (1,), (3,), (1,)], dtype=[('A'.'<i8')])
>>> rfn.structured_to_unstructured(arr)
array([[1],
[1],
[2],
[3],
[1],
[3],
[1]], dtype=int64)
>>> arr2 = np.array([(1.2), (3.3), (1.2)],dtype=[("A"."i8"), ("B"."i8")])
>>> rfn.structured_to_unstructured(arr2)
array([[1.2],
[3.3],
[1.2]], dtype=int64)
>>>
Copy the code
5.2 Transforming unstructured into structured
The recFunctions module also provides an unstructured_to_structured() method that supports converting unstructured arrays to structured arrays.
- This method converts an n-dimensional unstructured array into an (n-1) -dimensional structured array
- The last dimension of the input array is converted to a structure with the number of field elements equal to the size of the last dimension of the input array
- By default, output fields have the DTYPE of the input array.
- You can provide output structured DTYPE fields
unstructured_to_structured(arr, dtype=None, names=None, align=False,
copy=False, casting='unsafe')
Copy the code
Parameter Description:
parameter | instructions |
---|---|
arr | Unstructured array |
dtype | The structured array DTYPE to output |
names | String list |
align | Whether to create an aligned memory layout |
copy | Whether to return a copy |
casting | Controls the data type conversions that occur |
>>> arr = np.array([(1.2), (3.3), (1.2)])
>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8")])
>>> rfn.unstructured_to_structured(arr)
array([(1.2), (3.3), (1.2)], dtype=[('f0'.'<i4'), ('f1'.'<i4')])
>>> rfn.unstructured_to_structured(arr,arr_dtype)
array([(1.2), (3.3), (1.2)], dtype=[('A'.'<i8'), ('B'.'<i8')]) > > >Copy the code
conclusion
In this installment, we’ve provided the recfunctions module with structured array operations such as stacking array sequence fields using the stack_arrays() method to return new arrays. And we can use unstructured_to_structured() to convert unstructured arrays into structured arrays, and structured_to_unstructured arrays into unstructured arrays.
The recArray helper method can help us use structured arrays better, and we use it in practice.
That’s the content of this episode. Please give us your thumbs up and comments. See you next time