Make sure you have configured the Python environment and installed the third-party libraries pandas and Numpy before you begin.
1. Introduction to the Library
What are pandas? Pandas is a third-party library that provides high-performance, easy-to-use data types and data analysis tools. Pandas provides simple and efficient data types and tools for data analysis. Pandas is based on numpy and is often used with numpy and matplotlib. With respect to data types, python’s native data types go far beyond data analysis. It can be said that the data type in NUMPY is the basic data type in data analysis, focusing on the structural expression of data, reflected in the relationship between data (dimension); The data types in PANDAS are extended data types based on NUMPY and are concerned with the application representation of data, as reflected in the relationship between data and indexes. Manipulating an index is manipulating data to some extent. Manipulating an index is manipulating data to some extent.
2. The series type of the pandas library
Pandas has two types of data: the Series data type and the two-dimensional and high-dimensional DataFrame data type. So let’s look at the first one, the series type.
2.1 What is a Series type?
A series type consists of a set of data and the data indexes associated with it. Let’s look at a few lines of code:
Import pandas as pd a = pd.Series([7,8,9,10]) print(a)Copy the code
The output is as follows:
0 7
1 8
2 9
3 10
dtype: int64
Copy the code
Looking at the output code, we can see that the Series object consists of three parts: the auto-index part on the left, the data part on the right, and the data types (the data types in Numpy) part on the bottom.
In addition to automatically adding indexes, we can also customize its indexes:
The import pandas as pd a = pd. Series (,8,9,10 [7], the index = [" a ", "b", "c", "d"]) # index specified index print (a)Copy the code
The output is as follows:
a 7
b 8
c 9
d 10
dtype: int64
Copy the code
As you can see, the index part is changed to our custom ABCD.
2.2 How to Create a Series type?
There are several ways to create a Series type, mainly the following:
- Create from scalars
- Create from a dictionary
- Created from the nDARray type
- Create from a list
The following are specific explanations:
(1) Created from a scalar value
That is, specify a scalar to generate a series type as follows:
import pandas as pd
a = pd.Series(5,index=["a","b","c","d","e","f"])
print(a)
Copy the code
The output is as follows:
a 5
b 5
c 5
d 5
e 5
f 5
dtype: int64
Copy the code
Note that the index parameter cannot be omitted (because index is required to specify the number and index of generated elements). Select * from series; select * from series; select * from series; select * from series;
import pandas as pd
my_dir={
"a":1,
"b":2,
"c":3
}
b = pd.Series(my_dir)
print(b)
Copy the code
The output is as follows:
a 1
b 2
c 3
dtype: int64
Copy the code
In addition, when constructing a series type with a dictionary, we can also specify its index or change its structure by using index, which overrides the “key index” in the dictionary.
Ndarray is a data type in NUMpy, we can pass ndarray directly to create:
import pandas as pd
import numpy as np
c = np.arange(4)
d = pd.Series(c)
print(d)
Copy the code
The output is as follows:
0 0
1 1
2 2
3 3
dtype: int32
Copy the code
Similarly, you can customize an index using the index parameter.
(4) It can also be created from python lists, as shown in the small example in 2.1.
2.3 Basic use of series type
The Series object has two parts: index and values, so those are the main parts to operate on. Let’s take a look at the following example:
- A. index: Obtains the index
- A. values: Obtains data
- A [‘a’]: Gets the element whose index is A
- A [0] : get the element with index 0, notice! Automatic indexes and custom indexes coexist but cannot be mixed
Because series is based on the NDARray type, operations on series are similar to operations on the NDARray type:
- Operations and operations in NUMpy can be used for series types
- It can be sliced by automatic indexing or custom indexing
The import pandas as pd a = pd. Series ([6], the index = [" a ", "b", "c", "d", "e", "f"]) print (" a value:, "Dr. Alues) print (" a index: , "Anderson ndex) print (" a [0] :", a [0]) print (" a [' a '] : ", [" a "] a) print (" a slice: \ n ", a] [: : - 1)Copy the code
The output is as follows:
A value: [1 2 3 4 5 6] a Index, the Index ([' a ', 'b', 'c', 'd', 'e', 'f'], dtype = 'object') a [0] : 1 a (' a ') : F 6 e 5 d 4 c 3 b 2 a 1 dtype: int64Copy the code
In addition, the Series type has alignment operations. As follows:
The import pandas as pd a = pd. Series ([1, 2, 3], the index = [" c ", "d", "e"]) b = pd. The Series (,5,6,7,8 [4], the index = [" a ", "b", "e", "f", "g"]) c = a+b print(c)Copy the code
The output is as follows:
A NaN b NaN C NaN D NaN e 9.0 f NaN g NaN DType: float64Copy the code
We add the two series types. If you look at the output, you can see that a and B will only add if they have the same index (including position), and the rest of the values will not add. This is the alignment of series. This also verifies that PANDAS is an index-based operation.
The Series type also has a name attribute, meaning that both the Series object and index can be given a name. We can use.name to get or define its name.
Import pandas as pd a = pd.series ([1,2,3],index=["c","d","e"]) print(a.name) Print (a.name) print(a.index.name) print(a.index.name) print("*"*20) print(a)Copy the code
The output is as follows:
None mySeries None Index column ******************** Index column C 1 D 2 e 3 Name: mySeries, dType: int64Copy the code
3. The DataFrame type of the pandas library
After introducing the Series types, let’s look at the two-dimensional and multidimensional DataFrame types.
3.1 What is a DataFrame type?
A DataFrame type is a data type consisting of a set of column data that share the same index. The DataFrame type is a table-like data type, where each column can have a different value type and multiple columns in the same row share the same index. Let’s start with a small example:
0 import pandas as pd import numpy as np a = np.arange(10). 0 (2,5) b = pd.dataframe (a) print(b) 0Copy the code
Generate DataFrame data by using numpy to generate a two-dimensional ARRAY of NDARray and passing it as a parameter to the DataFrame. The output is as follows:
From the output result, it can be seen that the output result consists of three parts: left (vertical) row index index(red area, axis=0), top (horizontal) column index(yellow area, axis=1) and data section (blue area). DataFrame is often used to represent two-dimensional data, but it can also represent multidimensional data.
3.2 How do I Create a DataFrame type?
DataFrame can be created in one of four ways:
- Two-dimensional NDARray objects
- The dictionary
- The Series type
- Other DataFrame types
Next, we will introduce it in detail:
(1) Created by two-dimensional NDARray objects
0 import Pandas as pd import numpy as NP A = Np.arange (16). 0 print(' Ndarray type :\n',a) b = pd.dataframe (a) 0 Print (' Converted DataFrame type: \n',b)Copy the code
Use numpy to generate a 4*4 ndarray type and convert it as a parameter to DataFrame.
Ndarray type: [[0 12 3] [4 5 6 7] [8 9 10 11] [12 13 14 15]] DataFrame type after conversion: 0 12 3 0 12 3 14 6 7 2 8 9 10 11 3 12 13 14 15Copy the code
We first create a dictionary and pass it to the DataFrame as an argument:
import pandas as pd import numpy as np a = { "Xiao Ming:" pd. Series (,99,98,100,95,99 [100], the index = [" language ", "mathematics", "English", "physical", "chemistry", "creatures"]), "Little red" : pd. Series (,99,98,100,95,99 [100], the index = [" language ", "mathematics", "English", "physical", "chemistry", "creatures"]), "Little blue" : pd. Series (,99,98,100,95,99 [100], the index = [" language ", "mathematics", "English", "physical", "chemistry", "creatures"]), "Yellow" : pd. Series (,99,98,100,95,99 [100], the index = [" language ", "mathematics", "English", "physical", "chemistry", "creatures"]), "Little green" : pd. Series (,99,98,100,95,99 [100], the index = [" language ", "mathematics", "English", "physical", "chemical" and "biological"])} b = pd. The DataFrame print (a) (b)Copy the code
The output is as follows:
Xiao Ming, xiao Hong, Xiao blue, Xiao Huang, Xiao Green Chinese 100 100 100 100 100 100 math 99 99 99 99 99 English 98 98 98 98 98 98 Physics 100 100 100 100 100 100 chemistry 95 95 95 95 95 biology 99 99 99 99 99Copy the code
As you can see, the key name becomes a column label and the key value index becomes a row label.
3.3 Relationship between DataFrame and Series
You can see that when we extract a row or column from a DataFrame, the result is of type Series. That is, DataFrame is a Series container.
4. Data operations in the Library
In the previous article, we mentioned that the two data types for PANDAS are series type and DataFrame type. We will focus on these two types of data operations.
4.1 Basic Dataframe attributes
-
Df. shape: number of rows and columns
-
Df. dtype: column data type
-
Df. ndim: indicates the data dimension
-
Df. index: indicates the row index
-
Df. columns: indicates the column index
-
Df. Values: value
import pandas as pd import numpy as np df = Pd.dataframe (Np.Arange (16).0 (0),index=list("abcd"), Columns =list("abcd") print(df) Print (0) ",df.shape) print(" df.dtypes: \n",df.dtypes) print(" df.ndim: ",df.ndim) print(" df.index: ",df.index: ",type(df.index)) print(" df.index:" ",df.columns," datatype: ",type(df.columns)) print(" object value: \n",df.values," datatype: ",type(df.values))Copy the code
The output is as follows:
A B C D A 0 12 3 B 4 5 6 7 C 8 9 10 11 D 12 13 14 15 A int32 B int32 C int32 D int32 dtype: Object data dimensions: 2 row Index, Index ([' a ', 'b', 'c', 'd'], dtype = 'object') data type: < class '. Pandas. The core indexes. The base. The Index '> column Index: Index ([' A ', 'B', 'C', 'D'], dtype = 'object') data type: < class '. Pandas. The core indexes. The base. The Index '> object values: [0 12 3] [4 5 6 7] [8 9 10 11] [12 13 14 15]] <class 'numpy.ndarray'>Copy the code
4.2 Querying dataFrame Information
-
Df. head(n): displays the first n lines. The default is the first five lines
-
Df.tail (n) : displays the last n lines. The default value is the last five lines
-
Df.info () : Basic information: number of columns, column index, number of non-null columns, column type, row type, memory usage, etc
-
Df.describe () : statistical information: number of rows, number of columns, mean, standard deviation, maximum, minimum, quartile, etc
-
Df. sort_values(by=” column label “, Ascending =True) : ascending sort
import pandas as pd import numpy as np df = Pd.dataframe (np.arange(16).0 (0),index=list("abcd"),columns=list(" abcd") print(df) Print (0) print(" 0 \n",df. Head (3) print("-"*50) print(" . \ n ", df tail (3) print (" - "* 50) print (" basic information") print (df) info (), print (" - "* 50) print (" statistics:", df. The describe ()) print (" descending order: ") print(df.sort_values(by="B", ascending=False))Copy the code
The output is as follows:
A B B C D A 12 3 B 5 6 7 C 8 9 10 11 D 12 13 14 15 A B C D A 0 1 2 3 4 5 6 7 C B 8 9 10 11 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- after three lines: A B C D B 4 5 6 7 8 9 10 11 12 13 14 15 D C -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - basic information: < class 'pandas. Core. Frame. The DataFrame' > # Index Data types: 4 entries, a to d # line number 4 Data columns (total 4 columns) : # Column number 4 Column # Column non-null Count Dtype --------- -------------- ----- # Column Label number of non-null columns Column Data type 0 A 4 Non-null INT32 1 B 4 non-null int32 2 C 4 non-null int32 3 D 4 non-null int32 dtypes: int32(4) memory usage: 96.0 + # bytes memory footprint size None -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- statistics: A B C D count 4.000000 4.000000 4.000000 4.000000 4.000000 # Mean STD 5.163978 5.163978 5.163978 5.163978 # Standard deviation min 0.000000 1.000000 2.000000 3.000000 # Minimum 25% 3.000000 4.000000 5.000000 6.000000 # Top 25% median 50% 6.000000 7.000000 8.000000 9.000000 # Median 75% 9.000000 10.000000 11.000000 12.000000 # A B C D D 12 13 14 15 C 8 9 10 11 B 4 5 6 7 A 0 12 3Copy the code
4.3 Value Operation
The manipulation is a basic operation in pandas. The manipulation is used to create a column of 100 rows and 3 columns for the following operations:
import pandas as pd import numpy as np df = Pd. DataFrame (np) arange (16). Reshape (4, 4), the index = list (" abcd "), the columns = list (" abcd ")) print (df)Copy the code
The output is as follows:
A B C D
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d 12 13 14 15
Copy the code
There are many options for pandas, but two of the most common and useful methods are described in detail:
-
Df.loc []: gets row data by label index
-
Df.iloc []: Get row data by position
Df.loc []:
Print (" take a line of a column of data: ", df, loc (" a ", "a"), "data type:", type (df) loc [" a ", "a"])) print (" take a B column data: \ n ", df, loc [: [" a ", "B"]], "data type: ",type(df.loc[:,["A","B"]]))Copy the code
The output is as follows:
<class 'numpy.int32'> Select a from a and B: A B A 0 1 B 4 5 C 8 9 D 12 13 <class 'pandas.core.frame.DataFrame'>Copy the code
Loc [] can be found by a custom index or by a location index. Note that here we use [:], which is closed before and closed after the colon.
Df.iloc []:
Print (" \n",df.iloc[:2,:2])Copy the code
Output:
Take the first two rows and the first two columns: A, B, A, 0, 1, B, 4, 5Copy the code
(3) Boolean index
We filter values by general range, or we filter values by judgment
Print (" selection of df values greater than 7 line less than 13: \ n ", (df) loc [(df/" B "> 7) & (df/" B" (13)))Copy the code
Output:
Select df rows with values greater than 7 and less than 13: A, B, C, D, C, 8, 9, 10, 11Copy the code
When multiple conditions are selected, they need to be enclosed in ().
4.4 Changing the Data Type
By changing the structure of data types, we mean rearranging or adding or deleting data types. The main operations are as follows:
(1). Reindex (index/columns): resorts data by changing the index of the data. Let’s take the above transcript as an example:
Import pandas as pd import numpy as np a = {" Ming ": pd Series (,99,91,90,85,69 [100], the index = [" language", "mathematics", "English", "physical", "chemistry", "creatures"]), "Little red" : pd. Series (,93,92,100,65,93 [100], the index = [" language ", "mathematics", "English", "physical", "chemistry", "creatures"]), "Little blue" : pd. Series (,94,93,70,55,92 [100], the index = [" language ", "mathematics", "English", "physical", "chemistry", "creatures"]), "Yellow" : pd. Series (,95,88,80,85,89 [100], the index = [" language ", "mathematics", "English", "physical", "chemistry", "creatures"]), "Little green" : pd. Series (,92,78,89,75,79 [100], the index = [" language ", "mathematics", "English", "physical", "chemical" and "biological"])} b = pd. The DataFrame print (a) (b) = c B.r eindex (columns = [" little red ", "Ming", "little blue", "yellow", "little green"]) print (the "exchange xiao Ming and small red: \ n ", c) d = b.r eindex (index = [" mathematics ", "Chinese", "English", "physical", "chemical" and "biological"]) print (d)Copy the code
Enter the following:
Xiao Ming xiao red xiao blue Xiao Huang Xiao green Chinese 100 100 100 100 100 100 Math 99 93 94 95 92 English 91 92 93 88 78 Physics 90 100 70 80 89 Chemistry 85 65 55 85 75 Biology 69 93 For Xiao Ming and Xiao Hong: Little red, little blue, little yellow, little green Chinese 100 100 100 100 100 100 Math 93 99 94 95 92 English 92 91 93 88 78 Physics 100 90 70 80 89 Chemistry 65 85 55 85 75 Biology 93 69 For math and Chinese: Xiao Ming, Xiao Red, Xiao blue, Xiao Yellow, Xiao Green Math 99 93 94 95 92 Chinese 100 100 100 100 100 English 91 92 93 88 78 Physics 90 100 70 80 89 Chemistry 85 65 55 85 75 Biology 69 93 92 89 79Copy the code
Columns are swapped and rows are swapped using index. Add the remaining parameters to the.index() method:
- Index,columns: user-defined index of a new column
- Fill_value: the value used to fill the missing position (NaN) in the reindex
- Limit: indicates the maximum filling quantity
- Method: ffill fills forward and bfill fills backward
- Copy: default True to generate a new object
(2) In addition, there is another way to deal with missing values:
-
Pd. isNULL (df): whether the null value is marked True
-
Pd. isnotnull(df) : indicates whether the value is a non-null value marked as True
-
Df.dropna (axis=0/1,how=”all/any”,inplace=True/False) : Drop the row/column where nan is located
-
Df.fiullan (n): fills the null value with n
Import pandas as pd import numpy as np df = pd.dataframe (np.arange(16). 0 (4,4),columns=list("ABCD")) df.loc[3,"C"] = None df.loc[2,"C"] = None print(df) print(df.isnull()) print(df.notnull()) print(df.dropna(axis=0,inplace=False)) print(df.fillna(df.mean()))Copy the code
The output is:
A B C D 0 0 1 2.0 3 1 4 5 6.0 7 2 8 9 NaN 11 3 12 13 NaN 15 A B C D 0 False False False False False 1 False False False False False 2 False False True False 3 False False True False A B C D 0 True True True True 1 True True True True 2 True True False True 3 True True False True A B C D 0 0 1 2.0 3 1 4 5 6.0 7 A B C D 0 0 1 2.0 3 1 4 5 6.0 7 2 8 9 4.0 11 3 12 13 4.0 15Copy the code
4.5 Index Operations
For indexes, there are the following common methods:
- .append() joins another index object to produce a new index object
- .diff(IDx): evaluates the difference set to generate a new index object
- Intersection (idx): Computes the intersection
- Union (IDx) : kat computes union
- Delete (LOc): Deletes the element at the LOC position
- Insert (loc,c): Adds an element c to the LOC position
4.6 Deleting a specified index Object
.drop() can drop row or column indexes specified by series and DataFrame.
The import pandas as pd a = pd. Series ([4, 7], the index = [" a ", "b", "c", "d"]) print (a) b = a. d. rop ([" b ", "d"]) print (b)Copy the code
The output is as follows:
a 4
b 5
c 6
d 7
dtype: int64
a 4
c 6
dtype: int64
Copy the code
For DataFrame objects,.drop() operates by default on objects on axis 0. When we want to operate on objects on axis 1, we need to add an argument specifying axis 1: axis=1.
5. The data type operation in pandas
5.1 Arithmetic Operation
Pandas performs arithmetic operations based on the column and column indexes. The arithmetic operations are performed only when the indexes are the same. The default result is floating point. Operations between data types of different dimensions are broadcast operations, that is, operations between corresponding indexes. Let’s first look at the same dimensional operations:
01. new (0) a = pd.dataframe (a) 0 b = pd.DataFrame(b) print(a) print(b) print(a*b)Copy the code
The output is as follows:
0 12 3 4 0 0 12 3 4 1 5 6 7 8 9 0 12 3 4 0 1 5 6 7 8 9 2 10 11 12 13 14 0 12 3 4 0 0.0 1.0 4.0 9.0 16.0 1 25.0 36.0 49.0 64.0 81.0 2 NaN NaN NaN NaN NaN NaNCopy the code
The operations of different dimensions are as follows:
New 0 import pandas as pd import numpy as np a = pd.series ([1,2,3,4,5]) b = np.arange(10).0 print(a) print(b) print("________") c = b-a print(c)Copy the code
We generate a one-dimensional series object A and a binary DataFrame object B, and then use b-a to get the following result:
0 1
1 2
2 3
3 4
4 5
dtype: int64
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
________
0 1 2 3 4
0 -1 -1 -1 -1 -1
1 4 4 4 4 4
Copy the code
The observation shows that axis 1 in B is subtracted from a, that is, each row in B is subtracted from a. So, for data type operations of different dimensions, the default is high-latitude 1-axis operations on series.
5.2 Comparison Operation
Comparison can only compare elements of the same index, but it is important to note that there is no completion. Between two and one dimensions, one and zero dimensions are broadcast operations, resulting in objects composed of Boolean values.
Same latitude:
0 0 import pandas as pd import numpy as np a = np.arange(16). 0 (4,4) a = pd.dataframe (a) b = 0 Np. Arange (4, 20). Reshape (4, 4) b = pd. DataFrame (b) print (a, "\ n") print (b, "\ n") print (a = = b)Copy the code
The output is as follows:
0 12 3 0 0 12 3 14 5 6 7 2 8 9 10 11 3 12 13 14 15 0 12 3 4 5 6 7 18 9 10 11 2 12 13 14 15 3 16 17 18 19 0 12 3 0 False False False False 1 False False False False 2 False False False False 3 False False False FalseCopy the code
When comparing data from the same dimension, the two data sizes must be the same; otherwise, an error message will be displayed. So what happens when you compare the different dimensions? Let’s take a look:
New 0 import pandas as pd import numpy as np a = pd.series ([1,2,3,4]) b = np.arange(1,17).0 print(a,"\n") print(b,"\n") print(a==b)Copy the code
The output is:
0 1
1 2
2 3
3 4
dtype: int64
0 1 2 3
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
0 1 2 3
0 True True True True
1 False False False False
2 False False False False
3 False False False False
Copy the code
As you can see, the different dimensions are compared and the broadcast operation is performed. The default is axis 1. In addition, in addition to using operators directly, we can also use functions to perform operations.
- The add (d * * argws) : addition
- . Sub (d * * argws) : subtraction
- The mul (d * * argws) : multiplication
- Div (d * * argws) : division
These functions not only allow us to perform basic operations, but also provide us with more powerful functions with optional arguments, as follows:
01. new (0, 0) a = pd.dataframe (a) b = np.arange(20). 0 b = pd.DataFrame(b) print(a) print(b) c = b.add(a,fill_value=100) print(c)Copy the code
The output is as follows:
0 12 3 0 0 12 3 3 14 5 6 7 2 8 9 10 11 3 12 13 14 15 0 12 3 0 12 3 14 5 6 7 2 8 9 10 11 3 12 13 14 15 4 16 17 18 19 0 12 30 0.0 2.0 4.0 6.0 18.0 10.0 12.0 14.0 2 16.0 18.0 20.0 22.0 3 24.0 26.0 28.0 30.0 4 116.0 117.0 118.0 119.0Copy the code
As can be found, ab is the same dimension, but the size is different. Using the addition operation program directly as above will cause an error. But in this case, we’re passing in an argument, and we’re doing the completion, and then we’re doing the operation.
6. Pandas operates the CSV
6.1 Pandas Reads the CSV file
We will create a 15-row, 4-column A.c.SV file for the following cases:
A = np.arange(60). 0 0 (15,4) a = 0 pd.DataFrame(a,columns=("a","b","c","d")) print(a,"\n") a.to_csv("./a.csv",index=False)Copy the code
The content of the document is as follows:
The common parameters for pd.read_csv() are as follows:
- Filepath_or_buffer: — STR filepath, either local path or URL path
- Sep: Specifies the separator, default “,”
- Header: –int Specifies the column name. Header =0 is the default value, indicating the first column name of the data. Header =None indicates that the data has no column name (0,1,2,3,4…). , the original column name becomes row 0 data.
Case 1:
""" Def demo01(): Print (path =" data.csv" df = pd.read_csv(path, sep=",", header=0) print(df.head(), Print ("===header=none ") df = pd.read_csv(path, sep=",", header=None) print(df.head(), "\n")Copy the code
The output is as follows:
=== Header =0 (default) === A B C D 0 0 12 3 14 5 6 7 2 8 9 10 11 3 12 13 14 15 4 16 17 18 19 === Header = None == 0 12 3 0 A b c d 1 0 1 2 3 2 4 5 6 7 3 8 9 10 11 4 12 13 14 15Copy the code
- Names: –list specifies the column names, overwritten when the file contains them
Case 2:
""" def demo02(): Df = pd read_csv (path, names = [" A ", "B", "C", "D"]) print (" to specify the column name: ") print (df) head (), "\ n")Copy the code
The output is as follows:
Respecify column name: A B C D 0 A B C D 10 12 3 2 4 5 6 7 3 8 9 10 11 4 12 13 14 15Copy the code
- Encoding: Specifies the encoding format. The default value is UTF-8. Often users solve the problem of garbled code and different platform coding.
- Index_col :– STR /list specifies the index, specifying a column in the table as the index. It can be a single column or it can be multiple columns.
Case 3:
""" def demo03(): Print (path, index_col="b") print(df.head(), "\ n") df. To_csv (" text. CSV) print (" designated ab two listed as index: ") df = pd read_csv (path, index_col = [" a ", "b"]) print (df) head ())Copy the code
The output is as follows:
A c D B 10 2 3 5 4 6 7 9 8 10 11 13 12 14 15 17 16 18 19 c d a b 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Copy the code
- Usecols: — STR /list specifies the columns to read. By default, all columns are read
- Nrows: –int Read only the first rows (not including rows)
- Skiprows: int skip the specified number of rows, started to read (including the trip), pay attention to when necessary to specify the column name, or according to the header = 0 current guild for the column name
Case 4:
Def demo04(): usecols, nrows, skiprows Print (" read only the BC columns: ") df = pd read_csv (path, usecols = (" b ", "c")) print (df) head (), "\n") print(path, nrows=6) print(df, "\n") print(" path, nrows=6) print(df, "\n") print(" path, nrows=6 ") skiprows=6, names=list("abcd")) print(df.head(), "\n")Copy the code
The output is as follows:
B c 0 1 2 1 5 6 2 9 10 3 13 14 4 17 18 A b C D 0 0 12 3 14 6 7 2 8 9 10 11 3 12 13 14 15 4 16 17 18 19 5 20 21 22 23 a b c d 0 20 21 22 23 1 24 25 26 27 2 28 29 30 31 3 32 33 34 35 4 36 37 38 39Copy the code
6.2 Pandas Saves the CSV file
Saving a CSV in PANDAS essentially calls the TO_CSV method of a DataFrame object. Let’s focus on the common parameters of the to_CSV method.
path_or_buf
: Save pathsep
: delimiter, default is.
na_rep
: Replaces Spaces. Replaces a space with the specified valuefloat_format
: Formats values such as reserving two decimal places%.2f
header
: Indicates whether to reserve the column name. The default value is reserved. A value of0
Is not reserved.index
: Indicates whether to reserve the index (Boolean value). The default value is reserved.