Using Python to manipulate data is a common practice, but there are some hidden tricks. This article shares 6 fun and efficient operations to help you improve your productivity. Python Learning Exchange group, you can get PDF books, tutorials, etc for free.
A, Pandas Profiling
Pandas Profiling provides a comprehensive report of the data and is a process that helps us understand the data. It makes it easy and quick to perform exploratory data analysis of Pandas’ data box.
The df.describe() and df.info() functions in Pandas can also be used as the first step in the data exploration process. But they provide only a very basic overview of the data. The Profiling feature in Pandas displays large amounts of information in a single line of code while generating interactive HTML reports.
For a given dataset, the profiling package in Pandas calculates the following statistics:
Statistics calculated by the Pandas Profiling package include histograms, modes, correlation coefficients, quantiles, descriptive statistics, and other information including types, single variable values, missing values, and so on.
The installation
PIP and conda can be used, the method is very simple, as follows:
pip install pandas-profiling
conda install -c anaconda pandas-profiling
Copy the code
usage
The Titanic dataset is used to demonstrate the profiling capabilities.
import pandas as pd
import pandas_profiling
df = pd.read_csv('titanic/train.csv')
pandas_profiling.ProfileReport(df)
Copy the code
In addition to importing the library, a single line of code is required to display the details of the data report, including the necessary charts.
You can also export the report to an interactive HTML file using the following code.
profile = pandas_profiling.ProfileReport(df)
profile.to_file(outputfile="Titanic data profiling.html")
Copy the code
Second, the pretty print
Pprint is a built-in module in Python. It can print any data structure in a clear, readable and beautiful format. Here’s an example of print versus pprint.
My_dict = {'Student_ID': 34,'Student_name' : 'Tom', 'Student_class' : 5, 'Student_marks' : {'maths' : 92, 'science' : 95, 'social_science' : 65, 'English' : 88} }Copy the code
{'Student_ID': 34, 'Student_name': 'Tom', 'Student_class': 5, 'Student_marks': {'maths': 92, 'science': 95, 'social_science': 65, 'English': 88}}Copy the code
pprint
Pprint (my_dict) {'Student_ID': 34, 'Student_class': 5, 'Student_marks': {'English': 88, 'maths': 92, 'science': 95, 'social_science': 65}, 'Student_name': 'Tom'}Copy the code
You can clearly see the advantages of pPrint, the data structure is very clear.
Third, Python Debugger
The interactive debugger is also a magic function that if an error occurs while running a code cell, you can run it by typing %debug in a new line. This opens up an interactive debugging environment that automatically goes to the location where the error occurred, and also allows you to check the variable values assigned in the program and perform actions. To exit the debugger, press Q. Take this example.
Result = y+z print(result) result2 = x+y print(result2)Copy the code
I think you can see that x plus y will definitely give you an error, because they’re not of the same type. Then we type %debug.
%debug
Copy the code
A dialog box will appear that allows us to enter commands interchangeably. For example, we can do this as follows.
Fourth, the Cufflinks
This visual analysis of data exploration is super handy, producing beautiful visualizations with very little code. For an example, see this one line of Python code for cool visualizations. You need to know about Cufflinks.
Cufflinks has made a further packaging on the basis of Plotly, with a unified method and simple parameter configuration. It can also draw pictures in combination with pandas’ Dataframe. You can describe it as pandas like visualization.
Take the lins chart below.
import pandas as pd import cufflinks as cf import numpy as np cf.set_config_file(offline=True) Cf. Datagen. Lines (1500). Ta_plot (study = 'sma', periods =,21,55 [13])Copy the code
Another example is the box diagram.
cf.datagen.box(20).iplot(kind='box',legend=False)
Copy the code
Fifth, Pyforest
This is a lazy import wizard, you can write in advance in the configuration file to import tripartite libraries, so that each time you edit the script saves the beginning of a large number of import libraries, for the friends who have common and fixed use of libraries is no doubt one of the tools to improve efficiency.
Pyforest supports most popular data science libraries such as Pandas, Numpy, Matplotlib, Seaborn, SkLearn, TensorFlow, and more, as well as common auxiliary libraries such as OS, SYS, RE, pickle, and so on.
This usage is convenient for frequent debugging on your own, but not so useful for frequent cross-environment debugging such as sharing scripts with other people, because others may not necessarily use it.
6. Jupyter Notebook notes are highlighted
This method only works with Jupyter Notebook and is great when you want to highlight your notes and make them look nice.
Note highlights can be divided into several colors according to different situations. The difference is that each color code has a different class type, and the rest of the text is just written in the div tag. Here’s how to use it.
The blue is info
<div class="alert alert-block alert-info"> <b>Tip:</b> Use blue boxes (alert-info) for tips and notes. If it's a note, You don't have to include the word "Note".Copy the code
Yellow means Warning
<div class="alert alert-block alert-warning">
<b>Example:</b> Yellow Boxes are generally used to include additional examples or mathematical formulas.
</div>
Copy the code
Green is success
<div class="alert alert-block alert-success">
Use green box only when necessary like to display links to related content.
</div>
Copy the code
Red is Danger
<div class="alert alert-block alert-danger">
It is good to avoid red boxes but can be used to alert users to not delete some important part of code etc.
</div>
Copy the code
Here’s a little tip, if you copy it directly to jupyter Notebook, you might get an error, because the default format is code, so you need to select the cell and press Esc to change it to switchable mode, then press Y to change it to text mode. Shift + OK is ok.
The above is the content to share, welcome friends to like the message collection.