This article lists some tips for improving or speeding up your daily data analysis work, including:
-
Pandas Profiling
-
Draw Pandas data using Cufflinks and Plotly
-
IPython magic command
-
Format arrangement in Jupyter
-
Jupyter shortcuts
-
In Jupyter (or IPython) to make a unit have more than one output simultaneously
-
Instant slide creation for Jupyter Notebook
1. Pandas Profiling
This tool is effective. The following figure shows the result of calling the simple method df.profile_report() :To use the tool, you only need to install and import the Pandas Profiling package.
This article no longer dwelt on this tool, if you would like to learn more, please read: towardsdatascience.com/exploring-y…
2. Use Cufflinks and Plotly to draw Pandas data
Most “experienced” data scientists or analysts are familiar with Matplotlib and PANDAS. That is, you can quickly draw simple pd.dataframe or pd.series by simply calling the.plot() method:A little boring?
That’s all well and good, but what about an interactive, scalable, scalable panorama? It’s time for Cufflinks* * to step up! (Cufflinks did a further wrapper based on Plotly.)
To install Cufflinks in your environment, just run it in a terminal! PIP install cufflinks –upgrade See the image below:Much better!
Note that the only thing that changes in the figure above is the import and setting of Cufflinks cf.go_offline(), which changes the.plot() method to.iplot().
Other methods such as.scatter_matrix() can also provide great visualizations:For those of you who need to do a lot of data visualization, read Cufflinks and Plotly’s documentation to find out more.
-
Cufflinks documentation: plot.ly/ipython-not…
-
Plotly file: plot.ly/
3. IPython magic command
IPython’s “magic” is a series of IPython enhancements based on Python’s standard syntax. Magic commands include two methods: Line magics: run on a single input line prefixed with %; Cell magics command: run on multiple input lines prefixed with %%. Here are some useful features provided by the IPython magic command:
%lsmagic: Finds all commands
If you only remember one magic command, it has to be this one. Executing the %lsmagic command provides a list of all available magic commands: % DEBUG: interactive debug
This is probably the most common magic command I use.
Most data scientists have encountered this situation: the block of code being executed keeps breaking, and you desperately write 20 print() statements, trying to print out the contents of each variable. Then, when you finally fix the problem, you have to go back and delete all print() statements again.
But I don’t have to do that anymore. When you encounter a problem, simply execute the %debug command to execute any part of the code you want to run:What happened in the picture above?
We have a function that takes a list as input and squares all even numbers.
We ran the function, but something went wrong. But we don’t know how!
Use the %debug command on this function.
Let the debugger tell us the values of x and type(x).
The problem is obvious: we typed ‘6’ into the function as a string!
This is useful for more complex functions.
%store: Passes variables between notebooks
This command is also cool. Suppose you spent some time cleaning the data in the notebook, and now you want to test some functionality in another notebook. Do you implement that functionality in the same notebook, or do you save the data and load it in another notebook? With the %store command, none of this is necessary! This command will store the variable, which you can retrieve from any other notebook:%store [variable] Stores variables.
%store -r [variable] Reads/retrieves stored variables.
%who: Lists all global variables.
Have you ever assigned a value to a variable and forgotten its name? Or accidentally delete the cell responsible for assigning values to variables? Using the %who command, you can get a list of all global variables:%%time: time magic command
You can use this command to obtain all timing information. Simply apply the %%time command to any executable code and you get the following output: %%writefile: writes cell content to a file
This magic command is useful when writing complex functions or classes in a Notebook that you want to save to your own file. Simply add the %%writefile prefix and the file name you want to save to the cell of a function or class:As shown above, we can save the created function to the utils.py file and import it at will. This can be done in other notebooks as long as they are in the same directory as the utils.py file.
4. Format arrangement in Jupyter
This tool is cool! Jupyter takes into account the existence of HTML/CSS formats in Markdown. Here are the features I use most often:
Blue, fashionable:
<div class="alert alert-block alert-info">
This is <b>fancy</b>!
</div>Copy the code
Red, slightly flustered:
<div class="alert alert-block alert-danger">
This is <b>baaaaad</b>!
</div>Copy the code
Green and calm:
<div class="alert alert-block alert-success">
This is <b>gooood</b>!
</div>Copy the code
The following image shows them in action:This is useful when you want to present some discoveries in Notebook format!
5. Jupyter Shortcut keys
To learn about the keyboard shortcuts, use the command palette Ctrl + Shift + P to get a list of all the Notebook’s features. Here are a few basic commands:
-
Esc: Enters the command mode. In command mode, you can use arrow keys to navigate through the Notebook.
In command mode:
-
A and B: Insert A new cell Above or Below the current cell.
-
M: The current cell enters the Markdown state.
-
Y: The current cell enters the code state.
-
D,D: Deletes the current cell.
-
Enter: The current cell returns to edit mode.
In edit mode:
-
Shift + Tab: Provides document strings for objects you type in the current cell. Use this shortcut continuously to recycle document mode.
-
Ctrl + Shift + – : Splits the current cell at the cursor position.
-
Esc + F: Find and replace code (excluding output).
-
Esc + O: Toggles cell output.
Select multiple cells:
-
Shift + Down and Shift + Up: Select the lower or upper cell.
-
Shift + M: Merges selected cells.
Note that after multiple cells are selected, you can perform delete/copy/cut/paste/run operations in batches.
6. In Jupyter (or IPython), make a unit have multiple outputs at the same time
Pandas DataFrame.head() and.tail() have you ever wanted to show pandas DataFrame.head() and.tail(), but had to give up because it would be too cumbersome to create additional units of code that run the.tail() method? Don’t worry now, you can use the following lines to show the output you want to show:
from IPython.core.interactiveshell import
InteractiveShellInteractiveShell.ast_node_interactivity = "all"Copy the code
The following figure shows the results of multiple outputs: 7. Create slides for Jupyter Notebook in real time
With RISE, you can instantly turn Jupyter Notebook into a slide show with a single keystroke. And the Notebook is still active, so you can perform live coding while showing slides!
To use the tool, simply install RISE via Conda or PIP.
conda install -c conda-forge riseCopy the code
or
pip install RISECopy the code
Now you can click on the new button to create a nice slide for notebook:
.