- Data Visualization with Bokeh in Python, Part III: Making a Complete Dashboard
- Will Koehrsen
- The Nuggets translation Project
- Permanent link to this article: github.com/xitu/gold-m…
- Translator: YueYong
Create an interactive visual application in Bokeh
Sometimes I use data science to solve specific problems. Other times, I try a new tool, like Bokeh, because I see some cool project on Twitter and think, “That looks great. I’m not sure when it will come, but it will.” Almost every time I say this, I eventually find a use for this tool. Data science requires you to master many different aspects of knowledge, and you never know where the next idea you’ll use will come from!
As a data science researcher, I finally found the perfect use case in Bokeh’s example after trying it out for a few weeks. My research project involves using data science to improve the energy efficiency of commercial buildings. At a recent conference, we needed a way to show the results of the many technologies we use. Powerpoint is usually recommended for this task, but it doesn’t work. Most people in the meeting are impatient by the time they get to the third slide. Even though I wasn’t familiar with Bokeh, I volunteered to try to make an interactive application using the library, which I thought would expand my skills and create an engaging way to present our project. To be safe, our team prepared a copy of the demo, but after I showed them part of the first draft, they were all for it. The resulting interactive dashboard stood out at the conference and will be used by our team in the future:
Example of Bokeh dashboard built for my research
While not every thought you read on Twitter is likely to help your career, I can safely say that learning more about data science technology won’t hurt. Along those lines, I started this series of articles to demonstrate the power of Bokeh, a powerful drawing library in Python that lets you create interactive drawings and dashboards. Although I can’t show you the dashboard of my research, I can use a publicly available data set to show the basics of building visualizations in Bokeh. The third article is a continuation of my Bokeh series, with the first focusing on building a simple diagram and the second showing how to add interactions to a Bokeh diagram. In this article, we’ll see how to set up a complete Bokeh application and run an accessible local Bokeh server in your browser!
This article focuses on the structure of the Bokeh application rather than the specifics, but you can find the full code for everything on GitHub. We will use the NYCFlights13 dataset, which is a dataset of real information on flights departing from three New York airports in 2013. There are over 300,000 flights in this data set, and for our dashboard, we will focus on arrival delay statistics.
In order to run the entire application completely, Make sure you have Bokeh installed (using PIP Install Bokeh) and download it from GitHub [bokeh_app.zip] (https://github.com/WillKoehrsen/Bokeh-Python-Visualization/blob/master/bokeh_app.zip) folder, decompression, Open a command window in the current directory and type bokeh serve –show bokeh_app. This will set up a Bokeh local service and also open an app in your browser (you can use Bokeh’s online service, of course, but for now localhosting is good enough for us).
The final product
Before we dive into the details, let’s take a look at our final product so we can see how the pieces fit together. Here’s a short film that shows how we interact with the full dashboard:
- YouTube video link: Youtu.be /VWi3HAlKOUQ
The final version of the Bokeh Flight app
I used the Bokeh application in a browser running on a local server (in Chrome’s full-screen mode). At the top we see a number of tabs, each containing a different part of the application. The idea of the dashboard was that while each TAB could stand on its own, we could link many of them together to support a complete exploration of the data. This video shows the range of charts we can make with Bokeh, from histograms and density maps, to tables of data that can be sorted by column, to fully interactive maps. In addition to creating rich graphics, another benefit of using the Bokeh library is interaction. Each tag has an interactive element that allows users to participate in the data and explore for themselves. From experience, when exploring a data set, people like to have their own insights, and we can let them select and filter the data through various controls.
Now that we have an idea of the target dashboard, let’s look at how to create a Bokeh application. I strongly encourage you to download these codes for your reference!
Bokeh application structure
Before writing any code, it is important to establish a framework for our application. In any project, it’s easy to get carried away with coding and quickly get lost in a pile of unfinished scripts and misplaced data files, so we wanted to create a framework before writing code and inserting data. This organization will help us keep track of all the elements in our application and help us debug when things inevitably go wrong. In addition, we can reuse this framework for future projects, so that our initial investment in the planning phase will pay off in the future.
To set up a Boken app, I created a root directory called bokeh_app to hold everything. Within this directory, we create a subdirectory for archiving data (named Data) and another subdirectory for storing script files (named script) and grouping everything together in a main.py file. In general, to manage all the code, I find it best to keep the code for each TAB in a separate Python script and call them from a single main script. Here is the file structure I created for the Bokeh application, adapted from the official documentation.
bokeh_app
|
+--- data
| +--- info.csv
| +--- info2.csv
|
+--- scripts
| +--- plot.py
| +--- plot2.py
|
+--- main.py
Copy the code
For the Flight application, the structure looks like this:
The folder structure of flight dashboard
There are three main sections in the bokeh_app directory: data, scripts, and main.py. When we need to run the server, we run Bokeh in the bokeh_app directory, which automatically searches for and runs the main.py script. With the overall structure in place, let’s take a look at the main.py file, which I call the launcher for the Bokeh application (not a technical term)!
main.py
The main.py script is the startup script for the Bokeh application. It loads the data, passes it on to other scripts, takes the resulting graph, organizes it and displays it individually. This is the only complete script I’ll show you, because it’s so important to the application:
# Pandas for data management
import pandas as pd
# os methods for manipulating paths
from os.path import dirname, join
# Bokeh basics
from bokeh.io import curdoc
from bokeh.models.widgets import Tabs
# Each tab is drawn by one script
from scripts.histogram import histogram_tab
from scripts.density import density_tab
from scripts.table import table_tab
from scripts.draw_map import map_tab
from scripts.routes import route_tab
# Using included state data from Bokeh for map
from bokeh.sampledata.us_states import data as states
# Read data into dataframes
flights = pd.read_csv(join(dirname(__file__), 'data'.'flights.csv'),
index_col=0).dropna()
# Formatted Flight Delay Data for map
map_data = pd.read_csv(join(dirname(__file__), 'data'.'flights_map.csv'), the header = [0, 1], index_col = 0)# Create each of the tabs
tab1 = histogram_tab(flights)
tab2 = density_tab(flights)
tab3 = table_tab(flights)
tab4 = map_tab(map_data, states)
tab5 = route_tb(flights)
# Put all the tabs into one application
tabs = Tabs(tabs = [tab1, tab2, tab3, tab4, tab5])
# Put the tabs in the current document for display
curdoc().add_root(tabs)
Copy the code
We start with the necessary guide packages, including the function to create tabs, each stored in a separate script in the scripts directory. If you look at the file structure, notice that there is an __init__.py file in the scripts directory. This is a completely blank file that needs to be placed in a directory to allow us to import appropriate functions using relative statements (such as from scripts.histogram import histogram_tab). I’m not quite sure why this is needed, but it works (I’ve tackled this problem before, and here’s the answer to Stack Overflow).
After importing the libraries and scripts, We use Python [__file__] (https://stackoverflow.com/questions/9271464/what-does-the-file-variable-mean-do/9271617) Property reads the necessary data. In this example, we use two PANDAS data boxes (flights and MAP_data) and the U.S. state data contained in the Bokeh. After reading the data, the script continues: it passes the appropriate data to each function, each function draws and returns a TAB, and the main script organizes all of these tabs in a layout called tabs. As an example of these separate TAB functions, let’s look at the function that draws map_tab.
This function takes map_data (a formatted version of flight data) and U.S. state data and generates a route map for the selected airline:
Map TAB
The interactive plot, which we introduced in Part 2 of this series, is just an implementation of this idea. The overall functional structure is as follows:
def map_tab(map_data, states):
...
def make_dataset(airline_list):
...
return new_src
def make_plot(src):
...
return p
def update(attr, old, new):
...
new_src = make_dataset(airline_list)
src.data.update(new_src.data)
controls = ...
tab = Panel(child = layout, title = 'Flight Map')
return tab
Copy the code
We saw the familiar make_dataset, make_plot, and update functions, These functions are used to [use interactive control map drawing] (towardsdatascience.com/data- visualiz – with – bokehin – pythonpart – ii – interactions – a4cf994e2512). Once we have the diagram set up, the last line returns the entire diagram to the main script. Each individual script (5 tabs for 5 tabs) follows the same pattern.
Back to the main script, the final step is to collect the tabs and add them to a separate document.
# Put all the tabs into one application
tabs = Tabs(tabs = [tab1, tab2, tab3, tab4, tab5])
# Put the tabs in the current document for display
curdoc().add_root(tabs)
Copy the code
Tabs are displayed at the top of the application, just like tabs in any browser, and we can easily switch between them to view data.
Run the Bokeh service
After all the setup and coding, running the Bokeh server locally is simple. Let’s open up a command line interface (I prefer Git Bash, but any one will do), switch to a directory that contains bokeh_app, and run bokeh serve –show bokeh_app. Assume that all code is correct, the application will automatically open the address http://localhost:5006/bokeh_app in your browser. Then, we can access the application and view our dashboard!
The final version of the Bokeh Flight app
Debug in the Jupyter Notebook
If something went wrong (as it certainly did when we first started writing the dashboard), it was frustrating to have to stop the server, make changes to the files, and restart the server to see if our changes were having the desired effect. I usually develop diagrams in Jupyter Notebook for quick iteration and problem solving. Jupyter Notebook is a great development environment for Bokeh because you can create and test fully interactive drawings in a Notebook. The syntax is slightly different, but once you have a complete diagram, the code can be copied and pasted into a separate.py script with only minor modifications. To see this in action, check out [Jupyter Notebook](github.com/willkoehrse… – pyth – the visualization/blob/master/application/app_development ipynb).
conclusion
A fully interactive Bokeh dashboard can make any data science project stand out. I often see my colleagues do a lot of really great statistical work but can’t articulate the results, which means that all of this work doesn’t get the recognition it deserves. From personal experience, I find the Bokeh application very effective at communicating results. While it takes a lot of work (over 600 lines of code) to make a complete dashboard, the results are well worth it. Also, once we have an application, we can quickly share it using GitHub, and if we know our structure well, we can reuse the framework in other projects.
The key points from this project apply to many routine data science projects:
- Having the proper framework/structure (Bokeh or another) in place is crucial before embarking on a data science task. That way, you won’t find yourself lost in a forest of code trying to find errors. Also, once we develop an effective framework, it can be reused with minimal effort, resulting in future benefits.
- Finding a debug cycle that allows you to iterate on ideas quickly is critical. Jupyter Notebook’s ability to write code — view the results — and fix bad loops can help streamline the development cycle (at least for small projects).
- Interactive applications in Bokeh will enhance your project and encourage user participation. Dashboards can be standalone exploratory projects, or they can highlight all the hard analysis work you’ve done!
- You never know where to find the next tool you can use or help with your work. So keep your eyes open and don’t be afraid to try new software and technology!
That’s all for this article and this series, although I plan to publish additional standalone tutorials on Bokeh in the future. Presenting the results of data science in a convincing way is crucial, and with libraries like Bokeh and plot.ly, making interactive graphics is getting easier. You can check out all my work at the Bokeh GitHub repo, fork it for free and start your own projects. Now, I’m eager to see what others can create!
As always, I welcome feedback and constructive criticism. You can reach me on Twitter @koehrsen_will.
- Data Visualization with Bokeh in Python, Part 1: Getting started
- Data Visualization with Bokeh in Python, Part 2: Interaction
- Data Visualization with Bokeh in Python, Part 3: Making a complete dashboard
If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.
The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.