• Data Visualization with Bokeh in Python, Part II: Interactions
  • Will Koehrsen
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Starrier
  • Proofread by: TrWestdoor

Diagrams that go beyond static diagrams

In the first part of this series, we introduced a basic bar chart created in Bokeh, a powerful visualization library in Python. The final results show the distribution of delayed arrivals for flights departing from New York City in 2013, as follows (with a very good tool tip) :

This list gets the job done, but it’s not very attractive! Users can see that flight delays are almost normal (with a slight slope), but they have no reason to spend more than a few seconds on this number.

If we want to create more engaging visualizations of data, we can allow users to interactively retrieve the data they want. For example, in this bar chart, a valuable feature is the ability to choose to specify an airline for comparison, or to change the width of the container to examine the data in more detail. Fortunately, we can use Bokeh to add these two features to our existing drawing. The initial development of bar charts seemed to involve a simple graph, but we are now about to experience the benefits of a powerful library like Bokeh!

All of the code for this series is available on GitHub. Anyone interested can check out all the data cleansing details (a less inspiring but essential part of data science) or run them themselves! (For interactive Bokeh diagrams, we can still use Jupyter Notebook to display the results, or we can write Python scripts and run the Bokeh server. I usually use Jupyter Notebook for development because it makes it easy to quickly iterate and change drawings without having to restart the server. I then migrated them to the server to display the final results. You can see a separate script and full notes on GitHub).

Active interaction

In Bokeh, there are two types of interaction: passive and active. The passive interactions described in Part 1 are also known as inspectors because they allow the user to examine a diagram in more detail, but not to change the displayed information. For example, a tooltip that appears when the user hovers over a data point:

Tooltips, passive interactivators

The second type of interaction is called active because it changes the actual data displayed on the drawing. This can be anything from selecting a subset of the data (for example, a specified airline) to changing the degree of regression fitting of the matching polynomial. There are many types of active interactions in Bokeh, but here we will focus on “widgets” that can be clicked, and the user can control certain elements in terms of drawing.

Sample widgets (drop-down buttons and radio button groups)

When I look at diagrams, I like active interactions (such as those on FlowingData) because they allow me to explore the data myself. I find it more impressive to find conclusions from my own data (some research directions from the designers) than from a completely static graph. In addition, giving users a degree of freedom allows them to have more useful discussions about the data set, leading to different interpretations.

Interaction overview

Once we start adding active interactions, we need to move beyond single lines of code to the functions that encapsulate specific actions. There are three main functions for Bokeh widget interaction:

  • make_dataset()Format the specific data you want to display
  • make_plot()Draw with the specified data
  • update()Update drawing based on user selection

Formatting data

Before we can draw this graph, we need to plan the data that will be displayed. For our interactive bar chart, we will provide the user with three controllable parameters:

  1. Flight display (called carrier in code)
  2. Time delay range in drawing, e.g. -60 to 120 minutes
  3. By default, the container width of the bar chart is 5 minutes

For functions that generate drawing data sets, we need to allow each parameter to be specified. To tell us how to convert the data in the make_dataset function, we need to load all the relevant data to check.

Bar chart data

In this data set, each row is a separate flight. The arr_delay column is a flight arrival delay of several minutes (negative numbers indicate early arrival). In part one, we did some data exploration and learned that there were 327,236 flights with a minimum delay of -86 minutes and a maximum delay of 1,272 minutes. In the make_dataset function, we want to select companies based on the name column in the dataframe and restrict flights with the arr_DELAY column.

To generate the histogram data, we use the NUMPY function histogram to tally the data points in each container. In our example, this is the number of flights in each specified delay interval. For the first part, we made a bar chart for all flights, but now we will provide a bar chart for each carrier. Since the number of flights varies greatly from airline to airline, we can show the delay instead of the original number, we can show it proportionally. That is, the height on the graph corresponds to the proportion of all flights for a particular airline that have a delay in the corresponding container. From counting to scaling, we divide by the total number of airlines.

Below is the complete code to generate the dataset. The function takes a list of carriers we want to include, a minimum and maximum delay to draw, and a specified container width in minutes.

def make_dataset(carrier_list, range_start = - 60, range_end = 120, bin_width = 5):

    Check to ensure that the start point is less than the end point
    assert range_start < range_end, "Start must be less than end!"
    
    by_carrier = pd.DataFrame(columns=['proportion'.'left'.'right'.'f_proportion'.'f_interval'.'name'.'color'])
    range_extent = range_end - range_start
    
    # Traversal all carriers
    for i, carrier_name in enumerate(carrier_list):

        # Carrier subset
        subset = flights[flights['name'] == carrier_name]

        Create a bar chart with the specified container and scope
        arr_hist, edges = np.histogram(subset['arr_delay'], 
                                       bins = int(range_extent / bin_width), 
                                       range = [range_start, range_end])

        # Divide extreme speed by total to get a ratio and create df
        arr_df = pd.DataFrame({'proportion': arr_hist / np.sum(arr_hist), 
                               'left': edges[:- 1].'right': edges[1:]})# Formatting scale
        arr_df['f_proportion'] = ['% 0.5 f' % proportion for proportion in arr_df['proportion']]

        # format interval
        arr_df['f_interval'] = ['%d to %d minutes' % (left, right) for left, 
                                right in zip(arr_df['left'], arr_df['right']]# Specify carrier for tag
        arr_df['name'] = carrier_name

        # Different color carriers
        arr_df['color'] = Category20_16[i]

        Add to the entire dataframe
        by_carrier = by_carrier.append(arr_df)

    # general dataframe
    by_carrier = by_carrier.sort_values(['name'.'left'])
    
    Convert dataframe to column data source
    return ColumnDataSource(by_carrier)
Copy the code

(I know this is a blog post about Bokeh, but you can’t generate charts without formatting data, so I used code to demonstrate my approach!)

The result of running a function with the desired operator is as follows:

As a reminder, we use the Bokeh Quad table to make the bar chart, so we need to provide the left, right, and top of the table (the bottom will be fixed at 0). They are listed in left, right and PROPORTION respectively. The color column provides a unique color for each operator, and the F_ column tool provides the ability to format text.

The next function to implement is make_plot. The function should accept the ColumnDataSource (a specific type of object used for drawing in Bokeh) and return the drawing object:

def make_plot(src):
        # Blank image with correct tags
        p = figure(plot_width = 700, plot_height = 700, 
                  title = 'Histogram of Arrival Delays by Carrier',
                  x_axis_label = 'Delay (min)', y_axis_label = 'Proportion')

        # Create four symbols for the bar chart
        p.quad(source = src, bottom = 0, top = 'proportion', left = 'left', right = 'right',
               color = 'color', fill_alpha = 0.7, hover_fill_color = 'color', legend = 'name',
               hover_fill_alpha = 1.0, line_color = 'black')

        Hover tool in # vline mode
        hover = HoverTool(tooltips=[('Carrier'.'@name'), 
                                    ('Delay'.'@f_interval'),
                                    ('Proportion'.'@f_proportion')],
                          mode='vline')

        p.add_tools(hover)

        # Styling
        p = style(p)

        return p 
Copy the code

If we pass a source to all airlines, this code will give the following plot:

This bar chart is very confusing because 16 airlines are all plotted on the same chart! Because the information is overlapped, it’s not realistic for us to compare airlines. Fortunately, we can add widgets to make the drawing clearer and to make quick comparisons.

Create interactive widgets

Once we have created a base graph in Bokeh, it is relatively easy to add interactions through widgets. The first widget we need is a selection box that allows the user to select the airline to display. This is a checkbox control, called T CheckboxGroup in Bokeh, that allows as many selections as needed. To make this optional tool, we need to import the CheckboxGroup class to create an instance with labels: we want to display the values next to each box and Active: check the selected initial box. The CheckboxGroup code created below has the required operators attached.

from bokeh.models.widgets import CheckboxGroup

# Create checkboxes for optional elements that are available as carriers
# List of all airlines in the data
carrier_selection = CheckboxGroup(labels=available_carriers, 
                                  active = [0.1])
Copy the code

CheckboxGroup parts

The label in the Bokeh check box must be a string, but an integer is required to activate the value. This means that in the image ‘AirTran Airways Corporation’ the activation value is 0 and ‘Alaska Airlines Inc.’ the activation value is 1. When we want to match the selected check box to the airlines, we need to make sure that the selected integer activation value matches the corresponding string. We can do this using the.labels and.active attributes of the widget.

# Select the airline name from the selection value
[carrier_selection.labels[i] for i in carrier_selection.active]

['AirTran Airways Corporation'.'Alaska Airlines Inc.']
Copy the code

With the widget finished, we now need to link the selected airline check box to the information displayed on the diagram. This is done using the CheckboxGroup’s.on_change method and the update function we defined. The update function always takes three arguments: attr, old, new, and updates the drawing based on the selected control. The way to change the data displayed on the graph is to change the data source of the graph that we pass to the make_plot function. This may sound a bit abstract, so here is an example of an update function that displays the selected airline by changing the bar chart:

The # update function takes three default arguments
def update(attr, old, new):
    # Get the list of carriers for the graph
    carriers_to_plot = [carrier_selection.labels[i] for i in
                        carrier_selection.active]

    # based on the selected carrier and
    The make_dataset function previously defined creates a new dataset
    new_src = make_dataset(carriers_to_plot,
                           range_start = - 60,
                           range_end = 120,
                           bin_width = 5)

    # Update the source used in Quad GLPYHS
    src.data.update(new_src.data)
Copy the code

Here, we retrieve the list of airlines to display based on the selected airline from the CheckboxGroup. This list is passed to the make_dataset function, which returns a new column data source. We update the source data used in the diagram by calling src.data.update and passing in data from the new source. Finally, in order to link the changes in the carrier_Selection widget to the update function, we must use the.on_change method (called the event handler).

Link the changes in the selected button to the update function
carrier_selection.on_change('active', update)
Copy the code

The update function is called when other flights are selected or cancelled. The end result is that only symbols corresponding to the selected airline are plotted in the bar chart, as follows:

More controls

Now that we know the basic workflow for creating a control, we can add more elements. Each time we create the widget, we write the update function to change the data displayed on the drawing, and link the update function to the widget through the event handler. We can even override the function to use the same update function from multiple elements to extract the values we need from the widget. In practice, we’ll add two additional controls: a Slider to select the width of the bar chart container, and a RangeSlider to set the minimum and maximum latency. Here is the code to generate these widgets and the update function:

# slide bindWidth and the corresponding value will be selected
binwidth_select = Slider(start = 1, end = 30, 
                     step = 1, value = 5,
                     title = 'Delay Width (min)')
Update the drawing when the value is modified
binwidth_select.on_change('value', update)

RangeSlider is used to modify the minimum maximum value on the bar chart
range_select = RangeSlider(start = - 60, end = 180, value = (- 60.120),
                           step = 5, title = 'Delay Range (min)')

Update the drawing when the value is modified
range_select.on_change('value', update)


Update function for 3 controls
def update(attr, old, new):
    
    Find the selected carrier
    carriers_to_plot = [carrier_selection.labels[i] for i in carrier_selection.active]
    
    # Change binWidth to the selected value
    bin_width = binwidth_select.value

    # Range slider value is a tuple (start, end)
    range_start = range_select.value[0]
    range_end = range_select.value[1]
    
    Create new column data
    new_src = make_dataset(carriers_to_plot,
                           range_start = range_start,
                           range_end = range_end,
                           bin_width = bin_width)

    Update data on the drawing
    src.data.update(new_src.data)
Copy the code

The standard slider and range slider are shown below:

In addition to displaying data using the update function, we can modify other drawing functions if we want. For example, to match the header text to the container width, we could do this:

# Change the drawing title to match selection
bin_width = binwidth_select.value
p.title.text = 'Delays with %d Minute Bin Width' % bin_width
Copy the code

There are many other types of interactions in Bokeh Cnooc, but now our three controls allow to run on the icon “Run”!

Put it all together

All of our interactive drawing elements are covered. We have three necessary functions: make_dataset, make_plot, and update to change the plot based on the control and the system itself. We link all these elements to one page by defining the layout.

from bokeh.layouts import column, row, WidgetBox
from bokeh.models import Panel
from bokeh.models.widgets import Tabs

Put the control in a single element
controls = WidgetBox(carrier_selection, binwidth_select, range_select)
    
Create a row layout
layout = row(controls, p)
    
Create a TAB using the layout
tab = Panel(child=layout, title = 'Delay Histogram')
tabs = Tabs(tabs=[tab])
Copy the code

I put the entire layout on one TAB, and when we create a complete application, we can create a separate TAB for each drawing. The final result is as follows:

You can view the code on GitHub and draw your own drawings.

Next steps and content

The next part of this series will discuss how to make a complete application using multiple drawings. We will present the results of our work through the server, which can be accessed through a browser, and create a full dashboard to explore the data set.

As we can see, the final interactive drawing is much more useful than the original! We can now compare the latency between airlines and change the width/range of the container to see how these distributions are affected. The added interactivity increases the value of drawing because it increases support for data and allows users to draw conclusions through their own exploration. Despite setting up the initial drawing, we can still see how easy it is to add elements and controls to an existing drawing. Compared to a quick and easy drawing library like Matplotlib, using a heavier drawing library like Bokeh allows you to customize your drawing and interaction. Different visualization libraries have different benefits and use cases, but Bokeh is a great choice when we want to add an extra dimension to the interaction. Hopefully at this point you feel confident enough to develop your own visualizations and also to see that you can share your creations.

Feedback and constructive criticism are welcome. Contact me on Twitter @koehrsen_will.


  1. Data Visualization with Bokeh in Python, Part 1: Getting started
  2. Data Visualization with Bokeh in Python, Part 2: Interaction
  3. Data Visualization with Bokeh in Python, Part 3: Making a complete dashboard

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.