Discussion of visual choices?

1. Introduction

Recently, a friend put up a topic in the group: “About the domestic epidemic trend map drawn by Dingxiang Garden, Tencent and Baidu, which one is better?” . Hoshi was surprised that the topic would generate so much discussion that several data analysis forums would start discussing fryers. Lay people’s circle of friends on this topic, there are dozens of comments below, the power of this topic, terrible!

Just as laymen said in the group, “If you come to discuss a few words on this topic, laymen will give you an article of more than 1000 words.” Therefore, laymen will summarize and analyze the topic and give their personal suggestions.

At the same time, in order to make everyone feel the most authentic discussion content, lay people will select 6 points from so many points of view and show them to everyone, and lay people leave comments on each point of view.

2. Tencent and DXY vs. Baidu

As to which is better in the domestic epidemic trend map drawn by Tencent and Baidu, Lay made a unified arrangement by comparing the two charts and this discussion.

This paper mainly discusses from three perspectives: user perspective, data analysis perspective and business perspective.

The following is the domestic epidemic trend chart of DXY Garden and Tencent, which adopts the direct drawing method, that is, the data are not processed and directly drawn into a broken line chart.

Tencent picture.png

User perspective: This method directly shows the distribution trend of real data, so that users can intuitively understand the real situation.

Data analysis perspective: Due to the existence of a discrete point, the visual fluctuation of other parts of the line chart is not obvious, so it is not easy to find the data rules. However, as the object oriented is the general public, this way of presentation is more in line with the public thinking. Moreover, since it is epidemic information, every transformation processing may bring uncertain policy risks. This way of thinking is relatively safe.

Business perspective: Although discrete points have a great influence on other data, this method is recommended to show the distribution trend of real data.

The following is baidu’s domestic epidemic trend map, which uses the unequal distance mapping method, that is, data is mapped and then drawn into a broken line map.

Baidu Picture.png

User perspective: By changing the original data, this drawing method is not easy for users to intuitively understand the differences between data and the real situation.

Data analysis perspective: This method can eliminate the influence of discrete points on other data, and it is easy to see the law of data trend.

Business perspective: Although this method changes the mapping of some data, it has certain visual deception from the perspective of users, but it is easy for the public to find the changing rules of domestic epidemic trends, with obvious advantages and disadvantages.

3. Is there a better way to visualize?

First of all, Lay wants to explain that there is no absolutely good way to visualize, only the right way to visualize data.

Then share three other visualizations that show trends in the country.

3.1 Take logarithmic drawing method

Gu Ming magic, the method is to take the logarithm of the original data, and then visualization.

The following code takes the logarithm of the data.

# Take logarithms of the data
def data_to_log(data) :
    a=[]
    for i in data:
        if i>0:
            a.append(log(i))
        else:
            a.append(i)
    return a
Copy the code

The data is then visualized.

f=plt.figure(figsize=[20.12])
ax=f.subplots(1.1)
a=[log(i) for i in day_add_pd['everyday_addconfirm']]
b=[log(i) for i in day_add_pd['everyday_addsuspect']]
c=data_to_log(day_add_pd['everyday_adddead'])
d=data_to_log(day_add_pd['everyday_addheal'])
ax.plot(day_add_pd['date_addlist'],a,color='#E54646',marker='o', linestyle='solid',label='Newly confirmed')
ax.plot(day_add_pd['date_addlist'],b,color='#F1AF00',marker='o', linestyle='solid',label='New suspected')
ax.plot(day_add_pd['date_addlist'],c,color='# 707070',marker='o', linestyle='solid',label='New deaths')
ax.plot(day_add_pd['date_addlist'],d,color='#00B2BF',marker='o', linestyle='solid',label='New cure')
plt.xticks(day_add_pd['date_addlist'], rotation=45,fontsize=12) # scale value
plt.legend(fontsize=20)
plt.yticks(fontsize=20)
Copy the code

Chart presentation.

Take logarithm method. PNG

User perspective: This drawing method takes logarithms of numerical data, so it is not easy for users to intuitively understand the distribution and real situation between data.

Data analysis perspective: This method can eliminate the influence of discrete points on other data and find the law of data easily.

Business perspective: Although data rules can be found through this method, it is not recommended to use this method because the original data is changed and it is difficult for the general public to transform logarithms, which makes the public not know the real data value.

3.2 Sliding time axis method

Simply put, using this method to draw graphs, users can choose the corresponding time for visualization.

Here is the visual code.

import pyecharts.options as opts
from pyecharts.charts import Line
def line_smooth(data) -> Line:
    c = (
        Line()
        .add_xaxis(["2020 / {} {}".format(int(data['date_addlist'][i][0:2]),int(data['date_addlist'][i][3:5])) for i in range(len(data['date_addlist']))])
        .add_yaxis("New confirmed cases", data['everyday_addconfirm'], is_smooth=True,itemstyle_opts=opts.ItemStyleOpts(
                color="#E54646"),markpoint_opts=opts.MarkPointOpts(data=[opts.MarkPointItem(type_="max")]),)
        .add_yaxis("New suspected", data['everyday_addsuspect'], is_smooth=True,itemstyle_opts=opts.ItemStyleOpts(
                color="#F1AF00"),markpoint_opts=opts.MarkPointOpts(data=[opts.MarkPointItem(type_="max")]),)
        .add_yaxis("New deaths", data['everyday_adddead'], is_smooth=True,itemstyle_opts=opts.ItemStyleOpts(
                color="# 707070"),markpoint_opts=opts.MarkPointOpts(data=[opts.MarkPointItem(type_="max")]),)
        .add_yaxis("New cure", data['everyday_addheal'], is_smooth=True,itemstyle_opts=opts.ItemStyleOpts(
                color="#00B2BF"),markpoint_opts=opts.MarkPointOpts(data=[opts.MarkPointItem(type_="max")]),)
        .set_global_opts(title_opts=opts.TitleOpts(title="Line-smooth"))
        .set_global_opts(datazoom_opts=[opts.DataZoomOpts()]
                        )
        .set_series_opts(
           label_opts=opts.LabelOpts(is_show=False),
            linestyle_opts=opts.LineStyleOpts(width = 2,type_='solid')))return c
line_smooth(day_add_pd).render()
Copy the code

Chart presentation.

Slide the timeline. PNG

User perspective: through the selection of time, watch the local data, study the law of the local data; Select all the data and view the distribution trend of the overall data.

Data analysis perspective: By intercepting part of the time, the influence of discrete points on the variation trend of local data can be eliminated, and the law of local data can be easily found

Business perspective: it can directly display the distribution trend of real data, and observe the law of local data by selecting local data, and compare and recommend. However, the need for interactive plug-ins makes it difficult to use in reports

3.3 Truncated discrete point drawing method

To put it bluntly, the method is to take the discrete points out of the original data and draw them separately.

Here is the visual code.

p=plt.figure(figsize=[20.14])
ax1= p.add_subplot(212)  Draw two subgraphs
plt.subplots_adjust(wspace=0,hspace=0.1) # Set subgraph spacing
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_addconfirm'],color='#E54646',marker='o', linestyle='solid',label='Newly confirmed')
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_addsuspect'], color='#F1AF00', marker='o',label='New suspected')   # Draw a polyline
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_adddead'],color='# 707070',marker='o', linestyle='solid',label='New deaths')
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_addheal'],color='#00B2BF',marker='o', linestyle='solid',label='New cure')
plt.xticks(rotation=45,fontsize=12) # scale value
plt.yticks(fontsize=20)


ax= p.add_subplot(211)  Draw two subgraphs
ax.xaxis.set_major_locator(plt.NullLocator()) Delete the scale display of the coordinate axes
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_addsuspect'], color='#F1AF00', marker='o',label='New suspected')   # Draw a polyline
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_adddead'],color='# 707070',marker='o', linestyle='solid',label='New deaths')
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_addheal'],color='#00B2BF',marker='o', linestyle='solid',label='New cure')
plt.plot(day_add_pd['date_addlist'],day_add_pd['everyday_addconfirm'],color='#E54646',marker='o', linestyle='solid',label='Newly confirmed')
plt.xticks(rotation=45,fontsize=12) # scale value
plt.yticks(fontsize=20)
ax.set_ylim(6000.20000) Set the ordinate range
ax1.set_ylim(0.6000)  Set the ordinate range


ax.grid(axis='both',linestyle='-') # Open the grid line
ax1.grid(axis='y',linestyle='-')   # Open the grid line


ax.legend(fontsize=20) # Let the legend work
# plt.xLabel (" date ") #X label
#plt.ylabel(" epidemic data ") # Y-axis label


ax1.spines['top'].set_visible(False)    # Border control
ax1.spines['bottom'].set_visible(True) # Border control
ax1.spines['right'].set_visible(False)  # Border control


ax.spines['top'].set_visible(False)   # Border control
ax.spines['bottom'].set_visible(False) # Border control
ax.spines['right'].set_visible(False)  # Border control


ax1.tick_params(labeltop='off')


# Draw fault lines
d = 0.02  # Size of fault line
kwargs = dict(transform=ax.transAxes, color='k', clip_on=False)
ax.plot((-d, +d), (-d, +d), **kwargs)        # top-left diagonal


kwargs.update(transform=ax1.transAxes, color='k')  # switch to the bottom ax1es
ax1.plot((-d, +d), (1 - d, 1 + d), **kwargs)  # bottom-left diagonal


plt.show()
Copy the code

Chart presentation.

Truncation drawing method. PNG

User perspective: discrete points are drawn separately in this graph, which can still intuitively understand the real data and data distribution trend.

Data analysis perspective: By cutting out discrete points, the influence of discrete points on the overall data law can be eliminated and the data law can be easily mined.

Business perspective: This method not only allows the public to understand the original data, but also easy to mine the law of data, recommended. However from the custom with at ordinary times slightly different.

4. Everyone’s opinion

We have a lot of opinions, here are 6 points, lay people have also made comments, welcome to leave a message to claim ~

Point 1

It is not recommended to take logarithmic drawing, feeling will paralyze some people’s understanding.

I agree with that. I take the logarithm and plot it, and it changes the data dramatically.

Point 2

The default is to show the last 7 days and then the time can be adjusted, or the coordinate axis can be adjusted.

This view is similar to the sliding timeline method.

Point 3

I think most people are used to the fact that the scale is uniform, so that the shape of the curve has a definite meaning of high and low values.

This point of view and Tencent map train of thought happen to coincide.

Point 4

To be honest, this chart of domestic epidemic trends is a good example of the data analyst’s dilemma. This graph, from the perspective of the data analysis profession, has a lot of different opinions, which is hard to convince people. And even if one person had a genius idea to solve it, I don’t think the average person or the boss would think much of it.

I quite agree with this point of view. Indeed, there is no correct visualization method, only the appropriate data visualization method.

Point 5

Graphic visualization is about telling you something. But if you use visualizations that aren’t accessible to everyone, you’re not going to get that message across.

Right, visualization is to make it more convenient for you to understand the real situation and distribution trend of data.

Point 6

Regarding the standard of the index system, the user’s information acquisition efficiency must be regarded as the most core standard, and further refined into the index management standard, development standard, visualization standard and so on. The pure pursuit of comprehensive, cool chart is not the need of enterprise practice at all!

Agree, user information acquisition efficiency is indeed the ultimate goal of visualization. Everyone wants to do it, but they know it’s not easy.


Finally, if you think the content is good, welcome to your friends ~

“`php

Highlights of past For beginners entry route of artificial intelligence and data download AI based machine learning online manual deep learning online manual download update (PDF to 25 sets) this qq group, 1003271085 to join WeChat group please reply “add group” to get a sale standing knowledge star coupons, please reply “planet” knowledge like articles, point in watching

Copy the code