Author: Cola

Source: Coke’s path to data analysis

Please contact authorization for reprinting (wechat ID: data_COLA)

Hello, I’m Coke

Today, as we use charts and graphs to visualize data, have you ever wondered:

Where do these graphs come from?

At the beginning, people did not realize that the use of graphics to describe dry words, and the use of graphics to represent quantities, was a relatively new development.

One generation plants trees, another sits in the shade, and this article takes you through the history of those graphs/charts.

1. Line chart

In 1786, William Playfair drew a graph of England’s imports and exports between 1700 and 1780, considered to be the earliest known graph, or line chart, with time scale on the horizontal axis, years and values on the vertical axis.

(As an aside, in 1786, during the Reign of Emperor Qianlong of the Qing Dynasty, we always said that data analysis should be based on comparative thinking. Here comes comparison.)A line chart is a diagram made up of rectangular coordinates, points and lines. Often used to show numerical data over time, it is a better indicator of trends than a bar chart.Note for use:

  • The vertical axis starts at 0
  • Choose a relatively thick line
  • Try not to exceed 5 lines.
  • For predicted values, please use dashed lines

2. Bar chart

It was also the year that William Playfair used it creatively in his Illustration of Business and PoliticsThe bar chartTo show a discrete comparison of Scotland’s imports and exports between 1780 and 1781, and as you can see from his original drawing below, the horizontal axis shows the specific values of imports and exports, and the vertical axis shows the different countries, which is no different from the bar chart we use today.Bar charts can be used to representThe proportionBar charts can be vertical or horizontal. Horizontal bars are used to show category data. Vertical bars are used to show numerical data.Note the following when using: • Use the same color for the same series of data • Try not to use slanted labels • If data labels are added, remove the grid lines • Arrange data from largest to smallest.

3, the pie chart

Playfair believes that a picture is worth a thousand words, and he invented pie charts and line charts, etc. He has been engaged in many professions in his life, including businessman, statistician, postman, translator and accountant, etc. He can be said to be a young man with no problem of slashes. Perhaps it is these professional experiences that bring him the inspiration of graphic creation.

In 1801, 15 years after the bar chart was invented, Playfair used a pie chart to describe the proportions of the territory of the Ottoman Empire in Asia, Europe, and Africa. From his original manuscript, it can be seen that Europe accounted for 25% (right Angle), Asia for 60%, and Africa for 15%. This is the debut of the pie chart.

(in 1801, China entered the reign of Emperor Jiaqing of the Qing Dynasty.)Pie charts are circular statistical charts that divide data into several distinct sectors. In the pie chart, the arc length (as well as the central Angle and area) of each sector indicates the proportion of the species in the population, and the sectors together make a perfect circle.The pie chart basically shows thatThe proportionHowever, since angles are not as sensitive as lengths, when all sectors are of similar size, pie charts do not make much sense. Instead, bar charts or bar charts are recommended.

As the chart below shows, you can’t tell the nuances with a pie chart, but you can see them with a bar chart.Note when using:

  • When using, make sure that the data of each sector add up to 100%;
  • Avoid having more than 5 sectors and keep the chart simple.
  • Pay attention to the arrangement order of the sector. In general, place the largest sector at the 12 o ‘clock direction and then arrange the sector according to the area.
  • Finally, use color correctly to distinguish the sectors that need to be emphasized without being too confusing.

Scatter diagram

In 1833, John Herscherl published an article on observing the orbits of binary stars, in which he used a scatter plot to show the relationship between observation time and position Angle, which was the first modern scatter plot. John Herscherl was the son of William Herschel, who discovered Uranus and infrared light.

(in 1833, during the Daoguang Period of the Qing Dynasty, the first Opium War was about to break out)The charts we mentioned above are one-dimensional, while the scatter chart is a typical two-dimensional chart. It is a statistical chart of multiple coordinate points composed of two groups of data, mainly used to show data trends and explain the correlation between data.Note when using:

  • Scatter plots are suitable for exploring the relationship between variables
  • When using scatter plot to do correlation analysis, if the amount of data is too small, it does not have much explanatory significance
  • Too many data points will also affect the readability of the data graph
  • Data classification should not be too much, too much classification will lose the significance of comparison

5. Nightingale roses

Nightingale rose chart, is a variation of pie chart, it is Florence Nightingale invented, also known as the pole chart, cockscomb chart.

Lady Nightingale is also legendary. She was first a nurse, second a statistician and the first woman to become a member of the Royal Statistical Society.

In the 1850s, Britain, France, Turkey and Russia fought in the Crimean War. Nightingale volunteered to serve as a war nurse. The death rate from injuries was 42 percent. It was not until 1855, when the Sanitary Commission was brought in to improve the overall sanitary conditions in the hospital, that the death rate dropped dramatically to 2.5 percent. Nightingale noticed this and thought the government should improve conditions in field hospitals to save more young lives.

Worried that the results of her statistics would be ignored, she developed a colorful graphic format to make the data more impressive.

This picture is the graph of the nightingale was used when report this matter to express seasonal mortality, a military hospital on the whole * * : this chart is used to indicate, comparative field hospital injuries because of the number of deaths from all causes, deaths in each quadrant represents each month, the bigger the area represents the more deaths. 支那

(In 1850s, corresponding to the Xianfeng Era of the Qing Dynasty, the first Opium War broke out and China was reduced to a semi-colonial and semi-feudal society. The second Opium War broke out in 1856.)

This is a large rose and a small rose, the larger rose on the right, showing the period from April 1854 to March 1855; The rose chart on the left shows the period from April 1855 to March 1856, with April 1855 as the dividing point, dividing the 24 months into two plots and connecting them with a black line, because that’s about the date when the Board of Health tried to improve the environment, And so we can compare the number of deaths between the two years with a rough ratio of causes.

• Gray areas are significantly larger than other colors. This means that most casualties come not directly from war, but from infections in poor medical conditions. • After the arrival of the health commissioners (March 1855), the number of deaths dropped significantly.

Her approach impressed the highest officials of the day, including military figures and Queen Victoria herself, and the proposal for medical reform was supported, hence the “Nightingale Rose”, which resembles a blooming rose.

I have detailed instructions on how to make the amazing Nightingale Rose in this article.

Note when using:

  • Pie chart is to use the size of the Angle to reflect the value or proportion;
  • The nightingale rose chart shows the size of the data as the radius of the fan, while the angles of the fan remain the same.
  • It can be said that the Nightingale rose diagram is actually a circular histogram of polar coordinates. It exaggerates the visual effect of differences between data and is suitable for showing data that is less different than it should be

Snow’s cholera map

Cholera is an acute diarrheal infection caused by ingestion of food or water contaminated with the bacterium Vibrio cholerae, which can cause diarrheal dehydration and death within hours.

When cholera broke out in London’s Soho district in 1854, people didn’t know why it started or what to do about it. Faced with this kind of infectious disease with high transmission rate and high fatality rate, people at that time were helpless.

John Snow, a British anesthesiologist and epidemiologist, visited the affected areas, mapped the correlation between cholera cases and the surrounding water pumps, and used statistics to show the correlation between water quality and cholera, eventually pinpointing a public well.

(When the Second Opium War broke out in 1856, the Qing government was forced to sign a series of unequal treaties.)John Snow did not discover the pathogen of cholera, but creativelyUsing spatial statisticsFind the source of infection, to the future generations of infinite use value.

As public health systems have matured and antibiotics have been used to treat cholera, it has become less frightening.

7. Sankey diagram

In 1812, Napoleon declared war on Russia, started from France and marched on Moscow, but failed.

Charles Joseph Minard, a French civil engineer, published a statistical chart on November 20, 1869, which combined the Sankey diagram with cartography and temperature line charts to give a very visual demonstration of the course of the war. How an army of 422,000 took its toll in battle, geography and freezing, eventually reducing it to just 10,000, is known as Napoleon’s Map of the East, the original Sankey map.

This picture shows the number of Napoleon’s army, the distance traveled, the dimensions, latitude and longitude, the direction of travel, and the location of a particular date or event.

(In 1869, Cixi came to power during the Reign of Emperor Tongzhi of the Qing Dynasty.)

A Sankey diagram is a diagram representing the flow of values from one set of values to another. The width of the branch corresponds to the size of the data traffic. As shown in the figure below, the user conversion situation is described, including how many users join the shopping cart and how many users pay.Note when using:

  • Avoid colors that are too loud to read
  • Sankey diagrams are characterized by conservation of energy, so the width of each edge should remain constant

Reference:

  1. www.tuzhidian.com/chart?id=5c…
  2. Blog.csdn.net/weixin_3875…
  3. The Beauty of Statistics
  4. Finance.sina.com.cn/money/fund/…
  5. www.datavis.ca/papers/frie…

Official account: The road of data analysis of Cola

Reply: “Documentation”, access to the original 130,000 word data analysis bible and 57 pages OF SQL quick reference manual