The difference between data science, machine learning and AI

Most of the time, I don’t know the difference between these three things. If I don’t know the difference, I tend to fall into the wrong area, and develop an app that is not the same, unable to grasp the key point, and unable to explain to customers, friends, and colleagues what the app solves. The distinction between the three is explained:

Let’s say we’re building a self-driving car, and we’re working on the specifics of how to stop at a stop sign. Then we will need to use skills in all three areas.

Machine learning: Cars must be able to recognize stop signs through their cameras. We built data sets containing photos of millions of street-side objects and trained an algorithm to predict which ones had stop signs.

Artificial intelligence: Once our cars can read stop signs, they need to decide when to apply the brakes. Too early too late is dangerous, and we need it to be able to cope with different road conditions (for example, it needs to be aware that braking on slippery roads does not slow down quickly), which is a matter of control theory.

Data science: In our street tests, we found that the cars weren’t performing as well as they should because they missed stop signs. Analyzing data from street tests, we concluded that it’s easier to miss a stop sign before sunrise and after sunset. This led us to realize that most of our training data consisted only of daytime images, so we constructed a better data set that included nighttime images, and then went back to the machine learning steps for research testing.

What is data visualization

Data visualization is an important part of any data science or machine learning project. Data visualization mainly aims to communicate information clearly and effectively by means of graphics. In other words, visualization exists to help us better convey information and discover patterns in data.

People often start by exploring data analysis to gain insight into the data, and creating visualizations can really help make problems clearer and easier to understand, especially for those with large, high-dimensional data sets. At the end of a project, it’s important to be able to present the end result in a clear, concise and compelling way that your users can understand and understand.

Data visualization is the presentation of data in graphs or tables. A lot of information in a coherent, short report that is focused on. While data visualization can process written information, the focus is often on using pictures and pictorial information to communicate to the viewer. Moreover, data visualization is not limited to the use of data. It could also be visualizing all kinds of information — you can communicate your thoughts and assumptions to others. Today, you can add a variety of techniques to data visualization and even choose interactive visualization methods. Visual representation of information is an ancient way of sharing ideas and experiences. Charts and maps are important examples of some early data visualization techniques.

The importance of data visualization

Humans have been using data visualization technology for a long time, and images and charts have proved to be an effective way to convey and teach new information. There are studies that show that 80% of people remember what they saw, but only 20% remember what they read! It can even pass on ideas and events to future generations. The development of technology has further improved the opportunities brought by data visualization.

Perhaps the most important benefit of using data visualization is that it helps people understand data faster. You can highlight a large amount of data in a chart, and people can quickly spot key points. In writing, it can take hours to analyze all the data and connections.

In addition, this ability to display huge amounts of data is another advantage of data visualization. A chart may highlight a number of different things, and people can form different opinions on the data. This, of course, opens up new avenues for business. One might find something unexpected in the data.

Visualization of data improves the ability to interpret information. It’s not easy to make connections through mountains of data and information, but graphs and charts can provide information in seconds. It can be seen at a glance and provide the information needed.

Data visualization is generally regarded as a simple and effective way to summarize data, so it is a method that can improve people’s sharing of information and learning.

What does data visualization solve

Data visualization is all about facilitating action, so make the decision makers understand it! Diagrams are more expressive than data sheets. Each type of chart was born out of a clear and urgent need; So when you have to choose between the types of charts you already know, think about the problem you’re trying to solve!

Traditional visualization can be roughly divided into exploratory visualization and explanatory visualization. According to the application, visualization has multiple goals:

  • Effectively present important features
  • Reveal objective laws
  • Assist in understanding concepts and processes
  • Quality control of simulation and measurement
  • Improve the efficiency of scientific research and development
  • Promoting communication, exchanges and cooperation

From a macro perspective, there are three functions of visualization:

  • Information records
  • Information reasoning and analysis
  • Information dissemination and collaboration

Data visualization branch

Data Visualization consists of three branches, Scientific Visualization (Sci Vis, Scientific Visualization) and Information Visualization (Info Vis, Information Visualization), and later evolved into a third branch: This can be seen from the IEEE VIS conference classification.

Scientific visualization is oriented to scientific and engineering data, such as three-dimensional spatial measurement data of spatial coordinates and geometric information, computer simulation data, medical image data, and focuses on exploring how to present the laws contained in the data with geometric, topological and shape features.

Information visualization deals with non-structured, non-geometric abstract data, such as financial transactions, social networks and text data. Its core challenge is how to reduce the interference of visual confusion on large scale and high dimensional complex data.

In recent years, with the rise of artificial intelligence, people have gradually discovered that there are some things that machines can do better than humans, and also some things that need the help of 300 million years of human evolution. So the combination of visualization and analysis has given rise to a new discipline: visual analytics. Visual analytics is defined as an analysis and reasoning science based on visual interactive interface, which integrates graphics, data mining, human-computer interaction and other technologies to form complementary advantages and mutual improvement of human brain intelligence and machine intelligence.

Data visualization type

In general, most data visualizations fall into two different types: exploratory and interpretive. Exploration types help people discover the story behind the data, and parsing the data is easy for people to see.

In addition, there are different ways to create these two types. The most common data visualization methods include:

  • 2D regions – This method uses geospatial data visualization techniques that often involve the location of a particular surface of an object. Examples of 2D area data visualization include dot maps that show, for example, crime in a given area.

  • Temporal – temporal visualization is the presentation of data in a linear manner. The key is that temporal data visualization has a beginning and an end. An example of temporal visualization could be a connected scatter diagram showing, for example, temperature information for certain areas.

  • Multidimensional – Present 2-dimensional or higher-dimensional data by using common multidimensional methods. An example of a multidimensional visualization might be a pie chart that shows things like government spending.

  • Layering – The layering method is used to render multiple groups of data. These data visualizations often show small groups within large groups. Examples of hierarchical data visualization include a tree diagram that shows language groups.

  • Networking – Showing relationships between data on a network is a common way to show large amounts of data.

The idea behind how to do data visualization well

Anyone who has seen data visualizations knows that design is good or bad. If this information is not presented in the right and appropriate way, the benefits of data visualization can easily be lost, and a particular project requires a particular approach.

No matter what your message is about, there are a few ideas to keep in mind when using data visualization.

Know your audience

The first thing to do before presenting data is to think about who is going to view it. In order to find the right data visualization method, it is critical to know your audience.

Although data visualization is often a way of simplifying data, audiences may still have different knowledge backgrounds and need to be prepared for it. If your data visualization is aimed at a professional audience, you can interpret the data using more appropriate methods and terminology. A general audience, on the other hand, may need the same data to provide a clearer explanation.

It’s also important to know what your audience expects from the data. What are the key points they want? You need to present it clearly in the data. Also, you need to understand the intent of your data.

Know the data well enough

In addition to knowing your target audience, you also need to understand the implications of the data. If you don’t fully understand your data, you won’t be able to communicate it effectively to your audience.

You can’t extract everything from the data either, so you need to find the key information and present it in a consistent way. You also need to make sure the data is correct and not imaginary – don’t visualize false data!

If you understand it correctly, you can also get unique and interesting information from the data.

Tell a story

Your data visualization should also strive to tell a story. You don’t want the data to be a set of information that just presents itself, but the information behind the data that you use. It may be about introducing a different narrative and painting a particular image for the viewer.

Using a story often means that the audience gets more insight from the data. It helps the audience understand and dig into new information.

In fact, data visualization is a great storytelling tool. The old saying “images can tell a thousand stories” is true, and you should use it to your advantage. Telling a story through a data set is not difficult because you can use colors, fonts, and statements as part of your storytelling approach. In order to make data visualization and storytelling better, it’s crucial to understand the data.

Keep it simple

Data visualization has evolved rapidly in recent years, and as mentioned above, there are many tools and systems at your disposal. Just because you’re exposed to different unique methods doesn’t mean you need to use them. In addition, large amounts of data should not automatically assume that all information is essential.

In short, you need to keep your data visualization methods simple. You don’t want to include too much data or use too many different technologies for it.

If you’re considering telling a story through a camera, it’s important to understand that every element of your vision should be an essential part of the story. If data or elements, such as a picture of something, do not add any significant story, then you should not include it in your report.

Visualizations with too many elements can actually damage the finished product and skew the data. You also need to remember that the benefit of data visualization is to visually present large amounts of data. If your visualization looks like a struggle, then you need to go back and see if you’re using the wrong data presentation or containing too much information.

Avoid serious errors in visualizing data

While the above key methods can help you develop a data visualization strategy, you also need to be aware of some common mistakes.

The error message

Errors in the data mentioned above can mislead the audience. You want to make sure that the people who are looking at your data are seeing the right information. It’s your job to make sure that people can use data from your charts and images without having to double-check the information.

Incomplete information

In addition to making sure all the information is correct, you also need to provide complete data. Observers must find relevant data in their total information and do not use data visualization to trick or present incomplete information.

Data visualization can and should tell a story, but the story needs to have complete and correct information, not numbers that look right in a report.

Simple data

While you need to make sure your data is presented in a simple way, that doesn’t mean you should simplify it. First, you need to remember your audience — don’t use the usual simple language if you’re presenting to data professionals. On the other hand, if the audience has little awareness of it, don’t fill the text with jargon.

In addition, you can’t expect your audience to have a clear understanding of the connections between data without the help of clear description visualization. You can’t omit information because it seems obvious – remember, your audience will only see your current data, not the full data set used in the past!

Inappropriate visualization

When you present data, you need to think about it. When it comes to things like fonts, colors and images, backgrounds are very important. For example, if you’re presenting information about deaths due to a specific disease, a brightly colored, pleasant image might not seem appropriate.

Improper visualization involves the technology being used, making it difficult to view and understand the data. For example, you could use bubbles to represent different levels of spending in your department, but if you don’t take size differences into account, bubbles will misjudge and be inaccurate.

Forgotten annotation

Oversimplification can also lead to missing comments. When you present data, it’s easy to assume that the audience knows what every aspect of the image is. Simply adding annotations can improve the user experience and ensure that the audience knows all the key data points in your data.

As an example, you might have a chart showing how many bikes your business has sold over the past decade. If there is a big drop or rise in the data, a note explaining the reason behind the sudden change will ensure that the viewer gets this extra information.

Data visualization development process

First of all, we need to analyze our existing data, draw our own conclusions, and clarify the message and theme (i.e. what you are trying to illustrate with the chart).

Then, for this purpose, select the chart that meets your goal from the existing chart information base or one you know of. Finally, I started to make charts, beautify and check them until the final chart was finished.

Common types of data

For better visualization, we divided the data into four categories: time series data, classification data, multivariate data and spatial data.

Time series data

Time series data, also known as time series data, refers to the data column recorded by the same unified index in time sequence. For example, the number of new users per month, the annual sales of a company in the past ten years, etc. The data corresponding to such indicators recorded in chronological order is called temporal data.

Categorical data

Classification data is the data that indicates the category of things. For example, users’ devices can be divided into Iphone users and android users. Payment methods can be divided into Alipay, wechat and cash payment. The data resulting from such classifications is called classified data.

Multivariable data

Data is usually presented in the form of a table, in which there are multiple columns, and each column represents a variable. This data is called multivariable data, which is often used to study the correlation between variables. It is used to find out what factors affect an indicator.

Spatial data

Spatial data refers to the data used to represent the location, shape, size and distribution characteristics of spatial entities. It can be used to describe objects from the real world, and it has the characteristics of positioning, qualitative, time and space relations.

Spatial data is a kind of basic spatial data structure such as point, line, plane and entity to represent the natural world on which people live.

Data visualization of each chart using scenario classification

More classes

The visual approach shows the differences and similarities between values. The length, width, position, area, Angle and color of graphs are used to compare the size of values. It is usually used to show the comparison of values between different categories and the comparison of data at different points in time.

Chart list: Bar chart, bubble chart, bidirectional bar chart, bullet chart, Color block chart, funnel chart, histogram, K-chart, Mosaic chart, Grouped bar chart, Radar chart, Jade Jue chart, Nightingale Rose Chart, Spiral chart, Cascade Area chart, Cascade bar chart, rectangular tree, word cloud.

Distribution of the class

Visual methods show frequency, data scattered over an interval or grouped. The distribution of data is represented by the position, size and color gradient of the graph. It is usually used to show the distribution of values on continuous data.

Chart list: box diagram, bubble diagram, color block diagram, contour line, distribution curve, dot plot map, thermogram, histogram, scatter diagram, stem and leaf diagram.

Process class

Visual method to show process flow and flow. The general process will present multiple links, and there will be corresponding flow relations between each link. This kind of graph can well represent these relations.

Chart list: funnel diagram, Sankey diagram.

Of class

Visual methods to show the proportion of the same dimension.

Chart list: ring chart, Mosaic chart, pie chart, laminated area chart, laminated column chart, rectangular tree chart.

Interval class

The visual method shows the difference between the upper and lower limits of values on the same dimension. The size and position of a graph are used to indicate the upper and lower limits of a numerical value, usually to indicate the maximum and minimum value of data at a certain classification (point in time).

Chart list: dashboard, stacked area chart.

Association class

A visual method to show the relationship between data. Use nesting and placement of graphs to represent relationships between data, often to indicate sequential, parent-child, and correlation between data.

Chart list: arc length link diagram, chord diagram, Sankey diagram, rectangular tree diagram, Venn diagram.

Trends in class

Visual analysis of data trends. The position of a graph is used to show the distribution of data in a continuous area, usually to show the law of the size change of data in a continuous area.

Chart list: area chart, K-plot, Kaggi plot, broken line chart, regression curve, cascade area chart.

Time for class

A visual approach to displaying time-specific dimensions of data. The position of a graph is used to show the distribution of data over time. It is usually used to show the trend and change of data over time.

Chart list: area diagram, K diagram, Kaggi diagram, broken line diagram, spiral diagram, cascade area diagram.

The map class

A visual approach to displaying data over a geographic area. Using the map as the background, the geographical location of the data is represented by the position of the graph, usually to show the distribution of data in different geographical areas.

Chart list: bubble map, hierarchical statistics map, dot trace map.

The chart component

Data-based components include visual cues, coordinates, scales, background information, and any combination of the first four.

Visual cues

The idea that by looking at a chart you can relate it to the subconscious mind and get the consciousness that the chart represents. Commonly used visual cues are: location (height) (size), length (length), Angle, direction direction (up or down), shape (different shapes to represent different classification), area (size), volume (size), saturation (the strength of the color, is the depth of the color), color (different color).

Coordinate system

The coordinate system here is the same as the coordinate system that we used to see in math, but the axes may have a slightly different meaning. Common types of coordinate systems are: rectangular coordinate system, polar coordinate system and geographic coordinate system.

scale

The three coordinate systems mentioned above only define the dimensions and directions in which the data is displayed, while the ruler is used to measure the size in different directions and dimensions, which is similar to the scale we are familiar with.

Background information

The background here is the same concept as the background we learn in Chinese, which is to explain the relevant information of data (who, what, when, where and why), so as to make the data clearer and easier for readers to understand.

Combination of components

The composition component is the combination of the above four pieces of information according to the intended use, which is the final diagram style we will render, depending on your goal.