It will be half a year soon, the boss said, we should make a good plan, in the future, our direction is not machine learning, but artificial intelligence. I don’t remember the exact words, but I remember it was really hard to hear, so I said right there and then, what is machine learning? What is artificial intelligence? Girl BD said positively, “I know, I know,” and drew three squares on the blackboard. Something like this:
This is also not the original picture, something like this, but it’s a very simple relationship. (Memory is such an unreliable thing!) I don’t think so. The first picture that came to my mind was the picture of the title, and I briefly drew it on the whiteboard. Everyone expressed their opinions, which caused laughter. Although this thing is past, I, in this scene, do not have “past”, because I am a person who struggles with concepts.
Many of them are themselves scholars and researchers in this field. That is to say, even the professionals have different opinions and are not convinced by each other. It is so funny that there was even an article, Battle of the Data Science Venn Diagrams [1], which collected a lot of Venn Diagrams and did not intend to reprint them one by one. You can view them by yourself.
Pick some pictures that interest me personally.
This diagram, also known as the title diagram, is said to be from SAS’s data mining foundation course in KDD1998* [2]. Many Slides in these fields are analyzed and quoted by some technical personnel. SAS is one of the world’s largest software companies and a leader in business intelligence and data analysis software. Therefore, their opinions cannot be said to be authoritative.
In this picture, Data Mining is the core. Of course, because this is the lecture note of THE KDD conference, there is a circle of KDD in the picture, ha. Pattern Recognition and Neurocomputing are two things I have absolutely no idea about, so I won’t discuss them. What is interesting in this graph is the relationship between Statistics, Machine Learning and artificial intelligence. From the point of view of SAS, artificial intelligence completely includes the content of machine learning, which is basically a sub-discipline of artificial intelligence. Statistics has almost no intersection with these two, but if you look at it more closely, it’s not completely irrelevant, completely irrelevant is an expression like database, which is far away, that is, machine learning and artificial intelligence, which has to do with statistics, just a little bit. Ha!
There’s another diagram, derived from this diagram.
The author of this chart, Brendan Tierney, who looked it up, is an industry veteran, used to work at Oracle, is now a consultant, a lot of Quroa answers, a lot of industry articles, a lot of references to this chart, which is not a boundary question per se, as you can see in the title, But it also caused a lot of discussion (hehe). Brendan blogged in 2012 [3] and came back in 2016 to say that the Venn diagram in the middle, which is a very common graph in data mining (I can’t find the original one), was quoted casually and he thought it was derived from the KDD1998 graph above (I don’t think it is), He added the outside stuff that was what he was trying to say.
In this image, the intersection of data mining and pattern recognition has been removed, and data has been replaced in the middle. Unfortunately, visualization has been added, which is really… A blog post from 2010 [4] uses this graph to differentiate between data science, data mining, and machine learning, which is also worth a look.
The Venn diagram below defines what data science is.
This is drawn up by a fellow named Drew Conway, a sort of opinion leader in the New York tech community. Wikipedia profile of him [5], he actually became famous for drawing the above graph, and was included in Wikipedia. Another factor is the application of big data to terrorism research, of course, drawing this graph is ahead of terrorism research, can you understand?
According to Battle, this chart was created in 2010, but was only published on the blog in 2013 [6]. The blog’s filing time and claimed publication time do show both points. This is said to be the image that inspired the Battle, the first image in the Battle text.
Two things stand out about this chart. The first is Substantive Expertise. The second is Danger Zone! “, ha ha, the former sees the feeling is “grass, what ghost, this English I don’t know ah”, the latter feels is “mysterious ah!” . Don’t be embarrassed that you don’t understand English, the Battle writer also feels that these two words are not well chosen, which is very fucked up. all I can say, is if Conway meant something other than what I would call domain knowledge (e.g. physics), And of course the significant possibility is that none of us is as educated as Brother Conway. I have a PhD in Political Science from New York University.
Uneducated, skilled workers then modified Conway’s picture to make it a bit cuter [7].
This one is a little bit nicer to me. Of course, it is worth affirming that the three circles are easier to understand with my Cet-4 English ability. But what the hell has data science been removed from the center? And what the hell is in the middle? All right. “Mathematical and statistical knowledge” replaced by “quantitative methods”? “Hacker power” replaced with “computer science”? The substitution method of these two can basically be judged, and the third one is no better, so this guy is not included in Wikipedia because of this picture. The question mark, it was said, was not acceptable to the danger zone, so it was replaced with a question mark.
Look at a summary of the figure [8].
They just added a circle called “Evil.” Of course, there are a lot of what ghost English, I can understand not much, is bond brother also cameo?
Or indulged in the Battle of data science, helpless. This picture is also from that article.
Big data, data mining, machine learning and artificial intelligence are all mentioned in this picture, which should be the most relevant one. I have no choice. [9] It feels like this image, which offers a whole new way of looking at things, cuts away the more subjective and fuzzy stuff that Conway is famous for, and leaves the objective, data-related, technology-related stuff, a few terms, really comparable on a level. The article claims to have solved one piece of the Puzzle.
The article differentiates the meanings and brief history of several terms, and also provides another picture that I like very much.
In fact, this is my favorite picture so far, which not only provides the author’s explanation of the connotation and extension of concepts, but also explains the relationship between different concepts. That’s great!
Another article I like is this one [10], which differentiates the similarities and differences between machine learning and statistics. The paper argues that statistics and machine learning share the same goal (in this case) of “what can we learn from data”, but differ in their methods. The picture is also quoted in this article. One view cited in the article is that machine learning methods don’t require any presupposition, don’t care about the internal relationships of variables, and just throw it into an algorithm, which is more like a black box. The more data you have, the better your prediction. Machine learning methods are usually applied to high-dimensional data sets.
Statistics, in particular, focuses on the way the data is collected, the distribution of the various attributes of the sample, and you have to know exactly what you’re doing and exactly what variables are available to provide predictive power. Statistical methods, usually applied to low-dimensional data sets.
Of course, given our current work, it’s easy to see that even though we’re using machine learning methods, we’re very concerned about the areas that statistics requires, and we’re also concerned about the way data is obtained, the distribution of attributes, and so on. So, as the paper concludes, the distinction between the two is getting smaller and may become harder to distinguish in the future. (I added this one, which I thought would be the case in industry.)
There’s another one I like, too.
This picture introduces the whole process of machine learning [11]. I think it’s also very important.
Giiso Technology, founded in 2013, is the first domestic high-tech enterprise focusing on the research and development of intelligent information processing technology and the development and operation of core software for writing robots. At the beginning of its establishment, the company received angel round investment, and in August 2015, GSR Venture Capital received $5 million pre-A round of investment.
conclusion
Basically, the boundaries of what data science is all about are blurred. For an emerging discipline, this is no surprise. One thing we do know is that this is a very comprehensive interdisciplinary subject. Big data is also a broad concept. Machine learning and deep learning are relatively clear concepts of connotation and denotation, which are generally acknowledged to have inclusion relationship. Ai is a much larger category, but ai is not the same thing as big data or data science.
Giiso Information Technology All Rights Reserved www.giiso.com