Figure Eight The 2018 Data Scientist Report The 2018 Data Scientist Report
Introduction to the
Figure Eight has been tracking developments in data science in recent years, and a lot has changed in the data science community since the last edition of the Data Science Report was published in 2015 (when we were still CrowdFlower). Machine learning is booming and requires more and more data to support it.
Today, the Internet churns out more than 100 terabytes of data a day for data science and machine learning analysis. As a result, data science and machine learning are also among the fastest-growing jobs on linkedin.
Another trend that has emerged since 2015 is that the data science community is more ethical than ever, with data privacy issues becoming more visible. As artificial intelligence is used to make decisions in areas such as medical diagnosis and legal sentencing, these ethical issues need to be argued more carefully.
It is important to know what practitioners in various fields think about cutting-edge technologies. We surveyed more than 500 ethics experts, including health care professionals, clergy and law enforcement officials.
The rest of this report will be devoted to contrasting the views of ethicists and data scientists.
Without further ado, start reading the findings of this report.
Data scientists love, and love, their work
Data scientists who consider themselves happy and very happy
I believe many people have heard the saying, “do what you love, but also earn money, consider success”. Assuming that’s true, it’s hard to find a more successful career than data scientist.
We’ve been tracking this issue for a few years now, and we’ve found that data scientists love this line of work, even though real data scientists might dispute that a 1% increase isn’t statistically significant.
Love data science? Don’t miss your chance
In recent years, data and data science have brought a lot of hot topics, Google ARTIFICIAL intelligence expert Peter Norvig once proposed the famous “irrational effect of data” theory, Harvard Business Review called data scientist “the sexiest job of the 21st century”, the Economist magazine even said that “data is the new oil”. Most of you remember how big data became a global phenomenon overnight.
Market demand for data scientists
How often you receive job offers
While data science is hot today, it’s worth remembering that it wasn’t. After all, just a little over a decade ago, most companies didn’t track and store user interaction data at all. Today, these same companies jealously collect such data as their core assets.
As servers become cheaper and it becomes possible to store large amounts of data and information at low cost, most companies are realizing that data can achieve many previously unimaginable goals.
With so much data to crunch and so much desire to create value for the company. It’s no surprise, then, that data scientists are in high demand.
We asked data scientists how often they typically receive new job referrals, and the data below speaks volumes. About 50 percent of data scientists receive job offers once a week, 30 percent receive offers at least multiple times a week, and 85 percent receive offers at least once a month.
In other words, elite data scientists are in high demand. So if you have a high-caliber data scientist at your company, make sure you coax him, because he has a lot of options.
What’s holding data scientists back, data, not science
Let me tell you a little secret about data scientists, who are very insatiable. That’s not to say anything bad about them. In fact, many data scientists send us some pretty nice gifts during the holidays. But when it comes to data, no matter how much they already have, it’s never enough.
We’ve been working in the data science community for several years now, and this remains the biggest challenge in the community. Last year about 50 percent of data scientists said it was one of the top three headaches in their day job, and this year that number has grown to 55 percent, ranking it among the top headaches.
Data experts are well aware that only with large quantities of high-quality data can they build accurate models and make smart decisions. The more high-quality data they have, the more confident they are in their models.
The only thing a company can do for a data scientist is provide data, and the quality of the data a machine learning team has can make a huge difference in the results of machine learning, which is Paramount.
But keep in mind that data scientists want high-quality data, and years of research have shown that data scientists don’t really like cleaning data, either. They think it’s a waste of their lives.
Machine learning using data
Before, we never asked data scientists what they were actually doing with data. But as our platform has grown, we’ve been able to unlock some of the mysteries of machine learning, and more and more data is being passed directly from our platform to various AI and machine learning projects. Then we thought, shouldn’t we ask these data scientists what percentage of their work is actually devoted to ARTIFICIAL intelligence?
About 10% of data scientists say their work is unrelated to ARTIFICIAL intelligence. Still, almost 40 percent said their work was related to ARTIFICIAL intelligence.
Given how much the investment community is spending on AI right now, we’re particularly excited to see what that number looks like next year. However, we believe it will get higher and higher.
It’s no wonder data scientists are happy because they don’t have to clean logs and work on the most cutting-edge technology solutions in the company.
How much r&d? How much r&d?
What tools do data scientists use?
In 2015, we focused on what tools data scientists use. At the time, Excel was still the dominant tool for processing data, but there were plenty of data tools and processing options available to data scientists. In fact, Partially Derivative mentioned the problem in an episode of its “Weird Data Science” podcast.
Their point is that data science is such a new field that no single language, tool, or framework can become mainstream, that even now it’s hard to say which tools are the best, and that data scientists must be extraordinarily creative in figuring out the best tools and strategies to tackle the data science projects at hand.
Machine learning is much the same as data science was back then. There is no universally accepted strategy, but there are many options for solving previously intractable problems. However, the majority of the data science community (about 61%) now chooses Python. However, most of the common Python libraries listed below are not machine learning frameworks.
Open source software dominates these tools and frameworks. Pandas and NumPy have been around for a long time, as well as sciKit-learn and Matplotlib, both of which are older Python libraries.
Although Developed by Google, TensorFlow is also open source software. One caveat here is that it’s not just numbers, but on the other hand, the sheer number of users of these tools speaks to the data science community’s enthusiasm for open source and community-driven software.
Because these frameworks have been around for so long that early adopters are already familiar with them, it will take a lot more time, effort, and marketing effort, for example, to replace the old open-source software.
What data did data scientists crunch in 2018?
This year, the media has focused on machine learning projects like self-driving cars or home assistants, but it’s important to realize that the vast majority of data scientists are not processing lidar and audio speech data.
We interviewed a number of data scientists and found that text and time series data dominate their daily work. Perceptrons, audio, and video data are rarely involved, with still images being a relative fourth.
How much structured versus unstructured data is handled?
Ethical issues in data science
In recent years, the ethics of ARTIFICIAL intelligence applications have been under fire, with numerous cases of algorithmic discrimination in sub-areas such as face recognition, hiring reviews and voice assistants. Last year, the Supreme Court had a chance to take up a case on algorithmic sentencing (see Loomis v. Wisconsin), but it did not take up the case, so it is reasonable to assume that a case on machine learning could emerge within a decade.
This paper is not concerned with the long-term, science-fiction, or even ethical questions about the boundaries of consciousness, such as future agents and universal intelligence. It is concerned with the practical issues in areas that are of real concern to the public today.
As mentioned earlier, the survey interviewed ethics experts from various professions, including health care professionals, clergy and law enforcement officials. In this section, we’ll contrast their views with those of data scientists.
Generally speaking, data scientists are bullish on the growth of ARTIFICIAL intelligence. Both groups agreed that AI would do more good than harm, and the biggest difference between them was that ethicists were understandably indifferent to the potential challenges ai might pose to society. After all, everyone knows that data scientists must know more about ARTIFICIAL intelligence than judges do.
Data scientists are in this field and have invested a lot of energy in the development of AI, so it’s impossible to say that they don’t think AI will revolutionize society.
Still not admitting algorithmic discrimination?
In the last video, we talked about some very well-known cases of algorithmic discrimination. In fact, the Massachusetts Technology Review recently noted that “algorithmic discrimination is everywhere, but no one cares.”
But when we asked data scientists and ethics experts if they thought AI was more prone to discrimination than humans, we got this:
In fact, we all know that the question of whether technology is more discriminating than humans is pretty funny, based on what you know about human nature. Ultimately, algorithmic discrimination comes down to human programmers, data, and some weird reasons.
What’s interesting is that a lot of the feedback is that algorithms don’t discriminate that much, or even at all, but anyway, we do have a lot of examples of algorithmic discrimination in the real world.
The real question we have to answer is why did this happen? Remember that in most cases, it’s not the algorithm model itself that’s the problem, but the data the model uses.
The discrimination of algorithm model is latent and unconscious, but it is real. To solve this problem, a lot of energy should be spent and appropriate medicine should be applied. First, data annotation should be responsible and impartial. Then, iterate over the model by constantly updating the data; And put yourself in the position of the end user.
What can AI do in the real world
The vast majority of Internet users now use AI every day. Product and entertainment content recommendations, search engines, news recommendations, you name it: machine learning has spread to more and more areas.
How can I put it? In fact, most data scientists think it’s normal for AI to be involved in decision-making. The more complicated things get, the more uncomfortable they become for data scientists.
The use of ARTIFICIAL intelligence has been successful, though, in some marginal situations. But when it comes to big, critical questions, the results of AI so far are not enough to give an affirmative answer. Suffice it to say that data scientists just don’t have the appetite to apply AI to every corner of society. If ai experts are pushing for more robust or sensible solutions, you’d better sit back and listen to what they have to say.
Ethics: Artificial intelligence decision making
Which of the following situations can an AI make its own decisions without human intervention?
Artificial intelligence or not, that’s the question
Every day that passes from now on, audio interfaces are becoming more and more popular. Comscore predicts that 50% of all searches will be voice searches by 2020. Even now, there are almost a billion voice searches a month. But even the most advanced voice assistants still struggle with the voices they encounter every day. The problem is especially acute when speakers do not speak their first language, or have an accent or dialect.
On this issue, we specially consulted related data scientists, want to know if the voice assistant product launch family, but this kind of product can’t understand very well the accents and dialects, is still to adhere to the product launch, is said to be on the product label warnings, which remind people does not apply, or whether there are relevant provisions will limit the products sold in some areas.
And frankly, we want the data science community to launch these products. Because after all, only by selling these products can we collect more audio speech data, iterate on the data model of the product, and improve the recognition effect of such products so that they can understand more users’ speech. But the survey didn’t tell us what we thought.
While we were surprised by these results, they are consistent with previous research, and the data science community is very cautious about using AI. They figure out what they like, and then they do it. If you think back to the data science community’s love of open source platforms and open source data, you can see why they made the choice they did.
For autonomous driving, the two sides differ greatly
We asked ethics experts and data scientists a very simple question. If statistics show that the latest ai is, on average, safer than human-driven cars, would you want to drive yourself? Or would you rather drive a self-driving car?
For the rest of the report, the responses from both groups were largely similar, and overall, they agreed that AI would do more good than harm. Even for some sensitive AI products, it’s just a matter of identifying who works and who doesn’t. For example, people generally accept AI-powered product recommendations and have reservations about AI-powered loan approvals or case decisions.
But when it comes to autonomous driving, the two groups were sharply polarized, which only suggests that data scientists know more about how driverless technology works than clergy do. But we really didn’t expect the polarization to be so severe. It’s hard to explain why the two groups responded so differently, but if you’re in the self-driving car business, you should know by now who you’re marketing to.
Autopilot, or manual?
Reports the background
This year, we interviewed 240 data scientists via email and in person.
To obtain the 2015 edition of data Science report, please download it from the resource center on our official website.
Company profile
Figure Eight is a human-looped AI platform for data science teams. We provide high quality custom training data for our customers’ machine learning models, as well as ai models that are easy to deploy and easy to use, and workflows that integrate human-machine loops.
Our software platform supports many business types, including self-driving vehicles, personal intelligent assistant, medical image classification, content classification, customer support ticket classification, social data analysis, CRM data supplement, product classification and search correlation analysis.
Headquartered in San Francisco, our investors are Canvas Ventures, Trinity Ventures and Microsoft Ventures. Figure Eight is a fast-growing, multi-industry, data-driven company serving data science teams at Fortune 500 companies.