Your GitHub blockbuster project may not even be asked

Does a resume full of successful projects really count with an interviewer? Experience says: not necessarily. Haebichan Jung, project director and Recurly Data scientist at Towards Data Science, recently posted about his experience. He says that doing lots of projects and doing well can be useful at the resume screening stage, but the interviewer may not care about your project at all and will decide whether you stay or leave through an “intelligence test”.

Selected from Towardsdatascience, by Haebichan Jung, Panda W, Zhang Qian.

Project mentality

How do ambitious data scientists get high-paying jobs? There’s a big misconception that it has to do with the project.

The “project” in question is some kind of latest machine learning or deep learning algorithm on Jupyter Notebook that will be uploaded to GitHub. You may want to get a good score from the interviewer.

But guess what? The interviewers won’t actually read much of your personal project code. If you believe projects are important, you have what I call a “project mentality.”

Project mentality :(noun) the more machine learning projects you have in mind and the more projects you have on your resume, the better your chances of landing a high-paying data science position; But the truth is, it doesn’t make many people think you’re awesome.

Why do you say that? Because THAT’s what I used to be. I’ve spent a lot of valuable time working on various projects to expand my “resume,” some of which have been endorsed by some of the most prominent figures in the data field.

But now, as a data scientist in San Francisco, I see that I did the wrong thing, and what’s worse is that many others will follow in my footsteps. The purpose of this article is to give you a sense of how much your project will help you (early warning: not much).

PS: Remember, I only applied for a data scientist position in San Francisco, California. So my opinion may not be consistent with your geographic location or the position you are applying for. And that’s just me (actually two people, more on that later). But there’s something universal about this story, because I’ve seen so many people around the world fall for the (false) appeal and potential of “projects.”

The project was on fire, but the interviewer didn’t care

Before the interview, spend a few weeks working on a project

Before I applied for a data science position, I spent 4-5 weeks working on my own project because it felt like the right thing to do at the time. As a professional pianist, I want to do something about music. This brought my attention to neural networks, especially LSTM, which I wanted to use to generate new music.

I spent a full two weeks reading academic papers on the subject and, looking back, I understood about 30 per cent of them. But there’s something really troubling about that 30%. I don’t think some of the researchers working on AI-generated tunes have a deep understanding of the basics of music. You can tell because they use very complex neural network architectures to create new sounds, but those architectures don’t reflect the way real musicians compose music.

Examples of academic research using LSTM to create music

This bothered me so much that I decided to build an algorithm from scratch based on a hidden Markov model. I wrote about 800 lines of pure Python code and developed my own music generation algorithm. I call it Pop Music Maker: github.com/haebichan/P… .

Pop Music Maker takes musical data as input, breaks down musical notes, looks for statistical relationships between those notes, and then recreates a new Pop song based on those statistics.

The architectural foundation of Pop Music Maker

The site crashed because the project was too popular

I posted an article about my project at TowardsDataScience.com. Within a few days, the article had gone viral. Thousands of people read this article every day, especially after someone posted my article to Hack News. By the time I realized the article had gone viral, it had spread across Twitter and LinkedIn. Then Numpy creator and Anaconda founder Travis Oliphant and O “Reilly Media’s Ben Lorica both shared my content on their social Media feeds.

As my project becomes more known, hundreds of people use my algorithms every day through a flask website I’ve set up. This caused my site to crash repeatedly because the AWS EC2 instance I deployed the code on was too small to handle the volume of traffic. Some people on the Internet started accusing me of being a fraud because they tried my algorithm, only to find that the site didn’t work.

This is my article on hot: towardsdatascience.com/making-musi…

It wasn’t long before the criticism exploded into a full-fledged debate on many social media sites. Some researchers with PHDS have angrily pointed out that my Bayes-based approach is simply wrong. Others defended me and defended my work (including Ben Lorica). In short, I’ve reignited the Bayesianism vs the frequency statistics school of jihad in some parts of the Internet.

Initially, I decided to apologize to those who felt offended for some reason, and I politely asked them how I could improve my methods. But after a few days of apologizing, I couldn’t take it anymore. The debate was exhausting, and I just wanted to avoid the Internet. I turned off all my electronic devices.

However, no one asked me about the project when I applied

You might think that, controversial as it may be, having this project on my resume would help me land a job in data science. But it turns out: No. No one cares except one guy at a small startup who asked me. In the grand scheme of things, the blazing fire I was facing was just a spark blown by the hurricane of technology in the Bay Area.

What’s more, the members of the hiring committee didn’t test me on these projects. Because the hiring process isn’t about how many projects you’ve done. But I see a lot of candidates for data science jobs thinking the same way.

I’m not alone. In an interview with DoorDash’s Jeffrey Li, he shared some of the downsides he sees in ambitious data scientists:

“In most data scientists, I have seen the biggest drawback is that the machine learning model with commercial effect. So, a lot of very, very smart people will create the five layers of this very complex neural networks. It can make a good prediction of scores is also very high. But when we explore the specific model of commercial effect, they often struggle to answer.”

If data science recruitment is not project-based, what is it? Based on what recruiters call “Intelligence Testing.”

The infamous “intelligence test”

I don’t like the word “intelligence” because it implies physical aptitude (you either have it or you don’t). Unfortunately, I see it a lot (and secretly) in the tech recruiting world. I always hear someone say in some form behind my back, “That person is not smart enough for this technical job.” I first heard it from software engineer friends in California’s Bay Area.

Before, that seemed to me oppressive and empty. But after thinking long and hard about the term “intelligence” used in technology, I’m beginning to understand what it actually means. When I learned what it meant, I realized it had nothing to do with “biology” — that is, anyone can improve by being prepared. More importantly, I discovered the secret to successfully passing a data science interview.

Intelligence tests are the foundation of all hiring processes. This is the basis for technical question tests, take-home development tests, and interview questions. Intelligence tests have four main elements, namely:

Analytical thinking
Extract variable
Edge case detection
Process optimization

The first three are the most important, and having a fourth would be the icing on the cake. And when recruiters know something about you in the first three areas, they’ll ask about the fourth. All four areas are designed to understand your potential and capabilities in future technical positions.

A quick note: The following four skills are important, but so are understanding statistics, writing code, and SQL. I think it’s an obvious thing that everyone knows, so I won’t get into the basics here.

Analytical thinking

Analytical thinking is the ability to break down a big problem into its easily solvable parts. In a nutshell. It’s about creating a mental road map with multiple checkpoints to arrive at the final solution.

This part of intelligence can be measured either through practical coding puzzles or theoretical business/product problems. The interviewer will present you with a question that, at first glance, feels open. This is intentional, because the answer to this question is not the purpose of the test. So it doesn’t really matter if your solution actually works. The point of this question is to assess your ability to coordinate a multi-step plan to solve a complex problem.

Why test this ability? Because in actual data science work, some problems are very complex and difficult to be solved by a one-step approach. A strategic road map must therefore be developed, outlining the advantages and disadvantages of each step in terms of its impact on the business and technical solutions. To do this, the data scientist must have a flexible, strategically minded mind to come up with effective solutions with identifiable checkpoints.

For candidates who want to improve this skill, solve as many Leetcode problems as possible. Also read questions about data science products. Here is an example of a product problem:

A food delivery company is launching a new app with a new UI. The goal is to boost delivery workers’ earnings by increasing their miles. Please suggest a testing strategy to see if the new app is better than the old one.

Extract variable

Variable extraction is how many related variables can you come up with to solve the problem at hand. For example, give you this scenario: “There are two elevators in a building, and some people complain that one is slower than the other. What do you need to determine if these complaints are justified?”

This kind of thought experiment is usually done by product/non-data people who don’t know much about data science and want to get a sense of your “intelligence”. Intelligence here refers to your ability to come up with the variables that solve the problem (the ones the interviewer himself thinks of).

But how do you think the same as a stranger? But the good news (I think) is that 99% of the variables in these experiments fall into one of these categories:

1. Time (Does rush hour affect the speed of the elevator?)

2. Location (Maybe some floors have more elevators than others?)

3. Technology (Maybe there is a technical problem with elevators, outside of one’s internal perception.)

4. User statistics (Who are there in the building? Do visitors use one elevator and workers use another?)

Why is variable extraction important? Because it goes right to the heart of the experiment. Running an experiment requires relevant variables to test, and if you can come up with more appropriate variables to improve the accuracy of the test, even beyond the interviewer’s imagination, this skill is extremely valuable.

You can improve your intelligence in this area by studying as many different kinds of data as possible, such as temporal data, geographic data, and so on. Anything that expands your knowledge of data in different areas of knowledge is worth a try.

Edge case detection

The edge case test usually comes after your interviewer has covered the first two items. Having learned enough about the first two types of intelligence, the interviewer will give you a hard time. He or she will somehow completely overturn the road map and variables that you came up with to solve the problem.

This is a difficult part of the interview process because you will feel uneasy because holes have been found in your logic. You need to calm down and listen carefully to the hints your manager throws at you. Usually, they already have some answer in their head, and you have to find a way to find it. They drop clues that remind you to find the answer in their head.

They create puzzles like this that take you out of the flow of your thoughts in order to see how well you handle situations you’ve never encountered before. In fact, there are many more edge cases in data science workflows that you might not think of, especially when developing products.

How to practice? This is really hard to practice. When this happens, take a deep breath, ask questions, figure out what you need to do, and follow the clues.

Process optimization

This last item is optional and usually comes at the end of the technical interview if there is enough time. This skill is based on the first test of intelligence (analytical thinking). Once you come up with a particular way in your head, the manager will ask you if you can think of a better way to solve the problem.

Why do you do that? Because all data science work in the industry starts out rough and takes many iterations to improve. But this work can only be done once the first rough version is complete. So I don’t think it’s as high a priority as the first three.

Where exactly is the project useful?

I believe the program can be useful in the early stages of job hunting. In my opinion, the project can solve the problem:

1. Build confidence. Many people see completing a project as a necessary pre-requisite (an inner sense of ritual) before applying for a corporate job.

2. Practice variable extraction and optimization. Projects allow you to experiment with many different types of data. Allows you to experiment with workflows to optimize data processing, and so on.

3. Give you a chance to win over the initial recruiter. The initial recruiter’s job is not to conduct an intelligence test, but to screen candidates for the interviewer and then have the interviewer take the test. Projects also let the initial recruiter know about your enthusiasm and commitment to data science. Projects can help you show this well.

But after the initial screening, the importance of the project is not high. There are three reasons:

1. Projects don’t help you pass technical quizzes.

2. Projects don’t provide external validation of your potential as a data scientist — only that you can copy or remember existing code very well.

3. Interviewers don’t have time to read pages and pages of your notes. They process hundreds of applications every day. They also have to manage their own teams — which is enough to take up all their working hours.

The last point is that the external validation you need is the work you’ve done. In other words, getting an 83% AUC on your machine learning project doesn’t give the interviewer much insight into your potential as a data scientist. But if you say that hundreds of people have tried your machine, you have a much more powerful presentation.

Still not convinced? Finally, listen to the director of The Data Science Institute at Columbia University:

Director Jeannette Wing: “There are certain problem-solving techniques and methods that computer scientists do every day. And they include:

1. How do you design an algorithm to solve this particular problem

2. How can I break this particular problem down into smaller parts

3. How do you define an abstraction layer

4. How to define interfaces between components

It’s a collection of techniques for solving problems, but also for implementing big systems and solving big problems — that’s what I mean by thinking like a computer scientist.”

I also asked a senior data scientist at IBM, “What is the most important skill you have as a data scientist?”

He replied: “The one thing that everyone at IBM has in common is that they are consultants. They need to be able to work with customers. They need to meet with senior executives and talk intelligently about solutions.”

The original link: towardsdatascience.com/sorry-proje…

Your GitHub blockbuster project may not even be asked

Related Posts

【Redis】 Basic data type common operation – Set

Combat notes | JDBC problem location guide

HBase Basic Commands