How do you effectively learn seemingly endless, esoteric and constantly updated data science knowledge? Hopefully, after reading this article, you can get some help.

confusion

On Friday afternoon, I held a group meeting for my graduate students. The theme is workshop teaching, trying to build my first deep neural network.

The reference is my article “How to Find Departing Customers with Python and Deep Neural Networks”. . I took my students from downloading the latest Anaconda installer to finishing the first neural network classifier.

The process involved programming virtual environments, which they referred to in “How to Use Python Virtual Environments in Jupyter Notebook?” In this article, I have mastered how to install software packages and execute commands in virtual environment.

I ask them to ask questions as soon as they come across them. When I was helping out, everyone gathered around to look at the solution to improve efficiency.

I introduced the hierarchy of neural network to the students and used Tensorboard to visualize it. They didn’t quite understand the difference between neural network and traditional machine learning algorithm (they had heard about it and had an impression of it during the oral defense), so I took them to play with the deep learning experiment field.

They were excited to see what had been a silly straight line wound into a curve, then go from open to closed, separating points on the plane from inside to outside. I also recorded a video and posted it on wechat moments.

Delighted, one of my students asked me with concern:

Teacher, I can run out the sample now, but there are many contents in it that I don’t understand yet. How do you learn so much?

I think that’s a very good question.

For non-IT graduates, especially “liberal arts students” (as defined here), there is a lot of knowledge and skills that need to be added to the data science approach to graduate school. A lot of them are anxious about it.

But anxiety doesn’t help. It doesn’t give you any improvement. Learning how to break things down and deal with them is how you make progress.

In this article, I’m going to talk to you about how you can learn data science more effectively from seemingly endless, esoteric, and constantly updated knowledge.

Many readers have left me messages asking similar questions. So I share with you some of the advice I’ve given to my own students in the hope that it will help you too.

The target

You feel lost in the ocean of data science knowledge because you’re using the wrong learning model.

Ever since you were in elementary school, you’ve been used to thinking of what you have to learn as a tree of subject knowledge and systematically working your way through it. If the front of learning is not good, it will inevitably affect the understanding and digestion of the content behind.

Knowledge tree learning must also be fully covered. Otherwise, when you take the test, once you look at what you don’t know, you will lose marks.

The process of learning, there are teaching syllabus, teaching materials and teachers to be responsible for feeding you step by step, and urge you to continuously preview, study and review.

Now you’re suddenly alone in a new field of study. Without the syllabus and the teacher’s direction and progress, and with so many textbooks, it’s hard to know which one to read.

In fact, if data science is a frozen, static collection, and you have an infinite amount of time to learn it, it’s fine to learn it the old way.

The reality is, your time is limited, and data science knowledge is changing fast. This year’s hot spots may be out of fashion next year. Commenting on the different machine learning frameworks, Deep learning expert Andrej Karpathy says:

Matlab is so 2012. Caffe is so 2013. Theano is so 2014. Torch is so 2015. TensorFlow is so 2016. 😀

What to do?

You need to be goal-oriented to learn.

For example, in the paper you are writing, you need to classify data. Then you work on the classification model.

The classification model belongs to supervised learning. In traditional machine learning, KNN, logistic regression and decision tree are all classic classification models. If you have a lot of data and want to use a more complex and accurate model, you can try deep neural networks.

If you need to recognize images, you need to learn Convolutional Neural Network in order to efficiently process two-dimensional graphic data.

If you want to do research, it is to find the right model for time series data, such as financial asset price movements. You have to pay attention to recurrent neural networks, especially Long short-term memory (LSTM) models. So you can use artificial intelligence to play with the stock market crystal ball.

But what if you don’t have a specific research topic right now?

It doesn’t matter. Can in the study, take the case as the unit, accumulate ability continuously.

In recent years, the development of data science courses on MOOC has become more and more case-based.

Platforms like Udacity, Udemy, etc., have traditionally been known for technical training. Even Coursera, which grew out of universities, loads its problem sets with real-world scenarios. Andrew Ng’s latest Deep Neural Network course is a good example.

The machine learning course at The University of Washington, which I recommended earlier, is even more aggressive in the first course, which is a complete case study of the main content of the following courses.

Note that in the first course, students don’t even know the technology (or even the terminology) involved!

However, when you run through the code and produce the results, will you really get nothing because you don’t understand and master the details?

Of course not.

At the very least, you’ve seen how this scenario can be successfully solved in this way. This is called cognition.

Here’s a tip: In life, work, and school, you’re basically competing with others in recognition.

Once you gain awareness, you can quickly get an overview of the entire field. Know which knowledge is more important to your current needs and prioritize learning.

A more effective way to “find goals” than case study is to participate in projects and get your hands on them.

Hands-on, iterative principles, in how to Learn Python effectively? And How to Teach Innovation? I have a detailed analysis of the article, welcome to refer to.

Let me give you a real example.

One of my third-year graduate students majored in business administration as an undergraduate. When I first entered the university of Michigan, I learned Python courses according to my requirements and obtained a series of certificates. But for a long time, he didn’t know how to apply this knowledge in practice, and he couldn’t write his thesis.

By chance, I took him to participate in another teacher’s research project, responsible for the technical link and doing text mining. Because of the practical application background and strict time limit, he learned very attentively and worked very hard. Previously learned skills are really activated at this point.

When the project was successfully completed, he came to me on his own initiative and discussed with me whether he could apply these techniques to the research of this discipline and write a small paper.

So we decided on the topic together and designed the experiment. Then I handed him the data acquisition and analysis, and he did it perfectly.

With these experiences, he realized the lack of data analysis in his graduation thesis, so he improved the analysis depth of his graduation thesis.

It happened to be Friday, the day of the workshop, that we received an official offer from the journal.

I could see he was excited and happy.

The depth of the

Once you set your goals, you know what to learn and what not to learn.

But the next question is, how deep and how detailed should I learn?

In To Borrow or not to Borrow: How to Use Python and Machine Learning to Help You make Decisions? In this paper, we try the decision tree model.

Applying the decision tree model is essentially calling a package.

from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train_trans, y_train)
Copy the code

In just three lines, we completed the training function of the decision tree.

Here we are using the default parameters. If you need to know what parameters can be adjusted, use the Shift + TAB key combination inside the function brackets to see a detailed list of parameters and know what the default values are.

If you need more detailed instructions, you can refer to the documentation. Search the skLearn Tree DecisionTreeClassifier in a search engine and you will see the following results.

Click on the first item to see the official documentation for the latest version of SciKit-Learn.

When you understand how each function works, the types and ranges of parameters that can be adjusted, can you claim to understand the functionality?

You don’t seem very confident.

Because you think it’s “knowing what is,” not “knowing why.”

But do you really need to learn more about how this function/function is implemented?

Notice in the function definition section of the figure that there is a link to source.

Click on it and you’ll navigate to the source code for the function, hosted on Github.

If you are a professional and want to study, evaluate, or modify the function, a careful reading of the source code is not only necessary, but a must.

But as a liberal arts student, you don’t have to go into such details if it’s just for application. Treat someone else’s well-written, well-received software package as a black box and use it correctly.

It’s like you don’t need to know the circuitry to watch TV. You don’t need to know the techniques and traditions of Sichuan cuisine to eat mapo tofu. As long as you can use the remote control and chopsticks, you can enjoy these benefits.

As more and more great software packages are created, the barrier to entry for data science is getting lower and lower. Even to the point of being denounced. This post, for example, screams that “low barriers to entry are ruining deep learning’s reputation!”

But don’t count your chickens before they hatch. I felt that I had at last come upon a subject in which I could take shortcuts.

You must lay a solid foundation.

Data science applications are based on programming, mathematics and English.

Mathematics (including basic calculus and linear algebra) and English are offered in many undergraduate programs. Liberal arts students mainly need to supplement, is programming knowledge.

Only by understanding basic grammar can you communicate with your computer without obstacles.

A ridiculously simple programming language that saves you a lot of learning time and lets you get your hands on applications.

In the programmer community, there is a popular phrase called:

Life is short. I use Python.

How simple is Python? In my class, an accounting undergraduate took a course in 24 hours in order to earn a certificate in basic Python syntax. This also includes problem sets, projects, and systematic time marking.

How to get started and master Python effectively? Welcome to How to Learn Python effectively? , I hope to help you get started quickly.

collaboration

Now that you know what to learn and how much to learn, let’s talk about the ultimate secret weapon to increase your learning efficiency.

That weapon is the power of collaboration.

The benefits of collaboration, it seems, are already well known.

But, in practice, too many people simply don’t do it.

Because we’ve all been trained too long to “do it alone.”

For example, communicating with others during the exam is called cheating.

But even if you get used to doing things on your own, you’ll have to face a real, harsh world where you can’t achieve much by yourself. You’ll have to learn to collaborate.

It’s like the Stark quote from Game of Thrones:

Memorable When the cold winds blow the lone Wolf dies and the pack survives.

Faced with screen programming, liberal arts students always feel lonely and helpless, as if they have been abandoned by the world.

This wrong mindset can make you anxious, panicky and give up easily.

The right concept can save you — you’re collaborating. And you need to be proactive and collaborate better.

The computer or mobile terminal in front of you is the collaborative effort of millions of people.

The operating system you use is the work of millions of people.

The programming language you use, and the collaboration of millions of people.

Every package you call is still the work of millions of people.

Collaboration doesn’t have to be a small group of people communicating and working together. Collaboration already takes place on a planetary scale.

When you download and use an open source package from Github, you establish a collaborative relationship with the package’s author. When you think about IT, these people are probably employed by large IT companies and are making six figures (us $) a month. Isn’t IT a rare opportunity to work with them?

When you post technical questions and get answers, you create a collaborative relationship with other users. These people are likely to be senior IT specialists who are charged by the second for consulting.

It is because of division of labor and cooperation that society becomes more efficient.

The same goes for data science. Why did Google, Microsoft and others open source their deep learning frameworks and give them to the world for free? It is precisely because they understand the ultimate meaning of collaboration, that the rewards of this seemingly silly loss are immeasurable.

This worldwide collaboration speeds up the generation of knowledge, makes the needs of users more clear and thorough, and makes the scope and depth of technology applications unprecedented.

If you’re in this collaborative system, you’re evolving with the system. If you’re unlucky enough to be outside the system, you’re left to watch others soar.

How do you work better with others in times like these?

First, you need to find partners to work with. This requires mastering search engines, q&A platforms and social media. Keep your awareness up to date, find better problem-solving tools, and ask questions of people who are more likely to answer your questions. Check Github and Stackoverflow regularly, and you can be surprised at what you get.

Secondly, you should master clear logic and expression. Whether it’s searching for answers or asking questions, logic helps you avoid missteps, and presentation determines the effectiveness and depth of your collaboration with others. For a detailed explanation, please refer to “Python Programming Problems: What should Liberal Arts Students Do?” .

Third, don’t just be a helper. Take the initiative to help others solve their problems, open source your code on Github, and write articles to share your knowledge and insights. This will not only help you save in your social capital account (which is a cash withdrawal when you need help), but it will also help you increase your awareness through feedback. The power of the crowd can correct your misconceptions and push you forward through “approval”, comments, etc.

Links to bring collaboration are out there.

You don’t know they exist. They’re unreal to you.

You know them, you master them, you use them, and the enormous benefits they bring you are real.

summary

We talked about goals, which help you figure out what you need to learn and what you don’t. You now know an effective way to find your goals — through project practice or case study.

We talked about depth, and you learned that most of the functionality is implemented by understanding the black box interface, not the internal details. However, for basic knowledge and skills, be sure to consolidate, in order to go further.

We emphasized collaboration. Take advantage of the good work of others, share what you know, and connect with more good people. Get rid of the dilemma of fighting alone and turn yourself into a key node in a quality collaborative system.

May you gain cognitive growth and enjoy the pleasure of updating your knowledge and skills in the process of studying data science. Let go of anxiety and experience flow.

discuss

What data science knowledge and skills have you acquired to date? How long did it take you? Was it painful? Are there any lessons we can learn? Welcome to leave a message, share your feeling to everyone, we exchange and discuss together.

If you like, please give it a thumbs up. You can also follow and top my official account “Nkwangshuyi” on wechat.

If you’re interested in data science, check out my series of tutorial index posts entitled how to Get started in Data Science Effectively. There are more interesting problems and solutions.