They say the best time to do something is “now,” but where to start is often confusing for a lot of people, let alone those who want to get started on data science and machine learning. In this article, the author provides some tips and resources for beginners who want to get into the same situation.
From Towardsdatascience, written by Daniel Bourke, Compiled by The Heart of the Machine, with participation by Han Fang and Yiming.
Let’s start here.
Two years ago, I started teaching myself machine learning online and shared my learning process through YouTube and blogs. I had no idea what I was doing, and I had never written code before I decided to start learning machine learning.
Someone told me he had already started learning Python and was going to learn machine learning, but didn’t know what to do next.
“I’ve learned Python, what do I do next?”
I replied with a list of learning steps and copied them here. If you want to become a machine learning practitioner but don’t know how to write code, use this article as an outline. My learning style is code first: get the code up and running and then learn theory, math, statistics and probability as needed, rather than theory at the beginning.
Learn Python, data science tools, and machine learning concepts
The email writers who asked me said they had learned some Python. But this step also works for novices. Spend a few months learning Python programming and different machine learning concepts. You’ll need both.
Practice using data science tools like Jupyter and Anaconda while learning Python programming. Spend a few hours researching what they are used for and why.
Learning resources
-
Artificial intelligence elements (https://www.elementsofai.com/) – overview the concept of artificial intelligence and machine learning.
-
On Coursera Python tutorial – (https://bit.ly/pythoneverybodycoursera) from the beginning to learn Python.
-
Through freeCodeCamp learning Python (https://youtu.be/rfscVS0vtbw) – a video covered all main Python concepts.
-
Corey Schafer Anaconda tutorial (https://youtu.be/YJC6ldI3hWk) – a video learn Anaconda science and machine learning need to configure the environment (data).
-
Dataquest novice Jupyter Notebook tutorial (https://www.dataquest.io/blog/jupyter-notebook-tutorial/) – learn to start and run an article Jupyter Notebook.
-
Corey Schafer’s Jupyter Note tutorial (https://www.youtube.com/watch?v=HW29067qVWk) – a video to learn to use Jupyter Notebook.
Learn data analysis, manipulation, and visualization using Pandas, Numpy, and Matplotlib
Once you’ve mastered some Python skills, you’ll want to learn how to handle and manipulate data. To do this, you’ll need to be familiar with Pandas, Numpy, and Matplotlib.
-
Pandas allows you to manipulate two-dimensional data, similar to tables of information in an Excel file, containing rows and columns. This type of data is called structured data.
-
Numpy can help you do numerical calculations. Machine learning takes everything you can think of and turns it into numbers, and looks for patterns in those numbers.
-
Matplotlib helps you draw graphs and visualize data. It can be difficult for humans to understand a bunch of numbers in a table. We prefer to see a graph with a line running through it. Visualization can better communicate your findings.
Learning resources
-
Python Applied Data Science on Cousera (http://bit.ly/courseraDS) – Start honing Python skills in data science.
-
Introduction to 10 minutes pandas (https://pandas.pydata.org/pandas-docs/stable/gettingstarted/10min.html) – a quick overview of pandas library and some of the most useful function.
-
Codebasics Python pandas tutorial (https://youtu.be/CmorAWRsCAw) – YouTube series introduces all the main function of the pandas.
-
FreeCodeCamp NumPy tutorial (https://youtu.be/QUT1VHiLmmI) – a YouTube video to learn NumPy.
-
Sentdex Matplotlib tutorial (https://www.youtube.com/watch?v=q7Bo_J8x_dw&list=PLQVvvaa0QuDfefDfXb9Yf0la1fPDKluPF) – YouTube Matplotlib series helps you learn all the most useful features of Matplotlib.
Learn machine learning with SciKit-Learn
The focus is on what kind of machine learning problems are involved in learning, such as classification and regression, and what algorithms are best suited to solve these problems. You don’t need to understand each algorithm from scratch just yet, but learn how to apply them.
Learning resources
-
Scikit-learn Python machine learning for Data School (https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A) – a YouTube playlist teach you scikit – learn All the major functions of phi.
-
A brief introduction to exploratory data analysis by Daniel Bourke (https://towardsdatascience.com/a-gentle-introduction-to-exploratory-data-analysis-f11d843b8184) – the knowledge you learned in the two steps above fusion in a project . Provides code and videos to help you get started with your first Kaggle contest.
-
Daniel Formosso exploratory data analysis based on scikit – learn notes (https://github.com/dformoso/sklearn-classification) – more than a deeper version of resources, An end-to-end project that practices the above is included.
Learning deep learning neural network
Tip: In most cases, you’ll want to use a set of decision trees (algorithms like random forest or XGBoost) for structured data, while for unstructured data, you’ll want to use deep learning or transfer learning (taking a pre-trained neural network and applying it to your problem).
Learning resources
-
Andrew Ng on Cousera deeplearning. Ai (https://bit.ly/courseradl) (https://bit.ly/courseradl) – one of the most commercially successful practitioners deep learning course teaching.
-
Jeremy Howard fast. Ai deep learning courses (https://course.fast.ai/) (https://bit.ly/courseradl) – one of the best practitioners in the industry deep learning practical methods of teaching.
Other courses and books
After you’re familiar with how to use different machine learning and deep learning frameworks, you can try to consolidate your knowledge by building them from scratch. You don’t always have to do this in production or machine learning, but knowing how things work from the inside will help you build your own work.
Learning resources
-
How to Start your own machine Learning Project (https://towardsdatascience.com/how-to-start-your-own-machine-learning-projects-4872a41e4e9c) – may find it hard to start your own project, This article can give you some guidance.
-
Jeremy Howard fast. Ai (https://course.fast.ai/part2) – the basis of the study of the top-down, after studying this course will help you to fill in the blanks from down to up side.
-
Grokking Deep Learning by Andrew Trask (https://amzn.to/2H497My) — This book will teach you how to build neural networks from scratch and why you should know how to build them.
-
Daniel Bourke books recommended by machine learning (https://www.youtube.com/watch?v=7R08MPXxiFQ) – the YouTube video to sort out some of the best books of machine learning.
Answering questions
It could take you six months or more. Take it easy. Learning new things takes time. As a data scientist or machine learning engineer, the main skill you’re developing is how to ask good questions about data and then use your tools to try to find the answers.
Sometimes you feel like you haven’t learned anything. Or even backwards. Ignore it. Instead of measuring by day, see how far you’ve come in a year.
Where can I learn these skills?
I’ve listed a few resources above, all online and mostly free, and there are many more.
Remember, a big part of being a data scientist or machine learning engineer is solving problems. Explore each step here with your first assignment and create your own curriculum to help with learning.
What about statistics? What about math? Probability?
You’ll learn these things by doing them. Start with the code. Get the code up and running. Trying to learn all about statistics, math, probability before running the code is like trying to boil the ocean. It makes you wince.
Certificate?
Certificates are great, but you don’t study for them, you study for skills. Don’t make the same mistake I did and assume that more credentials mean more skills, it doesn’t. Build a knowledge base through the courses and resources described above, and then develop your expertise through your own projects (which cannot be taught in the course).
Reference link: towardsdatascience.com/5-beginner-…