IBM Watson, Kaggle, Bertelsmann, INSIGHT, Figure Eight
Now, more than half a year has passed, and in this harvest season, we welcome the world’s first “data scientist” nano degree graduate. He has not only overcome the language threshold of english-only teaching, but also completed two semesters of challenges. Kyle Chen is from China, congratulations on his sailing through YODA!
Today, he will be speaking about how he became the world’s first graduate student after becoming so addicted to his studies.
Yoda students said
Student name: Kyle Chen
Job Offer: SA Developer (Automation R&D Engineer)
Company: MBCloud
Motto, hobbies, etc. : Usually nothing more than masturbation code, read books, listen to songs, write blog. Besides raising kids, learning code words.
Github: github.com/kylechenoO
Blog: hacking-linux.com
Wechat official account: AINailN
Hello, everyone. My name is Kyle Chen. I am working in a bank in Shenzhen and have been engaged in automation research and development for eight years. Considering the needs of my work and my interests, I began to try to transition to AI at the end of February this year.
Data science, artificial intelligence, machine learning, deep learning, first name, high-end, atmospheric. However, when you learn it in depth, you will find that statistics, probability theory, calculus, any knowledge points extracted will be a relatively large discipline. Don’t be afraid to look at this, but let’s take a closer look. It’s not as difficult as you might think.
Data science is a relatively broad subject. When data scientists get the latest data, they need to consider cleaning the data at the first time. In this step, they need to do some basic processing on the data. For example, if there are too many gaps in a certain eigenvalue, we can consider dropping the eigenvalue, or consider using the median/mean value for replacement. Then, we need to carry out a series of pre-processing for the data. In this step, we will screen some abnormal data and eliminate them or leave a certain proportion of abnormal values after repeated confirmation. Combined with the above two steps, we have a ‘clean’ data.
In the pre-processing, we also need to pay attention to the data distribution in the data set. Of course, while drawing the distribution, we can also carry out a series of screening of abnormal data (for example, the size of the picture, size, when its size is too different from that of other pictures, it can be considered to remove it from the data set). This is also where data can be normalized and regularized if desired. Next, the data set needs to be split into training set, verification set and test set. You can refer to the ratio (training set: verification set: test set, 6:2:2).
Then, under normal circumstances, if there are too many eigenvalues, the eigenvalues may be screened first, which is called feature engineering. At the same time, it determines the input data of our model. Then, we select project models. In the case of lack of experience or lack of knowledge, we can give priority to improving some existing models. This process is called transfer learning. With transfer learning, we can pipe our input/output parts directly, or we can unfreeze parts of the original migration model and fine tune them.
This involves a large part of the work of parameter tuning. Whether the parameter tuning is good or not often directly affects the accuracy of the model. If you are a novice, it is recommended to try some parameter combinations here (such as learning rate, Optimizer, etc.). After determining the model, we also need to evaluate and verify the model. The simplest and crude scoring criteria can be directly selected as accuracy rate and error rate. Some samples can also be randomly selected for spot check and display.
Above, just a brief introduction to the following process of using machine learning/deep learning models to do data prediction. In the real world, of course, it’s not always that simple. Some of these modules (for example, model parts, back-end interfaces, front-end presentations) may need to be broken down to be called and output as RestFul interfaces or other forms. Of course, these are some of the problems in software engineering.
After finishing the nanotechnology degree of “Data Scientist”, I not only have a detailed understanding of data processing methods, model processing and interface design, but also have a deeper understanding of the overall AI industry. Machine learning, deep learning, and the research and development of various algorithms can only live in the field of scientific research. The specific implementation and implementation depends on how data scientists integrate them with existing data and business logic, so as to give accurate guidance to decision makers.
Of course, some of the work here will have a lot of relevance to your industry. Again, if you want to do something, do it well, invest your time in it, settle down in an industry, mix your experience with technology, and there will always be surprises.
Finally, we wish you all success in your studies and set sail in your career just like the author of this article!
(Chinese version: Sina Weibo @Love life love Cocoa)