Write in front: the blogger is a real combat development after training into the cause of the “hill pig”, nickname from the cartoon “Lion King” in “Peng Peng”, always optimistic, positive attitude towards things around. My technical path from Java full stack engineer all the way to big data development, data mining field, now there are small achievements, I would like to share with you what I have learned in the past, I hope to help you on the way of learning. At the same time, the blogger also wants to build a perfect technical library through this attempt. Any anomalies, errors and matters needing attention related to the technical points of the article will be listed at the end, and everyone is welcome to provide materials in various ways.
- Please criticize any mistakes in the article and revise them in time.
- If you have any questions you would like to discuss or learn, please contact me at [email protected].
- The style of the published article varies from column to column, and all are self-contained. Please correct the deficiencies.
The right way to open big data
Keywords: big data, application field, talent demand, learning route \
The article directories
- The right way to open big data
-
- I. The origin of the concept of big data
- Second, the application of big data
- 3. Talent demand of enterprises
- Iv. Learning route of big data
Thanks again for sharing this article with the staff of One-Man University
I. The origin of the concept of big data
First of all, let’s know what is big data, now the word is not so hot, many people also have some understanding. We open Baidu, you can also directly look up some definitions and features.
So let me tell you a little story from a developer’s perspective. In the ancient times, people could only record information through words, that is to say, the function of data was to record the information needed.
At that time, the circulation of data was not very convenient, the increase of data was not very large, and the type of data was relatively simple. With the development of technology, both sound and image can be recorded by the corresponding device and stored in the computer disk in the corresponding format, that is to say, the type of data is no longer a single text type.
“Two exciting things are happening in the software industry: object-oriented programming and the Web,” Jobs said in lost Interviews: A Dusty Vision. The Web will realize the dream we’ve all been waiting for, where computers will not just be computing tools, but communications.” This interview happened in 1995, we also have to admire the great man’s vision.
Due to the development of the Internet, the circulation of data has become simple and frequent, with the emergence of many e-commerce platforms and social networking sites, as well as major search engines, all of which contain great value.
At the same time, a large amount of data was accumulated in the field of financial securities. When the amount of data is small, it is not difficult for us to analyze these data. We can obtain a satisfactory result through relevant modeling tools and database software combined with statistical analysis.
However, when the amount of data increases and the data type becomes complicated, we still expect to obtain the potential value within an acceptable time range through the analysis of real historical data. This is the problem we need to solve, which is also the corresponding characteristics of big data.
Therefore, a huge big data software system is born, which includes large and small components and frameworks to meet the needs of each link in the data processing process.
Second, the application of big data
So I have just talked about the relevant concepts of big data and the problems it is committed to solving. The core value lies in creating profits and improving benefits. Since any industry is inseparable from data, it can also be said that data is everywhere.
As individuals, we use electronic devices that are constantly interacting with data; From a macro point of view, as a provider of various software services, the amount of data collected is particularly large.
For example, transaction data of shopping platforms, change information of financial securities, user behavior information that can be collected in various applications, and information contained in traffic.
Big data covers a wide range of fields, and as we are a country with a large population, we have obvious advantages in terms of data volume.
Whether it is genetic big data in biology, smart education in science and education, smart cities related to our daily life, or data analysis in a specific field, you can find the shadow of big data.
Friends who are engaged in data analysis should know that the calculation results of a data affect more or data quality, and algorithms can only play a role of correction and tuning. Data quality can then be determined by factors such as the number of data dimensions and whether the data itself is distorted.
The more comprehensive the data dimension is, the more detailed and specific the character can be, and the easier it is to make accurate predictions. Data dimensions can be understood as attributes of characters or indicators of certain behaviors occurring, such as height, weight, monthly income, monthly expenditure and so on.
3. Talent demand of enterprises
As big data is applied in such a wide range of fields, are the requirements for practitioners all so high? In fact, it is not as we imagined, because although the fields are different, the data processing process is basically the same, the difference lies in the source of the data, the type of data, the algorithm used and the purpose of the research. For developers, the difference can be summed up in two words: business logic. In a team, it is necessary to have an expert in the corresponding field to grasp the overall direction, and it is not necessary for everyone to study very deeply in this field. The current talent demand can be divided into two major directions: big data development and data analyst. You’ve probably heard of development engineer and algorithm engineer, but they’re very general. In the field of big data, the work of development engineers mainly includes the construction and maintenance of the big data cluster environment, the encapsulation and development of applications, and the business connection of the whole process of data analysis. Algorithm engineers are mainly responsible for the core part of data analysis, that is, they can further determine What I need and finally determine How to do on the premise of knowing What I want. Usually, those who are mathematics majors, have rich business experience and have studied a lot of papers can be perfectly qualified.
Iv. Learning route of big data
Just now, I have summarized the talent demand in the field of big data. Now, from the perspective of developers and learners, I will introduce how to transform the field of big data and how to open the door of big data.
Before deciding on a learning path, we should focus on the current mainstream technology. For learning, a more direct way is to look at the job responsibilities and technical requirements on major recruitment websites, or to determine the priority of learning by comparing the trend of Baidu index.
Now let’s talk about some professional knowledge. When we need to deal with a large amount of data, one machine is not enough. The core idea is to divide and conquer.
In the early years, Google published a paper on GFS and proposed the concept of distribution and scalability, which is also the core idea of big data storage, storing multiple copies of one piece of data.
To do this, multiple computers need to work together, and Windows has been a poor performer on commercial servers. Therefore, in addition to the study of big data related concepts, the first thing we need to learn is the Linux operating system.
There are many kinds of big data processing software, which are suitable for different data processing needs. From the whole process of data analysis, it can be divided into three parts: data collection, data analysis and results presentation.
The data collection part also has different processing methods according to different data types. Hadoop is mainly used as a distributed file storage system in China, called HDFS. That is to say, we need to find a way to store the data in HDFS. The processing of text files is relatively simple, and we can upload them directly.
The data generated by each application is usually stored in the database. We use the Sqoop component to pull data, and use Hive data warehouse and Hbase distributed database to manage data. Due to the lack of time, we cannot introduce every software. If you are interested, you can follow my knowledge planet to ask questions.
In the data analysis stage, we need to perform preparatory work, called data cleaning, which can usually be completed by HQL. In the data analysis stage, if it is simple statistical analysis, we can use MapReduce calculation model encapsulated by Hadoop to achieve it, or use HQL.
If predictive analysis is needed, it needs to use corresponding computing framework supporting machine learning library, such as Spark, and the whole analysis process will also change. The process of clustering and classification algorithm is different, and you can expand it by yourself with further learning.
In the data visualization part is mainly the use of some Web components for chart display, this part for developers should be familiar with the road, mainly using Baidu’s open source project Echarts, especially after the launch of the new version, for tens of millions of levels of data rendering provides better support.
Of course, the above are only the parts that must be mastered. In addition, according to different business scenarios, there are also the processing of stream data, low-delay data analysis, deep learning framework and so on.
Flume, Kafka, Storm, Elasticsearch, Cboard.
So for the students in the school, also if you are a math major, and want to develop posts to the big data development, so congratulations you, you made a very wise choice, although in the beginning is limited by the ability of coding, but the mathematics thinking is subtle, and the influence of late you show advantage is obvious.
In addition to learning the courses of my major well, I also need to make some efforts to get in touch with some related basic learning content, such as Linux, Java, database, software engineering, data structure.
If you want to make more achievements in the field of data analysis, according to my current understanding of the market, enterprises still prefer to accept students with a graduate degree who majored in statistics and mathematics.
If graduates from non-prestigious universities, it may be difficult to find a job as a data analyst at the very beginning. After all, this job requires not only knowledge of algorithms, but also business experience.
The above is just a little bit of my personal cognition, for your study and reference, we can communicate more about the deficiency, thank you.