About data
In recent years, the data produced by human beings has seen explosive growth. From the rise of mobile intelligent devices more than a decade ago, to the various sensor devices worn by people today, a large amount of data is constantly generated in 24 hours. This data includes text, voice, images, video and so on.
Big data
The term “big data” first appeared in the 1990s, when it was only used to describe the large amount of data, but did not give a clear definition and conceptual meaning. After its emergence, big data received little attention from many people. Until 2012, big data attracted the attention of all walks of life. Many disciplines and industries are involved in big data.
In terms of technology, the amount of data contained in big data generally exceeds the memory capacity of a single computer, even hundreds or thousands of times, so it is necessary to have specialized tools for processing massive data. MapReduc, proposed by Google, can be said to be the pioneering work in this area, and then Hadoop, which is open source, is a classic big data processing tool.
Big data was first developed in large Internet and e-commerce companies. Around 2008, the data collected by these companies was too large to be processed by traditional technologies, and it was difficult to meet the needs of business development. Therefore, concepts and technologies related to big data were put forward one after another. In 2010, with the advent of Web2.0 and the popularity of intelligent terminals, the amount of data generated further surged, at this time, big data has been integrated into human social life. In 2012, big data became one of the hottest fields in the world, and many companies at home and abroad put forward big data-related strategies. Big data officially entered the national development strategy in 2015 and has been developing rapidly since then.
The core work of big data is prediction, which can predict the possibility of transaction occurrence through mathematical model algorithm and massive data.
Features of big data
- Large capacity, large amount of data.
- Multiple types of data, including structured data, semi-structured data and unstructured data.
- Authenticity, big data should have authenticity, otherwise it has no value.
- Timeliness. Big data is usually timeliness.
Data engineering,
When we collect data, in order to generate business benefits, we will process and analyze data from an engineering perspective to obtain valuable information. This process is called data engineering. The general process of data engineering is as follows:
- Data acquisition, collecting data from different sources and getting data into a unified device.
- Data storage: The persistence of collected data in a storage medium, such as a hard disk.
- Data cleaning, will not conform to the specification of the data for specific processing, so that the data to achieve accurate, complete and consistent requirements.
- Data modeling, which defines the process for meeting the data requirements required by the business, typically involves a business modeler.
- Data processing: data collection, storage, retrieval, processing, transformation, transmission and other operations, extracting valuable data from massive data.
- Data analysis, using data mining techniques to obtain valuable information from mass data.
- Data visualization, the data is presented to users in an intuitive visual way.
Artificial intelligence (ai)
Artificial intelligence was formally proposed at Dartmouth in 1956. It studies how to build intelligent machines or simulate intelligent human behavior. The introduction and development of artificial intelligence can refer to the previous article “Understanding artificial intelligence – subject introduction, history of development, three schools of thought”.
Major areas of AI
- Pattern recognition, through the computer to the data sample feature extraction to learn the model, and then according to the model discrimination.
- Machine learning, making machines have the ability to learn, making machines have intelligence, involving cognitive science, neuropsychology, logic, etc.
- Machine translation is a branch of computational linguistics, involving linguistics, computer, cognitive science, information theory and other disciplines.
- Natural language processing, enabling machines to understand natural language, to generate and understand natural language in the same way that humans do.
- Computer vision, the ability for a computer to perceive information about its environment through images, such as identifying the environment and finding the shape, position, posture, movement of that object, and further understanding it.
- Expert system: a system with a large amount of knowledge and experience in a specific field, just as an expert in a human field has rich professional knowledge and experience, and can quickly solve problems in the corresponding field.
Big data and AI
Big data and artificial intelligence are inseparable. The development of big data is inseparable from artificial intelligence. Without the support of ARTIFICIAL intelligence, big data cannot have intelligence. And the development of artificial intelligence is inseparable from the support of data, it needs massive data as the basis of thinking and decision-making. It is generally believed that the three foundations of artificial intelligence are data, algorithm and computing power, and computing power is the basis of another dimension. Without the rapid development of hardware and parallel computing, there would not be this round of artificial intelligence wave. Because no matter how good an algorithm is, it has no practical application value without the help of computational power.
Machine learning vs artificial intelligence
Generally speaking, machine learning belongs to a subset of artificial intelligence and is a way to achieve artificial intelligence. When it comes to machine learning, it is bound to involve deep learning, a subset of machine learning that has become popular in recent years. So their relationship is like a Russian doll, layer upon layer.
Machine learning
Machine learning begins with the study of how to do a task without using explicit instructions to code it, and instead allows a machine to learn from data to acquire capabilities. Machine learning starts from known data features, uses mathematical methods such as probability statistics to get a certain rule, and then uses the rule to complete a certain prediction task. If a simple description is: the use of a mathematical expression of a data feature to represent something.
Machine learning is formally defined as: “For some class of task T and performance metric P, a computer program is said to learn from experience E if its performance measured at P on T improves with experience E.” .
Machine learning focuses on how to make the machine learn certain rules from the past data samples through programming, so that it can predict or make decisions about the future, that is, to achieve a task execution program that can optimize itself according to experience (data) and under the guidance of certain norms. For example, we collect a lot of different pictures of cats and dogs, and the machine learns its own patterns from those pictures, so that it can recognize cats and dogs.
Focus on artificial Intelligence, reading and feeling, talk about mathematics, computer Science, distributed, machine learning, deep learning, natural language processing, Algorithms and Data Structures, Java depth, Tomcat kernel, etc.