First of all, big data is reflected in big data

In 2011, the total amount of data held by China’s Internet industry reached 1.9EB (1EB byte is equivalent to 1 billion GB). In 2011, the total amount of data created and replicated in the world was 1.8ZB (1.8 trillion GB). In 2015, the total amount of data created and replicated in the world reached 8.6ZB. The amount of data stored in the world’s electronic devices will soar to 40ZB

So where do these huge numbers come from?

With the acceleration of global digitalization and networking process, the Internet is applied to all walks of life, and the amount of accumulated data is increasing. These data are all from our daily life, aggregated into BIGDATA.

What are the characteristics of big data?

Big data is not only the quantification of data, but also the rapidity, diversification and value.

Volume – large quantities

According to IDC’s estimates, the number has been growing at **50%** a year, which means it’s doubling every two years (Moore’s Law of Big Data).

Velocity – fast

One-second law: A large amount of data must be processed within one second before it can produce value to the business. This is also a fundamental difference from traditional data mining techniques.

Variety diversification –

Big data is made up of structured and unstructured data

  • 10% of structured data, stored in databases
  • Ninety percent of unstructured data, which is closely related to human information

There are many types of unstructured data: emails, videos, tweets, phone calls, web clicks, and so on

Value the Value –

Low value density, high commercial value. During continuous uninterrupted monitoring, data may be useful for only a second or two, but it has high commercial value.

What can Big Data do?

Big data is a new capability

What it represents is a thinking mode completely different from the traditional “small data”, which does not require a precise answer, but a macro thinking ability. A single data is of no value, but the accumulation of more and more data will lead to qualitative change. This new capability has advantages that traditional data analysis and data storage cannot match. Moving from megabytes of data to petabytes of data requires a complete refactoring of storage and computation from the bottom up, which represents a new capability.

Big data applications

Through the analysis of a large amount of data, we can predict a trend, can analyze the popularity of products, can realize the macro control of market economy, can establish smart transportation and smart home, can do the precise advertising, and so on…

summary

To sum up, big data is a post hoc comparison and real-time processing of data. Three features of big data analysis:

  • Full sample instead of sampling
  • Efficiency rather than precision
  • Correlation, not causation

The love-hate relationship between big data and related technologies

From databases to big data

== Pond fishing (database) vs. ocean fishing (big data) ==

The data size

The most obvious difference between “pond” and “sea” is size. Ponds “are relatively small, and even previously considered Large ponds, such as Very Large Database (VLDB), remain small compared to the ocean (XLDB).” “Pond” is usually treated in MB, while “sea” is often treated in GB, even TB, PB.

The data type

In the past, there were only one or a few types of data in the “pond”, which were mainly structured data. In the “sea”, there are thousands of kinds of data, including structured, semi-structured and unstructured data, and semi-structured and unstructured data account for an increasing share.

Relationships between schemas and data

In traditional databases, the schema comes before the data is generated. It’s like choosing the right “pond” first, and then releasing the right “fish” that will grow in that “pond” environment. However, in the era of big data, it is difficult to determine the mode in advance in many cases. The mode can only be determined after the emergence of data, and the mode is constantly evolving with the growth of data volume. It’s like having a small number of fish first, but over time, the number and species of fish continue to grow. Changes in the fish will keep the composition and environment of the sea in constant flux.

Handle object

In “pond” fishing,” fish “is merely the object of the catch. In the “sea”, the “fish” is not only the object of fishing, but also by the presence of some “fish” to judge the existence of other kinds of “fish”. That is, data in a traditional database is only used as a processing object. In the era of big data, data should be used as a resource to help solve problems in many other fields

Big data and cloud computing

Heads and tails of a coin

Big data and cloud computing are closely related and complement each other. The key technologies of the two are shared. “Cloud computing” appeared earlier

Mass data storage technology, mass data management technology, and MapReduce programming model are key technologies of big data in cloud computing

The relationship between big data and cloud computing is inseparable like the two sides of a coin. Big data cannot be processed by a single computer, and it must adopt a distributed architecture. Its feature is distributed data mining for massive data. But it must rely on cloud computing distributed processing, distributed database and cloud storage, virtualization technology.

You can understand the relationship between them as follows: cloud computing technology is a container, and big data is the water stored in this container. Big data is stored and calculated by cloud computing technology.

Inconsistent goals

  • Discover value vs. save IT costs
  • Cloud computing focuses more on “computing mode”, while big data focuses more on “data resources”.

Big data challenges

storage

In actual production, the data of some industries involve hundreds of parameters, whose complexity is not only reflected in the data sample itself, but also reflected in the interactive dynamics among multi-source heterogeneity, multi-entity and multi-space, which is difficult to describe and measure by traditional methods. How to store these highly heterogeneous data effectively becomes a challenge.

To deal with

With the advent of the era of big data, the rapid growth of semi-structured and unstructured data has brought great impact and challenges to traditional analysis techniques. To face:

  • Timeliness of data processing
  • Index design in dynamically changing environments
  • The lack of prior knowledge

reference

Big Data and cloud computing