Introduction: 40 years of technological development and changes, the trend, status quo and challenges of the Internet of Things industry
The improvement of infrastructure promotes the change of application form
We have divided the past four decades into five important phases of technology development, which we have broken down on a timeline: 1980-2000, 2000-2005, 2005-2010, 2010-2020, and 2020-2025. Today’s press conference is about the fifth stage of technological development, looking at the future from the past. So let’s review the history of technological development first. What are the technological developments of the first four stages? What are the main scenarios for technology application? What are the mainstream application forms? What new technologies and products have emerged?
In 1980~2000 this stage, is the computer technology development and application stage. Computers can help enterprises better manage their own data, which is of great help to improve the efficiency of the process, so the application of this stage is mainly enterprise information system. This was the stage where many enterprise-oriented technology companies were born and great commercial products were created. The data of the application system is mainly stored in the database, and the database is mainly relational database. This stage is the mature period of the theory and products of relational database, and the commercial database products such as Oracle, IBM DB2, Microsoft SQL Server are born.
In 2000~2005 this stage, is the initial development stage of Internet technology. Information can be transmitted more effectively through the Internet, so a large number of portal sites were born at this stage, that is, the Web 1.0 era. LAMP (Linux+Apache+MySQL+PHP) was the most popular site-building technology at the time and was a low-cost solution made entirely of open source products. The data storage of application system is still based on relational database. MySQL replaces commercial relational database with its advantages of open source and low cost and has been widely used. With more and more information on the Internet, people have more and more strong demands for obtaining effective information, so search engine is born as a new application form. Search engine was the first application to face the challenge of large-scale data processing. As a great technology company, Google pioneered many big data processing technologies. The well-known troika (GFS, Mapreduce, and Bigtable) laid the foundation for the development of NoSQL and Bigdata technologies over the next decade.
Between 2005 and 2010, more and more people were connected to the Internet as personal computers became more popular and the cost of accessing the Internet fell. Therefore, some new application forms are born. First, people are no longer satisfied with the one-way information acquisition from the Internet, but are more eager to carry out information exchange between people on the Internet, thus promoting the development of social networks. Second, around “crowd” applications, some B2C or C2C e-commerce sites began to develop. This era is the so-called Web 2.0 era, when “people” as the new data source began to generate large amounts of data on the Internet. This period has produced some very large Internet companies, mainly in e-commerce and social networking. These companies face the challenge of how to support such massive data online services, and how to process and analyze such huge amounts of data. At that time, there were no mature and usable solutions, so these companies had to start developing their own systems, so some popular NoSQL and Bigdata systems were born or incubated from this era of large Internet companies, for example, Hadoop was first incubated inside Yahoo. Cassandra was first used in the Facebook inbox search scenario.
From 2010 to 2020, with the development of 4G technology and the popularity of smart phones, mobile Internet begins to develop. People can connect to the Internet anytime and anywhere, and mobile applications can reach a wider range of people and penetrate into more life scenarios, such as payment and taxi hailing. Traditional Internet applications are gradually shifting to mobile applications, resulting in a large number of application construction needs. Cloud platforms have been accepted by more enterprises as low-cost and easily accessible data centers. This decade is also a golden decade for cloud computing development. Cloud computing radically changes the application running environment. Unlike traditional IDCs, computing and storage resources are pooled and multiple types of storage and computing resources can be flexibly obtained. The applications built based on cloud elastic resources are called “cloud native” applications. More and more big data and database products are built based on cloud native applications. In order to have elastic and extensible capabilities, distributed technologies are also widely used. For modern new big data and database products, distributed and cloud native capabilities must be a must.
Finally, in the period of 2020-2025, we can see the gradual maturity of 5G and IoT technology, and a new application form, namely Internet of Things, will be born. Some of the new application scenarios we can see include the Internet of cars, the Industrial Internet of Things and the smart home.
Summarize the evolution of technologies and products in the past decades (infrastructure technology -> wider range of information -> more scenarios, larger data scale -> technology and product development) :
Each stage is the starting point of the improvement and popularization of an “infrastructure”, the core role of infrastructure is to further enlarge the scope of information. For example, the Internet allows applications to connect to more terminals. The mobile Internet directly breaks down the barrier of terminals and directly connects applications to more people, while the Internet of Things will add more devices to the connection.
As the scope of informatization becomes larger, more new application scenarios are born, and a larger number of “individuals” produce a larger scale of data, which becomes the driving force of the development of basic technology.
In this process, the basic technology often falls behind the development of the application form. However, with the popularization of distributed technology and cloud computing, the evolution and popularization of basic technology is faster and faster. We can also see a change in the form of basic technology products, from the earliest commercial products, to open source products, to now cloud native products.
So in the new stage of the Internet of Things, the number of devices and the data generated will be bigger and there will be bigger challenges. So what kind of technological development will be promoted under such challenges?
The Internet of Things industry will usher in rapid development, what technical challenges will it face
Let’s take a look at the rapid development of the Internet of Things and the overall growth trend of the Internet of Things from the following two market reports:
Massive growth in devices: Gartner predicts that the number of devices in the Internet of Things will grow to 25 billion by 2021. Managing so many devices is the first challenge.
Large-scale growth of device data: IT can be seen in IDC report that the data scale of Internet of Things will reach 79.4ZB by 2025, with an average annual growth rate of 34.91%. How to store and analyze such a huge amount of data is the second challenge.
This section uses the Internet of vehicles as an example to define data storage requirements
The theme of this conference is Internet of Things storage solutions, so let’s take a look at the specific requirements for data storage in the Internet of Things scenario. Let’s take a real application scenario in the Internet of vehicles as an example. If you are a new energy vehicle enterprise providing online car booking service, and manage hundreds of thousands of new energy vehicles providing online car booking service on a daily basis, you will encounter the following specific scenarios.
To facilitate the management of these vehicles, each vehicle has to report its status in real time, including location information, remaining battery, mileage, speed and so on. In addition to these dynamic information, each car also has its own static information, such as model and owner, which needs to be acquired and managed in real time at the back end.
With the real-time status information of these vehicles, real-time status query services can be provided for the owners, passengers or the background of the vehicle. There will also be some computing tasks in the background that depend on real-time status, such as managing specific tasks by selecting vehicles according to location information or specific conditions, or scheduling vehicles according to real-time status.
In addition to the real-time status report of the vehicle itself, the vehicle and the management background also need to maintain a message channel. Through this message channel, the vehicle will report some abnormal events, the background can also send some messages or control instructions.
In addition, some driving information of the vehicle will be reported and stored as track data, and some sensor data in driving also need to be stored. With these data, on the one hand, the driving track can be queried; on the other hand, some calculation and analysis can be carried out based on the data to mine more values, for example, scheduling algorithm can be optimized by analyzing historical driving data.
It can be seen from these scenarios that vehicle-related data are mainly divided into three categories. One is real-time status data, which is classified as “metadata”. The second is message channel, which we classify as “message data”. The third is trajectory data, which we classify as “time series data”. The three types of data have different requirements on the underlying storage. Metadata is characterized by frequent updates and has high requirements on query capability. Data needs to be queried or filtered according to multi-field conditions. The characteristics of “message data” are similar to message queues. There are a large number of queues and an independent confrontation needs to be maintained for each device. “Sequential data” is characterized by high throughput and write, large data scale and more emphasis on analysis scenarios.
In traditional schemes, metadata is stored in MySQL. However, the biggest problem of MySQL is that it cannot flexibly support multi-field filtering. Elasticsearch is generally required to provide multi-field retrieval capability. Although message data has the characteristics of message queue, traditional message queue cannot be used, because traditional message queue cannot support such a large number of topics, so MySQL is generally chosen to simulate queue implementation. HBase is generally used to store time series data, which provides high throughput writes and supports large-scale storage. However, HBase does not have the analysis capability.
Basic technologies often lag behind the development of application forms. The traditional architecture is to build the entire Iot storage system by combining multiple products. This multi-component composite architecture has high architecture complexity and high operation and maintenance costs. Developers need to understand and use multiple products, and distributed components are difficult to operate and maintain, resulting in high overall costs. And each component are not for iot scenario design and optimization, we can see a iot scenarios equipment metadata, message data and time-series data have very typical characteristics, and the overall scale growth rate far above the age of the Internet, can see the old product cannot cope with the growth of the larger scale of data under the Internet of things.
What kind of storage products does the Internet of Things need
According to the objective law of the development of technological products in the past decades, the era of the Internet of Things has arrived, and the current technological architecture cannot support the growth of the scale of the Internet of Things in the future. In the face of the new application form of the Internet of Things, the Internet of Everything under the Internet of Things, under the challenge of massive devices and massive data, based on cloud computing, a new generation of basic platform, we make use of cloud native, distributed and other basic technologies, what new basic products do we need to create?
We want this new base product to have the following features:
Built based on cloud native and distributed technology, it has scalability and flexibility
It can meet the requirements of one-stop storage, retrieval and analysis of device metadata, message data and timing data
The cost is low enough to support such a huge amount of data
The original link to this article is the original content of Ali Cloud, shall not be reproduced without permission.