Big Data era and Cloudy Era: One dies, one is Born

The paper contains 3946 words and is expected to last 8 minutes

Photo credit: unsplash.com/@ev

The era of big data is dying as the focus shifts from collecting data to processing it in real time. Big data is now a business asset, laying the foundation for the coming era of cloud support, machine learning and real-time analytics.

The era of raw big data came to an end on June 5, 2019, when Tom Reilly announced he was stepping down from Cloudera and Cloudera’s market value dropped. With MapR’s recent announcement that it may close its doors, its continued operation will depend on whether it can find a buyer. All of this strongly suggests that the original Big data era, driven by Hadoop, ended in June 2019.

Big data will be remembered because it helped social media come to dominate and fundamentally changed the way businesses think about dealing with large amounts of data. Moreover, data analytics, data quality and data management can be used as a yardstick to measure enterprise assets.

While eulogizing the era of big data, it’s important to emphasize that big data technology is not actually dead. But now that it has established itself in the enterprise, the era of raw big data based on Hadoop has matured. Big data is no longer part of a never-ending, high-speed hype cycle, but a mature technology.

Big Data and Hadoop search volume changes on Google engine

Birth of big data

In 2006, Apache Hadoop was put into use, ushering in the era of big data. At the time, developers and architects thought this tool could help process and store multiple structured and semi-structured data. There has been a fundamental shift in the way people think about enterprise data, beyond the properties of traditional enterprise databases such as ACID (atomicity, consistency, isolation, and persistence). The change in data use cases resulted from the company’s realization that previously discarded or sealed data might actually help understand customer behavior, propensity to take action, risk factors, and complex organizational, environmental, and business practices.

The commercial value of Hadoop first appeared in 2009, when Cloudera released a commercial version, followed by MapR, Hortonworks, and EMC Greenplum (now Pivotal HD). While analysts predicted big data as a potential $50 billion-plus market, Hadoop as an analytics tool eventually ran into challenges in the 2000s.

Challenges in the Hadoop enterprise

While Hadoop is useful for large storage, ETL (extract, transform, and load) jobs, and supporting machine learning tasks through batch processing, it is not the best choice for the more traditional analytical work that enterprises and large organizations use for daily decision-making. Tools like Hive, Dremel, and Spark are better suited for analytics than Hadoop. And Hadoop isn’t fast enough to truly replace data warehouses.

Hadoop also faces other challenges. NoSQL database and object storage providers have made significant strides in addressing the storage and management challenges Hadoop was originally designed to solve. Over time, the lack of business continuity, flexibility, and ability to handle real-time analytics, geospatial, and other emerging analytics use cases makes it difficult to scale from batch processing to large amounts of data.

In addition, over time, enterprises are finding that more and more big data issues require a wide range of data sources, rapid adjustments to data schemas, queries, and definitions, and specific scenarios that reflect the use of new applications, platforms, and cloud industry vendors. To address this challenge, operations such as analysis, integration, and replication must be more agile and fast. A number of suppliers emerged, including:

· Analytics solution providers such as ClearStory Data, Domo, Incorta, Looker, Microsoft Power BI, Qlik, Sisense, Tableau and ThoughtSpot

· Data channel providers such as Alooma, Attunity, Alteryx, Fivetran and Matillion

· Data integration providers including Informatica, MuleSoft, SnapLogic, Talend and TIBCO (they also compete with each other in analytics through Spotfire portfolio)

It is no coincidence that such companies seem to be in the spotlight, both in terms of acquisitions and funding. Recent examples include but are not limited to:

· ThoughtSpot raised $145 million in Series D funding in May 2018

· Sisense raised $80 million in Series E funding in September 2018

· Incorta extended a $15 million Series B funding round in October 2018

· Fivetran raised $15 million in Series A funding in December 2018

· Looker raised $103 million in Series E in December 2018

· TIBCO acquired Orchestra Networks in December 2018

· LogiAnalytics acquired Jinfonet in February 2019

· Google acquired Alooma in February 2019

· Qlik acquired Attunity in February 2019

· Informatica acquired AllSight in February 2019

· TIBCO acquired SnappyData in March 2019

· Alteryx acquired ClearStory Data in April 2019

· Matillion raised $35 million in Series C financing in June 2019

· Google plans to acquire Looker in June 2019

· Salesforce plans to acquire Tableau in June 2019

· LogiAnalytics acquired Zoomdata in June 2019

The success of these companies reflects society’s need for analysts, data and flexible platforms to enhance the analytical value of data from different clouds and sources in specific contexts. There will be more of this in 2019, as some of these companies are either private equity-owned or have already received significant vc funding and need to exit as soon as possible to build up funds for future vc investments.

Photo credit: unsplash.com/@jontyson

As the era of big data dies, we will further enjoy the benefits of the era of cloud, machine learning, and real-time and ubiquitous context.

In a multi-cloud era, there is a greater need to support existing applications and platforms across multiple clouds, as well as continuous services and business continuity. The “there is already an application for this task” mentality has resulted in an enterprise with an average of one SaaS application per employee, which means that each large enterprise supports the data and traffic of thousands of SaaS applications. The growth of back-end containerization has led to a trend toward decentralized and specialized storage and workload environments to support on-demand and peak use environments.

The era of machine learning is characterized by an emphasis on analytical models, algorithms, model training, deep learning, and the ethics of algorithms and deep learning techniques. Much of the work of machine learning is the same as data cleansing for analytical purposes, but additional mathematical, business and ethical research is needed to create lasting and long-term value.

The era of real-time and ubiquitous context has increased the need for timely updates from an analytical and participatory perspective. From an analytics perspective, weekly or daily updates to the company’s analytics process are no longer sufficient. Employees today need near-real-time updates or risk making outdated decisions. Using real-time analytics effectively requires a wide range of business data to provide the appropriate context and to perform analysis based on data and requirements. Ubiquity also requires interaction, requiring the Internet of Things to provide more edge views of the environment and mechanical activity, as well as the ongoing expansion of the real world — including augmented and virtual reality — to provide users with immersive scenarios. To provide this level of interaction, data must be analyzed at interaction speeds as short as 300-500 milliseconds to provide effective behavioral feedback.

As the era of big data comes to an end, more attention can be paid to the myriad challenges of processing, analyzing, and interacting with large amounts of data in real time than to the way large amounts of data are collected. Here are a few concepts to keep in mind before stepping into the new era of big data.

First, Hadoop still has a place in enterprise data. Amalgam Insights predicts that MapR will eventually become a company known for managing IT software like BMC, CA or Micro Focus, and believes Cloudera has taken steps to improve Enterprise Hadoop to support the next era of data. But technology is relentless, and Cloudera’s sticking point is whether it can transform quickly. Cloudera faces the challenge of digital transformation as it develops its enterprise data platform into the next generation of research and machine learning. Over the past few decades, companies have been able to determine a timeline for the transition. Today, like Amazon, Facebook and Microsoft, successful tech companies must be prepared to reinvent themselves every decade, even nibbling away at bits of themselves to stay viable.

Second, the need for cloud analytics and data visualization is greater than ever. Google and Salesforce recently spent $18 billion on Looker and Tableau, essentially mark-to-market acquisitions based on size and revenue growth. Billions of dollars will also be spent on analytics programs that study data from a variety of sources and support the increasingly fragmented and diverse storage, computing and integration needs associated with cloud. This means that companies need to determine strategic data integration, data modeling, analysis, and/or machine learning/data science team to what extent will solve these problems, because of the heterogeneous data processing and analysis is becoming more and more difficult and complicated, and still has to support strategic business requirements and data will be used for real strategic advantage.

Third, machine learning and data science are the next generation of analytical analytics that require unique new types of data management work. Large-scale creation of test data, synthetic and masked data, as well as pedigrees, governance, parameter and hyperparameter definitions, and algorithm use requires going beyond traditional big data. The biggest concern is the use of data that is not competent for the business because of small sample sizes, insufficient data sources, unclear data definition, poor data context, or inaccurate algorithms and classification assumptions. In other words, don’t use lie data. Lying data can lead to biased, non-compliant, inaccurate results, and can lead to consequences similar to Nick Leeson’s destruction of Barings Bank in 1995, or Societe Generale’s $7 billion trading loss due to manipulation by Jerome Kerviel. AI is now the new potential “rogue trader” and needs to be properly monitored, managed and supported.

Fourth, the real-time and ubiquitous environment is a challenge for data, but also for collaboration and technology. People are moving into a world where every object, process, and conversation can be marked, subtitled, or augmented with additional text, and billions of bytes of data can be processed in real time to generate a simple “slow down” or “buy now” alert. Companies such as Gong, Tact and Voicera are attempting to digitally record, analyze and enhance analog conversations with additional text, creating the concept of “digital twins” for PTC, GE and other product life cycle and manufacturing and commercial companies in the industry.

conclusion

In short, the era of big data is over. But in the process, big data has become a core aspect of IT in its own right, ushering in a series of new eras, each with a bright future. Companies investing in big data should see these investments as an important foundation for future real-time, enhanced and interactive engagement companies. As the era of big data comes to an end, it is now ready to embrace the whole of big data as a business asset, not just hype, to support work-based contexts, machine learning, and real-time interaction.

Big Data era and Cloudy Era: One dies, one is Born

Related Posts

Terrible! How YouTube algorithms get kids hooked…

VRN – 3D model your face using just one photo

The tutor disapproves my Sql to write too low? Asked me to rewrite and added three requirements, right? — Analysis of Spark movie rating data from World War II