Then follow the 25 cases in the previous chapter to introduce the rest of the typical cases of big data companies mining data value…



26. The behavior of the crowd


Zhongqu is the first social media data management platform in China. At present, major social open platforms in China are still conservative in the openness of user data. As a third-party data analysis company, users’ permission is required to use such user data.


Zhongqu filters user data through operation statistics and other relevant data analysis principles, and finally completes the description of a user’s behavior, action and other individual characteristics. These descriptions can help brand marketers understand consumers’ consumption habits and needs; It can also help the leaders of the enterprise to improve their understanding of their employees. In addition to the description of individual and group behavior characteristics, these data analysis results can also be used to predict the behavior of user groups, so as to provide some prospective market analysis for marketers.


The results of zhongqu data analysis can only be accurate to groups but not to individuals. This kind of user data research in addition to a certain reference price in the field of marketing, most of them are also mainly used to cooperate with some small surveys. In addition, these data can also realize the credit rating of users and even enterprises, and also have a certain degree of use in the financial field.


The future of dragnet


Taola.com, an online shopping guide, created the “What to wear tomorrow” app. In the app, numerous fashion industry authorities send matching and style items, which are randomly rated by the user. Based on the user’s rating preference, Doudou.com can guess what she wants to wear tomorrow, and then recommend items among hundreds of thousands of online fashion items for her, and make direct purchase orders. After the acquisition of customer data, back-end analytics also work wonders.


Procrastinator has added more variables to its recommendation model. For example, a consumer is going to attend a party tomorrow and does not know what style to wear or watch the weather forecast. She hopes that the shopping guide website can help her combine these scenes with her own information and provide a complete set of solutions.


So date, geographical, occasions, style, and these have become a dress collocation solution variables, through constant combination is presented to the user, according to the drag net data, the collocation of the user to see a more high quality, and have sex scene guide, click on to the last page to complete the purchase of conversion rate will be 40% higher than the recommended item.


28. Genetic health of SeeChange


The ability to serialize the human genetic profile allows doctors and scientists to predict a patient’s susceptibility to certain diseases and other adverse conditions, reducing the time and cost of treatment.


SeeChange, a SAN Francisco-based company, has created a new health-insurance model. The company analyzes customers’ personal health records, medical reimbursement records and pharmacy data to determine their susceptibility to chronic diseases and whether they are likely to benefit from customized recovery packages.


SeeChange also designs wellness programs and creates incentives to encourage customers to take the initiative to complete wellness initiatives, all monitored by its data analytics engine.


29. Given Imaging for diagnosis


Israel Given Imaging company invented a capsule, a built-in camera, after taking the capsule to patients with the frequency of about 14 per second photo shooting in the digestive tract, and back to the external image of the receiver at the same time, patients with symptoms was accepted by supporting software database, within four to six hours capsule camera will leave in vitro through the body excrete.


Generally speaking, doctors are judging symptoms based on their own personal experience, and it is inevitable that they will be inaccurate or even delay the treatment of patients with suspected shadows. Now, through Given Imaging’s database, when a doctor discovers a suspicious tumor, after double-clicking the current image, all similar images taken by other doctors in the past and their diagnosis results will be extracted.


A patient’s problem, so to speak, is no longer one doctor looking at it, but thousands of doctors giving their opinions simultaneously, backed up by images from a large number of other patients. Such data comparison not only improves the efficiency of doctors’ diagnosis, but also improves accuracy.


30. Entelo’s “Former Headhunter”


True technical talents are always in high demand. Don’t wait for them to send your resume to you, because they will be snapped up by other companies before they have a chance to write their resume. Entelo can recommend highly skilled people to entrepreneurs who are just starting to jump ship.


Entelo currently has 300 million resumes in its database. Entelo has a patent-pending algorithm for determining the job-hopping tendencies of top talent. The algorithm has more than 70 indicators to determine job-hopping tendencies. A falling stock price, a shake-up at the top, and a recent acquisition by another big company are all factors Entelo points to as potential reasons for talent to jump ship.


Entelo instantly sends information about the company’s top talent to entrepreneurs who subscribe to its service. Entrepreneurs don’t get resumes that look like normal resumes. Entelo trawled these talents on social networks. Entrepreneurs can see what code the person has submitted, what questions they have answered online, and what messages they have posted on Twitter.


In short, these entrepreneurs who are ready to “poach” can see a live target talent standing in front of them.



31. Delays forecast for FlightCaster and Passur


Every minute counts in the airline industry, especially when a flight arrives. If a flight arrives early and the ground crew is not ready, passengers and flight attendants are stuck on the plane in vain; If a flight is delayed, the ground staff have to sit back and wait, wasting their money.


One major U.S. airline found from its own internal report that the actual arrival time of about 10 percent of flights was more than 10 minutes from the expected arrival time, and 30 percent were more than 5 minutes from the actual arrival time. FlightCaster is a company that forecasts flight delays based on how airlines are doing.


Like airlines’ proprietary information on similar flight performance, the company has a wealth of historical data on domestic flights and real-time flight performance. The secret of Flightcaster is its efficient use of big data analytics and the use of appropriate software tools to manage the output data in real time.


Passur Aerospace is a technology company specializing in decision support for the aviation industry. Comprehensively forecast the arrival time of flights by collecting public data such as weather and flight schedule, combined with non-public data of other factors affecting flights independently collected by myself. As of 2012, Passur has more than 155 radar stations, collecting a stream of information about each aircraft on its radar every 4.6 seconds, which continues to generate huge amounts of data.


Not only that, but after a long period of data collection, Passur has a huge multi-dimensional information carrier of more than a decade, making it possible for thorough analysis and appropriate data models. Passur believes that airlines can save each airport millions of dollars a year by planning based on the arrival times they offer.


32. Climate’s agricultural insurance


A startup called The Climate Corporation runs more than 10,000 daily simulations of weather conditions over the next two years at more than a million locations across the United States, with huge, dynamic, real-time data. The company then combined data on root structure and soil porosity with the simulation results to provide crop insurance to thousands of farmers.


Remote sensing of soil data, which is not the same as accessing user behavior data from web services that we are familiar with in the past, has greatly expanded the concept of data. To provide accurate insurance services for each field, there must be agricultural futures, climate forecasts, international trade, international political and military security, national economy, industrial competition and so on.


The business model launched on the basis of such massive and complex data is innovative, highly competitive with existing crop insurance methods, and sustainable and scalable. Even better, the company is operating on big data without making any expensive network infrastructure investments, just renting Amazon’s public cloud services for tens of thousands of dollars a month.


33. Hiptype record reading


Almost all paid e-books offer chapters for readers to try, but publishers need to figure out where people read, whether they buy them after they finish, and other experiences to sell more e-books.


Hiptype, an American startup that has developed an e-book reading analysis tool, has a business model that tries to solve this problem. Hiptype calls itself “Google Analytics for eBooks,” offering a wealth of data about ebooks. It not only counts how many times an e-book is tried and bought, but also creates an “audience map” that includes age, income and geographic location.


It can also tell publishers whether readers made a purchase after reading the free chapters, how many finished the book, how many pages the average reader read, which chapters they liked best to start with and which ones they dropped off at, and so on.


Hiptype can be integrated with e-books, so publishers will always have access to user data no matter which channel they choose. All the data collected by Hiptype is anonymous. When you download an e-book with Hiptype built in, you get an option to block it.


34. “Integration of People’s Network” of Ankecheng


There is a huge problem in Internet marketing. How do you know if a person using several different names online is the same person? An Chengke launched a technical solution called “audience operating system” to solve this problem. It allows marketers to bind to your digital persona, and even if you change your name due to marriage, or use a nickname, or the occasional middle name, it can still answer the question of whether the person who has changed his address or phone number is the same person.


AOS aggregates information from different databases, either offline or online, that companies may collect for individuals in different Settings. By using AbiliTec, a digital “identity” technology also owned by Acxiom, AOS has stripped customer information down to a single, simple result. AOS helps Ashc advertisers use their data to find targeted users on Facebook.




Part 2: Data Correlation, Data Exhaust, and Dark Data


Big data is mainly used for association analysis rather than causal judgment. Many association analyses do not require complex models, just an awareness of big data.


Many organizations have data waste that is either used up or discarded. The value of its reuse may not be clear to you now, but at some point in the future, it will burst forth and be turned into a treasure.


Dark data is data collected for a single target, often used and then filed away without its true value being fully exploited. Dark data, if used in the right way, can brighten a company’s career.


35. Data association analysis


One company team used location data from mobile phones to predict how many people parked in Macy’s parking lots on the start of the Christmas shopping season in the United States, and thus to predict its sales that day, well before Macy’s itself counted its sales. Whether it is a Wall Street analyst or a senior executive in a traditional industry, such keen insight will give them a huge competitive advantage.


For tax authorities, tax fraud is a growing concern, and big data can be used to increase the government’s process of identifying fraud. Where privacy is permitted, government departments can combine data from various aspects, such as vehicle registrations and overseas travel, to discover individuals’ spending patterns, so that tax contributions do not add up. At the same time, a suspicious question arises. There is no direct evidence of fraud, and these conclusions cannot be used to prosecute individuals. But he can help government departments clarify their audits and other audits and processes.


36. Data exhaust


The data of logistics companies used to serve only operational needs, but once reused, logistics companies have turned into financial companies. The data is used to evaluate the credit of customers, provide unsecured loans, or take goods in transit as collateral to provide loans. Logistics companies can even be transformed into financial information service companies to judge the operation and trend of each subdivided economic field.


Some companies are already getting close to “god looks down” on big data. A Los Angeles-based company says it has modeled historical data on the global night view to filter out fluctuations and produce research reports on investment in real estate and consumption.


McDonald’s, on the other hand, gets the exact address of its customers when it sells burgers, which, when aggregated, turns into an excellent internal data set for the real estate industry.



37. Dark data


In certain circumstances, dark data can be used for other purposes. Infinity Property & Casualty used algorithms to recover $12 million in subrogation using accumulated claims adjusters reports to analyze fraud cases. An electrical sales company, through the accumulation of 10 years ERP sales data analysis, according to the life cycle of electrical equipment, visited five years ago old customers one by one, obtained more than 10 million yuan electrical equipment maintenance orders, successfully entered the MRO market.


38. Customer churn analysis


American Express previously could only achieve hindsight reports and delayed forecasts, and the traditional BI has been unable to meet the needs of its business development.


So AmEx set out to build a model that actually predicted customer loyalty, using 115 variables based on historical transaction data. The company says it has been able to identify 24 per cent of Australian customers who will leave in the next four months. Such churn analysis can certainly be used to retain customers.


The hotel industry can customize the corresponding unique personality room for consumers, and even put the consumer’s travel mood on the micro blog on the wallpaper. According to big data, tourism can provide consumers with local products, activities, small and beautiful niche attractions and so on to win back the hearts of tourists.


39. Video analysis of fast food


Fast food companies can use video to analyze the length of queues and then automatically change the contents of electronic menus. If the queue is long, the food that can be served quickly is displayed. If the queue is short, foods that are more profitable but take relatively long to prepare are displayed.


40. Big data campaigns


In 2012, the Obama campaign set three fundamental goals: to get more people to pay more, to get more people to vote for Obama, and to get more people to participate!


This requires understanding at the “micro” level: what is each voter most likely to be persuaded by? Under what circumstances is each voter most likely to open his or her wallet? What are the most effective advertising channels to reach targeted voters? As Jim Messina, the campaign manager, said, “Throughout the campaign, you can’t make assumptions without data.”


Obama’s data-mining team has spent the past two years collecting, storing and analyzing vast amounts of data in an effort to raise $1 billion for his campaign. They note that actor George Clooney has a strong appeal to west Coast women aged 40 to 49: they are far and away the group most likely to fork out for dinner with clooney and Obama in Hollywood. Clooney raised millions of dollars for Obama at a fundraising dinner at his mansion.


Then, when the Obama team decided to find a star on the East Coast who had the same appeal to this group of women, the data team found that Sarah Jessica Parker fans also enjoyed contests, small parties and celebrities. The Clooney effect has been successfully replicated on the East Coast.


The Obama team spent less than $300 million on ads throughout the campaign, while The Romney team lost by spending nearly $400 million, not least because Obama’s data team made AD buying decisions based on rigorous data analysis. One poll showed that 80% of American voters think Mr Obama makes them feel more valued than Mr Romney.


As a result, 98 percent of the Obama team’s first $100 million came from small donations of less than $250, compared with 31 percent for the Romney team when it raised the same amount.



41. Monitor illegal remodeling


“Haphazard construction” in any country is a matter of annoyance, and easy to cause fire. Buildings with illegal partitions are much more likely to catch fire than other buildings. New York City receives 25,000 complaints a year about overcrowding, but the city has only 200 inspectors to deal with them.


A team of analysts in the mayor’s office thinks big data can help address this gap between demand and resources. The team created a database of all 900,000 buildings in the city, adding data collected by 19 city departments: records of tax arrears, utility anomalies, arrears, service cuts, ambulance use, local crime rates, rat complaints and so on.


Next, they compared this database with records of building fires in order of severity over the past five years, hoping to find correlations. Sure enough, building type and year of construction were factors associated with fire. A less anticipated result, however, was a correlation between buildings with exterior brick construction permits and a lower incidence of serious fires.


Using all this data, the team built a system that helped them determine which overcrowding complaints needed urgent attention. None of the building features they recorded was a cause of fire, but they correlated with increases or decreases in fire hazards. This knowledge has proved invaluable: in the past only 13 per cent of vacating orders were issued by inspectors when they were out on site, but this has risen to 70 per cent with the new approach.


42. Mustard index


Officials at the Planning department of the National Development and Reform Commission (NDRC), responsible for drafting the National Plan for Promoting healthy Urbanization (2011-2020), need to know exactly how people are moving around, and how to count them is a problem.


Mustard is a low-quality and consumable product. Income growth has almost no impact on the consumption of mustard. Under normal circumstances, the consumption of instant noodles and pickled mustard and other convenience foods by permanent urban residents is basically constant. The change in sales volume is mainly caused by the floating population.


According to officials at the National Development and Reform Commission (NDRC), fuling zhacai’s share of sales across the country in recent years can reflect the trend of population mobility, and a macroeconomic indicator known as the zhacai Index was born. Officials at the planning department of the National Development and Reform Commission found that Fuling zhacai’s sales share in south China fell from 49% in 2007, 48% in 2008, 47.58% in 2009 and 38.50% in 2010 to 29.99% in 2011.


This data shows that the outflow of population in South China is very fast. According to the “Zhacai index”, they divided the country into two parts, the inflow area and the outflow area. According to the different population structure of the two areas, the policy will be different.



43. The weather bill


As the saying goes, “The weather is unpredictable.” Have you ever had a trip, an important outdoor road show, a wedding or other important moment, but the bad weather ruined your mood or even caused financial loss?


Weatherbill, the world’s first weather insurance company, offers a wide range of climate guarantees. Customers log on to the weatherBill website and give a range of temperatures or rainfall they don’t want to encounter at a particular time. In 100 milliseconds, the WeatherBill website retripes the forecast for a given area, along with 30 years of weather data from the National Weather Service for that area. By calculating the weather data, the website gives the price of the policy on behalf of the insurer. This service is not only needed by individual users, but also by some companies, such as travel agencies.


A global beverage company integrates daily weather forecast information from external partners into its demand and inventory planning processes. By analyzing three data points — temperature, precipitation and sunshine hours on a given day — the company reduced inventory in a key European market while improving forecast accuracy by about 5 per cent.


44. Historical scenes reappear


Researchers at Microsoft and the Technion-Israel Institute of Technology have developed software that can predict when and where epidemics or other social problems are likely to break out, based on 20 years of New York Times articles and other online data.


When tested using historical data, the system performed surprisingly well. For example, the system predicted a high risk of cholera in Angola, based on reports of the country’s drought in 2006. This is because, through previous events, the system knows that the likelihood of cholera outbreaks increases after a few years of drought.


In addition, the system renewed a cholera alert in Angola based on reports of major hurricanes in Africa in early 2007. Less than a week later, there were reports of cholera in Angola. In other tests, such as predicting disease, violence and casualties, the system is 70 to 90 per cent accurate.


The system draws on 22 years of archived Reporting from the New York Times, from 1986 to 2007. But the system also uses other data on the web to learn what kind of events can cause specific social problems. These sources of information provide valuable content that does not exist in news articles and can help determine cause and effect or context between different events.


For example, the system can infer the relationship between what happens in cities in Rwanda and Angola, because both countries are in Africa, have similar GDP and other factors. Based on this approach, the system says that in predicting cholera outbreaks, it should take into account the location of a country or city, how much of the land is covered by water, population density and GDP, and whether there has been a drought in recent years.


Many aspects of the world have changed in recent decades, but many aspects of human nature and the environment remain the same, so software can learn patterns from past data to predict what will happen in the future, said Horvitz, who led the development. “I’m personally interested in data going further back,” he says.


A market for such forecasting tools is emerging. For example, a startup called RecordedFuture, whose clients include government intelligence agencies, predicts future events based on forward-looking reports and other sources of information online. ChristopherAhlberg, the company’s CEO, says it is possible to use “hard data” to make predictions, but there is still a long way to go from a prototype system to a commercial product.



45. Nike+ sensor shoes


Nike has turned itself into a big data marketing innovator with a new product called Nike+. The so-called Nike+ is a product with “Nike running shoes or wristband + sensor”. As long as the athlete wears Nike+ running shoes, the iPod can store and display the date of exercise, time, distance, calorie consumption and other data. Users upload data to the Nike community and share discussions with their peers.


Nike has struck a deal with Facebook that allows your running status updates to be posted to your account in real time, and friends can comment and click on a “clap” button — which, magically, allows you to hear your friends’ applause over the music as you run.


As runners upload their routes, Nike has a database of the best running routes in major cities. With Nike+, nike-organized city runs are better. Participants upload their running data within a specified time to see which city has accumulated the longest distance.


With data uploaded by athletes, Nike has built the world’s largest online sports community, with more than 5 million active users uploading data on a daily basis, creating an unprecedented relationship with consumers. Massive data for Nike to understand user habits, improve products, accurate delivery and accurate marketing has played an irreplaceable role.


Volvo’s Industrial Internet


Information about vehicle use, from brakes to central door lock systems, is being streamed to Volvo Group headquarters by installing sensors and embedded cpus in truck products.


“Analyzing this data will not only help us build better cars, but also help our customers have a better experience.” Volvo Group CIORichStrader said. This data is being used to optimize manufacturing processes to improve customer experience and improve safety.


Analyzing usage data from different customers allows product departments to identify potential problems with products early and alert customers to them before they occur. “A flaw in the design of a product that previously might have required 500,000 units to be sold can now be discovered in 1,000 units.”


47. McKesson’s dynamic supply chain


McKesson, the largest pharmaceutical trader in the United States, is also far ahead of most companies in the use of big data, integrating advanced analytics into a supply chain business that handles 2 million orders a day and oversees more than $8 billion in inventory.


For in-transit inventory management, McKesson has developed a supply chain model that provides an extremely accurate view of maintenance costs based on product lines, transportation costs and even carbon emissions. According to RobertGooby, the company’s vice president of process transformation, this detail gives the company a truer picture of operations at any point in time.


Another area where McKesson leverages advanced analytical techniques is the simulation and automation of physical inventory configurations within distribution centers. The ability to assess policy and supply chain changes has helped companies increase their responsiveness to customers while reducing liquidity. Overall, McKesson’s supply chain transformation has saved the company more than $100 million in liquidity.



House of Cards and the movie industry


The distinguishing feature of House of Cards is that it is an “online drama”, unlike the production process of previous TV dramas. In short, not only is the transmission channel Internet viewing, but the drama is a product designed according to “big data”, namely the taste of Internet audience.


Netflix’s success lies in its powerful recommendation system Cinematch. Based on the basic data of users’ video-on-demand, such as rating, playback, fast forward, time, location and terminal, the system stores the data in the database and analyzes the data to calculate the movies that users may like and provide customized recommendations for them. To that end, they created the Annual Netflix Awards (check out the winning algorithms), offering millions of dollars to anyone who can improve the accuracy of their movie recommendation algorithms by at least 10%.


The cost of making movies in the future will be much lower, and a thousand fans will be enough to make a movie successful. Or, as techie Elements says, “Where eyes go, money follows.”


49. Review and catering industry


Many state governments in the United States have teamed up with LEP, a restaurant review site, to monitor hygiene in the restaurant industry, with good results. Instead of looking at the restaurant from the window like before, people are looking at the reviews from the APP on their phone! In Localized O2O comments in China, such as Dianping and Tomodian.com, consumers can judge any business. Meanwhile, businesses can also improve their service ability through these comments and make greater efficiency optimization in the link.


The future catering industry will be thoroughly driven by the data generated and carried by the Internet and social media. More and more people will join in the review, and the survival of the fittest will be greatly accelerated.

Follow public accounts

【 Pegasus Club 】

Pegasus will
AI artificial intelligence/big data/technology management and other personnel learning exchange park

Past welfare
Pay attention to the pegasus public number, reply to the corresponding keywords package download learning materials;Reply “join the group”, join the Pegasus AI, big data, project manager learning group, and grow together with excellent people!

Microsoft Danniu artificial intelligence series of lessons

(Scan or subscribe)


From beginning to research, the 10 most Readable books in the field of artificial intelligence

RSVP number “2” machine learning & Data Science must-read classic book with resource pack!

Into AI & ML: Learning machine Learning from Basic Statistics (PDF download)

Answer the number “4” to learn about ARTIFICIAL intelligence, 30 books should not be missed (with electronic PDF download)

Answer number “6” AI AI: 54 Industry Blockbuster Reports

TensorFlow Introduction, Installation tutorial, Image Recognition application (with installation package/guide)

Reply number “12”

Small white | Python + + machine learning Matlab neural network theory + practice + + + depth video + courseware + source code, download attached!

Reply number “14” small white | machine learning and deep learning required books + machine learning field video/PPT + large data analysis books recommend!

17 mind maps for machine learning statistics

Ten years ago on This day on Machine Learning Projects.

Machine learning: How to go from beginner to Never Giving up? (With benefits)

Respond to digital “24” flash download | 132 g programming data: Python, JAVA, C, C + +, robot programming, PLC, entry to the proficient in ~

Reply number “25” limited resources | 177 g Python/machine learning/TensorFlow video/deep learning algorithm, introduction to cover/intermediate/project each stage!

Reply number “26” introduction to artificial intelligence book list recommended, learn AI please collect well (attached PDF download)

Reply | digital “27” Wu En of Stanford CS230 deep learning course a full range of information release (download)

Reply number “28” Programmers who understand this technology are being snapped up by BAT… (Information pack included)

Respond to digital “29” dry | 28 this big data/data analysis, data mining ebook collection of free download!

Reply digital “30” receive | 100 + artificial intelligence study, deep learning, machine learning, big data, algorithms such as data, decisive collection!

FMI Artificial Intelligence and Big Data Summit Guest Speech PPT

Top 10 AI Jianghu Fields

Machine Learning Practical Experience Guide

More than 100 Papers on deep Learning

Top ten Classic Algorithms of Data Mining

6.10 Ele. me & Pegasus Project Management Practice PPT