In this guest post, Mark Hurd, CEO of Oracle, says the big data revolution has only just begun and you ain’t seen nothing yet…
The magic of data mining and analytics is in discovering patterns and correlations that intuitive judgment could never divine: the finding, say, that 30-somethings who shop for silk shirts also tend to buy hybrid cars. Or that late-morning credit card purchases are more likely to be fraudulent than those made in the afternoon.
This data-based discovery is getting more productive as data sets grow larger, and as the science moves beyond the commercial frontiers of product placement, customer acquisition, and fraud detection into the realms of disease prevention, education, agriculture, healthcare, and even professional sports.
Big data analytics isn’t just changing the ways companies and industries do business. It is changing the world for the better.
Consider these five eye-opening advances:
- Tracking epidemics. The US Centers for Disease Control and Prevention is using big data to save lives by identifying areas where pandemics might take hold.
The CDC’s tool of choice is BioMosaic, a web-accessible interactive map that displays information in a readily digestible format to health officials nationwide. The map is the front end of an enormous data-crunching machine that analyses different kinds of information, including US census data, socio-economic data (particularly on foreign-born people, as well as their English-speaking ability), international travel data, demographic data, and migration health data of foreign-born populations in the US for 105 countries.
Local officials can use the health maps to identify potential trouble spots, down to the county level. The main purpose of the tool is to “distill simple patterns from large amounts of complex information,” says the CDC’s Martin Cetron in a recent report. The emphasis on foreign-born people is due to the fact that infectious diseases such as avian flu often are carried into the US by returning foreign ex-pats who had visited relatives or friends back home.
After the 2010 earthquake in Haiti, which led to a cholera epidemic, BioMosaic showed where clusters of Haitian-born residents in the US were most likely to live, along with air and sea travel routes to and from Haiti to pinpoint where anti-cholera measures in the States would be most useful.
According to Dr. Cetron, BioMosaic mapping discovered that Massachusetts was at higher risk than other states of experiencing an outbreak of avian flu, which originated in China. That’s because the Bay State has attracted an unusually large number of Chinese nationals during the past decade, due in part to its reputation for superior education.
Thanks to tools such as BioMosaic and Google’s flu trackr, the time it takes for international health authorities to discover an outbreak of an infectious disease following an initial report has been slashed from an average 167 days in 1996 to 20 days today.
- Boosting student retention. Georgia’s Valdosta State University is using advanced analytics to identify students at risk of failing or dropping out. It’s part of a wider effort by colleges and universities nationwide to boost student-retention rates and help students succeed — while keeping their tuition revenue streams intact amid state funding cuts and intense competition.
Analysing student data from multiple sources, including surveys and ID card usage, Valdosta State discovered a previously unknown correlation: It had a 10 per cent higher retention rate for students who ate breakfast on campus compared with those who didn’t. Based on that finding, the university promoted on-campus eateries to specific student segments.
The midsize university also discovered an 85 per cent retention rate for freshmen who work on campus — 30 per cent higher than for the general freshman class. So, Valdosta State invested $US200,000 in campus jobs for students, an investment it estimated would save about $US2 million in retention costs in four years.
- Increasing food supplies. With the world population projected to rise from seven billion today to nine billion in 2050, boosting food supplies and agricultural productivity are critically important.
Farmers have long kept a keen eye on weather patterns — trying their best to time planting, irrigation, pesticide use, and harvesting with nature. Today, farmers have unprecedented access to data that provide insights into weather, soil conditions, pest threats, plant health problems, and moisture levels. Various technologies also are helping them decide where to focus their resources.
For example, drones equipped with sensors detect the health of crops down to the individual plant, diagnosing where expensive fertilisers, pesticides, and water are needed. Auto-guidance systems in tractors help farmers optimise the pattern of rows in their fields, as well as seed and chemical distribution.
One farmer in the Midwest, analysing data collected from a range of sensors, identified areas of his field that could support seeds planted closer together — and increased his revenue by $US150,000.
In Africa, soil depletion severely affects crop yields, and many farmers can’t afford fertiliser. But today’s technologies enable farmers in Africa to monitor a range of conditions in their fields and alert them to when and how much fertiliser and irrigation are needed. This makes it possible to streamline financial aid that could dramatically increase crop production across the continent.
- Re-evaluating baseball metrics. Statistical correlations unearthed in the still-evolving data analytics field of sabermetrics — the stuff of Moneyball fame — have even the most leathery Major League Baseball managers and GMs questioning long-held truisms of the sport. These include the value of stealing bases, laying down sacrifice bunts, issuing intentional walks, and paying top dollar for players whose offensive numbers are inflated by their bandbox home ballparks.
Take the sacrifice bunt (please!). Extensive analyses of historical data have shown that giving up an out to advance a runner isn’t generally a smart baseball move. One such analysis of Baseball Prospectus data from the 2012 season, reported by MLB.com’s Anthony Castrovince, found that in the two most common situations for a sacrifice bunt, teams were more likely to score a run if they let the batter swing away. With a runner on first base and no outs, it was calculated that teams had a 24.4 per cent better chance of scoring a run than with a runner on second base and one out. With runners on first and second bases and no outs, teams had a 10.4 per cent better chance of scoring than with runners at second and third and one out.
Yet for decades the sacrifice bunt was a staple managerial tactic in such situations — because without the historical data and advanced software to analyse that data, sac bunts just seemed to make intuitive sense.
Also losing favour with sabermetric geeks are traditional baseball player metrics such as AVG (batting average), RBIs (runs batted in), and pitcher ERA (earned run average). They are being nudged aside by such alien-sounding acronyms as WOBA (weighted on-base average), OPS (on-base plus slugging percentage), VORP (value over replacement player), and WHIP (walks plus hits per inning pitched) — stats that data analyses have shown are better measures of a player’s contribution to team wins.
- Improving patient outcomes. Scientists at the University of Pittsburgh Medical Center’s insurance division discovered a correlation between the letters “ww” found in patient records and a trip to the emergency room.
Turns out “ww” stands for “wheeled walker.”
“What is interesting here is not that the patient has a wheeled walker,” says Pamela Peele, the UPMC division’s chief analytics officer. “What is interesting is that the clinician called out the wheeled walker in the clinical notes.” In other words, whether the patient has one (or not) is irrelevant; what is relevant is that the clinician noted it in the record.
Conversely, patient records containing the word “mother” — not “mom” or “mommy” or any other synonym for mother — are “highly associated with a lack of future use of the ER,” Peele says.
Without speculating as to the cause of those correlations, it’s clear that healthcare organisations can become more efficient by identifying such patterns and developing strategies accordingly.
As these examples show, the pattern-drawing potential of big data analytics can yield a variety of stunning results. In some cases, the questions are very limited (how closely together can I plant these seeds on my farm?), while in others they are wide open (is there a new type of good being imported to the US that I can build a business around?).
Big data has been with us for years, but we’re only beginning to see the burgeoning number of important and far-reaching results. I suspect that by this time next year, the types of examples I’ve highlighted above will be commonplace. And organisations that haven’t already begun asking questions of big data will find themselves on the outside of the information economy looking in.
This article originally appeared on B&T’s sister business site www.which-50.com