Big Data: Different From Small Data

Twitter icon
Facebook icon
LinkedIn icon
e-mail icon
Google icon

Three factors distinguish big data from the analytics that many executive leaders are familiar with: volume, velocity, and variety. In a recent article that appeared in Harvard Business Review, McAfee and Brynjolfsson¹ make the distinction and open a window on how two companies are harnessing big data to make more accurate predictions, better decisions, and more precise interventions—on an accelerated timetable. To describe the sheer volume of data available today, the authors explain that today, more data cross the Internet each second than were stored anywhere on the Internet in 1992. The retailer Wal-Mart Stores, Inc, for instance, collects more than 2.5 petabytes of customer data every day from its checkout registers. How much information does a petabyte represent? It is equivalent to 20 million filing cabinets of text, the authors explain; multiply that by 1,000 for an exabyte. The authors estimate that 2.5 exabytes of data are created each day. Speed, the second key differentiator, is more important than volume, in many applications. The authors report that a colleague at the Massachusetts Institute of Technology Media Lab used location data from mobile phones to estimate Black Friday sales at Macy’s by inferring how many people were in Macy’s parking lot that day. “Rapid insights like that can provide an obvious competitive advantage to Wall Street analysts and Main Street managers,” the authors write. Variety is the third characteristic that distinguishes big data from traditional analytic activities, including many sources that didn’t exist 10 years ago, such as the messages, updates, and images posted to social networks; readings from sensors; and GPS data from cell phones. Purely through the tools and activities that we engage with today—cell phones, social networks, GPS, and online shopping—each of us is now a walking data generator, the authors point out; because the data are unstructured, traditional structured databases that store much corporate—and health-care—information are unsuited to analyzing big data. Data in Action For skeptics of the notion that having data improves results in business, the authors interviewed executives at 330 public North American companies to determine their organizational and management practices, compared those results with performance data, and found that the most data-driven companies were, on average, 5% more productive and 6% more profitable. Specifically, how are managers using big data to improve performance? In time-sensitive industries such as aviation (and health care, for that matter), improving productivity often turns on finding and eliminating wasted minutes. Historically, the airlines rely on pilots—distracted by the responsibilities of landing an airplane—to provide estimated arrival times. If the plane lands early, pilots and passengers sit on the tarmac, waiting for the ground crew; if it’s late, the ground crew stands idle, waiting for the passengers. PASSUR Aerospace, a provider of decision-support technologies to the aviation industry, is helping airlines eliminate this disconnect by providing more precise estimated arrival times. It collects data from public sources such as weather and flight schedules, as well as proprietary data that include feeds from a network of 155 radar stations that it installed near airports. The company believes that enabling an airline to know exactly when its planes will land results in several million dollars of savings at each airport. Combination brick-and-mortar and online retailer Sears Holdings Corp began an initiative to generate greater value from data collected from sales of Sears, Craftsman, and Lands’ End brands several years ago, and it ran into an obstacle familiar to health care: Data required to make decisions were highly fragmented, housed in many databases and data warehouses maintained by various brands. “Sears required about eight weeks to generate personalized promotions, at which point many of them were no longer optimal for the company,” the writers explain. For Sears, the answer was to borrow techniques from big data: It set up an Apache Hadoop cluster, a group of inexpensive, off-the-shelf servers commanded by an emerging software framework (Hadoop), and it started feeding data from each of its brands—including data from existing data warehouses—into the cluster. The time needed to plan a promotion dropped from eight weeks to one, and the promotions themselves are of higher quality because they