Big data means big work ahead for imaging informaticists if they are to accomplish the potential transformation of radiology, says Katherine P. Andriole, PhD, FSIIM, director of imaging informatics, Brigham & Women's Hospital, Boston. She presented the 2014 Dwyer Lecture, “Transforming Healthcare and Biomedical Research with Informatics–Bit by Byte,” marking the official opening of the 2014 meeting of the Society of Imaging Informatics in Medicine.
Andriole traced the history of big data back to 1944. “Some credit NASA with coining the term big data, but if you look back in the archives, there were people who were thinking about big data much earlier that that,” she said. “In 1944, it was library scientists who realized they were going to be increasing volumes of books and knowledge—out of bounds.”
In 1944, the challenge was finding a way to store the contents of the Library of Congress, calculated to be 100 trillion bits of data. That—and as recently as 1986 most of the world’s data—was still in analog format; today, 94% of the world’s data is on digital media, paving the way for the embrace of even bigger data.
Andriole’s point was that big data is contextual and related to the capability of an era to manage the data; in the past, we processed information in batch mode on mainframe computers. Today, as described by Gartner data analyst Doug Laney, the challenge is to manage the three Vs of big data: volume (measured in exobytes); velocity; and variety, including photos, web information, audio, social media, and, in the context of healthcare, patient data and images.
Healthcare has the additional challenge of overcoming issues that are unique to the sector, Andriole says, including poor access to biomedical data and security and privacy requirements.
Big Data Healthcare Tools
Healthcare will move to cloud storage, virtual computing, and distributed computing, with better databases for searching and better computing platforms, Andriole predicts. A schematic she shared showed a combination of cross-platform desktop computing, remote computing, handhelds and web computing in which software-as-a-service is the model.
The big data platforms currently in use include the open source Hadoop from Apache®, a Java-based cross-platform solution. Andriole described it as a software framework for the storage and processing of large amounts of data on clusters of commodity hardware. It consists of libraries, utilities and a distributed file system that stores data on the clusters and provides very high bandwidth in aggregate.
It also utilizes a manager that monitors resources in the clusters and schedules them on various applications; and MapReduce, a programming model for large-scale data processing that originated at Google.
Brigham & Women’s has deployed the Open Source SMART platform, which stands for Substitutable Medical Applications and Reusable Technology and is funded by the Office of the National Coordinator of health IT. It provides a standard language and contextual information structure to facilitate information, to which people write apps. “The idea is to create a healthcare app store,” Andriole explained. “The idea is write one set of data and run everywhere.”
SMART-enabled servers know how to get the data and aggregate the data. Software engineers write apps against a container of the data; the apps are required to have a user interface and to follow an explicit data model.
“The goal here is to transform healthcare into a data-driven enterprise, more than it is today,” she said.
A big data management platform might include three elements, Andriole suggests: storage, big data appliances that write to the data warehouse and pull information, and analytics, where data is analyzed, “looking for patterns that we don’t even know exist; that’s the promise.”
The Fourth V
A fourth management challenge that has emerged along with volume, velocity and variety is veracity. “There are lots of areas of research where we have to work with data that is sparse, dirty and maybe inaccurate, and we still have to say something about it,” Andriole says. “This is another area of research and data analysis where people can spend their lives coming up with algorithms to analyze data. It’s going to be a very good time for informaticists and data scientists going forward.”
While data analytics are not as prevalent in healthcare as they are in business, Andriole has observed some activity in radiology. At Brigham & Women’s Hospital, exam