Big data means big work ahead for imaging informaticists if they are to accomplish the potential transformation of radiology, says Katherine P. Andriole, PhD, FSIIM, director of imaging informatics, Brigham & Women's Hospital, Boston. She presented the 2014 Dwyer Lecture, “Transforming Healthcare and Biomedical Research with Informatics–Bit by Byte,” marking the official opening of the 2014 meeting of the Society of Imaging Informatics in Medicine.
Andriole traced the history of big data back to 1944. “Some credit NASA with coining the term big data, but if you look back in the archives, there were people who were thinking about big data much earlier that that,” she said. “In 1944, it was library scientists who realized they were going to be increasing volumes of books and knowledge—out of bounds.”
In 1944, the challenge was finding a way to store the contents of the Library of Congress, calculated to be 100 trillion bits of data. That—and as recently as 1986 most of the world’s data—was still in analog format; today, 94% of the world’s data is on digital media, paving the way for the embrace of even bigger data.
Andriole’s point was that big data is contextual and related to the capability of an era to manage the data; in the past, we processed information in batch mode on mainframe computers. Today, as described by Gartner data analyst Doug Laney, the challenge is to manage the three Vs of big data: volume (measured in exobytes); velocity; and variety, including photos, web information, audio, social media, and, in the context of healthcare, patient data and images.
Healthcare has the additional challenge of overcoming issues that are unique to the sector, Andriole says, including poor access to biomedical data and security and privacy requirements.
Big Data Healthcare Tools
Healthcare will move to cloud storage, virtual computing, and distributed computing, with better databases for searching and better computing platforms, Andriole predicts. A schematic she shared showed a combination of cross-platform desktop computing, remote computing, handhelds and web computing in which software-as-a-service is the model.
The big data platforms currently in use include the open source Hadoop from Apache®, a Java-based cross-platform solution. Andriole described it as a software framework for the storage and processing of large amounts of data on clusters of commodity hardware. It consists of libraries, utilities and a distributed file system that stores data on the clusters and provides very high bandwidth in aggregate.
It also utilizes a manager that monitors resources in the clusters and schedules them on various applications; and MapReduce, a programming model for large-scale data processing that originated at Google.
Brigham & Women’s has deployed the Open Source SMART platform, which stands for Substitutable Medical Applications and Reusable Technology and is funded by the Office of the National Coordinator of health IT. It provides a standard language and contextual information structure to facilitate information, to which people write apps. “The idea is to create a healthcare app store,” Andriole explained. “The idea is write one set of data and run everywhere.”
SMART-enabled servers know how to get the data and aggregate the data. Software engineers write apps against a container of the data; the apps are required to have a user interface and to follow an explicit data model.
“The goal here is to transform healthcare into a data-driven enterprise, more than it is today,” she said.
A big data management platform might include three elements, Andriole suggests: storage, big data appliances that write to the data warehouse and pull information, and analytics, where data is analyzed, “looking for patterns that we don’t even know exist; that’s the promise.”
The Fourth V
A fourth management challenge that has emerged along with volume, velocity and variety is veracity. “There are lots of areas of research where we have to work with data that is sparse, dirty and maybe inaccurate, and we still have to say something about it,” Andriole says. “This is another area of research and data analysis where people can spend their lives coming up with algorithms to analyze data. It’s going to be a very good time for informaticists and data scientists going forward.”
While data analytics are not as prevalent in healthcare as they are in business, Andriole has observed some activity in radiology. At Brigham & Women’s Hospital, exam CTDIvol readings were used to assess radiation dose with different protocols; it was determined that some protocols were out of bounds and they were removed from the scanners.
“That’s retrospective business analytics,” Andriole noted. “Going forward, we will have more direct interventions and predictive powers using analytics.” Data analytics will not be limited to assessing metadata, but must extend to the imaging data itself.
“We’ve been talking a lot about metadata and analyzing that, but we have to get the pixel data content and be able to search that,” she said. “Right now, this is a huge problem: How do we tag features in images? How do we make the information we produce in radiology quantitative?”
Andriole described the big data effort in quantitative imaging as huge, in that researchers are attempting to connect genomic information, patient history, labs, social history, and imaging data. In order to create algorithms to analyze the data, however, large amounts of data are required.
“I think this is where imaging needs to focus going forward,” she said.”
The potential of big data to transform biomedical research is equally promising, Andriole says. She described an especially evocative project in which two researchers built a computer model to sift through discarded EKGs that yielded a method of predicting which heart attack patients had double and triple the risk of dying from a second heart attack within the first year. “By doing big data—large amounts of data in aggregate and analyzing it with these tools—they were able to say something that had not been known before,” she said.
The challenges of big data in healthcare are not small. In addition to access problems, privacy issues, and data integrity, there is also the issue of large amounts of unstructured data in clinical notes and medical imaging reports. But the potential impact is huge.
“The promise of big data in healthcare is safety and high quality, for everyone, efficiency and cost effectiveness and predictive analytics,” Andriole said. “We will see more movement to personalized medicine and, in fact, quantitative medicine, and evidence-based decision support at the point of care.”
Andriole, who did a medical imaging fellowship at the University of California, Los Angeles, when Dwyer was chief of the division of medical imaging, opened her lecture with a heartfelt tribute to the “father of PACS.”
"Be like Sam,” she urged in conclusion. “Work across disciplines. Understand the application environment. He engaged with industry partners and he worked with them to bring these technologies into the clinic. He collaborated with everyone. Think about the possibilities—and enjoy the journey."