Machine learning (ML) can help providers extract all relevant facts from radiology reports in real time, according to a new study published in the Journal of Digital Imaging.
What set this research apart from the work of others, the authors explained, is that it focused on providing additional context that is often left out during other information extraction (IE) tasks.
“Previous IE work in radiology has focused on a limited set of information, and extracts isolated entities (i.e., single words such as ‘lesion’ or ‘cyst’) rather than complete facts, which require the linking of multiple entities and modifiers,” wrote author Jackson M. Steinkamp, department of radiology at the Hospital of the University of Pennsylvania in Philadelphia, and colleagues. “Here, we develop a prototype system to extract all useful information in abdominopelvic radiology reports (findings, recommendations, clinical history, procedures, imaging indications and limitations, etc.), in the form of complete, contextualized facts.”
Steinkamp et al. developed their information schema using a sample of 120 abdominal and pelvis radiology reports collected at a single institution from 2013 to 2018. The dataset included 50 CT examinations, 48 MRI examinations and 22 ultrasound examinations. Custom-built labeling software was used to annotate each report with their “complete factual content,” linking every piece of information with a specific part of the document. A two-part neural network architecture was developed to extract the relevant information as needed.
Overall, the team’s schema labeled more than 5,000 facts and more than 15,000 separate pieces of information. More than 86% of radiology reports’ text was connected to at least one fact labeled by the system.
“We did not have to add any new facts after approximately 50 reports were fully labeled, suggesting that we had achieved some degree of content saturation within the limited domain of abdominopelvic reports,” the authors added.
The system worked so well, in fact, that it was able to key in on facts it had never even “seen” before. This was a key finding, the researchers explained, because it shows the system is not limited by “pre-specified vocabularies and ontologies.”
“Our study demonstrates the feasibility of near-complete information extraction from radiologic texts using only a small corpus of 120 abdominopelvic cross-sectional imaging reports,” the authors wrote. “It introduces a novel information schema which is capable of handling the vast majority of information in radiologic text and a neural network model for extracting that information.”
Steinkamp and colleagues did explain that their study had certain limitations. For instance, “rare types of information” will still be missed from time to time, and the system “is unable to handle some types of ‘implied’ information which are not directly referenced in the text.” Also, the training set was somewhat small, something the team plans to address by increasing its size at time goes on.
Overall, however, the researchers see their work as a step forward for IE tasks in radiology. Future research, they concluded, will involve “more sophisticated language models” and “building downstream applications.”