AI tracks when radiology reports include follow-up recommendations

Natural language processing (NLP) and machine learning can help track when free-text radiology reports include follow-up imaging recommendations, according to a new study published in the Journal of Digital Imaging.

“While structured reporting could explicitly indicate the need for follow-up, the fact is many radiology reports remain unstructured or loosely structured,” wrote Robert Lou, Perelman School of Medicine at the University of Pennsylvania in Philadelphia, and colleagues. “Automated identification of follow-up recommendations in radiology reports would allow for automated tracking of patients requiring follow-up and help to decrease the number of patients who experience adverse outcomes due to missed follow-up.”

The researchers noted that structured reporting is gaining traction, but many radiology reports—including those at their own institution—are “loosely structured” and still include a lot of free text. NLP, they added, “allows for processing of large amounts of text that would not be possible using manual efforts.”

Lou et al. explored data from 6,000 randomly sampled abdominal MRI, CT and ultrasound examinations from 2016 and 2017 from a single urban health system. While 4,800 examinations were used for a training dataset, the other 1,200 were used as a test set.

If a radiology report was for an intermediate lesion (follow-up imaging is recommended) or suspicious lesion (biopsy or surgical resection is recommended), it was labeled as requiring follow-up. This ended up being the case for 735 (12.3%) of the 6,000 randomly selected imaging examinations. NLP was then used to extract 1,500 features, and three different machine learning models—naïve Bayes, decision tree, and maximum entropy—were used to automatically detect when follow-up recommendations were present.  

The team determined that the decision tree model had the highest F1 score (45.8%) and accuracy (86.2%) of the three machine learning methods. While naïve Bayes had an F1 score of 38.1% and accuracy of 74.5%, maximum entropy had an F1 score of 38.7% and accuracy of 81.2%.

“While perhaps not robust enough yet for clinical usage, this study demonstrates proof of concept and underlines the strength of the machine learning decision tree algorithm,” the authors wrote. “Decision trees are well suited to tasks in which hierarchical categorical distinctions can be made.”

Overall, though they said there was still work to be done, the authors saw significant potential in their findings.

“Follow-up recommendation detection is a challenging task that can certainly be addressed by explicit structured reporting of follow-up recommendations, but until a standardized system of doing so becomes prevalent in radiology, NLP- and machine learning- powered automated detection algorithms may assist in tracking the many patients who are at risk for adverse events due to delayed or missed follow-up imaging,” they wrote.