AI detects more variation in free-text radiology reports than structured reports

A natural language processing (NLP) and machine learning algorithm was trained to evaluate variation in both free-text radiology reports and structured radiology reports, according to new research published in Current Problems in Diagnostic Radiology.The variation was more prevalent in free-text reports.

“The power of NLP lies in its ability to analyze the vast amounts of, what is commonly referred to as, ‘unstructured data’ and make decisions based on the content,” wrote lead author Lane F. Donnelly, MD, of the Lucile Packard Children’s Hospital at Stanford and Stanford University School of Medicine in California, and colleagues. “Common NLP tasks can be as diverse as automatic text classification, sentiment analysis, machine translation, automatic triage of emergency cases based on a clinical report content, to customer support conversational agents (or chatbots) that can answer customer’s questions and resolve common problems.”

To determine whether NLP and a machine learning algorithm can evaluate report variations, the authors analyzed more than 28,000 radiology reports for four metrics: verbosity, observational terms only, unwarranted negative findings and repeated language in different sections of the report.

Radiology reports for two imaging examinations—an appendicitis and a single view chest x-ray—were examined. The appendicitis reports were more structured, the chest x-ray reports had a free-text format. A total of 23 radiologists created reports for the appendicitis ultrasound and 28 radiologists who created reports for the chest x-ray. Meanwhile, 20 other radiologists dictated for both.

The authors calculated the mean and standard deviation, for individual and group, for each metric. Radiologists were ranked based on the number of metrics identified in each report.

Metric values and variability were greater on radiology reports that used free-text reporting compared to structured reports. The lowest scoring radiologist ranked at a nine and the highest ranked at 81. There was a 10-fold difference in scores.

“There was great variability in radiologist dictation styles—metrics per report varied greatly between radiologists with the maximum 10 times higher than the minimum score,” the researchers wrote. “Metric values were greater on the standardized reports using free text than the more structured reports."

The researchers noted previous studies have established that standardized and structured reporting improves communication with referring physicians and this may be attributed to the decrease in variability in dictation style. Furthermore, they noted a “lack of data uniformity and structure related to non-standardized lengthy narratives can hinder clear communication."

Though standardized reports improve communication in comparison to free-text reports, the authors wrote, that increased variability exists with standardized templates.

“This study demonstrates that natural language processing and machine learning algorithms can be used to evaluate significant volumes of radiology reports for metrics which could be used for tasks such as quality control, teaching, and as feedback and learning materials for practicing radiologists,” Donnelly and colleagues wrote. “This study also demonstrates and confirms that there is high variability in radiologist dictation styles based on the parameters evaluated.”