In concert: Combining multiple machine learning models in radiology boosts prediction performance

Pooling together multiple machine learning models to predict pediatric bone age has been proven to be more effective than one model on its own.

Researchers with several academic institutions recently made that discovery using dozens of submissions from the RSNA Pediatric Bone Age Machine Learning Challenge. Brown University’s Ian Pan and colleagues were able to combine several AI algorithms—designed to complete the same task—together using a method called “ensembling.”

The souped-up version of the model was able to decrease the generalization error of bone age prediction from a mean absolute deviation of 4.55 months down to just 3.79 months, according to their study, published Wednesday, Nov. 20, in Radiology: Artificial Intelligence. In clinical practice, this could help radiologists to rapidly assess skeletal maturation in children with a variety of different conditions, they noted.

“Our results call attention to a concept that has substantial practical implications, as computer vision and other machine learning algorithms begin to move from research to the clinical environment,” Pan said in a statement. “Namely, that the best results are likely to be achieved by combining multiple accurate and diverse models rather than from single models alone.”

Pan and co-investigators reached their conclusions by using 48 submissions in the 2017 bone-age competition. In it, RSNA shared more than 12,600 pediatric hand x-rays, with bone ages determined by a radiologist, challenging teams to create their own prediction models. Researchers for this recent study evaluated numerous possible model combinations—from two, up to 10 of the 48 submissions—using the mean absolute deviation.

A key takeaway from the study is the need for practitioners who are incorporating AI algorithms into their own workflows to seek out predictions from other similar models, the authors noted. They compared this practice to a radiologist seeking out a second opinion in the reading room.

The authors also believe that such AI competitions are fertile ground for further development ensemble prediction methods.

“Machine learning competitions within radiology should be encouraged to spur development of heterogeneous models whose predictions can be combined to achieve optimal performance,” Pan added.

Eliot Siegel, MD, a radiologist and professor at the University of Maryland, further explored the ensemble model and Pan’s research in a corresponding commentary published Nov. 20.