Augmented datasets improve AI accuracy

Augmented datasets can improve the overall accuracy of deep convolutional neural networks (DCNNs), according to new findings published in Clinical Radiology.

“Image classification performance using DCNNs is dependent on the data available in the training datasets, with large and diverse datasets providing the best results; however, correcting large datasets is usually time and effort intensive,” wrote lead author R. Ogawa, Saiseikai Matsuyama Hospital in Japan, and colleagues. “Furthermore, the availability of medical images to train DCNNs is limited. To improve training datasets, images can be processed to increase the number of samples by employing a method called data augmentation.”

The researchers used 735 chest x-rays for the study, with two experience radiologists classifying the findings as normal or abnormal. Images were converted into the JPEG format in multiple sizes, grouped into training and validation sets and then augmented using “rotation, Gaussian blur, brightness variation, and horizontal and vertical flipping.”

The team’s DCNN then classified all images—again, as being normal or abnormal—and the authors tracked its performance. Overall, its accuracy improved using augmented datasets, but applying Gaussian blur sometimes had the opposite impact. Noting that a resolution of 128x128 pixels and “combining rotation and horizontal flipping” produced the best results, the team used such augmentation when using the DCNN on a test set from the National Institutes of Health.

Using that set, the authors noted, the DCNN had a sensitivity of 85%, specificity of 81%, positive predictive value of 82%, negative predictive value of 85% and accuracy of 83%.

“In conclusion, augmentation of training datasets was useful for the binary classification of chest radiographs using a DCNN,” the authors wrote. “Classification performance was highly dependent on the type of augmentation techniques employed.”