Deep learning matches performance of radiologists in diagnosis of thyroid nodules

Background

Thyroid is a gland in the neck consisting of two lobes. Thyroid nodules are estimated to affect as much as 50% of the population. Triaging them for biopsy is done based on assessment of ultrasound (US) imaging by radiologists. However, interpretation of thyroid ultrasound suffers from high inter-reader variability and overdiagnosis.

Thyroid nodule

Data

Our dataset included US images for 1377 thyroid nodules from 1230 patients (Figure 1). All nodules were assigned ground truth labels (benign or cancer) proven by either biopsy or surgery. For testing, we selected 99 cases that were additionally annotated by multiple radiologists. The remaining 1278 cases were used for algorithm development.

Thyroid Ultrasound
Figure 1: Deidentified ultrasound images of a thyroid nodule.

Deep learning algorithm

We developed a deep learning algorithm to provide management recommendations for thyroid nodules observed on ultrasound images . Code repository for methods applied in this project is available at the following link: github.com/mateuszbuda/thyroid-us.

The main steps of the algorithm are:

Extraction of region of interest (ROI) based on calliper marks using Faster R-CNN network.
Prediction of malignancy using multi-task CNN.
Stratification into risk level and biopsy recommendation.

ROI extraction

Extraction of ROI with a nodule from thyroid US was performed by detection of calliper marks with Faster R-CNN network. ROI was defined as a rectangle enclosing all detected calipers, as shown in Figure 2.

Thyroid ROI
Figure 2. Thyroid US with detected calliper marks (red boxes) and ROI enclosing them (blue box).

Multi-task CNN

For prediction of malignancy, we developed a multi-task CNN (Figure 3). The main task for the network was to predict malignancy of thyroid nodules based on US images. Auxiliary tasks, trained jointly with the main task, were prediction of nodule’s visual features that were highly relevant to the malignancy status. Shared weight were updated based on training signal from all tasks. As a result, the network was able to extract generalizable features, even when trained on a small dataset.

Figure 3. Multi-task CNN network for malignancy prediction of thyroid nodules based on US images.

Results

For the test nodules, the proposed deep learning algorithm achieved AUC of 0.87, which was similar to a committee of three expert radiologists (AUC=0.91).

Test ROC
Figure 4. ROC curves comparing deep learning and radiologists on a test set [1].

The proposed deep learning algorithm achieved 87% sensitivity and 52% specificity. Sensitivity and specificity of the deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists.

References

[1] Mateusz Buda, Benjamin Wildman-Tobriner, Jenny K Hoang, David Thayer, Franklin N Tessler, William D Middleton, Maciej A Mazurowski “Management of thyroid nodules seen on US images: deep learning may match performance of radiologists.” Radiology, 2019.