Evaluation of the dataset quality in gamma passing rate predictions using machine learning methods

Quintero, Paulo, Benoit, David, Yongqiang, Cheng, Moore, Craig and Beavis, Andrew (2023) Evaluation of the dataset quality in gamma passing rate predictions using machine learning methods. British Journal of Radiology, 96 (1147). ISSN 0007-1285

Item Type:	Article

Abstract

Objective: Gamma passing rate (GPR) predictions using machine learning methods have been explored for treatment verification of radiotherapy plans. However, these methods presented datasets with unbalanced number of plans having different treatment conditions (heterogeneous datasets), such as anatomical sites or dose per fractions, leading to lower model interpretability and prediction performance.

Methods: We investigated the impact of the dataset composition on GPR binary classification (pass/fail) using random forest (RF), XG-boost, and neural network (NN) models. 945 plans were used to create one reference dataset (randomly assembled) and 24 customized datasets that considered four heterogeneity factors independently (anatomical region, number of arcs, dose per fraction, and treatment unit). 309 predictor features were extracted and calculated from plan parameters, modulation complexity metrics, and radiomic analysis (leave-trajectory maps, 3D dose distributions, and portal dosimetry images). The models’ performances were measured using the area under the curve from the receiver operating characteristic (ROC-AUC).

Results: Radiomics features for reference models increased ROC-AUC values up to 13%, 15%, and 5% for RF, XG-Boost, and NN, respectively. The datasets with higher heterogeneous conditions presented the lower ROC-AUC values (RF: 0.72 ± 0.11, XG-Boost: 0.67 ± 0.1, NN: 0.89 ± 0.05) compared to models with less heterogeneous treatment conditions (RF: 0.88 ± 0.06, XG-Boost: 0.89 ± 0.07, NN: 0.98 ± 0.01). The ten most important features for each heterogeneity dataset group demonstrated their correlation with the treatments’ physical aspects and GPR prediction.

Conclusion: Improvements in data generalization and model performances can be associated with datasets having similar treatment conditions. This analysis might be implemented to evaluate the dataset quality and model consistency of further ML applications in radiotherapy.

Advances in knowledge: Dataset heterogeneities decrease ML model performance and reliability.

[thumbnail of Evaluation of the dataset quality in gamma passing rate predictions using machine learning methods.pdf]

Preview

PDF
Evaluation of the dataset quality in gamma passing rate predictions using machine learning methods.pdf - Accepted Version
Download (533kB) | Preview

More Information

Related URLs: Publisher

Depositing User: Yongqiang Cheng

Identifiers

Item ID: 16814

Identification Number: 10.1259/bjr.20220302

ISSN: 0007-1285

URI: http://sure.sunderland.ac.uk/id/eprint/16814

Official URL: https://www.birpublications.org/doi/10.1259/bjr.20...

Users with ORCIDS

ORCID for Cheng Yongqiang:

orcid.org/0000-0001-7282-7638

Catalogue record

Date Deposited: 11 Jan 2024 11:50

Last Modified: 04 Jun 2025 14:58

Contributors

Author:	Cheng Yongqiang
Author:	Paulo Quintero
Author:	David Benoit
Author:	Craig Moore
Author:	Andrew Beavis

University Divisions

Faculty of Business and Technology > School of Computer Science and Engineering

Subjects

Computing > Data Science
Computing > Artificial Intelligence

Actions (login required)

View Item (Repository Staff Only)

Altmetric

Dimensions

Download Statistics

Downloads per month over past year

SURE

Evaluation of the dataset quality in gamma passing rate predictions using machine learning methods

Abstract

More Information

Identifiers

Users with ORCIDS

Catalogue record

Contributors

University Divisions

Subjects

Actions (login required)

Altmetric

Altmetric

Altmetric

Download Statistics

Download Statistics

Download Statistics