Histopathology refers to inspecting and analysing tissue samples under a microscope to identify and examine signs of diseases. The manual investigation procedure of histology slides by pathologists is time-consuming and susceptible to misconceptions. Deep learning models have demonstrated outstanding performance in digital histopathology, providing doctors and clinicians with immediate and reliable decision-making assistance in their workflow. In this study, deep learning models, including vision transformers (ViT) and convolutional neural networks (CNN), were employed to compare their performance in patch-level classification task on feature annotations of glioblastoma multiforme in H\&E histology whole slide images (WSI). The dataset utilised in this study was obtained from the Ivy Glioblastoma Atlas Project (IvyGAP). The pre-processing steps included stain normalisation of the images, and patches of size 256x256 pixels were extracted from the WSIs. In addition, the per-subject split method was implemented to prevent data leakage between the training, validation and test sets. Three models were employed to perform the classification task on the IvyGAP data image, two scratch-trained models, a ViT and a CNN (variant of VGG16), and a pre-trained ViT. The models were assessed using various metrics such as accuracy, f1-score, confusion matrices, Matthews correlation coefficient (MCC), area under the curve (AUC) and receiver operating characteristic (ROC) curves. In addition, experiments were conducted to calibrate the models to reflect the ground truth of the task using the temperature scale technique, and their uncertainty was estimated through the Monte Carlo dropout approach. Lastly, the models were statistically compared using the Wilcoxon signed-rank test. Among the evaluated models, the scratch-trained ViT exhibited the best test accuracy of 67%, with an MCC of 0.45. The scratch-trained CNN obtained a test accuracy of 49% and an MCC of 0.15. However, the pre-trained ViT only achieved a test accuracy of 28% and an MCC of 0.034. The reliability diagrams and metrics indicated that the scratch-trained ViT demonstrated better calibration. After applying temperature scaling, only the scratch-trained CNN showed improved calibration. Therefore, the calibrated CNN was used for subsequent experiments. The scratch-trained ViT and calibrated CNN illustrated different uncertainty levels. The scratch-trained ViT had moderate uncertainty, while the calibrated CNN exhibited modest to high uncertainty across classes. The pre-trained ViT had an overall high uncertainty. Finally, the results of the statistical tests reported that the scratch-trained ViT model performed better among the three models at a significant level of approximately 0.0167 after applying the Bonferroni correction. In conclusion, the scratch-trained ViT model achieved the highest test accuracy and better class discrimination. In contrast, the scratch-trained CNN and pre-trained ViT performed poorly and were comparable to random classifiers. The scratch-trained ViT demonstrated better calibration, while the calibrated CNN showed varying levels of uncertainty. The statistical tests demonstrated no statistical difference among the models.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-195364 |
Date | January 2023 |
Creators | Spyretos, Christoforos |
Publisher | Linköpings universitet, Statistik och maskininlärning |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0035 seconds