Frontotemporal dementia is a neurodegenerative disorder with high heterogeneity on the genetic, pathological and clinical level. The familial form of the disease is mainly caused by pathogenic variants of three genes: C9orf72, MAPT and GRN. As there is no clear correlation between the mutation and the clinical phenotype, symptom severity or age of onset, the demand for predictive biomarkers is high. While there is no fluid biomarker for frontotemporal dementia in use yet, there is strong hope that changes of protein concentrations in the blood or cerebrospinal fluid can aid prognostics many years before symptoms develop. Increasing amounts of data are becoming available because of long-term studies of families affected by familial frontotemporal dementia, but its analysis is time-consuming and work intensive. In the scope of this project a pipeline was built for the automated analysis of proteomics data. Specifically, it aims to identify proteins useful for differentiation between two groups by using random forest, a supervised machine learning method. The analysis results of the pipeline for a data set containing blood plasma protein concentration of healthy controls and participants affected by frontotemporal dementia were promising and the generalized functioning of the pipeline was proven with an independent breast cancer proteomics data set.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-416827 |
Date | January 2020 |
Creators | Waury, Katharina |
Publisher | Uppsala universitet, Institutionen för biologisk grundutbildning, Karolinska Institutet |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds