Spelling suggestions: "subject:"sannolikhetsteori"" "subject:"sannolikhetsteorin""
191 |
Uncertainty quantification for neural network predictions / Kvantifiering av osäkerhet för prediktioner av neurala nätverkBorgström, Jonas January 2022 (has links)
Since their inception, machine learning methods have proven useful, and their usability continues to grow as new methods are introduced. However, as these methods are used for decision-making in most fields, such as weather forecasting,medicine, and stock market prediction, their reliability must be appropriately evaluated before the models are deployed. Uncertainty in machine learning and neural networks usually stems from two primary sources, the data used or the model itself. Uncertainty would not be a problem for most statistical and machine learning methods, but for neural networks that lack inherent uncertainty quantification methods, this can be more problematic. Furthermore, as the neural network architecture dimension grows in size, so does the number of parameters to be estimated. So, modeling the prediction uncertainty through parameter uncertainty can become an impossible task. There are, however, methods that can quantify uncertainty in neural networks using Bayesian approximation. One such method is Monte Carlo Dropout, where the same input data is used with different network structures. The results using these methods are assumed to follow a normal distribution, from which the uncertainty can be quantified. The second method tests a new approach where the neural network is first considered a dimension reduction tool. In doing this, input feature space that is often large is mapped to the state space of the neurons in the last hidden layer that can be selected to be smaller. Then by using the information from this reduced feature space, a reduced parameter set for the neural network prediction can be defined. With this, an assumption of, for example, a multinomial-Dirichlet probability model for discrete classification can be made. Importantly, this reduced feature space can generate predictions for hypothetical inputs, which quantifies prediction uncertainty for the network predictions. This thesis aims to see if the uncertainty of neural network predictions can be quantified statistically by evaluating this new method. Then, the results of these two methods will be compared to see any differences between the predictive uncertainty quantified using these methods. The results show that using the new method, predictive uncertainty could be quantified by first gathering the output range for each ReLU activation function. Then, new data could be uniformly simulated and inserted into the softmax layer for classification by using these ranges. By using these results, the multinomial-Dirichlet distribution could be used to quantify the uncertainty. The two methods offer comparable results when used to quantify predictive uncertainty.
|
192 |
Modification of the RusBoost algorithm : A comparison of classifiers on imbalanced data / Modifikation av RusBoost algoritmen : En jämförelse av klassificeringsmetoder på obalanserad dataForslund, Isak January 2022 (has links)
In many situations data is imbalanced, meaning the proportion of one class is larger than the other(s). Standard classifiers often produce undesirable results when the data is imbalanced and different methods have been developed in the attempt to improve classification under such conditions. Examples of this are the algorithms AdaBoost, RusBoost, and SmoteBoost which modifies the cost for misclassified observations, and the latter two also reduce the class imbalances when training the classifier. This thesis presents a new method, Modified RusBoost, where the RusBoost algorithm is modified in a way such that observations that are harder to classify correctly are assigned a lower probability of being removed in the under-sampling process. Comparisons were made between the performance of this method, AdaBoost, RusBoost, and SmoteBoost on imbalanced data. Also, how imbalances affect the different classifiers were investigated. The performance of these methods were compared on 20 real data sets. Overall, Modified RusBoost performed better or comparable to the other methods. Indicating that this algorithm can be a good alternative when classifying imbalanced data. Also, results showed that an increase of ρ, a ratio of majority over minority observations in a data set, has a negative impact on performance of the algorithms. However, this negative impact of ρ affects the performance of all methods similarly.
|
193 |
Sleep apnea prediction in a Swedish cohort : Can the STOP-Bang questionnaire be improved? / Sömnapnéprediktion i en svensk kohort : Kan STOP-Bang enkäten förbättras?Gladh, Miriam January 2022 (has links)
Obstructive sleep apnea (OSA) is defined as more than five breathing pauses per hour of sleep, an apnea-hypopnea index (AHI) > 5. STOP-Bang is a questionnaire that predicts the risk of sleep apnea based on risk factors, like snoring, hypertension, and throat circumference greater than 40 cm. Many individuals with OSA are undiagnosed and patients with sleep apnea have an increased risk of complications after surgery. Therefore, it is important to identify these patients. This thesis aims to create prediction models that predict the degree of sleep apnea, defined as no sleep apnea to mild sleep apnea (AHI < 15) or moderate to severe sleep apnea (AHI ≥ 15), by using different methods. The methods are Random Forests, logistic regression, and linear discriminant analysis (LDA). Beyond these three methods, the STOP-Bang questionnaire, a weighted STOP-Bang, and a modified STOP-Bang are used to predict the degree of sleep apnea. In the modified STOP-Bang, the same feature variables are used as in STOP-Bang. But the categorical feature variables are divided in a different way, and the modified STOP-Bang gives more weight to some of the feature variables. STOP-Bang models where some other feature variables are used were made to see if the prediction accuracy would be improved, SCAPIS STOP-Bang. The prediction precision is also compared for all models depending on gender. Accuracy, specificity, and sensitivity were compared for the models. For the models using the STOP-Bang feature variables, the models with the highest area under the curve (AUC), with confidence interval in parenthesis, were the LDA and the logistic regression models with an AUC of 0.81 (0.78, 0.84). The confidence intervals for the AUC, sensitivity, and accuracy were overlapping for all the models. The SCAPIS STOP-Bang model did not achieve a better prediction accuracy. For all the models, the accuracy was higher for females than for males. But also here, all the confidence intervals were overlapping.
|
194 |
Correlation coefficient based feature screening : With applications to microarray data / Korrelationsbaserad dimensionsreduktion med tillämpning på data från mikromatriserHolma, Agnes January 2022 (has links)
Measuring dependency between variables is of great importance when performing statistical analysis and can for instance be used for feature screening. Therefore, it is interesting to find measures that can quantify the dependencies, even if the dependencies are complex. Recently, the correlation coefficient ξn was proposed [1], that is fast to compute and works particularly well when dependencies show an oscillatory or wiggly pattern. In this thesis, the coefficient ξn was applied as a feature screening tool, and it was investigated how well the coefficient could find the dependencies between predictor variables and a response variable in a comprehensive simulation study. The result showed that the correlation coefficient ξn was better, compared to two other quite new and popular correlation coefficients, Hilbert-Schmidt Independence Criterion and Distance Correlation (DC), at detecting the dependencies when variables were connected through sinus-or cosinus-functions and worse when variables were connected through some other functions, such as exponential functions. As a feature screening tool, the correlation coefficient ξn and DC was also applied to real microarray data to investigate if it could give better results than when using t-test for feature screening. The result showed that using t-test was more efficient than using DC or ξn for this particular data set.
|
195 |
Classification Models for Activity Recognition using Smart Phone Accelerometers / Klassificeringsmodeller för aktivitetsigenkänning använder sig av accelerometrar för smarta telefonerKumar, Biswas January 2022 (has links)
The huge amount of data generated by accelerometers in smartphones creates new opportunities for useful data mining applications. Machine Learning algorithms can be effectively used for tasks such as the classification and clustering of physical activity patterns. This paper builds and evaluates a system that uses real-world smartphone-based tri-axial accelerometers labeled data to perform activity recognition tasks. Over a million data recorded at the frequency 20Hz, was filtered and pre-processed to extract relevant features for the classification task. The features were selected to obtain higher classification accuracy. These supervised classification models, namely, random forest, support vector machines, decision tree, naïve Bayes classifier, and multinomial logistic regression are evaluated and finally compared with a few unsupervised classification models such as k-means and self-organizing map (SOM) technique built on an unlabelled dataset. Statistical model evaluation metrics such as accuracy-precision-recall are used to compare the classification performances of the models. It was interesting to see that all supervised learning methods achieved very high accuracy (over 95%) on labeled datasets as against 65% by unsupervised SOM. Moreover, they registered very low similarity (23%) among themselves on unlabelled datasets with the same selected features.
|
196 |
Modeling COPD with a Generalized Linear Mixed Effect Model : Results from the OLIN-study / Modellering av KOL med en Generalized Linear Mixed Effect Model : Resultat från OLIN-studienSjödin, Jenny January 2022 (has links)
The purpose of this thesis is to analyze which factors are associated with physicians’ diagnosis of Chronic Obstructive Pulmonary Disease (COPD) where time is of primary interest. The data that is used in the analysis is from the OLIN studies. The OLIN studies are a longitudinal epidemiological research project that focuses on obstructive lung diseases. The study population contains two groups, one group with COPD according to a lung function test criteria at inclusion in the study and one reference group with matching gender and age. All subjects were invited to annual examinations of a basic program including structured interviews, health-related questionnaires, and lung function testing. The analysis is performed with a generalized linear mixed effect model that accounts for dependencies within-subject observation, which make the selected model idea lwhen analyzing longitudinal data with repeated measurements from the same subject. Results show that smoking and poor performance from the lung function tests increases the risk of getting the COPD diagnosis by a physician. The thesis also reaches the conclusion that time has different effect depending on which group the subject belongs to. For subject that had COPD according to the lung function test criteria at inclusion in the study, the risk of getting the diagnosis increases with time, and for subject that did not have COPD at inclusion in the study, the risk decreases with time.
|
197 |
Jämförelse av män och kvinnors risk att återinsjukna i stroke : En överlevnadsanalys som tar hänsyn till konkurrerande utfall / Comparing the risk of stroke recurrence in men and women : A survival analysis in the presence of competing risksGrundberg, Anton, Inge, Erik January 2022 (has links)
Stroke är den tredje vanligaste dödsorsaken i Sverige. Varje år drabbas omkring 25 000 personer, och av dem är det ungefär fyra av fem som insjuknar för första gången. Det är ungefär lika många män som kvinnor som drabbas av stroke. Dock får kvinnor i genomsnitt en allvarligare stroke än män. Kvinnor är också oftast äldre när de drabbas: kvinnor är i genomsnitt 78 år och än 73 år. Av de som drabbas av stroke varje år dör ungefär 20% inom de tre första månaderna och drygt 20% återinsjuknar senare. I vår studie har vi undersökt om det finns någon skillnad i risk för att återinsjukna i stroke mellan män och kvinnor. I studien har vi arbetat med data från Riksstroke. Riksstroke är ett nationellt kvalitetsregister som samlar in, tillhandahåller och analyserar data rörande svenska strokepatienter och svensk strokevård. Sammanlagt har vi studerat 27 981 insjuknanden. I analysen har vi använt oss av Kaplan-Meier-estimation, Cox Proportional Hazard Cause-Specific Model samt Fine and Gray Subdistributional Hazard Model. Den sistnämnda metoden har vi använt för att kunna ta hänsyn till konkurrerande utfall. I vårt fall har död varit det konkurrerande utfallet till återinsjuknande. Resultatet visar att det inte finns någon signifikant skillnad i risk mellan könen att återinsjukna i stroke från 60 dagar (efter första stroken) och framåt. Detta gäller oavsett om man tar hänsyn till konkurrerande utfall eller inte. Däremot finns det en signifikant skillnad i risk att dö efter en stroke med avseende på kön under samma tidsintervall. Enligt vår studie har män en högre risk att dö 60 dagar eller senare efter en stroke jämfört med kvinnor. Vi har också kunnat se att modeller som inte tar hänsyn till konkurrerande utfall i vissa fall överskattar riskerna för respektive utfall.
|
198 |
Variabelselektion för högdimensionella data : En jämförande simuleringsstudie av variabelselektionsmetoder / Variable selection for high dimensional data : a comparative simulation study between variable selection methodsLindberg, Jesper, Lidström, Oscar January 2022 (has links)
Högdimensionella data är något som blir allt vanligare inom flera områden som ekonomi, medicin och geologi. Detta kan ofta vara svårt att hantera. Det är därför viktigt att veta hur olika metoder som skattar regressionsmodeller fungerar och presterar för att kunna använda den metod som passar bäst utefter det syfte som finns. Syftet för denna studie är att jämföra olika metoder som skattar regressionsmodeller på högdimensionella data baserat på prediktionsförmåga, variabelselektion och koefficientskattningar. Studien jämför metoderna Lasso, Ridge, Elastic net, adaptive Lasso och adaptive Elastic net. Metoderna jämförs genom att skapa åtta olika simuleringar med olika förutsättningar för linjär regression. Även metoden Random forest jämförs med metoderna ovan i variabelselektion på högdimensionella data, där risken för Bardet-Biedl Syndrom undersöks utifrån nivån av olika gener i däggdjurs ögon. Resultatet visar på att skattningsmetoden Elastic net är den metod som i våra simuleringar oftast ger den bästa prediktionen. Denna metod fungerar bra för både variabelselektion och koefficientskattningar på de påverkande variablerna medan den är sämre i att plocka bort ochskatta de icke-påverkande variablerna. Att peka ut en metod som alltid skapar den bästa modellen är däremot svårt. Olika förutsättningar på data gör att den metod som skapar den bästa modellen varierar. Syftet till att en modell skapas har också stor inverkan på vilken metod somkommer att ge den optimala modellen.
|
199 |
Investigating the relationship between dementia and cognitive tests performance : do better scores on cognitive tests relate to a lower risk of developing dementia? / Undersökning av sambandet mellan demens och prestation på kognitiva tester : innebär bättre resultat på kognitiva tester en lägre risk för demens?Kwon, Emma, Lindvall, Markus January 2022 (has links)
In 2021, dementia was the seventh leading cause of death among all diseases in the world according to the World Health Organization (2021). Dementia is an overall medical term indicating deteriorated brain health associated with loss of memory, cognitive abnormality and difficulties in daily activities. There is no medical cure for dementia yet. However, if we can detect early brain changes through various cognitive tests, we may delay the progression of dementia. The purpose of this study is to investigate the relationship between dementia and cognitive tests. We use two methods to examine their relationship. The first method is called an extended Cox model. Since we are using longitudinal data regarding cognitive aging provided from the Betula Project collected from 1988 to 2010 with repeated cognitive examinations from four different sample groups, we introduce time-varying covariates in the usual Cox proportional hazard model. Secondly, the regularization method from the extended Cox model is implied. We have 12 cognitive tests and four additional covariates such as genetic information (APOEe4), information on age, sex and education to fit into the extended Cox model. As a result, we have many variables in the regression. If we can shrink the model, we can examine which variables have more important relations to dementia. With an aim to find more important variables, we apply the Elastic net regularization. To sum up, an extended Cox regression and a regularized extended Cox model are fitted to investigate the relationship between dementia and cognitive test results. The study finds that a person who performs better at Episodic Memory test, Prospective Memory test and/or Mini-Mental State Examination, has lower risk of developing dementia.
|
200 |
Control charts for statistical quality control of Swedish stroke care using Riksstroke data / Statistiska kontrolldiagram för kvalitetskontroll av strokevård i Sverige med data från Riksstroke registerMorin, Edvin, Novossad, Martiina January 2022 (has links)
The aim of this study was to implement statistical quality control to stroke care in Sweden by designing control charts for data from the Riksstroke registry to detect potential unnatural, or special cause variation in the years 2019-2020. Suitable control charts were designed for three quality indicators: the time elapsed from hospital admission to receiving reperfusion therapy (door-to-needle time), the proportion of patients directly admitted to stroke unit, and the fatality rate. The data was sourced from three anonymous hospitals from the Riksstroke registry. The data was split into two phases, one for calibration of the control charts (phase-I) and one for monitoring the process (phase-II). Phase-I consisted of data from 2015-2018 and phase-II of data from 2019-2020. The control charts X-bar and s charts were designed for the door-to-needle time, p charts for the proportion of patients directly admitted to stroke unit, while p charts in addition to EWMA charts were used for the fatality rate. The x-bar and s charts for the two larger hospitals signalled for special cause variation in some months in 2019-2020, whereas the process appeared to be in control during the same period at the smallest hospital. The p chart for the proportion of directly admitted at the largest hospital signalled for special cause variation which lasted throughout 2019-2020. As a consequence, this p chart was modified with recalibrated control limits and it could be seen that the proportion of patients directly admitted had increased in 2019-2020 from previous years. None of the p and EWMA charts for the fatality rate at each hospital signalled for any special cause variation. In conclusion, in this study it is shown how control charts could be useful tools for detecting and evaluating changes in values for quality indicators in stroke care. In order to design adequate control charts, the data should be collected at each time unit and the process should be in control during calibration. This way the control charts may retain good sensitivity of detecting special cause variation.
|
Page generated in 0.0601 seconds