Global ETD Search

171	Detecting early-stage Alzheimer’s disease with Machine Learning algorithms Mukka, Jakob January 2023 (has links) Alzheimer’s disease (AD) accounts for the majority of all cases of dementia and can be characterized as a disease that causes a progressive decline of cognitive functions. Detecting the disease at it’s earliest stage is important as medical treatments can be more effective if they can be applied before the disease has caused irreparable brain damage. However, making a correct diagnosis of AD can be difficult, especially in the early stage when the symptoms are still mild. Machine learning algorithms can help in this process, with the purpose of this study being to investigate just how accurately machine learning algorithms can detect early-stage AD. Three algorithms were selected for the study, Random Forest, AdaBoost and Logistic Regression, which were then evaluated on the accuracy of their predictions. The results showed that Random Forest had the best overall performance with an accuracy of 79.78%. AdaBoost attained an accuracy of 76.40% and Logistic Regression attained an accuracy of 74.16%. These results suggest that machine learning algorithms can be used to make relatively accurate predictions of AD even when the disease is in it’s early stage. Machine Learning Random Forest AdaBoost Logistic Regression Computer Sciences Datavetenskap (datalogi)
172	A Knowledge Based Approach of Toxicity Prediction for Drug Formulation. Modelling Drug Vehicle Relationships Using Soft Computing Techniques Mistry, Pritesh January 2015 (has links) This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.
173	Spatial Patterns and the Socioeconomic Determinants of COVID-19 Infections in Ottawa, Canada. Laadhar, Brahim 15 December 2023 (has links) This study uncovered the pattern and spatial relationships between socio-economic factors and aggregated COVID-19 rates in Ottawa, Canada, from July 2020 to December 2021 at the neighbourhood scale. Both top-down and bottom-up data mining approaches were used to predict COVID-19 rates. The top-down approach employed ordinary least squares regression (OLS), spatial error model (SEM), geographically weighted regression (GWR) and multi-scale geographically weighted regression (MGWR). Model intercomparison was also undertaken. The pattern of COVID-19 in Ottawa exhibited a significant moderately positive spatial structure among neighbourhoods (Moran's I = 0.39; p = 0.0001). Local Moran's analysis identified areas of low and high COVID-19 clustering, interspersed with cold spots. The OLS model used determinants based on a literature review. Determinants were tested for normality using the Shapiro-Wilks test with those that failed the test had transformatoins to normality applied. Next, an OLS-based backward stepwise approach was used to select the optimal set of determinants based on goodness of fit, selecting the model with the lowest Akaike Information Criterion (AIC). The percentage of people who take public transit to work, percentage of people with no high school diploma, percentage of people over 65 years old, and percentage of people with a Bachelor level degree or above comprised the final set of determinants. A SEM model was created to account for residual spatial autocorrelation in the OLS model's residuals and yielded an adjusted R² = 0.63. Based on the SEM, a one-unit increase in the square root of the percentage of people with a bachelor's degree or above was associated with a 3.2% increase in COVID-19 rates, while the same unit increase in the square root of the percentage of people with no high school diploma was associated with a 10.6% increase in COVID-19 rates. Conversely, a one percent increase in the percentage of people aged 65 and older was linked to a 34.6% decrease in COVID-19 rates. To examine local variations in the relationships between the determinants and COVID-19, a MGWR with a Bisquare kernel and an adaptive bandwidth was used to improve upon the overall explained variance of the SEM model. The residuals of the MGWR model exhibited no significant spatial autocorrelation (Moran's I = -0.04; p = 0.62) and residuals were approximately normal (W = 0.98; p > 0.25). The MGWR model yielded an adjusted R² = 0.75. Taking a data mining and bottom-up approach, an optimized Random Forest model provided a very different set of determinants as important when compared to the top-down regression approaches and accounted for 47.34% of the COVID-19 variance. COVID-19 Spatial autocorrelation Regression Analysis Ottawa Hot Spot Analysis Random Forest
174	A comparison of forecasting techniques: Predicting the S&P500 Neikter, Axel, Sjöberg, Nils January 2023 (has links) Accurately predicting the S\&P 500 index means knowing where the US economy is heading. If there was a model that could predict the S\&P 500 with even some accuracy, this would be extremely valuable. Machine learning techniques such as neural network and Random forest have become more popular in forecasting. This thesis compares the more traditional forecasting methods, ARIMA, Exponential smoothing, and Naïve, versus the Random forest regression model in predicting the S\&P 500 index. The models are compared using the scale measures MAE and RMSE. The Diebold-Mariano test is used to evaluate if the model's forecasts significantly have better accuracy than the last known observation (Naïve method). The result showed that the Random forest model did outperform the other models regarding the RMSE and MAE values, especially on a two-day forecast. Furthermore, the Random forest model was significantly better on all horizons on a five percent significance level, meaning that the model had a better forecast accuracy than the last known observation. However, further research on this subject is needed to ensure the effectiveness of the Random forest model when forecasting stock market indices. Forecasting machine learning random forest arima Probability Theory and Statistics Sannolikhetsteori och statistik
175	Automatic processing of LiDAR point cloud data captured by drones / Automatisk bearbetning av punktmolnsdata från LiDAR infångat av drönare Li Persson, Leon January 2023 (has links) As automation is on the rise in the world at large, the ability to automatically differentiate objects in datasets via machine learning is of growing interest. This report details an experimental evaluation of supervised learning on point cloud data using random forest with varying setups. Acquired via airborne LiDAR using drones, the data holds a 3D representation of a landscape area containing power line corridors. Segmentation was performed with the goal of isolating data points belonging to power line objects from the rest of the surroundings. Pre-processing was performed on the data to extend the machine learning features used with geometry-based features that are not inherent to the LiDAR data itself. Due to how large-scale the data is, the labels were generated by the customer, Airpelago, and supervised learning was applied using this data. With their labels as benchmark, F1 scores of over 90% could be generated for both of the classes pertaining to power line objects. The best results were obtained when the data classes were balanced and both relevant intrinsic and extended features were used for the training of the classification models. machine learning supervised learning random forest point cloud segmentation Computer Engineering Datorteknik
176	Detecting Fraudulent User Behaviour : A Study of User Behaviour and Machine Learning in Fraud Detection Gerdelius, Patrik, Hugo, Sjönneby January 2024 (has links) This study aims to create a Machine Learning model and investigate its performance of detecting fraudulent user behaviour on an e-commerce platform. The user data was analysed to identify and extract critical features distinguishing regular users from fraudulent users. Two different types of user data were used; Event Data and Screen Data, spanning over four weeks. A Principal Component Analysis (PCA) was applied to the Screen Data to reduce its dimensionality. Feature Engineering was conducted on both Event Data and Screen Data. A Random Forest model, a supervised ensemble method, was used for classification. The data was imbalanced due to a significant difference in number of frauds compared to regular users. Therefore, two different balancing methods were used: Oversampling (SMOTE) and changing the Probability Threshold (PT) for the classification model. The best result was achieved with the resampled data where the threshold was set to 0,4. The result of this model was a prediction of 80,88% of actual frauds being predicted as such, while 0,73% of the regular users were falsely predicted as frauds. While this result was promising, questions are raised regarding the validity since there is a possibility that the model was over-fitted on the data set. An indication of this was that the result was significantly less accurate without resampling. However, the overall conclusion from the result was that this study shows an indication that it is possible to distinguish frauds from regular users, with or without resampling. For future research, it would be interesting to see data over a more extended period of time and train the model on real-time data to counter changes in fraudulent behaviour. Fraud Detection User Behaviour Random Forest PCA SMOTE Computer Sciences Datavetenskap (datalogi)
177	Comparative Analysis of Surrogate Models for the Dissolution of Spent Nuclear Fuel Awe, Dayo 01 May 2024 (has links) (PDF) This thesis presents a comparative analysis of surrogate models for the dissolution of spent nuclear fuel, with a focus on the use of deep learning techniques. The study explores the accuracy and efficiency of different machine learning methods in predicting the dissolution behavior of nuclear waste, and compares them to traditional modeling approaches. The results show that deep learning models can achieve high accuracy in predicting the dissolution rate, while also being computationally efficient. The study also discusses the potential applications of surrogate modeling in the field of nuclear waste management, including the optimization of waste disposal strategies and the design of more effective containment systems. Overall, this research highlights the importance of surrogate modeling in improving our understanding of nuclear waste behavior and developing more sustainable waste management practices. spent nuclear fuel random forest regression boosting methods surrogate model machine learning Physical Sciences and Mathematics
178	Android Malware Detection Using Machine Learning Kesani, Rahul Sai January 2024 (has links) Background. The Android smartphone, with its wide range of uses and excellent performance, has attracted numerous users. Still, this domination of the Android platform also has motivated the attackers to develop malware. The traditional methodology which detects the malware based on the signature is unfit to discover unknown applications. In this paper, detection has been tried whether an application is malware or not using Static Analysis (SA). Considered all the permissions that an application asks for and took them as input to feed our machine learning models. Objectives. The objectives to address and fulfill the aim of this thesis are: To find/create the necessary data set containing malware in the android systems. To test this, different classifiers have been built using different machine learning (ML) algorithms such as Support Vector Machine (SVM) (Linear and RBF), Logistic Regression (LR), Random Forest Algorithm (RF), Gaussian Naive-Bayes (GNB), Decision Tree Method (DT) etc., and also compared their performances. To evaluate and compare each of the chosen models using Accuracy, Precision, F1-Score and Recall methods among the algorithms mentioned in detecting the malware in android with better accuracy in real-time scenarios. Methods. To answer the research question, 1 method has been chosen which is: To identify malware in android system, the Experiment has been used. Results. The Sequential Neural Network (SNN) performed well on the dataset with 98.82 percent than the other Machine Learning (ML) algorithms. So, it is the most fruitful algorithm for the Android malware detection. Random Forest (RF), Decision Tree (DT) are the second-best algorithms on the dataset with 97 percent. Conclusions. Among Logistic Regression, KNN, SVM Linear, SVM RBF, Decision Tree, Random Forest, Gaussian Naive Bayes, and Sequential Neural Network Random Forest is declared as the most efficient algorithm after comparing all the models based on the performance metrics Precision, Recall, F1-Score and also by calculating Accuracy. Random Forest is considered as the most efficient algorithm among the four algorithms when they were compared. Malware Machine Learning Random Forest Sequential Neural Network. Computer Sciences Datavetenskap (datalogi)
179	An Investigation of How Well Random Forest Regression Can Predict Demand : Is Random Forest Regression better at predicting the sell-through of close to date products at different discount levels than a basic linear model? Jonsson, Estrid, Fredrikson, Sara January 2021 (has links) Allt eftersom klimatkrisen fortskrider ökar engagemanget kring hållbarhet inom företag. Växthusgaser är ett av de största problemen och matsvinn har därför fått mycket uppmärksamhet sedan det utnämndes till den tredje största bidragaren till de globala utsläppen. För att minska sitt bidrag rabatterar många matbutiker produkter med kort bästföredatum, vilket kommit att kräva en förståelse för hur priskänslig efterfrågan på denna typ av produkt är. Prisoptimering görs vanligtvis med så kallade Generalized Linear Models men då efterfrågan är ett komplext koncept har maskininl ärningsmetoder börjat utmana de traditionella modellerna. En sådan metod är Random Forest Regression, och syftet med uppsatsen är att utreda ifall modellen är bättre på att estimera efterfrågan baserat på rabattnivå än en klassisk linjär modell. Vidare utreds det ifall ett tydligt linjärt samband existerar mellan rabattnivå och efterfrågan, samt ifall detta beror av produkttyp. Resultaten visar på att Random Forest tar bättre hänsyn till det komplexa samband som visade sig finnas, och i detta specifika fall presterar bättre. Vidare visade resultaten att det sammantaget inte finns något linjärt samband, men att vissa produktkategorier uppvisar svag linjäritet. / As the climate crisis continues to evolve many companies focus their development on becoming more sustainable. With greenhouse gases being highlighted as the main problem, food waste has obtained a great deal of attention after being named the third largest contributor to global emissions. One way retailers have attempted to improve is through offering close-to-date produce at discount, hence decreasing levels of food being thrown away. To minimize waste the level of discount must be optimized, and as the products can be seen as flawed the known price-to-demand relation of the products may be insufficient. The optimization process historically involves generalized linear regression models, however demand is a complex concept influenced by many factors. This report investigates whether a Machine Learning model, Random Forest Regression, is better at estimating the demand of close-to-date products at different discount levels than a basic linear regression model. The discussion also includes an analysis on whether discounts always increase the will to buy and whether this depends on product type. The results show that Random Forest to a greater extent considers the many factors influencing demand and is superior as a predictor in this case. Furthermore it was concluded that there is generally not a clear linear relation however this does depend on product type as certain categories showed some linearity. Random Forest Regression Linear Regression Food Waste Demand Prediction Computer and Information Sciences Data- och informationsvetenskap
180	STUDENT ATTENTIVENESS CLASSIFICATION USING GEOMETRIC MOMENTS AIDED POSTURE ESTIMATION Gowri Kurthkoti Sridhara Rao (14191886) 30 November 2022 (has links) <p> Body Posture provides enough information regarding the current state of mind of a person. This idea is used to implement a system that provides feedback to lecturers on how engaging the class has been by identifying the attentive levels of students. This is carried out using the posture information extracted with the help of Mediapipe. A novel method of extracting features are from the key points returned by Mediapipe is proposed. Geometric moments aided features classification performs better than the general distances and angles features classification. In order to extend the single person pose classification to multi person pose classification object detection is implemented. Feedback is generated regarding the entire lecture and provided as the output of the system. </p> Computer vision : Pose Classification Mediapipe Geometric Moments Object detection Attentiveness classification Random Forest Classifier Retina-net

Search results