Global ETD Search

21	Early Stratification of Gestational Diabetes Mellitus (GDM) by building and evaluating machine learning models Sharma, Vibhor January 2020 (has links) Gestational diabetes Mellitus (GDM), a condition involving abnormal levels of glucose in the blood plasma has seen a rapid surge amongst the gestating mothers belonging to different regions and ethnicities around the world. Cur- rent method of screening and diagnosing GDM is restricted to Oral Glucose Tolerance Test (OGTT). With the advent of machine learning algorithms, the healthcare has seen a surge of machine learning methods for disease diag- nosis which are increasingly being employed in a clinical setup. Yet in the area of GDM, there has not been wide spread utilization of these algorithms to generate multi-parametric diagnostic models to aid the clinicians for the aforementioned condition diagnosis.In literature, there is an evident scarcity of application of machine learn- ing algorithms for the GDM diagnosis. It has been limited to the proposed use of some very simple algorithms like logistic regression. Hence, we have attempted to address this research gap by employing a wide-array of machine learning algorithms, known to be effective for binary classification, for GDM classification early on amongst gestating mother. This can aid the clinicians for early diagnosis of GDM and will offer chances to mitigate the adverse out- comes related to GDM among the gestating mother and their progeny.We set up an empirical study to look into the performance of different ma- chine learning algorithms used specifically for the task of GDM classification. These algorithms were trained on a set of chosen predictor variables by the ex- perts. Then compared the results with the existing machine learning methods in the literature for GDM classification based on a set of performance metrics. Our model couldn’t outperform the already proposed machine learning mod- els for GDM classification. We could attribute it to our chosen set of predictor variable and the under reporting of various performance metrics like precision in the existing literature leading to a lack of informed comparison. / Graviditetsdiabetes Mellitus (GDM), ett tillstånd som involverar onormala ni- våer av glukos i blodplasma har haft en snabb kraftig ökning bland de drab- bade mammorna som tillhör olika regioner och etniciteter runt om i världen. Den nuvarande metoden för screening och diagnos av GDM är begränsad till Oralt glukosetoleranstest (OGTT). Med tillkomsten av maskininlärningsalgo- ritmer har hälso- och sjukvården sett en ökning av maskininlärningsmetoder för sjukdomsdiagnos som alltmer används i en klinisk installation. Ändå inom GDM-området har det inte använts stor spridning av dessa algoritmer för att generera multiparametriska diagnostiska modeller för att hjälpa klinikerna för ovannämnda tillståndsdiagnos.I litteraturen finns det en uppenbar brist på tillämpning av maskininlär- ningsalgoritmer för GDM-diagnosen. Det har begränsats till den föreslagna användningen av några mycket enkla algoritmer som logistisk regression. Där- för har vi försökt att ta itu med detta forskningsgap genom att använda ett brett spektrum av maskininlärningsalgoritmer, kända för att vara effektiva för binär klassificering, för GDM-klassificering tidigt bland gesterande mamma. Det- ta kan hjälpa klinikerna för tidig diagnos av GDM och kommer att erbjuda chanser att mildra de negativa utfallen relaterade till GDM bland de dödande mamma och deras avkommor.Vi inrättade en empirisk studie för att undersöka prestandan för olika ma- skininlärningsalgoritmer som används specifikt för uppgiften att klassificera GDM. Dessa algoritmer tränades på en uppsättning valda prediktorvariabler av experterna. Jämfört sedan resultaten med de befintliga maskininlärnings- metoderna i litteraturen för GDM-klassificering baserat på en uppsättning pre- standametriker. Vår modell kunde inte överträffa de redan föreslagna maskininlärningsmodellerna för GDM-klassificering. Vi kunde tillskriva den valda uppsättningen prediktorvariabler och underrapportering av olika prestanda- metriker som precision i befintlig litteratur vilket leder till brist på informerad jämförelse. GDM machine learning algorithms binary classification treebased models XGBoost performance metrics Computer and Information Sciences Data- och informationsvetenskap
22	SYSTEMATICALLY LEARNING OF INTERNAL RIBOSOME ENTRY SITE AND PREDICTION BY MACHINE LEARNING Junhui Wang (5930375) 15 May 2019 (has links) <p><a>Internal ribosome entry sites (IRES) are segments of the mRNA found in untranslated regions, which can recruit the ribosome and initiate translation independently of the more widely used 5’ cap dependent translation initiation mechanism. IRES play an important role in conditions where has been 5’ cap dependent translation initiation blocked or repressed. They have been found to play important roles in viral infection, cellular apoptosis, and response to other external stimuli. It has been suggested that about 10% of mRNAs, both viral and cellular, can utilize IRES. But due to the limitations of IRES bicistronic assay, which is a gold standard for identifying IRES, relatively few IRES have been definitively described and functionally validated compared to the potential overall population. Viral and cellular IRES may be mechanistically different, but this is difficult to analyze because the mechanistic differences are still not very clearly defined. Identifying additional IRES is an important step towards better understanding IRES mechanisms. Development of a new bioinformatics tool that can accurately predict IRES from sequence would be a significant step forward in identifying IRES-based regulation, and in elucidating IRES mechanism. This dissertation systematically studies the features which can distinguish IRES from nonIRES sequences. Sequence features such as kmer words, and structural features such as predicted MFE of folding, Q<sub>MFE</sub>, and sequence/structure triplets are evaluated as possible discriminative features. Those potential features incorporated into an IRES classifier based on XGBboost, a machine learning model, to classify novel sequences as belong to IRES or nonIRES groups. The XGBoost model performs better than previous predictors, with higher accuracy and lower computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by adding global kmer and structural features. The trained XGBoost model has been implemented as the first high-throughput bioinformatics tool for IRES prediction, IRESpy. This website provides a public tool for all IRES researchers and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.</a></p> Bioinformatics bioinformatic study machine learning-based Internal Ribosome Entry Sites XGBoost classification model
23	Predicting the Movement Direction of OMXS30 Stock Index Using XGBoost and Sentiment Analysis Elena, Podasca January 2021 (has links) Background. Stock market prediction is an active yet challenging research area. A lot of effort has been put in by both academia and practitioners to produce accurate stock market predictions models, in the attempt to maximize investment objectives. Tree-based ensemble machine learning methods such as XGBoost have proven successful in practice. At the same time, there is a growing trend to incorporate multiple data sources in prediction models, such as historical prices and text, in order to achieve superior forecasting performance. However, most applications and research have so far focused on the American or Asian stock markets, while the Swedish stock market has not been studied extensively from the perspective of hybrid models using both price and text derived features. Objectives. The purpose of this thesis is to investigate whether augmenting a numerical dataset based on historical prices with sentiment features extracted from financial news improves classification performance when predicting the daily price trend of the Swedish stock market index, OMXS30. Methods. A dataset of 3,517 samples between 2006 - 2020 was collected from two sources, historical prices and financial news. XGBoost was used as classifier and four different metrics were employed for model performance comparison given three complementary datasets: the dataset which contains only the sentiment feature, the dataset with only price-derived features and finally, the dataset augmented with sentiment feature extracted from financial news. Results. Results show that XGBoost has a good performance in classifying the daily trend of OMXS30 given historical price features, achieving an accuracy of 73% on the test set. A small improvement across all metrics is recorded on the test set when augmenting the numerical dataset with sentiment features extracted from financial news. Conclusions. XGBoost is a powerful ensemble method for stock market prediction, reflected in a satisfactory classification performance of the daily movement direction of OMXS30. However, augmenting the numerical input set with sentiment features extracted from text did not have a powerful impact on classification performance in this case, as the improvements across all employed metrics were small. Machine learning XGBoost Sentiment analysis Stock market prediction OMXS30 Computer Sciences Datavetenskap (datalogi)
24	Machine Learning for Outcome Prediction of High-Risk Trauma Patients in the Emergency Department Cardosi, Joshua David January 2021 (has links) No description available. Mechanical Engineering machine learning emergency department critical care mortality missing data neural network XGBoost LightGBM
25	CAN STATISTICAL MODELS BEAT BENCHMARK PREDICTIONS BASED ON RANKINGS IN TENNIS? Svensson, William January 2021 (has links) The aim of this thesis is to beat a benchmark prediction of 64.58 percent based on player rankings on the ATP tour in tennis. That means that the player with the best rank in a tennis match is deemed as the winner. Three statistical model are used, logistic regression, random forest and XGBoost. The data are over a period between the years 2000-2010 and has over 60 000 observations with 49 variables each. After the data was prepared, new variables were created and the difference between the two players in hand taken all three statistical models did outperform the benchmark prediction. All three variables had an accuracy around 66 percent with the logistic regression performing the best with an accuracy of 66.45 percent. The most important variable overall for the models is the total win rate on different surfaces, the total win rate and rank. Logistic Regression Random Forest XGBoost ATP tour Probability Theory and Statistics Sannolikhetsteori och statistik
26	ASSESSING PREDICTION CONDITIONS ANDSEQUENTIAL CLASSIFICATION IN ICU SEPSISPREDICTION Lind, Petter January 2023 (has links) Patients admitted to intensive care units (ICUs) often have a higher risk of sepsis due to weakened immune systems. Early sepsis diagnosis is crucial for timely treatment, emphasizing the need to improve the predictive capabilities of sepsis prediction models. Although machine learning models have demonstrated success in predicting sepsis onset, there is limited work done on how model assessment is affected by sequential prediction rather than evaluating on one prediction per patient. This thesis assesses the effectiveness of the evaluation procedures employed by such models and explore different prediction conditions to enhance sepsis prediction. Data was collected from the MIMIC-IV data set,and includes variables commonly used in real ICU settings relevant to sepsis diagnosis. Random onset matching is used to select time points for patients with and without sepsis, with the data analyzed using XGBoost. Evaluation metrics are calculated both once per patient, and is compared to sequential measurements for all patients from 40 hours before sepsis up until sepsis onset. Results shows that a model trained on data close to sepsis onset has strong predictive performance up to 25 hours before sepsis onset. In addition,different restrictive conditions on predictions are considered and evaluated. As the test set is limited it is important that the results are validated further, as it could provide insights regarding interpretation in the practical implementation of similar prediction models for support of healthcare professionals through timely interventions. Sepsis Prediction Sequential Prediction Conditional Predictions XGBoost Probability Theory and Statistics Sannolikhetsteori och statistik
27	Predicting and classifying atrial fibrillation from ECG recordings using machine learning Bogstedt, Carl January 2023 (has links) Atrial fibrillation is one of the most common types of heart arrhythmias, which can cause irregular, weak and fast atrial contractions up to 600 beats per minute. Atrial fibrillation has increased prevalence with age and is associated with increased risks of ischemia, as blood clots can form due to the weak contractions. During prolonged periods of atrial fibrillation, the atria can undergo a process called atrial remodelling. This causes electrophysiological and structural changes to the atria such as increased atrial size and changes to calcium ion densities. These changes themselves promotes the initiation and propagation of atrial fibrillation, which makes early detection crucial. Fortunately, atrial fibrillation can be detected on an electrocardiogram. Electrocardiograms measures the electrical activity of the heart during its cardiac cycle. This includes the initiation of the action potential, the depolarization of the atria and ventricles and their repolarization. On the electrocardiogram recording, these are seen as peaks and valleys, where each peak and valley can be traced back to one of these events. This means that during atrial fibrillation, the weak, irregular and fast atrial contractions can all be detected and measured. The aim of this project was to develop a machine learning model that could predict onset of atrial fibrillation, and that could classify ongoing atrial fibrillation. This was achieved by training one multiclass classification machine learning model using XGBoost, and three binary classification machine learning models using ROSETTA, on electrocardiogram recordings of people with and without atrial fibrillation. XGBoost is a tree boosting system which uses tree-like structures to classify data, while ROSETTA is a rule-based classification model which creates rules in an IF and THEN format to make decisions. The recordings were labelled according to three different classes: no atrial fibrillation, atrial fibrillation or preceding atrial fibrillation. The XGBoost model had a prediction accuracy of 99.3%, outperforming the three ROSETTA models and other atrial fibrillation classification and prediction models found. The ROSETTA models had high accuracies on the learning set, however, the predictions were subpar, indicating faulty settings for this type of data. The results in this project indicate that the models created can be used to accurately classify and predict onset of and ongoing atrial fibrillation, serving as a tool for early detection and verification of diagnosis. Bioinformatics Machine Learning Electrocardiogram Classification Rough Sets XGBoost Atrial Fibrillation Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
28	Recommendation System for Insurance Policies : An Investigation of Unsupervised and Supervised Learning Techniques Palmgren, Andreas January 2023 (has links) Recommendation systems have significantly influenced user experiences across various industries, yet their application in the insurance sector remains relatively unexplored. This thesis focuses on developing a car insurance recommendation system that implements a `consumers like you' feature. The study initially employs a clustering-based recommendation system due to missing labels in an offline environment. However, challenges emerge, such as determining the optimal number of clusters and managing complex data. Additionally, the inability to effectively update based on feedback and lower predictive performance compared to supervised methods necessitated exploring supervised alternatives. In response, this thesis proposes a methodology where the unsupervised approach simulates consumer behavior in an offline environment. Supervised alternatives are pre-trained on the clustering-based system to replicate it and come with the ability to be fine-tuned based on live traffic. Three supervised alternatives — KNN, XGBoost, and a neural network — are developed and compared. Given the supervised recommendation system adaptability based on feedback, supervised methods can provide more accurate, personalized recommendations in the insurance domain. The XGBoost and neural network-based recommendation systems were able to replicate the unsupervised approach, and their expressive power makes them valid candidate models to further evaluate on live traffic. The thesis concludes with the potential to both improve and adapt these recommendation systems to other insurance types, marking a significant step toward more personalized, user-friendly insurance services. Recommendation System Car Insurance Machine Learning Cluster Analysis KNN XGBoost Neural Network Mathematics Matematik
29	Improving House Price Prediction Models: Exploring the Impact of Macroeconomic Features Holmqvist, Martin, Hansson, Max January 2023 (has links) This thesis investigates if house price prediction models perform better when adding macroe- conomic features to a data set with only house-specific features. Previous research has shown that tree-based models perform well when predicting house prices, especially the algorithms random forest and XGBoost. It is common to rely entirely on house-specific features when training these models. However, studies show that macroeconomic variables such as interest rate, inflation, and GDP affect house prices. Therefore it makes sense to include them in these models and study if they outperform the more traditional models with only house-specific features. The thesis also investigates which algorithm, out of random forest and XGBoost is better at predicting house prices. The results show that the mean absolute error is lower for the XGBoost and random forest models trained on data with macroeconomic features. Furthermore, XGBoost outperformed random forest regardless of the set of features. In Con- clusion, the suggestion is to include macroeconomic features and use the XGBoost algorithm when predicting house prices. Machine learning random forest xgboost macroeconomic features house prices Probability Theory and Statistics Sannolikhetsteori och statistik
30	Application of Machine Learning to Financial Trading Horemuz, Michal January 2018 (has links) Machine learning methods have become powerful tools used in multiple industries. They have been successfully applied to problems such as image recognition, speech recognition and machine translation, among others. In this report, we investigated several machine learning methods for forecasting five different bond indexes. We have implemented and analyzed Feedforward Neural Nets, LSTMs, Q-Networks and Gradient Boosted Trees, and compared them to the Buy&Hold strategy. We performed manual feature extraction based on some popular features used in the industry. The features were extracted from several financial instruments and were used as predictor variables. The results showed that XGBoost and Feedforward Neural Networks were consistently able to beat the Buy&Hold strategy for three of five bond indexes. / Maskininlärningsmetoder har blivit kraftfulla verktyg som används i flera problemområden. De har framgångsrikt tillämpats på problem som bland annat bildigenkänning, taligenkänning och maskinöversättning. I denna rapport har vi undersökt flera maskininlärningsmetoder för att förutse fem olika obligationsindex. Vi har implementerat och analyserat Feedforward Neural Nets, LSTMs, Q-Networks och Gradient Boosted Trees, och jämfört dem med Buy\&Hold strategin. Vi har utfört manuell extraktion av features baserat på några populära funktioner som används inom industrin. Dessa features beräknades från flera finansiella instrument och användes som prediktorvariabler. Resultaten visar att XGBoost och Feedforward Neural Networks kan konsekvent slå Buy\&Hold strategin för tre av fem obligationsindex. Machine Learning Financial Trading XGBoost Bond index Michal Horemuz Computer Sciences Datavetenskap (datalogi)

Search results