Spelling suggestions: "subject:"extreme 9gradient boosting"" "subject:"extreme 9gradient roosting""
11 |
Using supervised learning methods to predict the stop duration of heavy vehicles.Oldenkamp, Emiel January 2020 (has links)
In this thesis project, we attempt to predict the stop duration of heavy vehicles using data based on GPS positions collected in a previous project. All of the training and prediction is done in AWS SageMaker, and we explore possibilities with Linear Learner, K-Nearest Neighbors and XGBoost, all of which are explained in this paper. Although we were not able to construct a production-grade model within the time frame of the thesis, we were able to show that the potential for such a model does exist given more time, and propose some suggestions for the paths one can take to improve on the endpoint of this project.
|
12 |
Introduction à l’apprentissage automatique en pharmacométrie : concepts et applicationsLeboeuf, Paul-Antoine 05 1900 (has links)
L’apprentissage automatique propose des outils pour faire face aux problématiques d’aujourd’hui et de demain. Les récentes percées en sciences computationnelles et l’émergence du phénomène des mégadonnées ont permis à l’apprentissage automatique d’être mis à l’avant plan tant dans le monde académique que dans la société. Les récentes réalisations de l’apprentissage automatique dans le domaine du langage naturel, de la vision et en médecine parlent d’eux-mêmes. La liste des sciences et domaines qui bénéficient des techniques de l’apprentissage automatique est longue.
Cependant, les tentatives de coopération avec la pharmacométrie et les sciences connexes sont timides et peu nombreuses. L’objectif de ce projet de maitrise est d’explorer le potentiel de l’apprentissage automatique en sciences pharmaceutiques. Cela a été réalisé par l’application de techniques et des méthodes d’apprentissage automatique à des situations de pharmacologie clinique et de pharmacométrie. Le projet a été divisé en trois parties. La première partie propose un algorithme pour renforcer la fiabilité de l’étape de présélection des covariables d’un modèle de pharmacocinétique de population. Une forêt aléatoire et l’XGBoost ont été utilisés pour soutenir la présélection des covariables. Les indicateurs d’importance relative des variables pour la forêt aléatoire et pour l’XGBoost ont bien identifié l’importance de toutes les covariables qui avaient un effet sur les différents paramètres du modèle PK de référence. La seconde partie confirme qu’il est possible d’estimer des concentrations plasmatiques avec des méthodes différentes de celles actuellement utilisés en pharmacocinétique. Les mêmes algorithmes ont été sélectionnés et leur ajustement pour la tâche était appréciable. La troisième partie confirme la possibilité de faire usage des méthodes d'apprentissage automatique pour la prédiction de relations complexes et typiques à la pharmacologie clinique. Encore une fois, la forêt aléatoire et l’XGBoost ont donné lieu à un ajustement appréciable. / Machine learning offers tools to deal with current problematics. Recent breakthroughs in computational sciences and the emergence of the big data phenomenon have brought machine learning to the forefront in both academia and society. The recent achievements of machine learning in natural language, computational vision and medicine speak for themselves. The list of sciences and fields that benefit from machine learning techniques is long.
However, attempts to cooperate with pharmacometrics and related sciences are timid and limited. The aim of this Master thesis is to explore the potential of machine learning in pharmaceutical sciences. This has been done through the application of machine learning techniques and methods to situations of clinical pharmacology and pharmacometrics. The project was divided into three parts. The first part proposes an algorithm to enhance the reliability of the covariate pre-selection step of a population pharmacokinetic model. Random forest and XGBoost were used to support the screening of covariates. The indicators of the relative importance of the variables for the random forest and for XGBoost recognized the importance of all the covariates that influenced the various parameters of the PK model of reference. The second part exemplifies the estimation of plasma concentrations using machine learning methods. The same algorithms were selected and their fit for the task was appreciable. The third part confirms the possibility to apply machine learning methods in the prediction of complex relationships, as some typical clinical pharmacology relationships. Again, random forest and XGBoost got a nice adjustment.
|
13 |
Toward an application of machine learning for predicting foreign trade in services – a pilot study for Statistics SwedenUnnebäck, Tea January 2023 (has links)
The objective of this thesis is to investigate the possibility of using machine learn- ing at Statistics Sweden within the Foreign Trade in Services (FTS) statistic, to predict the likelihood of a unit to conduct foreign trade in services. The FTS survey is a sample survey, for which there is no natural frame to sample from. Therefore, prior to sampling a frame is manually constructed each year, starting with a register of all Swedish companies and agencies and in a rule- based manner narrowing it down to contain only what is classified as units likely to trade in services during the year to come. An automatic procedure that would enable reliable predictions is requested. To this end, three different machine learning methods have been analyzed, two rule- based methods (random forest and extreme gradient boosting) and one distance- based method (k nearest neighbors). The models arising from these methods are trained and tested on historically sampled units, for which it is known whether they did trade or not. The results indicate that the two rule-based methods perform well in classifying likely traders. The random forest model is better at finding traders, while the extreme gradient boosting model is better at finding non-traders. The results also indicate interesting patterns when studying different metrics for the models. The results also indicate that when training the rule-based models, the year in which the training data was sampled needs to be taken into account. This entails that cross-validation with random folds should not be used, but rather grouped cross-validation based on year. By including a feature that mirror the state of the economy, the model can adapt its rules to this, meaning that the rules learned on training data can be extended to years beyond training data. Based on the observed results, the final recommendation is to further develop and investigate the performance of the random forest model.
|
Page generated in 0.0836 seconds