Global ETD Search

41	Quantitative Retrieval of Organic Soil Properties from Visible Near-Infrared Shortwave Infrared (Vis-NIR-SWIR) Spectroscopy Using Fractal-Based Feature Extraction. Liu, Lanfa, Buchroithner, Manfred, Ji, Min, Dong, Yunyun, Zhang, Rongchung 27 March 2017 (has links) Visible and near-infrared diffuse reflectance spectroscopy has been demonstrated to be a fast and cheap tool for estimating a large number of chemical and physical soil properties, and effective features extracted from spectra are crucial to correlating with these properties. We adopt a novel methodology for feature extraction of soil spectroscopy based on fractal geometry. The spectrum can be divided into multiple segments with different step–window pairs. For each segmented spectral curve, the fractal dimension value was calculated using variation estimators with power indices 0.5, 1.0 and 2.0. Thus, the fractal feature can be generated by multiplying the fractal dimension value with spectral energy. To assess and compare the performance of new generated features, we took advantage of organic soil samples from the large-scale European Land Use/Land Cover Area Frame Survey (LUCAS). Gradient-boosting regression models built using XGBoost library with soil spectral library were developed to estimate N, pH and soil organic carbon (SOC) contents. Features generated by a variogram estimator performed better than two other estimators and the principal component analysis (PCA). The estimation results for SOC were coefficient of determination (R2) = 0.85, root mean square error (RMSE) = 56.7 g/kg, the ratio of percent deviation (RPD) = 2.59; for pH: R2 = 0.82, RMSE = 0.49 g/kg, RPD = 2.31; and for N: R2 = 0.77, RMSE = 3.01 g/kg, RPD = 2.09. Even better results could be achieved when fractal features were combined with PCA components. Fractal features generated by the proposed method can improve estimation accuracies of soil properties and simultaneously maintain the original spectral curve shape. info:eu-repo/classification/ddc/620 ddc:620
42	Using supervised learning methods to predict the stop duration of heavy vehicles. Oldenkamp, Emiel January 2020 (has links) In this thesis project, we attempt to predict the stop duration of heavy vehicles using data based on GPS positions collected in a previous project. All of the training and prediction is done in AWS SageMaker, and we explore possibilities with Linear Learner, K-Nearest Neighbors and XGBoost, all of which are explained in this paper. Although we were not able to construct a production-grade model within the time frame of the thesis, we were able to show that the potential for such a model does exist given more time, and propose some suggestions for the paths one can take to improve on the endpoint of this project. Mathematics Applied Mathematics Machine Learning Supervised Learning Regression Linear Learner Linear Regression K-Nearest neighbors Extreme Gradient Boosting XGBoost AWS SageMaker Scania Data Science Data Analysis Other Mathematics Annan matematik
43	Introduction à l’apprentissage automatique en pharmacométrie : concepts et applications Leboeuf, Paul-Antoine 05 1900 (has links) L’apprentissage automatique propose des outils pour faire face aux problématiques d’aujourd’hui et de demain. Les récentes percées en sciences computationnelles et l’émergence du phénomène des mégadonnées ont permis à l’apprentissage automatique d’être mis à l’avant plan tant dans le monde académique que dans la société. Les récentes réalisations de l’apprentissage automatique dans le domaine du langage naturel, de la vision et en médecine parlent d’eux-mêmes. La liste des sciences et domaines qui bénéficient des techniques de l’apprentissage automatique est longue. Cependant, les tentatives de coopération avec la pharmacométrie et les sciences connexes sont timides et peu nombreuses. L’objectif de ce projet de maitrise est d’explorer le potentiel de l’apprentissage automatique en sciences pharmaceutiques. Cela a été réalisé par l’application de techniques et des méthodes d’apprentissage automatique à des situations de pharmacologie clinique et de pharmacométrie. Le projet a été divisé en trois parties. La première partie propose un algorithme pour renforcer la fiabilité de l’étape de présélection des covariables d’un modèle de pharmacocinétique de population. Une forêt aléatoire et l’XGBoost ont été utilisés pour soutenir la présélection des covariables. Les indicateurs d’importance relative des variables pour la forêt aléatoire et pour l’XGBoost ont bien identifié l’importance de toutes les covariables qui avaient un effet sur les différents paramètres du modèle PK de référence. La seconde partie confirme qu’il est possible d’estimer des concentrations plasmatiques avec des méthodes différentes de celles actuellement utilisés en pharmacocinétique. Les mêmes algorithmes ont été sélectionnés et leur ajustement pour la tâche était appréciable. La troisième partie confirme la possibilité de faire usage des méthodes d'apprentissage automatique pour la prédiction de relations complexes et typiques à la pharmacologie clinique. Encore une fois, la forêt aléatoire et l’XGBoost ont donné lieu à un ajustement appréciable. / Machine learning offers tools to deal with current problematics. Recent breakthroughs in computational sciences and the emergence of the big data phenomenon have brought machine learning to the forefront in both academia and society. The recent achievements of machine learning in natural language, computational vision and medicine speak for themselves. The list of sciences and fields that benefit from machine learning techniques is long. However, attempts to cooperate with pharmacometrics and related sciences are timid and limited. The aim of this Master thesis is to explore the potential of machine learning in pharmaceutical sciences. This has been done through the application of machine learning techniques and methods to situations of clinical pharmacology and pharmacometrics. The project was divided into three parts. The first part proposes an algorithm to enhance the reliability of the covariate pre-selection step of a population pharmacokinetic model. Random forest and XGBoost were used to support the screening of covariates. The indicators of the relative importance of the variables for the random forest and for XGBoost recognized the importance of all the covariates that influenced the various parameters of the PK model of reference. The second part exemplifies the estimation of plasma concentrations using machine learning methods. The same algorithms were selected and their fit for the task was appreciable. The third part confirms the possibility to apply machine learning methods in the prediction of complex relationships, as some typical clinical pharmacology relationships. Again, random forest and XGBoost got a nice adjustment. Apprentissage automatique Méthodes ensemblistes Pharmacométrie Sciences pharmaceutiques Forêts aléatoires eXtreme Gradient Boosting Machine learning Ensemble methods Pharmacometrics Pharmaceutical sciences Random forest
44	Data Driven Energy Efficiency of Ships Taspinar, Tarik January 2022 (has links) Decreasing the fuel consumption and thus greenhouse gas emissions of vessels has emerged as a critical topic for both ship operators and policy makers in recent years. The speed of vessels has long been recognized to have highest impact on fuel consumption. The solution suggestions like "speed optimization" and "speed reduction" are ongoing discussion topics for International Maritime Organization. The aim of this study are to develop a speed optimization model using time-constrained genetic algorithms (GA). Subsequent to this, this paper also presents the application of machine learning (ML) regression methods in setting up a model with the aim of predicting the fuel consumption of vessels. Local outlier factor algorithm is used to eliminate outlier in prediction features. In boosting and tree-based regression prediction methods, the overfitting problem is observed after hyperparameter tuning. Early stopping technique is applied for overfitted models.In this study, speed is also found as the most important feature for fuel consumption prediction models. On the other hand, GA evaluation results showed that random modifications in default speed profile can increase GA performance and thus fuel savings more than constant speed limits during voyages. The results of GA also indicate that using high crossover rates and low mutations rates can increase fuel saving.Further research is recommended to include fuel and bunker prices to determine more accurate fuel efficiency. Local outlier factor k-nearest neighbors random forest gradient boosting support vector machines ensemble learning ship speed optimization genetic algorithm DEAP HyperOpt Annan elektroteknik och elektronik
45	Multi-level Safety Performance Functions For High Speed Facilities Ahmed, Mohamed 01 January 2012 (has links) High speed facilities are considered the backbone of any successful transportation system; Interstates, freeways, and expressways carry the majority of daily trips on the transportation network. Although these types of roads are relatively considered the safest among other types of roads, they still experience many crashes, many of which are severe, which not only affect human lives but also can have tremendous economical and social impacts. These facts signify the necessity of enhancing the safety of these high speed facilities to ensure better and efficient operation. Safety problems could be assessed through several approaches that can help in mitigating the crash risk on long and short term basis. Therefore, the main focus of the research in this dissertation is to provide a framework of risk assessment to promote safety and enhance mobility on freeways and expressways. Multi-level Safety Performance Functions (SPFs) were developed at the aggregate level using historical crash data and the corresponding exposure and risk factors to identify and rank sites with promise (hot-spots). Additionally, SPFs were developed at the disaggregate level utilizing real-time weather data collected from meteorological stations located at the freeway section as well as traffic flow parameters collected from different detection systems such as Automatic Vehicle Identification (AVI) and Remote Traffic Microwave Sensors (RTMS). These disaggregate SPFs can identify real-time risks due to turbulent traffic conditions and their interactions with other risk factors. In this study, two main datasets were obtained from two different regions. Those datasets comprise historical crash data, roadway geometrical characteristics, aggregate weather and traffic parameters as well as real-time weather and traffic data. iii At the aggregate level, Bayesian hierarchical models with spatial and random effects were compared to Poisson models to examine the safety effects of roadway geometrics on crash occurrence along freeway sections that feature mountainous terrain and adverse weather. At the disaggregate level; a main framework of a proactive safety management system using traffic data collected from AVI and RTMS, real-time weather and geometrical characteristics was provided. Different statistical techniques were implemented. These techniques ranged from classical frequentist classification approaches to explain the relationship between an event (crash) occurring at a given time and a set of risk factors in real time to other more advanced models. Bayesian statistics with updating approach to update beliefs about the behavior of the parameter with prior knowledge in order to achieve more reliable estimation was implemented. Also a relatively recent and promising Machine Learning technique (Stochastic Gradient Boosting) was utilized to calibrate several models utilizing different datasets collected from mixed detection systems as well as real-time meteorological stations. The results from this study suggest that both levels of analyses are important, the aggregate level helps in providing good understanding of different safety problems, and developing policies and countermeasures to reduce the number of crashes in total. At the disaggregate level, real-time safety functions help toward more proactive traffic management system that will not only enhance the performance of the high speed facilities and the whole traffic network but also provide safer mobility for people and goods. In general, the proposed multi-level analyses are useful in providing roadway authorities with detailed information on where countermeasures must be implemented and when resources should be devoted. The study also proves that traffic data collected from different detection systems could be a useful asset that should be utilized iv appropriately not only to alleviate traffic congestion but also to mitigate increased safety risks. The overall proposed framework can maximize the benefit of the existing archived data for freeway authorities as well as for road users. Traffic safety real time crash analysis freeway expressway active traffic management intelligent transportation systems advanced traveler information systems data mining statistical modeling full bayesian hierarchical stochastic gradient boosting Civil Engineering Engineering
46	Machine Learning based Predictive Data Analytics for Embedded Test Systems Al Hanash, Fayad January 2023 (has links) Organizations gather enormous amounts of data and analyze these data to extract insights that can be useful for them and help them to make better decisions. Predictive data analytics is a crucial subfield within data analytics that make accurate predictions. Predictive data analytics extracts insights from data by using machine learning algorithms. This thesis presents the supervised learning algorithm to perform predicative data analytics in Embedded Test System at the Nordic Engineering Partner company. Predictive Maintenance is a concept that is often used in manufacturing industries which refers to predicting asset failures before they occur. The machine learning algorithms used in this thesis are support vector machines, multi-layer perceptrons, random forests, and gradient boosting. Both binary and multi-class classifier have been provided to fit the models, and cross-validation, sampling techniques, and a confusion matrix have been provided to accurately measure their performance. In addition to accuracy, recall, precision, f1, kappa, mcc, and roc auc measurements are used as well. The prediction models that are fitted achieve high accuracy. Machine learning Artificial Intelligence Predictive data analytics Embedded test systems Confusion matrix Predictive maintenance Support vector machines Random forest Gradient Boosting Multi-layer perceptron Binary classification Multi-class classification Computer Sciences Datavetenskap (datalogi)
47	Toward an application of machine learning for predicting foreign trade in services – a pilot study for Statistics Sweden Unnebäck, Tea January 2023 (has links) The objective of this thesis is to investigate the possibility of using machine learn- ing at Statistics Sweden within the Foreign Trade in Services (FTS) statistic, to predict the likelihood of a unit to conduct foreign trade in services. The FTS survey is a sample survey, for which there is no natural frame to sample from. Therefore, prior to sampling a frame is manually constructed each year, starting with a register of all Swedish companies and agencies and in a rule- based manner narrowing it down to contain only what is classified as units likely to trade in services during the year to come. An automatic procedure that would enable reliable predictions is requested. To this end, three different machine learning methods have been analyzed, two rule- based methods (random forest and extreme gradient boosting) and one distance- based method (k nearest neighbors). The models arising from these methods are trained and tested on historically sampled units, for which it is known whether they did trade or not. The results indicate that the two rule-based methods perform well in classifying likely traders. The random forest model is better at finding traders, while the extreme gradient boosting model is better at finding non-traders. The results also indicate interesting patterns when studying different metrics for the models. The results also indicate that when training the rule-based models, the year in which the training data was sampled needs to be taken into account. This entails that cross-validation with random folds should not be used, but rather grouped cross-validation based on year. By including a feature that mirror the state of the economy, the model can adapt its rules to this, meaning that the rules learned on training data can be extended to years beyond training data. Based on the observed results, the final recommendation is to further develop and investigate the performance of the random forest model. foreign trade in services sampling sampling frame statistics machine learning random forest predicting extreme gradient boosting k nearest neighbors k-nn official statistics statistics sweden Probability Theory and Statistics Sannolikhetsteori och statistik
48	Analýza a klasifikace dat ze snímače mozkové aktivity / Data Analysis and Clasification from the Brain Activity Detector Jileček, Jan January 2019 (has links) This thesis aims to implement methods for recording EEG data obtained with the neural activity sensor OpenBCI Ultracortex IV headset. It also describes neurofeedback, methods of obtaining data from the motor cortex for further analysis and takes a look at the machine learning algorithms best suited for the presented problem. Multiple training and testing datasets are created, as well as a tool for recording the brain activity of a headset-wearing test subject, which is being visually presented with cognitive challenges on the screen in front of him. A neurofeedback demo app has been developed, presented and later used for calibration of new test subjects. Next part is data analysis, which aims to discriminate the left and right hand movement intention signatures in the brain motor cortex. Multiple classification methods are used and their utility reviewed.
49	How Certain Are You of Getting a Parking Space? : A deep learning approach to parking availability prediction / Maskininlärning för prognos av tillgängliga parkeringsplatser Nilsson, Mathias, von Corswant, Sophie January 2020 (has links) Traffic congestion is a severe problem in urban areas and it leads to the emission of greenhouse gases and air pollution. In general, drivers lack knowledge of the location and availability of free parking spaces in urban cities. This leads to people driving around searching for parking places, and about one-third of traffic congestion in cities is due to drivers searching for an available parking lot. In recent years, various solutions to provide parking information ahead have been proposed. The vast majority of these solutions have been applied in large cities, such as Beijing and San Francisco. This thesis has been conducted in collaboration with Knowit and Dukaten to predict parking occupancy in car parks one hour ahead in the relatively small city of Linköping. To make the predictions, this study has investigated the possibility to use long short-term memory and gradient boosting regression trees, trained on historical parking data. To enhance decision making, the predictive uncertainty was estimated using the novel approach Monte Carlo dropout for the former, and quantile regression for the latter. This study reveals that both of the models can predict parking occupancy ahead of time and they are found to excel in different contexts. The inclusion of exogenous features can improve prediction quality. More specifically, we found that incorporating hour of the day improved the models’ performances, while weather features did not contribute much. As for uncertainty, the employed method Monte Carlo dropout was shown to be sensitive to parameter tuning to obtain good uncertainty estimates. monte carlo dropout mc dropout long short term memory lstm neural network recurrent neural network gradient tree boosting rnn gradient boosting regression tree gbrt quantile regression traffic congestion parking parking occupancy parking availability parking space parking lot machine learning Other Computer and Information Science Annan data- och informationsvetenskap
50	Peak shaving optimisation in school kitchens : A machine learning approach Alhoush, George, Edvardsson, Emil January 2022 (has links) With the increasing electrification of todays society the electrical grid is experiencing increasing pressure from demand. One factor that affects the stability of the grid are the time intervals at which power demand is at its highest which is referred to as peak demand. This project was conducted in order to reduce the peak demand through a process called peak shaving in order to relieve some of this pressure through the use of batteries and renewable energy. By doing so, the user of such systems could reduce the installation cost of their electrical infrastructure as well as the electrical billing. Peak shaving in this project was implemented using machine learning algorithms that predicted the daily power consumption in school kitchens with help of their food menus, which were then fed to an algorithm to steer a battery according to the results. All of these project findings are compared to another system installed by a company to decide whether the algorithm has the right accuracy and performance. The results of the simulations were promising as the algorithm was able to detect the vast majority of the peaks and perform peak shaving intelligently. Based on the graphs and values presented in this report, it can be concluded that the algorithm is ready to be implemented in the real world with the potential to contribute to a long-term sustainable electrical grid while saving money for the user. Peak shaving Machine learning Peak-shaving Battery optimisation Random Forest Gradient Boosting Peak demand Peak-demand Sustainable Electrification AI A.I Artificial intelligence Energy system Maskininlärning Utjämnings av effekttoppar Batterioptimering Hållbarhet Elektrifiering Artificiell intelligens Energisystem Energy Systems Energisystem Software Engineering Programvaruteknik Computer Sciences Datavetenskap (datalogi)

Search results