Global ETD Search

41	Machine learning strategies for multi-step-ahead time series forecasting Ben Taieb, Souhaib 08 October 2014 (has links) How much electricity is going to be consumed for the next 24 hours? What will be the temperature for the next three days? What will be the number of sales of a certain product for the next few months? Answering these questions often requires forecasting several future observations from a given sequence of historical observations, called a time series. <p><p>Historically, time series forecasting has been mainly studied in econometrics and statistics. In the last two decades, machine learning, a field that is concerned with the development of algorithms that can automatically learn from data, has become one of the most active areas of predictive modeling research. This success is largely due to the superior performance of machine learning prediction algorithms in many different applications as diverse as natural language processing, speech recognition and spam detection. However, there has been very little research at the intersection of time series forecasting and machine learning.<p><p>The goal of this dissertation is to narrow this gap by addressing the problem of multi-step-ahead time series forecasting from the perspective of machine learning. To that end, we propose a series of forecasting strategies based on machine learning algorithms.<p><p>Multi-step-ahead forecasts can be produced recursively by iterating a one-step-ahead model, or directly using a specific model for each horizon. As a first contribution, we conduct an in-depth study to compare recursive and direct forecasts generated with different learning algorithms for different data generating processes. More precisely, we decompose the multi-step mean squared forecast errors into the bias and variance components, and analyze their behavior over the forecast horizon for different time series lengths. The results and observations made in this study then guide us for the development of new forecasting strategies.<p><p>In particular, we find that choosing between recursive and direct forecasts is not an easy task since it involves a trade-off between bias and estimation variance that depends on many interacting factors, including the learning model, the underlying data generating process, the time series length and the forecast horizon. As a second contribution, we develop multi-stage forecasting strategies that do not treat the recursive and direct strategies as competitors, but seek to combine their best properties. More precisely, the multi-stage strategies generate recursive linear forecasts, and then adjust these forecasts by modeling the multi-step forecast residuals with direct nonlinear models at each horizon, called rectification models. We propose a first multi-stage strategy, that we called the rectify strategy, which estimates the rectification models using the nearest neighbors model. However, because recursive linear forecasts often need small adjustments with real-world time series, we also consider a second multi-stage strategy, called the boost strategy, that estimates the rectification models using gradient boosting algorithms that use so-called weak learners.<p><p>Generating multi-step forecasts using a different model at each horizon provides a large modeling flexibility. However, selecting these models independently can lead to irregularities in the forecasts that can contribute to increase the forecast variance. The problem is exacerbated with nonlinear machine learning models estimated from short time series. To address this issue, and as a third contribution, we introduce and analyze multi-horizon forecasting strategies that exploit the information contained in other horizons when learning the model for each horizon. In particular, to select the lag order and the hyperparameters of each model, multi-horizon strategies minimize forecast errors over multiple horizons rather than just the horizon of interest.<p><p>We compare all the proposed strategies with both the recursive and direct strategies. We first apply a bias and variance study, then we evaluate the different strategies using real-world time series from two past forecasting competitions. For the rectify strategy, in addition to avoiding the choice between recursive and direct forecasts, the results demonstrate that it has better, or at least has close performance to, the best of the recursive and direct forecasts in different settings. For the multi-horizon strategies, the results emphasize the decrease in variance compared to single-horizon strategies, especially with linear or weakly nonlinear data generating processes. Overall, we found that the accuracy of multi-step-ahead forecasts based on machine learning algorithms can be significantly improved if an appropriate forecasting strategy is used to select the model parameters and to generate the forecasts.<p><p>Lastly, as a fourth contribution, we have participated in the Load Forecasting track of the Global Energy Forecasting Competition 2012. The competition involved a hierarchical load forecasting problem where we were required to backcast and forecast hourly loads for a US utility with twenty geographical zones. Our team, TinTin, ranked fifth out of 105 participating teams, and we have been awarded an IEEE Power & Energy Society award.<p> / Doctorat en sciences, Spécialisation Informatique / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Machine learning Time-series analysis -- Data processing Apprentissage automatique Série chronologique -- Informatique forecasting competitions load forecasting nearest neighbors neural networks gradient boosting direct forecasts forecasting strategies recursive forecasts machine learning time series forecasting
42	Gradient Boosting Machine and Artificial Neural Networks in R and H2O / Gradient Boosting Machine and Artificial Neural Networks in R and H2O Sabo, Juraj January 2016 (has links) Artificial neural networks are fascinating machine learning algorithms. They used to be considered unreliable and computationally very expensive. Now it is known that modern neural networks can be quite useful, but their computational expensiveness unfortunately remains. Statistical boosting is considered to be one of the most important machine learning ideas. It is based on an ensemble of weak models that together create a powerful learning system. The goal of this thesis is the comparison of these machine learning models on three use cases. The first use case deals with modeling the probability of burglary in the city of Chicago. The second use case is the typical example of customer churn prediction in telecommunication industry and the last use case is related to the problematic of the computer vision. The second goal of this thesis is to introduce an open-source machine learning platform called H2O. It includes, among other things, an interface for R and it is designed to run in standalone mode or on Hadoop. The thesis also includes the introduction into an open-source software library Apache Hadoop that allows for distributed processing of big data. Concretely into its open-source distribution Hortonworks Data Platform.
43	Quantitative Retrieval of Organic Soil Properties from Visible Near-Infrared Shortwave Infrared (Vis-NIR-SWIR) Spectroscopy Using Fractal-Based Feature Extraction. Liu, Lanfa, Buchroithner, Manfred, Ji, Min, Dong, Yunyun, Zhang, Rongchung 27 March 2017 (has links) Visible and near-infrared diffuse reflectance spectroscopy has been demonstrated to be a fast and cheap tool for estimating a large number of chemical and physical soil properties, and effective features extracted from spectra are crucial to correlating with these properties. We adopt a novel methodology for feature extraction of soil spectroscopy based on fractal geometry. The spectrum can be divided into multiple segments with different step–window pairs. For each segmented spectral curve, the fractal dimension value was calculated using variation estimators with power indices 0.5, 1.0 and 2.0. Thus, the fractal feature can be generated by multiplying the fractal dimension value with spectral energy. To assess and compare the performance of new generated features, we took advantage of organic soil samples from the large-scale European Land Use/Land Cover Area Frame Survey (LUCAS). Gradient-boosting regression models built using XGBoost library with soil spectral library were developed to estimate N, pH and soil organic carbon (SOC) contents. Features generated by a variogram estimator performed better than two other estimators and the principal component analysis (PCA). The estimation results for SOC were coefficient of determination (R2) = 0.85, root mean square error (RMSE) = 56.7 g/kg, the ratio of percent deviation (RPD) = 2.59; for pH: R2 = 0.82, RMSE = 0.49 g/kg, RPD = 2.31; and for N: R2 = 0.77, RMSE = 3.01 g/kg, RPD = 2.09. Even better results could be achieved when fractal features were combined with PCA components. Fractal features generated by the proposed method can improve estimation accuracies of soil properties and simultaneously maintain the original spectral curve shape. info:eu-repo/classification/ddc/620 ddc:620
44	Using supervised learning methods to predict the stop duration of heavy vehicles. Oldenkamp, Emiel January 2020 (has links) In this thesis project, we attempt to predict the stop duration of heavy vehicles using data based on GPS positions collected in a previous project. All of the training and prediction is done in AWS SageMaker, and we explore possibilities with Linear Learner, K-Nearest Neighbors and XGBoost, all of which are explained in this paper. Although we were not able to construct a production-grade model within the time frame of the thesis, we were able to show that the potential for such a model does exist given more time, and propose some suggestions for the paths one can take to improve on the endpoint of this project. Mathematics Applied Mathematics Machine Learning Supervised Learning Regression Linear Learner Linear Regression K-Nearest neighbors Extreme Gradient Boosting XGBoost AWS SageMaker Scania Data Science Data Analysis Other Mathematics Annan matematik
45	Introduction à l’apprentissage automatique en pharmacométrie : concepts et applications Leboeuf, Paul-Antoine 05 1900 (has links) L’apprentissage automatique propose des outils pour faire face aux problématiques d’aujourd’hui et de demain. Les récentes percées en sciences computationnelles et l’émergence du phénomène des mégadonnées ont permis à l’apprentissage automatique d’être mis à l’avant plan tant dans le monde académique que dans la société. Les récentes réalisations de l’apprentissage automatique dans le domaine du langage naturel, de la vision et en médecine parlent d’eux-mêmes. La liste des sciences et domaines qui bénéficient des techniques de l’apprentissage automatique est longue. Cependant, les tentatives de coopération avec la pharmacométrie et les sciences connexes sont timides et peu nombreuses. L’objectif de ce projet de maitrise est d’explorer le potentiel de l’apprentissage automatique en sciences pharmaceutiques. Cela a été réalisé par l’application de techniques et des méthodes d’apprentissage automatique à des situations de pharmacologie clinique et de pharmacométrie. Le projet a été divisé en trois parties. La première partie propose un algorithme pour renforcer la fiabilité de l’étape de présélection des covariables d’un modèle de pharmacocinétique de population. Une forêt aléatoire et l’XGBoost ont été utilisés pour soutenir la présélection des covariables. Les indicateurs d’importance relative des variables pour la forêt aléatoire et pour l’XGBoost ont bien identifié l’importance de toutes les covariables qui avaient un effet sur les différents paramètres du modèle PK de référence. La seconde partie confirme qu’il est possible d’estimer des concentrations plasmatiques avec des méthodes différentes de celles actuellement utilisés en pharmacocinétique. Les mêmes algorithmes ont été sélectionnés et leur ajustement pour la tâche était appréciable. La troisième partie confirme la possibilité de faire usage des méthodes d'apprentissage automatique pour la prédiction de relations complexes et typiques à la pharmacologie clinique. Encore une fois, la forêt aléatoire et l’XGBoost ont donné lieu à un ajustement appréciable. / Machine learning offers tools to deal with current problematics. Recent breakthroughs in computational sciences and the emergence of the big data phenomenon have brought machine learning to the forefront in both academia and society. The recent achievements of machine learning in natural language, computational vision and medicine speak for themselves. The list of sciences and fields that benefit from machine learning techniques is long. However, attempts to cooperate with pharmacometrics and related sciences are timid and limited. The aim of this Master thesis is to explore the potential of machine learning in pharmaceutical sciences. This has been done through the application of machine learning techniques and methods to situations of clinical pharmacology and pharmacometrics. The project was divided into three parts. The first part proposes an algorithm to enhance the reliability of the covariate pre-selection step of a population pharmacokinetic model. Random forest and XGBoost were used to support the screening of covariates. The indicators of the relative importance of the variables for the random forest and for XGBoost recognized the importance of all the covariates that influenced the various parameters of the PK model of reference. The second part exemplifies the estimation of plasma concentrations using machine learning methods. The same algorithms were selected and their fit for the task was appreciable. The third part confirms the possibility to apply machine learning methods in the prediction of complex relationships, as some typical clinical pharmacology relationships. Again, random forest and XGBoost got a nice adjustment. Apprentissage automatique Méthodes ensemblistes Pharmacométrie Sciences pharmaceutiques Forêts aléatoires eXtreme Gradient Boosting Machine learning Ensemble methods Pharmacometrics Pharmaceutical sciences Random forest
46	Data Driven Energy Efficiency of Ships Taspinar, Tarik January 2022 (has links) Decreasing the fuel consumption and thus greenhouse gas emissions of vessels has emerged as a critical topic for both ship operators and policy makers in recent years. The speed of vessels has long been recognized to have highest impact on fuel consumption. The solution suggestions like "speed optimization" and "speed reduction" are ongoing discussion topics for International Maritime Organization. The aim of this study are to develop a speed optimization model using time-constrained genetic algorithms (GA). Subsequent to this, this paper also presents the application of machine learning (ML) regression methods in setting up a model with the aim of predicting the fuel consumption of vessels. Local outlier factor algorithm is used to eliminate outlier in prediction features. In boosting and tree-based regression prediction methods, the overfitting problem is observed after hyperparameter tuning. Early stopping technique is applied for overfitted models.In this study, speed is also found as the most important feature for fuel consumption prediction models. On the other hand, GA evaluation results showed that random modifications in default speed profile can increase GA performance and thus fuel savings more than constant speed limits during voyages. The results of GA also indicate that using high crossover rates and low mutations rates can increase fuel saving.Further research is recommended to include fuel and bunker prices to determine more accurate fuel efficiency. Local outlier factor k-nearest neighbors random forest gradient boosting support vector machines ensemble learning ship speed optimization genetic algorithm DEAP HyperOpt Annan elektroteknik och elektronik
47	Multi-level Safety Performance Functions For High Speed Facilities Ahmed, Mohamed 01 January 2012 (has links) High speed facilities are considered the backbone of any successful transportation system; Interstates, freeways, and expressways carry the majority of daily trips on the transportation network. Although these types of roads are relatively considered the safest among other types of roads, they still experience many crashes, many of which are severe, which not only affect human lives but also can have tremendous economical and social impacts. These facts signify the necessity of enhancing the safety of these high speed facilities to ensure better and efficient operation. Safety problems could be assessed through several approaches that can help in mitigating the crash risk on long and short term basis. Therefore, the main focus of the research in this dissertation is to provide a framework of risk assessment to promote safety and enhance mobility on freeways and expressways. Multi-level Safety Performance Functions (SPFs) were developed at the aggregate level using historical crash data and the corresponding exposure and risk factors to identify and rank sites with promise (hot-spots). Additionally, SPFs were developed at the disaggregate level utilizing real-time weather data collected from meteorological stations located at the freeway section as well as traffic flow parameters collected from different detection systems such as Automatic Vehicle Identification (AVI) and Remote Traffic Microwave Sensors (RTMS). These disaggregate SPFs can identify real-time risks due to turbulent traffic conditions and their interactions with other risk factors. In this study, two main datasets were obtained from two different regions. Those datasets comprise historical crash data, roadway geometrical characteristics, aggregate weather and traffic parameters as well as real-time weather and traffic data. iii At the aggregate level, Bayesian hierarchical models with spatial and random effects were compared to Poisson models to examine the safety effects of roadway geometrics on crash occurrence along freeway sections that feature mountainous terrain and adverse weather. At the disaggregate level; a main framework of a proactive safety management system using traffic data collected from AVI and RTMS, real-time weather and geometrical characteristics was provided. Different statistical techniques were implemented. These techniques ranged from classical frequentist classification approaches to explain the relationship between an event (crash) occurring at a given time and a set of risk factors in real time to other more advanced models. Bayesian statistics with updating approach to update beliefs about the behavior of the parameter with prior knowledge in order to achieve more reliable estimation was implemented. Also a relatively recent and promising Machine Learning technique (Stochastic Gradient Boosting) was utilized to calibrate several models utilizing different datasets collected from mixed detection systems as well as real-time meteorological stations. The results from this study suggest that both levels of analyses are important, the aggregate level helps in providing good understanding of different safety problems, and developing policies and countermeasures to reduce the number of crashes in total. At the disaggregate level, real-time safety functions help toward more proactive traffic management system that will not only enhance the performance of the high speed facilities and the whole traffic network but also provide safer mobility for people and goods. In general, the proposed multi-level analyses are useful in providing roadway authorities with detailed information on where countermeasures must be implemented and when resources should be devoted. The study also proves that traffic data collected from different detection systems could be a useful asset that should be utilized iv appropriately not only to alleviate traffic congestion but also to mitigate increased safety risks. The overall proposed framework can maximize the benefit of the existing archived data for freeway authorities as well as for road users. Traffic safety real time crash analysis freeway expressway active traffic management intelligent transportation systems advanced traveler information systems data mining statistical modeling full bayesian hierarchical stochastic gradient boosting Civil Engineering Engineering
48	Machine Learning based Predictive Data Analytics for Embedded Test Systems Al Hanash, Fayad January 2023 (has links) Organizations gather enormous amounts of data and analyze these data to extract insights that can be useful for them and help them to make better decisions. Predictive data analytics is a crucial subfield within data analytics that make accurate predictions. Predictive data analytics extracts insights from data by using machine learning algorithms. This thesis presents the supervised learning algorithm to perform predicative data analytics in Embedded Test System at the Nordic Engineering Partner company. Predictive Maintenance is a concept that is often used in manufacturing industries which refers to predicting asset failures before they occur. The machine learning algorithms used in this thesis are support vector machines, multi-layer perceptrons, random forests, and gradient boosting. Both binary and multi-class classifier have been provided to fit the models, and cross-validation, sampling techniques, and a confusion matrix have been provided to accurately measure their performance. In addition to accuracy, recall, precision, f1, kappa, mcc, and roc auc measurements are used as well. The prediction models that are fitted achieve high accuracy. Machine learning Artificial Intelligence Predictive data analytics Embedded test systems Confusion matrix Predictive maintenance Support vector machines Random forest Gradient Boosting Multi-layer perceptron Binary classification Multi-class classification Computer Sciences Datavetenskap (datalogi)
49	Toward an application of machine learning for predicting foreign trade in services – a pilot study for Statistics Sweden Unnebäck, Tea January 2023 (has links) The objective of this thesis is to investigate the possibility of using machine learn- ing at Statistics Sweden within the Foreign Trade in Services (FTS) statistic, to predict the likelihood of a unit to conduct foreign trade in services. The FTS survey is a sample survey, for which there is no natural frame to sample from. Therefore, prior to sampling a frame is manually constructed each year, starting with a register of all Swedish companies and agencies and in a rule- based manner narrowing it down to contain only what is classified as units likely to trade in services during the year to come. An automatic procedure that would enable reliable predictions is requested. To this end, three different machine learning methods have been analyzed, two rule- based methods (random forest and extreme gradient boosting) and one distance- based method (k nearest neighbors). The models arising from these methods are trained and tested on historically sampled units, for which it is known whether they did trade or not. The results indicate that the two rule-based methods perform well in classifying likely traders. The random forest model is better at finding traders, while the extreme gradient boosting model is better at finding non-traders. The results also indicate interesting patterns when studying different metrics for the models. The results also indicate that when training the rule-based models, the year in which the training data was sampled needs to be taken into account. This entails that cross-validation with random folds should not be used, but rather grouped cross-validation based on year. By including a feature that mirror the state of the economy, the model can adapt its rules to this, meaning that the rules learned on training data can be extended to years beyond training data. Based on the observed results, the final recommendation is to further develop and investigate the performance of the random forest model. foreign trade in services sampling sampling frame statistics machine learning random forest predicting extreme gradient boosting k nearest neighbors k-nn official statistics statistics sweden Probability Theory and Statistics Sannolikhetsteori och statistik
50	Проектирование алгоритма прогнозирования сырья : магистерская диссертация / Designing a raw material forecasting algorithm Михайличенко, Л. А., Mikhaylichenko, L. A. January 2024 (has links) В работе решается актуальная бизнес-задача проектирования алгоритма прогнозирования сырья для производственного косметического предприятия на базе машинного обучения, модель экстремального градиентного бустинга (XGBoost), продемонстрировавшая высокую точность и стабильность прогнозов. Собраны наборы данных, проведен анализ и исследованы методы прогнозирования спроса и сырья, включая модели прогнозирования ARIMA, SARIMA, Хольта-Винтерса, Prophet и различные модели машинного обучения. Использовались метрики: MAE, MSE и MAPE, R2. Статистические модели и модели на основе нейронных сетей, такие как LSTM, показали менее стабильные результаты, чем машинное обучение. Разработан комплексный алгоритм прогнозирования сырья, включающий этапы прогнозирования спроса и расчета потребности в сырье. Прототип алгоритма реализован с использованием Streamlit. Предложены рекомендации по внедрению алгоритма, включая интеграцию с существующими системами и расчет экономической эффективности. / The work solves the current business problem of designing a raw material forecasting algorithm for a cosmetics manufacturing enterprise based on machine learning, the extreme gradient boosting model (XGBoost), which has demonstrated high accuracy and stability of forecasts. Collected data sets, analyzed and researched demand and raw material forecasting methods including ARIMA, SARIMA, Holt-Winters, Prophet and various machine learning models. Metrics used: MAE, MSE and MAPE, R2. Statistical and neural network models such as LSTM have shown less consistent results than machine learning. A comprehensive algorithm for forecasting raw materials has been developed, including the stages of forecasting demand and calculating the need for raw materials. The algorithm prototype is implemented using Streamlit. Recommendations are offered for the implementation of the algorithm, including integration with existing systems and calculation of economic efficiency. MASTER'S THESIS PURCHASE FORECASTING TIME SERIES MACHINE LEARNING DEMAND FORECASTING RAW MATERIAL FORECASTING GRADIENT BOOSTING XGBOOST LSTM ВРЕМЕННЫЕ РЯДЫ МАШИННОЕ ОБУЧЕНИЕ XGBOOST LSTM

Search results