Global ETD Search

31	Road-traffic accident prediction model : Predicting the Number of Casualties Andeta, Jemal Ahmed January 2021 (has links) Efficient and effective road traffic prediction and management techniques are crucial in intelligent transportation systems. It can positively influence road advancement, safety enhancement, regulation formulation, and route planning to save living things in advance from road traffic accidents. This thesis considers road safety by predicting the number of casualties if an accident occurs using multiple traffic accident attributes. It helps individuals (drivers) or traffic offices to adjust and control their contributions for the occurrence of an accident before emerging it. Three candidate algorithms from different regression fit patterns are proposed and evaluated to conduct the thesis: the bagging, linear, and non-linear fitting patterns. The gradient boosting machines (GBoost) from the bagging, Linearsupport vector regression (LinearSVR) from the linear, and extreme learning machines (ELM) also from the non-linear side are the selected algorithms. RMSE and MAE performance evaluation metrics are applied to evaluate the models. The GBoost achieved a better performance than the other two with a low error rate and minimum prediction interval value for 95% prediction interval. A SHAP (SHapley Additive exPlanations) interpretation technique is applied to interpret each model at the global interpretation level using SHAP’s beeswarm plots. Finally, suggestions for future improvements are presented via the dataset and hyperparameter tuning. Road traffic accident Accident Casualties Gradient Boosting Extreme Learning Machines Prediction Interval Computer Sciences Datavetenskap (datalogi) Transport Systems and Logistics Transportteknik och logistik Control Engineering Reglerteknik
32	Maskininlärning med konform förutsägelse för prediktiva underhållsuppgifter i industri 4.0 / Machine Learning with Conformal Prediction for Predictive Maintenance tasks in Industry 4.0 : Data-driven Approach Liu, Shuzhou, Mulahuko, Mpova January 2023 (has links) This thesis is a cooperation with Knowit, Östrand \& Hansen, and Orkla. It aimed to explore the application of Machine Learning and Deep Learning models with Conformal Prediction for a predictive maintenance situation at Orkla. Predictive maintenance is essential in numerous industrial manufacturing scenarios. It can help to reduce machine downtime, improve equipment reliability, and save unnecessary costs. In this thesis, various Machine Learning and Deep Learning models, including Decision Tree, Random Forest, Support Vector Regression, Gradient Boosting, and Long short-term memory, are applied to a real-world predictive maintenance dataset. The Orkla dataset was originally planned to use in this thesis project. However, due to some challenges met and time limitations, one NASA C-MAPSS dataset with a similar data structure was chosen to study how Machine Learning models could be applied to predict the remaining useful lifetime (RUL) in manufacturing. Besides, conformal prediction, a recently developed framework to measure the prediction uncertainty of Machine Learning models, is also integrated into the models for more reliable RUL prediction. The thesis project results show that both the Machine Learning and Deep Learning models with conformal prediction could predict RUL closer to the true RUL while LSTM outperforms the Machine Learning models. Also, the conformal prediction intervals provide informative and reliable information about the uncertainty of the predictions, which can help inform personnel at factories in advance to take necessary maintenance actions. Overall, this thesis demonstrates the effectiveness of utilizing machine learning and Deep Learning models with Conformal Prediction for predictive maintenance situations. Moreover, based on the modeling results of the NASA dataset, some insights are discussed on how to transfer these experiences into Orkla data for RUL prediction in the future. Machine Learning Deep Learning Uncertainty estimation Conformal prediction Predictive maintenance RUL Probabilistic predictions Decision Tree Random Forest Support Vector Regression Gradient Boosting LSTM Computer Sciences Datavetenskap (datalogi)
33	Modelización integrada con aprendizaje automático para evaluar la contaminación por nutrientes en las masas de agua actual y bajo el efecto del cambio climático. Aplicación a la Demarcación Hidrográfica del Júcar Dorado Guerra, Diana Yaritza 26 February 2024 (has links) Tesis por compendio / [ES] La contaminación del agua representa un desafío ambiental crítico a nivel global y en la Unión Europea (UE), particularmente en la región mediterránea de España. El crecimiento poblacional, la demanda creciente de alimentos y combustibles, junto con el cambio climático, intensifican la contaminación por nutrientes en los cuerpos de agua. Esta contaminación amenaza la calidad del agua y los ecosistemas acuáticos, así como la salud humana. La complejidad de las vías de transporte de nutrientes hace que su monitoreo y mitigación sean complicados. Se requieren modelos integrales que vinculen procesos y relaciones de causa y efecto para controlar eficazmente la contaminación. En la región mediterránea, como la Demarcación Hidrográfica del Júcar (DHJ), la interacción entre agua superficial y subterránea es clave, pero los modelos tradicionales presentan limitaciones. Esta tesis aborda estos desafíos al caracterizar la contribución de nutrientes a las masas de agua superficiales de la DHJ, evaluar medidas de reducción de la contaminación, considerando el cambio climático a largo plazo y aplicar técnicas de aprendizaje supervisado para predecir la concentración de nitratos. El acoplamiento de modelos hidrológicos y de calidad del agua, junto con el aprendizaje automático, ofrece una comprensión profunda y valiosa de los factores detrás de la contaminación por nutrientes y proporciona una base sólida para la toma de decisiones y la gestión sostenible del agua en la DHJ y regiones similares. Esta tesis fue estructurada como un compendio de tres artículos que abarcan estos desafíos. El primer artículo profundiza en la compleja interacción entre las aguas superficiales y las subterráneas en las cuencas de la DHJ, centrándose en la dinámica de la contaminación por nitratos. Los resultados muestran una correlación directa entre las concentraciones de nitratos en ríos y acuíferos a lo largo del eje principal de los ríos Júcar y Turia, lo cual destaca el papel fundamental de las aportaciones de agua subterránea en la contribución a los niveles de nitratos de los ríos. Además, el estudio identifica regiones aguas abajo con actividades agrícolas y urbanas intensificadas como focos de contaminación por nitratos. El segundo artículo aborda la vulnerabilidad de la calidad de las aguas superficiales al cambio climático y escenarios de reducción de la contaminación difusa y puntual en las cuencas de la DHJ a largo plazo. Los resultados indican que, en los escenarios de cambio climático, se espera que aumenten significativamente las masas de agua con un mal estado de amonio, fósforo y DBO5, y en menor proporción las masas en mal estado de nitratos. En concreto, las concentraciones medias de amonio y fósforo podrían duplicarse durante los meses de bajo caudal. Para mantener la calidad actual del agua, se requieren reducciones sustanciales de al menos el 25% de la contaminación difusa por nitratos y del 50% de las cargas puntuales de amonio, fósforo y DBO5. El tercer artículo presenta un enfoque innovador para simular la concentración de nitratos en masas de agua superficiales mediante modelos de aprendizaje automático. Aprovechando los métodos de selección de características y los algoritmos random forest (RF) y eXtreme Gradient Boosting (XGBoost), el estudio logró una gran precisión en la predicción de la concentración de nitratos. Estos modelos analizaron 19 variables de entrada, que abarcan factores ecológicos, hidrológicos y ambientales, junto con datos de concentración de nitratos procedentes de estaciones de aforo de la calidad de las aguas superficiales. En particular, la investigación destaco que la localización desempeña un papel dominante, explicando el 87% de la variabilidad de los nitratos en relación con la concentración de nitrógeno y fósforo. Esta investigación destaco el potencial del aprendizaje automático en la predicción de la calidad del agua y la evaluación de riesgos. / [CA] La contaminació de l'aigua representa un desafiament ambiental crític a nivell global i a la Unió Europea (UE), particularment a la regió mediterrània d'Espanya. El creixement poblacional, la demanda creixent d'aliments i combustibles, juntament amb el canvi climàtic, intensifiquen la contaminació per nutrients en els cossos d'aigua. Aquesta contaminació amenaça la qualitat de l'aigua i els ecosistemes aquàtics, així com la salut humana. La complexitat de les vies de transport de nutrients fa que el seu monitoratge i mitigació siguin complicats. Es requereixen models integrals que vinculin processos i relacions de causa i efecte per a controlar eficaçment la contaminació. A la regió mediterrània, com la Demarcació Hidrogràfica del Xúquer (DHJ), la interacció entre aigua superficial i subterrània és clau, però els models tradicionals presenten limitacions. Aquesta tesi aborda aquests desafiaments en caracteritzar la contribució de nutrients a les masses d'aigua superficials de la DHJ, avaluar mesures de reducció de la contaminació, considerant el canvi climàtic a llarg termini i aplicar tècniques d'aprenentatge supervisat per a predir la concentració de nitrats. L'acoblament de models hidrològics i de qualitat de l'aigua, juntament amb l'aprenentatge automàtic, ofereix una comprensió profunda i valuosa dels factors darrere de la contaminació per nutrients i proporciona una base sòlida per a la presa de decisions i la gestió sostenible de l'aigua en la DHJ i regions similars. Aquesta tesi va ser estructurada com un compendi de tres articles que abasten aquests desafiaments. El primer article aprofundeix en la complexa interacció entre les aigües superficials i les subterrànies en les conques de la DHJ, centrant-se en la dinàmica de la contaminació per nitrats. Els resultats mostren una correlació directa entre les concentracions de nitrats en rius i aqüífers al llarg de l'eix principal dels rius Xúquer i Túria, la qual cosa destaca el paper fonamental de les aportacions d'aigua subterrània en la contribució als nivells de nitrats dels rius. A més, l'estudi identifica regions aigües avall amb activitats agrícoles i urbanes intensificades com a focus de contaminació per nitrats. El segon article aborda la vulnerabilitat de la qualitat de les aigües superficials al canvi climàtic i escenaris de reducció de la contaminació difusa i puntual en les conques de la DHJ a llarg termini. Els resultats indiquen que, en els escenaris de canvi climàtic, s'espera que augmentin significativament les masses d'aigua amb un mal estat d'amoni, fòsfor i DBO5, i en menor proporció les masses en mal estat de nitrats. En concret, les concentracions mitjanes d'amoni i fòsfor podrien duplicar-se durant els mesos de baix cabal. Per a mantenir la qualitat actual de l'aigua, es requereixen reduccions substancials d'almenys el 25% de la contaminació difusa per nitrats i del 50% de les càrregues puntuals d'amoni, fòsfor i DBO5. El tercer article presenta un enfocament innovador per a simular la concentració de nitrats en masses d'aigua superficials mitjançant models d'aprenentatge automàtic. Aprofitant els mètodes de selecció de característiques i els algorismes random forest (RF) i extremi Gradient Boosting (XGBoost), l'estudi va aconseguir una gran precisió en la predicció de la concentració de nitrats. Aquests models van analitzar 19 variables d'entrada, que abasten factors ecològics, hidrològics i ambientals, juntament amb dades de concentració de nitrats procedents d'estacions d'aforament de la qualitat de les aigües superficials. En particular, la recerca destaco que la localització exerceix un paper dominant, explicant el 87% de la variabilitat dels nitrats en relació amb la concentració de nitrogen i fòsfor. Aquesta recerca destaco el potencial de l'aprenentatge automàtic en la predicció de la qualitat de l'aigua i l'avaluació de riscos. / [EN] Water pollution poses a critical environmental challenge globally and in the European Union (EU), particularly in the Mediterranean region of Spain. Population growth, increasing demand for food and fuels, coupled with climate change, intensify nutrient pollution in water bodies. This pollution threatens water quality, aquatic ecosystems, and human health. The complexity of nutrient transport pathways makes monitoring and mitigation challenging. Comprehensive models that link processes and cause-and-effect relationships are required to effectively control pollution. In the Mediterranean region, such as the Júcar River Basin District (RBD), the interaction between surface and groundwater is crucial, but traditional models have limitations. This thesis addresses these challenges by characterising the contribution of nutrients to surface waters in the Júcar RBD, evaluating pollution reduction measures considering long-term climate change, and applying supervised learning techniques to predict nitrate concentrations. The coupling of hydrological and water quality models, along with machine learning, provides a deep and valuable understanding of the factors behind nutrient pollution and establishes a solid foundation for decision-making and sustainable water management in the Júcar RBD and similar regions. This thesis is structured as a compendium of three articles that encompass these challenges. The first article delves into the complex interaction between surface and groundwater in the Júcar RBD basins, focusing on nitrate pollution dynamics.The results reveal a direct linear correlation between nitrate concentrations in rivers and aquifers along the main axes of the Júcar and Turia rivers, highlighting the fundamental role of groundwater contributions to river nitrate levels. Additionally, the study identifies downstream regions with intensified agricultural and urban activities as nitrate pollution hotspots. This research not only identifies pollution sources but also offers a means to predict nitrate concentrations and assess the effectiveness of pollution prevention measures. The second article addresses the vulnerability of surface water quality to climate change and long-term diffuse and point source pollution reduction scenarios in the Júcar RBD basins. In a region where nutrient concentrations are of particular concern, the study investigates how changing climatic conditions, including rising temperatures and altered precipitation patterns, affect nitrate, ammonium, phosphorus, and biochemical oxygen demand (BOD5) levels. The results indicate that under climate change scenarios, significantly more water bodies are expected to be in poor condition for ammonium, phosphorus, and BOD5, and to a lesser extent, nitrate. Specifically, average concentrations of ammonium and phosphorus could double during low-flow months. To maintain current water quality, substantial reductions of at least 25% in diffuse nitrate pollution and 50% in point source loads of ammonium, phosphorus, and BOD5 are required. This research underscores the importance of water quality management strategies. The third article introduces an innovative approach to simulate nitrate concentrations in surface water bodies using machine learning models. Leveraging feature selection methods and artificial intelligence algorithms, including random forest (RF) and eXtreme Gradient Boosting (XGBoost), the study achieved high precision in predicting nitrate concentrations. These models analysed 19 input variables spanning ecological, hydrological, and environmental factors, along with nitrate concentration data from surface water quality gauging stations. In particular, the research highlighted the dominant role of location, explaining 87% of nitrate variability in relation to nitrogen and phosphorus concentration. This research showcased the potential of machine learning in water quality prediction and risk assessment. / We appreciate the help provided by the Júcar River Basin District Authority (CHJ), who gathered field data. The first author’s research was partially funded by a PhD scholarship from the food research stream of the programme “Colombia Científica—Pasaporte a la Ciencia”, granted by the Colombian Institute for Educational Technical Studies Abroad (Instituto Colombiano de Crédito Educativo y Estudios Técnicos en el Exterior, ICETEX). The authors thank the Spanish Research Agency (AEI) for the financial support to RESPHIRA project (PID2019-106322RB- 100)/AEI/10.13039/501100011033. The contributors gratefully acknowledge funding for open access charge: CRUE-Universitat Politècnica de València / Dorado Guerra, DY. (2024). Modelización integrada con aprendizaje automático para evaluar la contaminación por nutrientes en las masas de agua actual y bajo el efecto del cambio climático. Aplicación a la Demarcación Hidrográfica del Júcar [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202898 / Compendio Contaminación del agua Contaminación difusa Contaminación puntual Hidrología Cambio climático XGBoost Climate change Hydrology eXtreme Gradient Boosting (XGBoost) Random forest (RF) Water pollution Diffuse pollution INGENIERIA HIDRAULICA
34	Predicting Location-Dependent Structural Dynamics Using Machine Learning Zink, Markus January 2022 (has links) Machining chatter is an undesirable phenomenon of material removal processes and hardly to control or avoid. Its occurrence and extent essentially depend onthe kinematic, which alters with the position of the Tool Centre Point, of the machine tool. Research as to chatter was done widely but rarely with respect to changing structural dynamics during manufacturing. This thesis applies intelligent methods to learn the underlying functions of modal parameters – natural frequency, damping ratio, and mode shape – and defines the dynamic properties of a system firstly at this extent. To do so, it embraces three steps: first, the elaboration of the necessary dynamic parameters, second, the acquisition of the data via a simulation,and third, the prediction of the modal parameters with two kinds of Machine Learning techniques: Gradient Boosting Machine and Multilayer Perceptron. In total, it investigates three types of kinematics: cross bed, gantry, and overhead gantry. It becomes apparent that Light Gradient Boosting Machine outperforms Multilayer Perceptron throughout all studies. It achieves a prediction error of at most 1.7 % for natural frequency and damping ratio for all kinematics. However, it cannot really control the prediction of the participation factor yet which might originate in the complexity of the data and the data size. As expected, the error rises with noisy data and less amount of measurement points but at a tenable extent for both natural frequency and damping ratio. / 'Bearbetningsvibrationer är ett oönskat fenomen i materialborttagningsprocesser och är svåra att kontrollera eller undvika. Dess förekomst och omfattning beror i huvudsak på kinematiken, som förändras med positionen för verktygets centrumpunkt på verktygsmaskinen. Det har gjorts mycket forskning om bearbetningsvibrationer, men sällan om förändrad strukturell dynamik under tillverkningen. I denna avhandling tillämpas intelligenta metoder för att lära sig de underliggande funktionerna hos modalparametrar – egenfrekvens, dämpningsgrad och modalform – och definierar systemets dynamiska egenskaper för första gången i denna omfattning. För att göra detta omfattar den tre steg: för det första utarbetandet av de nödvändiga dynamiska parametrarna, för det andra insamling av data via en simulering och för det tredje förutsägelse av modalparametrarna med hjälp av två typer av tekniker för maskininlärning: Gradient Boosting Machine och Multilayer Perceptron. Sammanlagt undersöks tre typer av kinematik: crossbed, gantry och overhead gantry. Det framgår tydligt att Light Gradient Boosting Machine överträffar Multilayer Perceptron i alla studier. Den uppnår ett prediktionsfel på högst 1,7 % för egenfrekvens och dämpningsförhållande för alla kinematiker. Den kan dock ännu inte riktigt kontrollera förutsägelsen av deltagarfaktorn, vilket kan bero på datans komplexitet och datastorlek. Som väntat ökar felet med bullrig data och färre mätpunkter, men i en acceptabel omfattning för både naturfrekvens och dämpningsförhållande. machine learning artificial intelligence gradient boosting LightGBM multilayer perceptron prediction chatter vibration structural dynamics modal parameters machine tool tool centre point work envelope Mechanical Engineering Maskinteknik
35	Comparative Analysis of Machine Learning Algorithms for Cryptocurrency Price Prediction Kurtagic, Leila January 2024 (has links) As the cryptocurrency markets continuously grow, so does the need for reliable analytical tools for price prediction. This study conducted a comparative analysis of machine learning (ML) algorithms for cryptocurrency price prediction. Through a literature review, three common and reliable ML algorithms for cryptocurrency price prediction were identified: Long Short-Term Memory (LSTM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Utilizing the Bitcoin All Time History dataset from TradingView, the study assessed both the individual performance of each algorithm and the potential of ensemble methods to enhance predictive accuracy. The results reveal that the LSTM algorithm outperformed RF and XGBoost in terms of predictive accuracy according to the metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). Additionally, two ensemble approaches were tested: Ensemble 1, which enhanced the LSTM model with the combined predictions from RF and XGBoost, and Ensemble 2, which integrated predictions from all three models. Ensemble 2 demonstrated the highest predictive performance among all models, highlighting the advantages of using ensemble approaches for more robust predictions. Machine Learning Cryptocurrency Price Prediction LSTM (Long Short-Term Memory) Random Forest XGBoost (eXtreme Gradient Boosting) Ensemble Methods Feature Importance Financial Analytics Computer and Information Sciences Data- och informationsvetenskap
36	Maskininlärning för Prediktion av Fartygsvibrationer : En jämförelsestudie av Random forest,Gradient boosting och Neurala nätverk Tvinghagen, Fredrik, Queckfeldt, Jonathan January 2024 (has links) Syftet med detta projekt är att utveckla en prediktiv modell för att förutsäga fartygsvibrationer baserat på historisk mätdata från lastfartyg. Projektet fokuserar på att använda maskininlärningsmetoder för att förutspå amplituden av vibrationer och identifiera de mest relevanta variablerna för modellens prediktiva förmåga. De undersökta metoderna inkluderar random forest, gradient boosting och neurala nätverk. Resultaten visar att random forest-modellen presterar bäst utifrån prestandamåtten: medelkvadratfelet (MSE), genomsnittliga absoluta felet (MAE) och genomsnittliga procentuella felet (MAPE). Projektet bidrar till en djupare förståelse av maskininlärningens användningsområden för en mer hållbar sjöfart. Syftet är att ge ett underlag till att potentiellt reducera vibrationer som påverkar fartygens komfort och strukturella integritet. machine learning regression gradient boosting neural network random forest Maskininlärning vibrationer fartyg fartygsvibrationer beräkningsvetenskap prediktion Vehicle Engineering Farkostteknik Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
37	Quantitative Retrieval of Organic Soil Properties from Visible Near-Infrared Shortwave Infrared (Vis-NIR-SWIR) Spectroscopy Using Fractal-Based Feature Extraction. Liu, Lanfa, Buchroithner, Manfred, Ji, Min, Dong, Yunyun, Zhang, Rongchung 27 March 2017 (has links) (PDF) Visible and near-infrared diffuse reflectance spectroscopy has been demonstrated to be a fast and cheap tool for estimating a large number of chemical and physical soil properties, and effective features extracted from spectra are crucial to correlating with these properties. We adopt a novel methodology for feature extraction of soil spectroscopy based on fractal geometry. The spectrum can be divided into multiple segments with different step–window pairs. For each segmented spectral curve, the fractal dimension value was calculated using variation estimators with power indices 0.5, 1.0 and 2.0. Thus, the fractal feature can be generated by multiplying the fractal dimension value with spectral energy. To assess and compare the performance of new generated features, we took advantage of organic soil samples from the large-scale European Land Use/Land Cover Area Frame Survey (LUCAS). Gradient-boosting regression models built using XGBoost library with soil spectral library were developed to estimate N, pH and soil organic carbon (SOC) contents. Features generated by a variogram estimator performed better than two other estimators and the principal component analysis (PCA). The estimation results for SOC were coefficient of determination (R2) = 0.85, root mean square error (RMSE) = 56.7 g/kg, the ratio of percent deviation (RPD) = 2.59; for pH: R2 = 0.82, RMSE = 0.49 g/kg, RPD = 2.31; and for N: R2 = 0.77, RMSE = 3.01 g/kg, RPD = 2.09. Even better results could be achieved when fractal features were combined with PCA components. Fractal features generated by the proposed method can improve estimation accuracies of soil properties and simultaneously maintain the original spectral curve shape. Fraktale Dimension Merkmalsextraktion LUCAS Bodenspektroskopie TU Dresden Publikationsfonds fractal dimension feature extraction gradient-boosting regression model LUCAS soil spectroscopy TU Dresden Publishing Fund ddc:620 rvk:ZI 0001
38	Predicting inter-frequency measurements in an LTE network using supervised machine learning : a comparative study of learning algorithms and data processing techniques / Att prediktera inter-frekvensmätningar i ett LTE-nätverk med hjälp av övervakad maskininlärning Sonnert, Adrian January 2018 (has links) With increasing demands on network reliability and speed, network suppliers need to effectivize their communications algorithms. Frequency measurements are a core part of mobile network communications, increasing their effectiveness would increase the effectiveness of many network processes such as handovers, load balancing, and carrier aggregation. This study examines the possibility of using supervised learning to predict the signal of inter-frequency measurements by investigating various learning algorithms and pre-processing techniques. We found that random forests have the highest predictive performance on this data set, at 90.7\% accuracy. In addition, we have shown that undersampling and varying the discriminator are effective techniques for increasing the performance on the positive class on frequencies where the negative class is prevalent. Finally, we present hybrid algorithms in which the learning algorithm for each model depends on attributes of the training data set. These algorithms perform at a much higher efficiency in terms of memory and run-time without heavily sacrificing predictive performance. Telecommunications Telecom Mobile networks 4G LTE LTE-A Machine learning Random forest Gradient boosting Neural network Multi-layer perceptron Logistic regression Frequency measurements Handover Load balancing Carrier aggregation Computer and Information Sciences Data- och informationsvetenskap
39	Machine learning strategies for multi-step-ahead time series forecasting Ben Taieb, Souhaib 08 October 2014 (has links) How much electricity is going to be consumed for the next 24 hours? What will be the temperature for the next three days? What will be the number of sales of a certain product for the next few months? Answering these questions often requires forecasting several future observations from a given sequence of historical observations, called a time series. <p><p>Historically, time series forecasting has been mainly studied in econometrics and statistics. In the last two decades, machine learning, a field that is concerned with the development of algorithms that can automatically learn from data, has become one of the most active areas of predictive modeling research. This success is largely due to the superior performance of machine learning prediction algorithms in many different applications as diverse as natural language processing, speech recognition and spam detection. However, there has been very little research at the intersection of time series forecasting and machine learning.<p><p>The goal of this dissertation is to narrow this gap by addressing the problem of multi-step-ahead time series forecasting from the perspective of machine learning. To that end, we propose a series of forecasting strategies based on machine learning algorithms.<p><p>Multi-step-ahead forecasts can be produced recursively by iterating a one-step-ahead model, or directly using a specific model for each horizon. As a first contribution, we conduct an in-depth study to compare recursive and direct forecasts generated with different learning algorithms for different data generating processes. More precisely, we decompose the multi-step mean squared forecast errors into the bias and variance components, and analyze their behavior over the forecast horizon for different time series lengths. The results and observations made in this study then guide us for the development of new forecasting strategies.<p><p>In particular, we find that choosing between recursive and direct forecasts is not an easy task since it involves a trade-off between bias and estimation variance that depends on many interacting factors, including the learning model, the underlying data generating process, the time series length and the forecast horizon. As a second contribution, we develop multi-stage forecasting strategies that do not treat the recursive and direct strategies as competitors, but seek to combine their best properties. More precisely, the multi-stage strategies generate recursive linear forecasts, and then adjust these forecasts by modeling the multi-step forecast residuals with direct nonlinear models at each horizon, called rectification models. We propose a first multi-stage strategy, that we called the rectify strategy, which estimates the rectification models using the nearest neighbors model. However, because recursive linear forecasts often need small adjustments with real-world time series, we also consider a second multi-stage strategy, called the boost strategy, that estimates the rectification models using gradient boosting algorithms that use so-called weak learners.<p><p>Generating multi-step forecasts using a different model at each horizon provides a large modeling flexibility. However, selecting these models independently can lead to irregularities in the forecasts that can contribute to increase the forecast variance. The problem is exacerbated with nonlinear machine learning models estimated from short time series. To address this issue, and as a third contribution, we introduce and analyze multi-horizon forecasting strategies that exploit the information contained in other horizons when learning the model for each horizon. In particular, to select the lag order and the hyperparameters of each model, multi-horizon strategies minimize forecast errors over multiple horizons rather than just the horizon of interest.<p><p>We compare all the proposed strategies with both the recursive and direct strategies. We first apply a bias and variance study, then we evaluate the different strategies using real-world time series from two past forecasting competitions. For the rectify strategy, in addition to avoiding the choice between recursive and direct forecasts, the results demonstrate that it has better, or at least has close performance to, the best of the recursive and direct forecasts in different settings. For the multi-horizon strategies, the results emphasize the decrease in variance compared to single-horizon strategies, especially with linear or weakly nonlinear data generating processes. Overall, we found that the accuracy of multi-step-ahead forecasts based on machine learning algorithms can be significantly improved if an appropriate forecasting strategy is used to select the model parameters and to generate the forecasts.<p><p>Lastly, as a fourth contribution, we have participated in the Load Forecasting track of the Global Energy Forecasting Competition 2012. The competition involved a hierarchical load forecasting problem where we were required to backcast and forecast hourly loads for a US utility with twenty geographical zones. Our team, TinTin, ranked fifth out of 105 participating teams, and we have been awarded an IEEE Power & Energy Society award.<p> / Doctorat en sciences, Spécialisation Informatique / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Machine learning Time-series analysis -- Data processing Apprentissage automatique Série chronologique -- Informatique forecasting competitions load forecasting nearest neighbors neural networks gradient boosting direct forecasts forecasting strategies recursive forecasts machine learning time series forecasting
40	Gradient Boosting Machine and Artificial Neural Networks in R and H2O / Gradient Boosting Machine and Artificial Neural Networks in R and H2O Sabo, Juraj January 2016 (has links) Artificial neural networks are fascinating machine learning algorithms. They used to be considered unreliable and computationally very expensive. Now it is known that modern neural networks can be quite useful, but their computational expensiveness unfortunately remains. Statistical boosting is considered to be one of the most important machine learning ideas. It is based on an ensemble of weak models that together create a powerful learning system. The goal of this thesis is the comparison of these machine learning models on three use cases. The first use case deals with modeling the probability of burglary in the city of Chicago. The second use case is the typical example of customer churn prediction in telecommunication industry and the last use case is related to the problematic of the computer vision. The second goal of this thesis is to introduce an open-source machine learning platform called H2O. It includes, among other things, an interface for R and it is designed to run in standalone mode or on Hadoop. The thesis also includes the introduction into an open-source software library Apache Hadoop that allows for distributed processing of big data. Concretely into its open-source distribution Hortonworks Data Platform.

Search results