Global ETD Search

1	Strojové učení v algoritmickém obchodování / Machine Learning in Algorithmic Trading Bureš, Michal January 2021 (has links) This thesis is dedicated to the application of machine learning methods to algorithmic trading. We take inspiration from intraday traders and implement a system that predicts future price based on candlestick patterns and technical indicators. Using forex and US stocks tick data we create multiple aggregated bar representations. From these bars we construct original features based on candlestick pattern clustering by K-Means and long-term features derived from standard technical indicators. We then setup regression and classification tasks for Extreme Gradient Boosting models. From their predictions we extract buy and sell trading signals. We perform experiments with eight different configurations over multiple assets and trading strategies using walk-forward validation. The results report Sharpe ratios and mean profits of all the combinations. We discuss the results and recommend suitable configurations. In overall our strategies outperform randomly selected strategies. Furthermore, we provide and discuss multiple opportunities for further research.
2	Sentimentanalys av svenskt aktieforum för att förutspå aktierörelse / Sentiment analysis of Swedish stock trading forum for predicting stock market movement Ouadria, Michel Sebastian, Ciobanu, Ann-Stephanie January 2020 (has links) Förevarande studie undersöker möjligheten att förutsäga aktierörelse på en dagligbasis med sentimentanalys av inlägg från ett svenskt aktieforum. Sentimentanalys används för att finna subjektivitet i form av känslor (sentiment) ur text. Textdata extraherades från ett svenskt aktieforum för att förutsäga aktierörelsen för den relaterade aktien. All data aggregerades inom en bestämd tidsperiod på två år. Undersökningen utnyttjade maskininlärning för att träna tre maskininlärningsmodeller med textdata och aktiedata. Resultatet påvisade ingen tydlig korrelation mellan sentiment och aktierörelse. Vidare uppnåddes inte samma resultat som tidigare arbeten inom området. Den högst uppnådda noggrannheten med modellerna beräknades till 64%. / The present study examines the possibility of predicting stock movement on a daily basis with sentiment analysis of posts in a swedish stock trading forum. Sentiment analysis is used to find subjectivity in the form of emotions (sentiment) from text. Textdata was extracted from a stock forum to predict the share movement of the related share. All data was aggregated within a fixed period of two years. The analysis utilizes machine learning to train three machine learning models with textdata and stockdata. The result showed no clear correlation between sentiment and stock movement. Furthermore, the result was not able to replicate accuracy as previous work in the field. The highest accuracy achieved with the models was calculated at 64%. Sentiment analysis Stock market Machine Learning Support Vector Machine Naive Bayes Extreme Gradient Boosting Sentimentanalys Aktiemarknad Maskininlärning Stödvektormaskin Naive Bayes Extreme Gradient Boosting Computer and Information Sciences Data- och informationsvetenskap
3	Modeling Melodic Accents in Jazz Solos / Modellering av melodiska accenter i jazzsolon Berrios Salas, Misael January 2023 (has links) This thesis looks at how accurately one can model accents in jazz solos, more specifically the sound level. Further understanding the structure of jazz solos can give a way of pedagogically presenting differences within music styles and even between performers. Some studies have tried to model perceived accents in different music styles. In other words, model how listeners perceive some tones as somehow accentuated and more important than others. Other studies have looked at how the sound level correlates to other attributes of the tone. But to our knowledge, no other studies have been made modeling actual accents within jazz solos, nor have other studies had such a big amount of training data. The training data used is a set of 456 solos from the Weimar Jazz Database. This is a database containing tone data and metadata from monophonic solos performed with multiple instruments. The features used for the training algorithms are features obtained from the software Director Musices created at the Royal Institute of Technology in Sweden; features obtained from the software "melfeature" created at the University of Music Franz Liszt Weimar in Germany; and features built upon tone data or solo metadata from the Weimar Jazz Database. A comparison between these is made. Three learning algorithms are used, Multiple Linear Regression (MLR), Support Vector Regression (SVR), and eXtreme Gradient Boosting (XGBoost). The first two are simpler regression models while the last is an award-winning tree boosting algorithm. The tests resulted in eXtreme Gradient Boosting (XGBoost) having the highest accuracy when combining all the available features minus some features that were removed since they did not improve the accuracy. The accuracy was around 27% with a high standard deviation. This tells that there was quite some difference when predicting the different solos, some had an accuracy of about 67% while others did not predict one tone correctly in the entire solo. But as a general model, the accuracy is too low for actual practical use. Either the methods were not the optimal ones or jazz solos differ too much to find a general pattern. / Detta examensarbete undersöker hur väl man kan modellera accenter i jazz-solos, mer specifikt ljudnivån. En bredare förståelse för strukturen i jazzsolos kan ge ett sätt att pedagogiskt presentera skillnaderna mellan olika musikstilar och även mellan olika artister. Andra studier har försökt modellera uppfattade accenter inom olika musik-stilar. Det vill säga, modellera hur åhörare upplever vissa toner som accentuerade och viktigare än andra. Andra studier har undersökt hur ljudnivån är korrelerad till andra attribut hos tonen. Men såvitt vi vet, så finns det inga andra studier som modellerar faktiska accenter inom jazzsolos, eller som haft samma stora mängd träningsdata. Träningsdatan som använts är ett set av 456 solos tagna från Weimar Jazz Database. Databasen innehåller data på toner och metadata från monofoniska solos genomförda med olika instrument. Särdragen som använts för tränings-algoritmerna är särdrag erhållna från mjukvaran Director Musices skapad på Kungliga Tekniska Högskolan i Sverige; särdrag erhållna från mjukvaran ”melfeature” skapad på University of Music Franz Liszt Weimar i Tyskland; och särdrag skapade utifrån datat i Weimar Jazz Database. En jämförelse mellan dessa har också gjorts. Tre inlärningsalgoritmer har använts, Multiple Linear Regression (MLR), Support Vector Regression (SVR), och eXtreme Gradient Boosting (XGBoost). De första två är enklare regressionsalgoritmer, medan den senare är en prisbelönt trädförstärkningsalgoritm. Testen resulterade i att eXtreme Gradient Boosting (XGBoost) skapade en modell med högst noggrannhet givet alla tillgängliga särdrag som träningsdata minus vissa särdrag som tagits bort då de inte förbättrar noggrannheten. Den erhållna noggrannheten låg på runt 27% med en hög standardavvikelse. Detta pekar på att det finns stora skillnader mellan att förutsäga ljudnivån mellan de olika solin. Vissa solin gav en noggrannhet på runt 67% medan andra erhöll inte en endaste ljudnivå korrekt i hela solot. Men som en generell modell är noggrannheten för låg för att användas i praktiken. Antingen är de valda metoderna inte de bästa, eller så är jazzsolin för olika för att hitta ett generellt mönster som går att förutsäga. Accents Jazz Solo Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamic Accenter Jazz Solos Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamisk Computer and Information Sciences Data- och informationsvetenskap
4	Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based Methods Al-Mter, Yusur January 2020 (has links) Heart rate variability (HRV) is the time variation between adjacent heartbeats. This variation is regulated by the autonomic nervous system (ANS) and its two branches, the sympathetic and parasympathetic nervous system. HRV is considered as an essential clinical tool to estimate the imbalance between the two branches, hence as an indicator of age and cardiac-related events.This thesis focuses on the ECG recordings during nocturnal rest to estimate the influence of HRV in predicting the age decade of healthy individuals. Time and frequency domains, as well as non-linear methods, are explored to extract the HRV features. Three feature-based methods (support vector machine (SVM), random forest, and extreme gradient boosting (XGBoost)) were employed, and the overall test accuracy achieved in capturing the actual class was relatively low (lower than 30%). SVM classifier had the lowest performance, while random forests and XGBoost performed slightly better. Although the difference is negligible, the random forest had the highest test accuracy, approximately 29%, using a subset of ten optimal HRV features. Furthermore, to validate the findings, the original dataset was shuffled and used as a test set and compared the performance to other related research outputs. Supervised learning Classification Ensemble Support Vector Machines Heart Rate Variability Extreme Gradient Boosting Random Forest Computer and Information Sciences Data- och informationsvetenskap
5	Strategies for Combining Tree-Based Ensemble Models Zhang, Yi 01 January 2017 (has links) Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
6	Forecasting anomalies in time series data from online production environments Sseguya, Raymond January 2020 (has links) Anomaly detection on time series forecasts can be used by many industries in especially forewarning systems that can predict anomalies before they happen. Infor (Sweden) AB is software company that provides Enterprise Resource Planning cloud solutions. Infor is interested in predicting anomalies in their data and that is the motivation for this thesis work. The general idea is firstly to forecast the time series and then secondly detect and classify anomalies on the forecast. The first part is time series forecasting and the second part is anomaly detection and classification done on the forecasted values. In this thesis work, the time series forecasting to predict anomalous behaviour is done using two strategies namely the recursive strategy and the direct strategy. The recursive strategy includes two methods; AutoRegressive Integrated Moving Average and Neural Network AutoRegression. The direct strategy is done with ForecastML-eXtreme Gradient Boosting. Then the three methods are compared concerning performance of forecasting. The anomaly detection and classification is done by setting a decision rule based on a threshold. In this thesis work, since the true anomaly thresholds were not previously known, an arbitrary initial anomaly threshold is set by using a combination of statistical methods for outlier detection and then human judgement by the company commissioners. These statistical methods include Seasonal and Trend decomposition using Loess + InterQuartile Range, Twitter + InterQuartile Range and Twitter + GESD (Generalized Extreme Studentized Deviate). After defining what an anomaly threshold is in the usage context of Infor (Sweden) AB, then a decision rule is set and used to classify anomalies in time series forecasts. The results from comparing the classifications of the forecasts from the three time series forecasting methods are unfortunate and no recommendation is made concerning what model or algorithm to be used by Infor (Sweden) AB. However, the thesis work concludes by recommending other methods that can be tried in future research. Infor (Sweden) AB time series forecasting anomaly detection ARIMA neural network autoregression eXtreme Gradient Boosting package Computer and Information Sciences Data- och informationsvetenskap Probability Theory and Statistics Sannolikhetsteori och statistik
7	Prediction of Credit Risk using Machine Learning Models Isaac, Philip January 2022 (has links) This thesis aims to investigate different machine learning (ML) models and their performance to find the best performing model to predict credit risk at a specific company. Since granting credit to corporate customers is a part of this company's core business, managing the credit risk is of high importance. The company has of today only one credit risk measurement, which is obtained through an external company, and the goal is to find a model that outperforms this measurement. The study consists of two ML models, Logistic Regression (LR) and eXtreme Gradient Boosting. This thesis proves that both methods perform better than the external risk measurement and the LR method achieves the overall best performance. One of the most important analyses done in this thesis was handling the dataset and finding the best-suited combination of features that the ML models should use. Credit Risk Credit Risk Scorecard Machine Learning Artificial Intelligence AI Logistic Regression eXtreme Gradient Boosting ROC-AUC Binning Cross-Validation Correlation Computer Sciences Datavetenskap (datalogi)
8	Customer acquisition and onboarding at an online grocery company Borg, Ida January 2022 (has links) The master thesis is carried out in a collaboration with a Swedish online grocery company. The goal of the thesis is to investigate if it is possible to explain the underlying factors that affect new customers to be retained. Because of the difficulties of defining churn and retention in non-contractual settings, most of the literature is focused on contractual and subscription settings. There are a limited number of studies when trying to predict customer churn in non-contractual businesses and even fewer studies that emphasize retention. This thesis aims to contribute to the field of retention in non-contractual business and also highlight the assumptions and drawbacks of churn-related task. To achieve the goal of the thesis a literature review is carried out together with two statistical learning approaches; logistic regression model and extreme gradient boosting model. The results shows that it is possible to find the underlying factors that drive customers to be retained. The greatest drivers that could increase the probability of retaining new customers are the days between the first and second order, the second order value, and the total order value. / Examensarbetet är genomfört som ett samarbete med ett svenskt matvaruföretag på nätet. Målet med examensarbetet är att undersöka om det är möjligt att förklara de bakomliggande faktorer som påverkar nya kunder att stanna kvar som kunder. På grund av svårigheterna med att definiera kundbortfall och bibehållande av kunder i icke-kontraktuella affärer fokuserar den mesta av litteraturen på avtals- och prenumerationsmiljöer. Det finns ett begränsat antal studier där man försöker förutsäga kundbortfall i icke-kontraktuella verksamheter och ännu färre studier som fokuserar på bibehållande av kunder. Denna uppsats syftar till att bidra till området bibehållande av kunder i icke-kontraktuella affärer och även belysa antagandena och nackdelarna med analyser inom kundbortfall. För att uppnå målet med avhandlingen genomförs en litteraturgenomgång tillsammans med två statistiska lärandemetoder; logistisk regressionsmodell och extreme gradient boosting model. Resultaten visar att det är fullt möjligt att hitta de bakomliggande faktorerna som driver kunderna att stanna kvar. De största drivkrafterna som kan öka sannolikheten för att kunder ska bibehållas är dagarna mellan första och andra ordern, andra ordervärdet och det totala ordervärdet. retention churn customer acquisition customer onboarding logistic regression extreme gradient boosting model bibehållande av kunder kundbortfall kundförvärv kundonboarding logistisk regression exteme gradient boosting model Mathematics Matematik
9	Modelización integrada con aprendizaje automático para evaluar la contaminación por nutrientes en las masas de agua actual y bajo el efecto del cambio climático. Aplicación a la Demarcación Hidrográfica del Júcar Dorado Guerra, Diana Yaritza 26 February 2024 (has links) Tesis por compendio / [ES] La contaminación del agua representa un desafío ambiental crítico a nivel global y en la Unión Europea (UE), particularmente en la región mediterránea de España. El crecimiento poblacional, la demanda creciente de alimentos y combustibles, junto con el cambio climático, intensifican la contaminación por nutrientes en los cuerpos de agua. Esta contaminación amenaza la calidad del agua y los ecosistemas acuáticos, así como la salud humana. La complejidad de las vías de transporte de nutrientes hace que su monitoreo y mitigación sean complicados. Se requieren modelos integrales que vinculen procesos y relaciones de causa y efecto para controlar eficazmente la contaminación. En la región mediterránea, como la Demarcación Hidrográfica del Júcar (DHJ), la interacción entre agua superficial y subterránea es clave, pero los modelos tradicionales presentan limitaciones. Esta tesis aborda estos desafíos al caracterizar la contribución de nutrientes a las masas de agua superficiales de la DHJ, evaluar medidas de reducción de la contaminación, considerando el cambio climático a largo plazo y aplicar técnicas de aprendizaje supervisado para predecir la concentración de nitratos. El acoplamiento de modelos hidrológicos y de calidad del agua, junto con el aprendizaje automático, ofrece una comprensión profunda y valiosa de los factores detrás de la contaminación por nutrientes y proporciona una base sólida para la toma de decisiones y la gestión sostenible del agua en la DHJ y regiones similares. Esta tesis fue estructurada como un compendio de tres artículos que abarcan estos desafíos. El primer artículo profundiza en la compleja interacción entre las aguas superficiales y las subterráneas en las cuencas de la DHJ, centrándose en la dinámica de la contaminación por nitratos. Los resultados muestran una correlación directa entre las concentraciones de nitratos en ríos y acuíferos a lo largo del eje principal de los ríos Júcar y Turia, lo cual destaca el papel fundamental de las aportaciones de agua subterránea en la contribución a los niveles de nitratos de los ríos. Además, el estudio identifica regiones aguas abajo con actividades agrícolas y urbanas intensificadas como focos de contaminación por nitratos. El segundo artículo aborda la vulnerabilidad de la calidad de las aguas superficiales al cambio climático y escenarios de reducción de la contaminación difusa y puntual en las cuencas de la DHJ a largo plazo. Los resultados indican que, en los escenarios de cambio climático, se espera que aumenten significativamente las masas de agua con un mal estado de amonio, fósforo y DBO5, y en menor proporción las masas en mal estado de nitratos. En concreto, las concentraciones medias de amonio y fósforo podrían duplicarse durante los meses de bajo caudal. Para mantener la calidad actual del agua, se requieren reducciones sustanciales de al menos el 25% de la contaminación difusa por nitratos y del 50% de las cargas puntuales de amonio, fósforo y DBO5. El tercer artículo presenta un enfoque innovador para simular la concentración de nitratos en masas de agua superficiales mediante modelos de aprendizaje automático. Aprovechando los métodos de selección de características y los algoritmos random forest (RF) y eXtreme Gradient Boosting (XGBoost), el estudio logró una gran precisión en la predicción de la concentración de nitratos. Estos modelos analizaron 19 variables de entrada, que abarcan factores ecológicos, hidrológicos y ambientales, junto con datos de concentración de nitratos procedentes de estaciones de aforo de la calidad de las aguas superficiales. En particular, la investigación destaco que la localización desempeña un papel dominante, explicando el 87% de la variabilidad de los nitratos en relación con la concentración de nitrógeno y fósforo. Esta investigación destaco el potencial del aprendizaje automático en la predicción de la calidad del agua y la evaluación de riesgos. / [CA] La contaminació de l'aigua representa un desafiament ambiental crític a nivell global i a la Unió Europea (UE), particularment a la regió mediterrània d'Espanya. El creixement poblacional, la demanda creixent d'aliments i combustibles, juntament amb el canvi climàtic, intensifiquen la contaminació per nutrients en els cossos d'aigua. Aquesta contaminació amenaça la qualitat de l'aigua i els ecosistemes aquàtics, així com la salut humana. La complexitat de les vies de transport de nutrients fa que el seu monitoratge i mitigació siguin complicats. Es requereixen models integrals que vinculin processos i relacions de causa i efecte per a controlar eficaçment la contaminació. A la regió mediterrània, com la Demarcació Hidrogràfica del Xúquer (DHJ), la interacció entre aigua superficial i subterrània és clau, però els models tradicionals presenten limitacions. Aquesta tesi aborda aquests desafiaments en caracteritzar la contribució de nutrients a les masses d'aigua superficials de la DHJ, avaluar mesures de reducció de la contaminació, considerant el canvi climàtic a llarg termini i aplicar tècniques d'aprenentatge supervisat per a predir la concentració de nitrats. L'acoblament de models hidrològics i de qualitat de l'aigua, juntament amb l'aprenentatge automàtic, ofereix una comprensió profunda i valuosa dels factors darrere de la contaminació per nutrients i proporciona una base sòlida per a la presa de decisions i la gestió sostenible de l'aigua en la DHJ i regions similars. Aquesta tesi va ser estructurada com un compendi de tres articles que abasten aquests desafiaments. El primer article aprofundeix en la complexa interacció entre les aigües superficials i les subterrànies en les conques de la DHJ, centrant-se en la dinàmica de la contaminació per nitrats. Els resultats mostren una correlació directa entre les concentracions de nitrats en rius i aqüífers al llarg de l'eix principal dels rius Xúquer i Túria, la qual cosa destaca el paper fonamental de les aportacions d'aigua subterrània en la contribució als nivells de nitrats dels rius. A més, l'estudi identifica regions aigües avall amb activitats agrícoles i urbanes intensificades com a focus de contaminació per nitrats. El segon article aborda la vulnerabilitat de la qualitat de les aigües superficials al canvi climàtic i escenaris de reducció de la contaminació difusa i puntual en les conques de la DHJ a llarg termini. Els resultats indiquen que, en els escenaris de canvi climàtic, s'espera que augmentin significativament les masses d'aigua amb un mal estat d'amoni, fòsfor i DBO5, i en menor proporció les masses en mal estat de nitrats. En concret, les concentracions mitjanes d'amoni i fòsfor podrien duplicar-se durant els mesos de baix cabal. Per a mantenir la qualitat actual de l'aigua, es requereixen reduccions substancials d'almenys el 25% de la contaminació difusa per nitrats i del 50% de les càrregues puntuals d'amoni, fòsfor i DBO5. El tercer article presenta un enfocament innovador per a simular la concentració de nitrats en masses d'aigua superficials mitjançant models d'aprenentatge automàtic. Aprofitant els mètodes de selecció de característiques i els algorismes random forest (RF) i extremi Gradient Boosting (XGBoost), l'estudi va aconseguir una gran precisió en la predicció de la concentració de nitrats. Aquests models van analitzar 19 variables d'entrada, que abasten factors ecològics, hidrològics i ambientals, juntament amb dades de concentració de nitrats procedents d'estacions d'aforament de la qualitat de les aigües superficials. En particular, la recerca destaco que la localització exerceix un paper dominant, explicant el 87% de la variabilitat dels nitrats en relació amb la concentració de nitrogen i fòsfor. Aquesta recerca destaco el potencial de l'aprenentatge automàtic en la predicció de la qualitat de l'aigua i l'avaluació de riscos. / [EN] Water pollution poses a critical environmental challenge globally and in the European Union (EU), particularly in the Mediterranean region of Spain. Population growth, increasing demand for food and fuels, coupled with climate change, intensify nutrient pollution in water bodies. This pollution threatens water quality, aquatic ecosystems, and human health. The complexity of nutrient transport pathways makes monitoring and mitigation challenging. Comprehensive models that link processes and cause-and-effect relationships are required to effectively control pollution. In the Mediterranean region, such as the Júcar River Basin District (RBD), the interaction between surface and groundwater is crucial, but traditional models have limitations. This thesis addresses these challenges by characterising the contribution of nutrients to surface waters in the Júcar RBD, evaluating pollution reduction measures considering long-term climate change, and applying supervised learning techniques to predict nitrate concentrations. The coupling of hydrological and water quality models, along with machine learning, provides a deep and valuable understanding of the factors behind nutrient pollution and establishes a solid foundation for decision-making and sustainable water management in the Júcar RBD and similar regions. This thesis is structured as a compendium of three articles that encompass these challenges. The first article delves into the complex interaction between surface and groundwater in the Júcar RBD basins, focusing on nitrate pollution dynamics.The results reveal a direct linear correlation between nitrate concentrations in rivers and aquifers along the main axes of the Júcar and Turia rivers, highlighting the fundamental role of groundwater contributions to river nitrate levels. Additionally, the study identifies downstream regions with intensified agricultural and urban activities as nitrate pollution hotspots. This research not only identifies pollution sources but also offers a means to predict nitrate concentrations and assess the effectiveness of pollution prevention measures. The second article addresses the vulnerability of surface water quality to climate change and long-term diffuse and point source pollution reduction scenarios in the Júcar RBD basins. In a region where nutrient concentrations are of particular concern, the study investigates how changing climatic conditions, including rising temperatures and altered precipitation patterns, affect nitrate, ammonium, phosphorus, and biochemical oxygen demand (BOD5) levels. The results indicate that under climate change scenarios, significantly more water bodies are expected to be in poor condition for ammonium, phosphorus, and BOD5, and to a lesser extent, nitrate. Specifically, average concentrations of ammonium and phosphorus could double during low-flow months. To maintain current water quality, substantial reductions of at least 25% in diffuse nitrate pollution and 50% in point source loads of ammonium, phosphorus, and BOD5 are required. This research underscores the importance of water quality management strategies. The third article introduces an innovative approach to simulate nitrate concentrations in surface water bodies using machine learning models. Leveraging feature selection methods and artificial intelligence algorithms, including random forest (RF) and eXtreme Gradient Boosting (XGBoost), the study achieved high precision in predicting nitrate concentrations. These models analysed 19 input variables spanning ecological, hydrological, and environmental factors, along with nitrate concentration data from surface water quality gauging stations. In particular, the research highlighted the dominant role of location, explaining 87% of nitrate variability in relation to nitrogen and phosphorus concentration. This research showcased the potential of machine learning in water quality prediction and risk assessment. / We appreciate the help provided by the Júcar River Basin District Authority (CHJ), who gathered field data. The first author’s research was partially funded by a PhD scholarship from the food research stream of the programme “Colombia Científica—Pasaporte a la Ciencia”, granted by the Colombian Institute for Educational Technical Studies Abroad (Instituto Colombiano de Crédito Educativo y Estudios Técnicos en el Exterior, ICETEX). The authors thank the Spanish Research Agency (AEI) for the financial support to RESPHIRA project (PID2019-106322RB- 100)/AEI/10.13039/501100011033. The contributors gratefully acknowledge funding for open access charge: CRUE-Universitat Politècnica de València / Dorado Guerra, DY. (2024). Modelización integrada con aprendizaje automático para evaluar la contaminación por nutrientes en las masas de agua actual y bajo el efecto del cambio climático. Aplicación a la Demarcación Hidrográfica del Júcar [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202898 / Compendio Contaminación del agua Contaminación difusa Contaminación puntual Hidrología Cambio climático XGBoost Climate change Hydrology eXtreme Gradient Boosting (XGBoost) Random forest (RF) Water pollution Diffuse pollution INGENIERIA HIDRAULICA
10	Comparative Analysis of Machine Learning Algorithms for Cryptocurrency Price Prediction Kurtagic, Leila January 2024 (has links) As the cryptocurrency markets continuously grow, so does the need for reliable analytical tools for price prediction. This study conducted a comparative analysis of machine learning (ML) algorithms for cryptocurrency price prediction. Through a literature review, three common and reliable ML algorithms for cryptocurrency price prediction were identified: Long Short-Term Memory (LSTM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Utilizing the Bitcoin All Time History dataset from TradingView, the study assessed both the individual performance of each algorithm and the potential of ensemble methods to enhance predictive accuracy. The results reveal that the LSTM algorithm outperformed RF and XGBoost in terms of predictive accuracy according to the metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). Additionally, two ensemble approaches were tested: Ensemble 1, which enhanced the LSTM model with the combined predictions from RF and XGBoost, and Ensemble 2, which integrated predictions from all three models. Ensemble 2 demonstrated the highest predictive performance among all models, highlighting the advantages of using ensemble approaches for more robust predictions. Machine Learning Cryptocurrency Price Prediction LSTM (Long Short-Term Memory) Random Forest XGBoost (eXtreme Gradient Boosting) Ensemble Methods Feature Importance Financial Analytics Computer and Information Sciences Data- och informationsvetenskap

Search results