Global ETD Search

31	Stock Price Movement Prediction Using Sentiment Analysis and Machine Learning Wang, Jenny Zheng 01 June 2021 (has links) (PDF) Stock price prediction is of strong interest but a challenging task to both researchers and investors. Recently, sentiment analysis and machine learning have been adopted in stock price movement prediction. In particular, retail investors’ sentiment from online forums has shown their power to influence the stock market. In this paper, a novel system was built to predict stock price movement for the following trading day. The system includes a web scraper, an enhanced sentiment analyzer, a machine learning engine, an evaluation module, and a recommendation module. The system can automatically select the best prediction model from four state-of-the-art machine learning models (Long Short-Term Memory, Support Vector Machine, Random Forest, and Extreme Boost Gradient Tree) based on the acquired data and the models’ performance. Moreover, stock market lexicons were created using large-scale text mining on the Yahoo Finance Conversation boards and natural language processing. Experiments using the top 30 stocks on the Yahoo users’ watchlists and a randomly selected stock from NASDAQ were performed to examine the system performance and proposed methods. The experimental results show that incorporating sentiment analysis can improve the prediction for stocks with a large daily discussion volume. Long Short-Term Memory model outperformed other machine learning models when using both price and sentiment analysis as inputs. In addition, the Extreme Boost Gradient Tree (XGBoost) model achieved the highest accuracy using the price-only feature on low-volume stocks. Last but not least, the models using the enhanced sentiment analyzer outperformed the VADER sentiment analyzer by 1.96%. Stock Price Movement Prediction Machine Learning NLP sentiment analysis LSTM SVM RF XGBoost Computer and Systems Architecture
32	Physics-guided Machine Learning Approaches for Applications in Geothermal Energy Prediction Shahdi, Arya 03 June 2021 (has links) In the area of geothermal energy mapping, scientists have used physics-based models and bottom-hole temperature measurements from oil and gas wells to generate heat flow and temperature-at-depth maps. Given the uncertainties and simplifying assumptions associated with the current state of physics-based models used in this field, this thesis explores an alternate approach for locating geothermally active regions using machine learning methods coupled with physics knowledge of geothermal energy problems, in the emerging field of physics-guided machine learning. There are two primary contributions of this thesis. First, we present a thorough analysis of using state-of-the-art machine learning models to predict a subsurface geothermal parameter, temperature-at-depth, using a rich geo-spatial dataset across the Appalachian Basin. Specifically, we explore a suite of machine learning algorithms such as neural networks (DNN), Ridge regression (R-reg) models, and decision-tree-based models (e.g., XGBoost and Random Forest). We found that XGBoost and Random Forests result in the highest accuracy for subsurface temperature prediction. We also ran our model on a fine spatial grid to provide 2D continuous temperature maps at three different depths using the XGBoost model, which can be used to locate prospective geothermally active regions. Second, we develop a physics-guided machine learning model for predicting subsurface temperatures that not only uses surface temperature, thermal conductivity coefficient, and depth as input parameters, but also the heat-flux parameter that is known to be a potent indicator of temperature-at-depth values according to physics knowledge of geothermal energy problems. Since, there is no independent easy-to-use method for observing heat-flux directly or inferring it from other observed variables. We develop an innovative approach to take into account heat-flux parameters through a physics-guided clustering-regression model. Specifically, the bottom-hole temperature data is initially clustered into multiple groups based on the heat-flux parameter using Gaussian mixture model (GMM). This is followed by training neural network regression models using the data within each constant heat-flux region. Finally, a KNN classifier is trained for cluster membership prediction. Our preliminary results indicate that our proposed approach results in lower errors as the number of clusters increases because the heat-flux parameter is indirectly accounted for in the machine learning model. / Master of Science / Machine learning and artificial intelligence have transformed many research fields and industries. In this thesis, we investigate the applicability of machine learning and data-driven approaches in the field of geothermal energy exploration. Given the uncertainties and simplifying assumptions associated with the current state of physics-based models, we show that machine learning can provide viable alternative solutions for geothermal energy mapping. First, we explore a suite of machine learning algorithms such as neural networks (DNN), Ridge regression (R-reg) models, and decision-tree based models (e.g., XGBoost and Random Forest). We find that XGBoost and Random Forests result in the highest accuracy for subsurface temperature prediction. Accuracy measures show that machine learning models are at par with physics-based models and can even outperform the thermal conductivity model. Second, we incorporate the thermal conductivity theory with machine learning and propose an innovative clustering-regression approach in the emerging area of physics-guided machine learning that results in a smaller error than black-box machine learning methods. Renewable Energy Geothermal Energy Machine learning XGBoost Subsurface temperature geothermal gradient
33	Enhancing NFL Game Insights: Leveraging XGBoost For Advanced Football Data Analytics To Quantify Multifaceted Aspects Of Gameplay Schoborg, Christopher P 01 January 2024 (has links) (PDF) XGBoost, renowned for its efficacy in various statistical domains, offers enhanced precision and efficiency. Its versatility extends to both regression and categorization tasks, rendering it a valuable asset in predictive modeling. In this dissertation, I aim to harness the power of XGBoost to forecast and rank performances within the National Football League (NFL). Specifically, my research focuses on predicting the next play in NFL games based on pre-snap data, optimizing the draft ranking process by integrating data from the NFL combine, and collegiate statistics, creating a player rating system that can be compared across all positions, and evaluating strategic decisions for NFL teams when crossing the 50-yard line, including the feasibility of attempting a first down conversion versus opting for a field goal attempt. NFL Analytics XGBoost Prediction Fourth Down Categorical Data Analysis Data Science
34	Advancing Credit Risk Analysis through Machine Learning Techniques : Utilizing Predictive Modeling to Enhance Financial Decision-Making and Risk Assessment Lampinen, Henrik, Nyström, Isac January 2024 (has links) Assessment of credit risk is crucial for the financial stability of banks, directly influencing their lending policies and economic resilience. This thesis explores advanced techniques for predictive modeling of Loss Given Default (LGD) and credit losses within major Swedish banks, with a focus on sophisticated methods in statistics and machine learning. The study specifically evaluates the effectiveness of various models, including linear regression, quantile regression, extreme gradient boosting, and ANN, to address the complexity of LGD’s bimodal distribution and the non-linearity in credit loss data. Key findings highlight the robustness of ANN and XGBoost in modeling complex data patterns, offering significant improvements over traditional linear models. The research identifies critical macroeconomic indicators—such as real estate prices, inflation, and unemployment rates—through an Elastic Net model, underscoring their predictive power in assessing credit risks. Credit Risk Loss Given Default Credit Loss Quantile Regression Machine Learning ANN XGBoost Mathematics Matematik
35	Weather Impact on Energy Consumption For Electric Trucks : Predictive modelling with Machine Learning / Väders påverkan på energikonsumption för elektriska lastbilar : Prediktiv modellering med maskininlärning Carlsson, Robert, Nordgren, Emrik January 2024 (has links) Companies in the transporting sector are undergoing an important transformation of electrifyingtheir fleets to meet the industry’s climate targets. To meet customer’s requests, keep its marketposition, and to contribute to a sustainable transporting industry, Scania needs to be in frontof the evolution. One aspect of this is to attract customers by providing accurate information anddetecting customer’s opportunities for electrification. Understanding the natural behavior of weatherparameters and their impact on energy consumption is crucial for providing accurate simulations ofhow daily operations would appear with an electric truck. The aim of this thesis is to map weatherparameters impact on energy consumption and to get an understanding of the correlations betweenenergy consumption and dynamic weather data. ML and deep learning models have undergone training using historical data from operations per-formed by Scania’s Battery Electric Vehicles(BEV). These models have been assessed against eachother to ensure that they are robust and accurate. Utilizing the trained models ability to providereliable consumption predictions based on weather, we can extract information and patterns aboutconsumption derived from customised weather parameters. The results show several interesting correlations and can quantify the impact of weather parametersunder certain conditions. Temperature is a significant factor that has a negative correlation withenergy consumption while other factors like precipitation and humidity prove less clear results. Byinteracting parameters with each other, some new results were found. For instance, the effect ofhumidity is clarified under certain temperatures. Wind speed also turns out to be an importantfactor with a positive correlation to energy consumption. Energy Consumption Weather Parameters Machine Learning XGBoost LSTM Convolutional Neural Network Mathematics Matematik
36	Försäljningsprediktion : en jämförelse mellan regressionsmodeller / Sales prediction : a comparison between regression models Fridh, Anton, Sandbecker, Erik January 2021 (has links) Idag finns mängder av företag i olika branscher, stora som små, som vill förutsäga sin försäljning. Det kan bland annat bero på att de vill veta hur stort antal produkter de skall köpa in eller tillverka, och även vilka produkter som bör investeras i över andra. Vilka varor som är bra att investera i på kort sikt och vilka som är bra på lång sikt. Tidigare har detta gjorts med intuition och statistik, de flesta vet att skidjackor inte säljer så bra på sommaren, eller att strandprylar inte säljer bra under vintern. Det här är ett simpelt exempel, men hur blir det när komplexiteten ökar, och det finns ett stort antal produkter och butiker? Med hjälp av maskininlärning kan ett sånt här problem hanteras. En maskininlärningsalgoritm appliceras på en tidsserie, som är en datamängd med ett antal ordnade observationer vid olika tidpunkter under en viss tidsperiod. I den här studiens fall är detta försäljning av olika produkter som säljs i olika butiker och försäljningen ska prediceras på månadsbasis. Tidsserien som behandlas är ett dataset från Kaggle.com som kallas för “Predict Future Sales”. Algoritmerna som används i för den här studien för att hantera detta tidsserieproblem är XGBoost, MLP och MLR. XGBoost, MLR och MLP har i tidigare forskning gett bra resultat på liknande problem, där bland annat bilförsäljning, tillgänglighet och efterfrågan på taxibilar och bitcoin-priser legat i fokus. Samtliga algoritmer presterade bra utifrån de evalueringsmått som användes för studierna, och den här studien använder samma evalueringsmått. Algoritmernas prestation beskrivs enligt så kallade evalueringsmått, dessa är R², MAE, RMSE och MSE. Det är dessa mått som används i resultat- och diskussionskapitlen för att beskriva hur väl algoritmerna presterar. Den huvudsakliga forskningsfrågan för studien lyder därför enligt följande: Vilken av algoritmerna MLP, XGBoost och MLR kommer att prestera bäst enligt R², MAE, RMSE och MSE på tidsserien “Predict Future Sales”. Tidsserien behandlas med ett känt tillvägagångssätt inom området som kallas CRISP-DM, där metodens olika steg följs. Dessa steg innebär bland annat dataförståelse, dataförberedelse och modellering. Denna metod är vad som i slutändan leder till resultatet, där resultatet från de olika modellerna som skapats genom CRISP-DM presenteras. I slutändan var det MLP som fick bäst resultat enligt mätvärdena, följt av MLR och XGBoost. MLP fick en RMSE på 0.863, MLR på 1.233 och XGBoost på 1.262 / Today, there are a lot of companies in different industries, large and small, that want to predict their sales. This may be due, among other things, to the fact that they want to know how many products they should buy or manufacture, and also which products should be invested in over others. In the past, this has been done with intuition and statistics. Most people know that ski jackets do not sell so well in the summer, or that beach products do not sell well during the winter. This is a simple example, but what happens when complexity increases, and there are a large number of products and stores? With the help of machine learning, a problem like this can be managed easier. A machine learning algorithm is applied to a time series, which is a set of data with several ordered observations at different times during a certain time period. In the case of this study, it is the sales of different products sold in different stores, and sales are to be predicted on a monthly basis. The time series in question is a dataset from Kaggle.com called "Predict Future Sales". The algorithms used in this study to handle this time series problem are XGBoost, MLP and MLR. XGBoost, MLR and MLP. These have in previous research performed well on similar problems, where, among other things, car sales, availability and demand for taxis and bitcoin prices were in focus. All algorithms performed well based on the evaluation metrics used by the studies, and this study uses the same evaluation metrics. The algorithms' performances are described according to so-called evaluation metrics, these are R², MAE, RMSE and MSE. These measures are used in the results and discussion chapters to describe how well the algorithms perform. The main research question for the study is therefore as follows: Which of the algorithms MLP, XGBoost and MLR will perform best according to R², MAE, RMSE and MSE on the time series "Predict Future Sales". The time series is treated with a known approach called CRISP-DM, where the methods are followed in different steps. These steps include, among other things, data understanding, data preparation and modeling. This method is what ultimately leads to the results, where the results from the various models created by CRISP-DM are presented. In the end, it was the MLP algorithm that got the best results according to the measured values, followed by MLR and XGBoost. MLP got an RMSE of 0.863, MLR of 1,233 and XGBoost of 1,262 Time series XGBoost MLP MLR Evaluation Metrics Crisp-DM “Predict Future Sales” Machine Learning Regression Sales Prediction Tidsserie XGBoost MLP MLR Mätvärden Crisp-DM “Predict Future Sales” Maskininlärning Regression Prediktion av försäljning. Computer and Information Sciences Data- och informationsvetenskap
37	Improving End-Of-Line Quality Control of Fuel Cell Manufacturing Through Machine Lerning Enabled Data Analysis Sasse, Fabian, Fischer, Georg, Eschner, Niclas, Lanza, Gisela 27 May 2022 (has links) For an economically sustainable fuel cell commercialization, robust manufacturing processes are essential. As current quality control is time-consuming and costly for manufacturers, standardized solutions are required that reduce cycle times needed to determine cell quality. With existing studies examining durability in field use, little is known about end-of-line detection of cell malfunctions. Applying machine learning algorithms to analyse performance measures of 3600 PEM fuel cells, this work presents a concept to automatically classify produced fuel cells according to cell performance indicators. Using a deep learning autoencoder and the extreme gradient boosting algorithm for anomaly detection and cell classification, models are created that detect cells associated with potential cell malfunctions. The work shows that the models developed predict key performance features in an early stage of the quality control phase and contributes to the overall goal of achieving cycle time reduction for manufacturers quality control procedures. / Für eine wirtschaftlich nachhaltige Kommerzialisierung von Brennstoffzellen sind robuste Herstellungsprozesse unerlässlich. Da die derzeitige Qualitätskontrolle zeitaufwändig und kostenintensiv ist, sind standardisierte Lösungen erforderlich. Während bisherige Arbeiten vorwiegend Lebensdaueruntersuchungen durchführen, ist nur wenig über die Erkennung von Zellfehlfunktionen am Ende der Produktionslinie bekannt. Durch die Anwendung von Algorithmen des maschinellen Lernens zur Analyse der Leistungsdaten von 3600 PEM-Brennstoffzellen wird in dieser Arbeit ein Konzept zur automatischen Klassifizierung produzierter Brennstoffzellen anhand von Leistungsindikatoren der Zellen vorgestellt. Unter Verwendung eines Deep-Learning-Autoencoders und des Extreme-Gradient-Boosting-Algorithmus zur Erkennung von Anomalien und zur Klassifizierung von Zellen werden Modelle erstellt, die Zellen erkennen, die mit potenziellen Zellfehlfunktionen in Verbindung stehen. Die Arbeit zeigt, dass die entwickelten Modelle wichtige Leistungsmerkmale in einem frühen Stadium der Qualitätskontrollphase vorhersagen und zum Gesamtziel der Reduzierung der Zykluszeit für die Qualitätskontrollverfahren der Hersteller beitragen. XGBOOST, AUTOENCODER info:eu-repo/classification/ddc/600 ddc:600 info:eu-repo/classification/ddc/620 ddc:620 info:eu-repo/classification/ddc/629 ddc:629
38	Using Gradient Boosting to Identify Pricing Errors in GLM-Based Tariffs for Non-life Insurance / Identifiering av felprissättningar i GLM-baserade skadeförsäkringstariffer genom Gradient boosting Greberg, Felix, Rylander, Andreas January 2022 (has links) Most non-life insurers and many creditors use regressions, more specifically Generalized Linear Models (GLM), to price their liabilities. One limitation with GLMs is that interactions between predictors are handled manually, which makes finding interactions a tedious and time-consuming task. This increases the cost of rate making and, more importantly, actuaries can miss important interactions resulting in sub-optimal customer prices. Several papers have shown that Gradient Tree Boosting can outperform GLMs in insurance pricing since it handles interactions automatically. Insurers and creditors are however reluctant to use so-called ”Black-Box” solutions for both regulatory and technical reasons. Tree-based methods have been used to identify pricing errors in regressions, albeit only as ad-hoc solutions. The authors instead propose a systematic approach to automatically identify and evaluate interactions between predictors before adding them to a traditional GLM. The model can be used in three different ways: Firstly, it can create a table of statistically significant candidate interactions to add to a GLM. Secondly, it can automatically and iteratively add new interactions to an old GLM until no more statistically significant interactions can be found. Lastly, it can automatically create a new GLM without an existing pricing model. All approaches are tested on two motor insurance data sets from a Nordic P&C insurer and the results show that all methods outperform the original GLMs. Although the two iterative modes perform better than the first, insurers are recommended to mainly use the first mode since this results in a reasonable trade-off between automating processes and leveraging actuaries’ professional judgment. / De flesta skadeförsäkringsbolag och många långivare använder regressioner, mer specifikt generaliserade linjära modeller (GLM), för att prissätta sina skulder. En begräsning med GLM:er är att interaktioner mellan exogena variabler hanteras manuellt, vilket innebär att hanteringen av dessa är tidskrävande. Detta påverkar försäkringsbolags lönsamhet på flera sätt. För det första ökar kostnaderna för att skapa tariffer och för det andra kan aktuarier missa viktiga interaktioner, vilket resulterar i suboptimala kundpriser. Tidigare forskning visar att Gradient Boosting kan överträffa GLM:er inom försäkringsprissättning eftersom denna metod hanterar interaktioner automatiskt. Försäkringsbolag och kreditgivare är dock motvilliga till att använda så kallade ”Black-box-lösningar” på grund av både regulatoriska och tekniska skäl. Trädbaserade metoder har tidigare använts för att hitta felprissättningar i regressioner, dock endast genom situationsanpassade lösningar. Författarna föreslår i stället en systematisk metod för att automatiskt identifiera och evaluera interaktioner innan de inkluderas i en traditionell GLM. Modellen kan användas på tre olika sätt: Först och främst kan den användas för att skapa en tabell med statistiskt signifikanta interaktioner att addera till en existerande GLM. Utöver detta kan den iterativt och automatiskt lägga till sådana interaktioner tills inga fler återstår. Slutligen kan modellen också användas för att skapa en helt ny GLM från grunden, utan en existerande prissättningsmodell. Metoderna testas på två motorförsäkringsdataset från ett nordiskt skadeförsäkringsbolag och resultaten visar att alla överträffar originalregressionen. Även om de två iterativa metoderna överträffar den första metoden rekommenderas försäkringsbolag att använda den första metoden. Detta eftersom den resulterar i en rimlig avvägning mellan att automatisera processer och att nyttja aktuariers omdömesförmåga. GLM Gradient Boosting XGBoost Non-life insurance Property & Casualty Rate making Insurance Tariff MTPL insurance Machine learning Regression trees Tweedie regression Credit risk GLM Gradient Boosting XGBoost Skadeförsäkring Prissättning Försäkringstariff Trafikförsäkring Regressionsträd Maskininlärning Tweedie-regression Kreditrisk Other Mathematics Annan matematik
39	Analytisk Studie av Avancerade Gradientförstärkningsalgoritmer för Maskininlärning : En jämförelse mellan XGBoost, CatBoost, LightGBM, SnapBoost, KTBoost, AdaBoost och GBDT för klassificering- och regressionsproblem Wessman, Filip January 2021 (has links) Maskininlärning (ML) är idag ett mycket aktuellt, populärt och aktivt forskat område. Därav finns det idag en stor uppsjö av olika avancerade och moderna ML-algoritmer. Svårigheten är att bland dessa identifiera den mest optimala att applicera på ens tillämpningsområde. Algoritmer som bygger på Gradientförstärkning (eng. Gradient Boosting (GB)) har visat sig ha ett väldigt brett spektrum av appliceringsområden, flexibilitet, hög förutsägelseprestanda samt låga tränings- och förutsägelsetider. Huvudsyftet med denna studie är på klassificerings- och regressiondataset utvärdera och belysa prestandaskillnaderna av 5 moderna samt 2 äldre GB-algoritmer. Målet är att avgöra vilken av dessa moderna algoritmer som presterar i genomsnitt bäst utifrån på flera utvärderingsmått. Initialt utfördes en teoretisk förstudie inom det aktuella forskningsområdet. Algoritmerna XGBoost, LightGBM, CatBoost, AdaBoost, SnapBoost, KTBoost, GBDT implementerades på plattformen Google Colab. Där utvärderades dess respektive, tränings- och förutsägelsestid samt prestandamåtten, uppdelat i ROCAUC och Log Loss för klassificering samt R2 och RMSE för regression. Resultaten visade att det generellt var små skillnader mellan dom olika testade algoritmerna. Med undantag för AdaBoost som i allmänhet, med större marginal, hade den sämsta prestandan. Därmed gick det inte i denna jämförelse utse en klar vinnare. Däremot presterade SnapBoost väldigt bra på flera utvärderingsmått. Modellresultaten är generellt sätt väldigt begränsade och bundna till det applicerade datasetet vilket gör att det överlag är väldigt svårt att generalisera det till andra datauppsättningar. Detta speglar sig från resultaten med svårigheten att identifiera ett ML-ramverk som utmärker sig och presterar bra i alla scenarier. / Machine learning (ML) is today a very relevent, popular and actively researched area. As a result, today there exits a large numer of different advanced and modern ML algorithms. The difficulty is to identify among these the most optimal to apply to one’s area of application. Algorithms based on Gradient Boosting (GB) have been shown to have a very wide range of application areas, flexibility, high prediction performance and low training and prediction times. The main purpose of this study is on classification and regression datasets evaluate and illustrate the performance differences of 5 modern and 2 older GB algorithms. The goal is to determine which of these modern algorithms, on average, performs best on the basis of several evaluation metrics. Initially, a theoretical feasibility study was carried out in the current research area. The algorithms XGBoost, LightGBM, CatBoost, AdaBoost, SnapBoost, KTBoost, GBDT were implemented on the Google Colab platform. There, respective training and prediction time as well as the performance metrics were evaluated, divided into ROC-AUC and Log Loss for classification and R2 and RMSE for regression. The results showed that there were generally small differences between the different algorithms tested. With the exception of AdaBoost which in general, by a larger margin, had the worst performance. Thus, it was not possible in this comparison to nominate a clear winner. However, SnapBoost performed very well in several evaluation metrics. The model results are generally very limited and bound to the applied dataset, which makes it generally very difficult to generalize it to other data sets. This is reflected in the results with the difficulty of identifying an ML framework that excels and performs well in all scenarios. Machine learning Classification Regression XGBoost LightGBM CatBoost AdaBoost SnapBoost KTBoost GBDT ROC-AUC Log Loss R2 RMSE Maskininlärning Klassificering Regression XGBoost LightGBM CatBoost AdaBoost SnapBoost KTBoost GBDT ROC-AUC Log Loss R2 RMSE Software Engineering Programvaruteknik
40	Modeling Melodic Accents in Jazz Solos / Modellering av melodiska accenter i jazzsolon Berrios Salas, Misael January 2023 (has links) This thesis looks at how accurately one can model accents in jazz solos, more specifically the sound level. Further understanding the structure of jazz solos can give a way of pedagogically presenting differences within music styles and even between performers. Some studies have tried to model perceived accents in different music styles. In other words, model how listeners perceive some tones as somehow accentuated and more important than others. Other studies have looked at how the sound level correlates to other attributes of the tone. But to our knowledge, no other studies have been made modeling actual accents within jazz solos, nor have other studies had such a big amount of training data. The training data used is a set of 456 solos from the Weimar Jazz Database. This is a database containing tone data and metadata from monophonic solos performed with multiple instruments. The features used for the training algorithms are features obtained from the software Director Musices created at the Royal Institute of Technology in Sweden; features obtained from the software "melfeature" created at the University of Music Franz Liszt Weimar in Germany; and features built upon tone data or solo metadata from the Weimar Jazz Database. A comparison between these is made. Three learning algorithms are used, Multiple Linear Regression (MLR), Support Vector Regression (SVR), and eXtreme Gradient Boosting (XGBoost). The first two are simpler regression models while the last is an award-winning tree boosting algorithm. The tests resulted in eXtreme Gradient Boosting (XGBoost) having the highest accuracy when combining all the available features minus some features that were removed since they did not improve the accuracy. The accuracy was around 27% with a high standard deviation. This tells that there was quite some difference when predicting the different solos, some had an accuracy of about 67% while others did not predict one tone correctly in the entire solo. But as a general model, the accuracy is too low for actual practical use. Either the methods were not the optimal ones or jazz solos differ too much to find a general pattern. / Detta examensarbete undersöker hur väl man kan modellera accenter i jazz-solos, mer specifikt ljudnivån. En bredare förståelse för strukturen i jazzsolos kan ge ett sätt att pedagogiskt presentera skillnaderna mellan olika musikstilar och även mellan olika artister. Andra studier har försökt modellera uppfattade accenter inom olika musik-stilar. Det vill säga, modellera hur åhörare upplever vissa toner som accentuerade och viktigare än andra. Andra studier har undersökt hur ljudnivån är korrelerad till andra attribut hos tonen. Men såvitt vi vet, så finns det inga andra studier som modellerar faktiska accenter inom jazzsolos, eller som haft samma stora mängd träningsdata. Träningsdatan som använts är ett set av 456 solos tagna från Weimar Jazz Database. Databasen innehåller data på toner och metadata från monofoniska solos genomförda med olika instrument. Särdragen som använts för tränings-algoritmerna är särdrag erhållna från mjukvaran Director Musices skapad på Kungliga Tekniska Högskolan i Sverige; särdrag erhållna från mjukvaran ”melfeature” skapad på University of Music Franz Liszt Weimar i Tyskland; och särdrag skapade utifrån datat i Weimar Jazz Database. En jämförelse mellan dessa har också gjorts. Tre inlärningsalgoritmer har använts, Multiple Linear Regression (MLR), Support Vector Regression (SVR), och eXtreme Gradient Boosting (XGBoost). De första två är enklare regressionsalgoritmer, medan den senare är en prisbelönt trädförstärkningsalgoritm. Testen resulterade i att eXtreme Gradient Boosting (XGBoost) skapade en modell med högst noggrannhet givet alla tillgängliga särdrag som träningsdata minus vissa särdrag som tagits bort då de inte förbättrar noggrannheten. Den erhållna noggrannheten låg på runt 27% med en hög standardavvikelse. Detta pekar på att det finns stora skillnader mellan att förutsäga ljudnivån mellan de olika solin. Vissa solin gav en noggrannhet på runt 67% medan andra erhöll inte en endaste ljudnivå korrekt i hela solot. Men som en generell modell är noggrannheten för låg för att användas i praktiken. Antingen är de valda metoderna inte de bästa, eller så är jazzsolin för olika för att hitta ett generellt mönster som går att förutsäga. Accents Jazz Solo Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamic Accenter Jazz Solos Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamisk Computer and Information Sciences Data- och informationsvetenskap

Search results