31 |
Automation of price prediction using machine learning in a large furniture companyGhorbanali, Mojtaba January 2022 (has links)
The accurate prediction of the price of products can be highlybeneficial for the procurers both businesses wised and productionwise. Many companies today, in various fields ofoperations and sizes, have access to a vast amount of datathat valuable information can be extracted from them. In thismaster thesis, some large databases of products in differentcategories have been analyzed. Because of confidentiality, thelabels from the database that are in this thesis are subtitled bysome general titles and the real titles are not mentioned. Also,the company is not referred to by name, but the whole job iscarried out on the real data set of products. As a real-worlddata set, the data was messy and full of nulls and missing data.So, the data wrangling took some more time. The approachesthat were used for the model were Regression methods andGradient Boosting models.The main purpose of this master thesis was to build priceprediction models based on the features of each item to assistwith the initial positioning of the product and its initial price.The best result that was achieved during this master thesiswas from XGBoost machine learning model with about 96%accuracy which can be beneficial for the producer to acceleratetheir pricing strategies.
|
32 |
Using Machine Learning as a Tool to Improve Train Wheel Overhaul EfficiencyGert, Oskar January 2020 (has links)
This thesis develops a method for using machine learning in a industrial pro-cess. The implementation of this machine learning model aimed to reduce costsand increase efficiency of train wheel overhaul in partnership with the AustrianFederal Railroads, Oebb. Different machine learning models as well as categoryencodings were tested to find which performed best on the data set. In addition,differently sized training sets were used to determine whether size of the trainingset affected the results. The implementation shows that Oebb can save moneyand increase efficiency of train wheel overhaul by using machine learning andthat continuous training of prediction models is necessary because of variationsin the data set.
|
33 |
Stock Price Movement Prediction Using Sentiment Analysis and Machine LearningWang, Jenny Zheng 01 June 2021 (has links) (PDF)
Stock price prediction is of strong interest but a challenging task to both researchers and investors. Recently, sentiment analysis and machine learning have been adopted in stock price movement prediction. In particular, retail investors’ sentiment from online forums has shown their power to influence the stock market. In this paper, a novel system was built to predict stock price movement for the following trading day. The system includes a web scraper, an enhanced sentiment analyzer, a machine learning engine, an evaluation module, and a recommendation module. The system can automatically select the best prediction model from four state-of-the-art machine learning models (Long Short-Term Memory, Support Vector Machine, Random Forest, and Extreme Boost Gradient Tree) based on the acquired data and the models’ performance. Moreover, stock market lexicons were created using large-scale text mining on the Yahoo Finance Conversation boards and natural language processing. Experiments using the top 30 stocks on the Yahoo users’ watchlists and a randomly selected stock from NASDAQ were performed to examine the system performance and proposed methods. The experimental results show that incorporating sentiment analysis can improve the prediction for stocks with a large daily discussion volume. Long Short-Term Memory model outperformed other machine learning models when using both price and sentiment analysis as inputs. In addition, the Extreme Boost Gradient Tree (XGBoost) model achieved the highest accuracy using the price-only feature on low-volume stocks. Last but not least, the models using the enhanced sentiment analyzer outperformed the VADER sentiment analyzer by 1.96%.
|
34 |
Physics-guided Machine Learning Approaches for Applications in Geothermal Energy PredictionShahdi, Arya 03 June 2021 (has links)
In the area of geothermal energy mapping, scientists have used physics-based models and bottom-hole temperature measurements from oil and gas wells to generate heat flow and temperature-at-depth maps. Given the uncertainties and simplifying assumptions associated with the current state of physics-based models used in this field, this thesis explores an alternate approach for locating geothermally active regions using machine learning methods coupled with physics knowledge of geothermal energy problems, in the emerging field of physics-guided machine learning.
There are two primary contributions of this thesis. First, we present a thorough analysis of using state-of-the-art machine learning models to predict a subsurface geothermal parameter, temperature-at-depth, using a rich geo-spatial dataset across the Appalachian Basin. Specifically, we explore a suite of machine learning algorithms such as neural networks (DNN), Ridge regression (R-reg) models, and decision-tree-based models (e.g., XGBoost and Random Forest). We found that XGBoost and Random Forests result in the highest accuracy for subsurface temperature prediction. We also ran our model on a fine spatial grid to provide 2D continuous temperature maps at three different depths using the XGBoost model, which can be used to locate prospective geothermally active regions.
Second, we develop a physics-guided machine learning model for predicting subsurface temperatures that not only uses surface temperature, thermal conductivity coefficient, and depth as input parameters, but also the heat-flux parameter that is known to be a potent indicator of temperature-at-depth values according to physics knowledge of geothermal energy problems. Since, there is no independent easy-to-use method for observing heat-flux directly or inferring it from other observed variables. We develop an innovative approach to take into account heat-flux parameters through a physics-guided clustering-regression model. Specifically, the bottom-hole temperature data is initially clustered into multiple groups based on the heat-flux parameter using Gaussian mixture model (GMM). This is followed by training neural network regression models using the data within each constant heat-flux region. Finally, a KNN classifier is trained for cluster membership prediction. Our preliminary results indicate that our proposed approach results in lower errors as the number of clusters increases because the heat-flux parameter is indirectly accounted for in the machine learning model. / Master of Science / Machine learning and artificial intelligence have transformed many research fields and industries. In this thesis, we investigate the applicability of machine learning and data-driven approaches in the field of geothermal energy exploration. Given the uncertainties and simplifying assumptions associated with the current state of physics-based models, we show that machine learning can provide viable alternative solutions for geothermal energy mapping. First, we explore a suite of machine learning algorithms such as neural networks (DNN), Ridge regression (R-reg) models, and decision-tree based models (e.g., XGBoost and Random Forest). We find that XGBoost and Random Forests result in the highest accuracy for subsurface temperature prediction. Accuracy measures show that machine learning models are at par with physics-based models and can even outperform the thermal conductivity model. Second, we incorporate the thermal conductivity theory with machine learning and propose an innovative clustering-regression approach in the emerging area of physics-guided machine learning that results in a smaller error than black-box machine learning methods.
|
35 |
Machine Learning Models in Fullerene/Metallofullerene Chromatography StudiesLiu, Xiaoyang 08 August 2019 (has links)
Machine learning methods are now extensively applied in various scientific research areas to make models. Unlike regular models, machine learning based models use a data-driven approach. Machine learning algorithms can learn knowledge that are hard to be recognized, from available data. The data-driven approaches enhance the role of algorithms and computers and then accelerate the computation using alternative views. In this thesis, we explore the possibility of applying machine learning models in the prediction of chromatographic retention behaviors. Chromatographic separation is a key technique for the discovery and analysis of fullerenes. In previous studies, differential equation models have achieved great success in predictions of chromatographic retentions. However, most of the differential equation models require experimental measurements or theoretical computations for many parameters, which are not easy to obtain. Fullerenes/metallofullerenes are rigid and spherical molecules with only carbon atoms, which makes the predictions of chromatographic retention behaviors as well as other properties much simpler than other flexible molecules that have more variations on conformations. In this thesis, I propose the polarizability of a fullerene molecule is able to be estimated directly from the structures. Structural motifs are used to simplify the model and the models with motifs provide satisfying predictions. The data set contains 31947 isomers and their polarizability data and is split into a training set with 90% data points and a complementary testing set. In addition, a second testing set of large fullerene isomers is also prepared and it is used to testing whether a model can be trained by small fullerenes and then gives ideal predictions on large fullerenes. / Machine learning models are capable to be applied in a wide range of areas, such as scientific research. In this thesis, machine learning models are applied to predict chromatography behaviors of fullerenes based on the molecular structures. Chromatography is a common technique for mixture separations, and the separation is because of the difference of interactions between molecules and a stationary phase. In real experiments, a mixture usually contains a large family of different compounds and it requires lots of work and resources to figure out the target compound. Therefore, models are extremely import for studies of chromatography. Traditional models are built based on physics rules, and involves several parameters. The physics parameters are measured by experiments or theoretically computed. However, both of them are time consuming and not easy to be conducted. For fullerenes, in my previous studies, it has been shown that the chromatography model can be simplified and only one parameter, polarizability, is required. A machine learning approach is introduced to enhance the model by predicting the molecular polarizabilities of fullerenes based on structures. The structure of a fullerene is represented by several local structures. Several types of machine learning models are built and tested on our data set and the result shows neural network gives the best predictions.
|
36 |
Enhancing NFL Game Insights: Leveraging XGBoost For Advanced Football Data Analytics To Quantify Multifaceted Aspects Of GameplaySchoborg, Christopher P 01 January 2024 (has links) (PDF)
XGBoost, renowned for its efficacy in various statistical domains, offers enhanced precision and efficiency. Its versatility extends to both regression and categorization tasks, rendering it a valuable asset in predictive modeling. In this dissertation, I aim to harness the power of XGBoost to forecast and rank performances within the National Football League (NFL). Specifically, my research focuses on predicting the next play in NFL games based on pre-snap data, optimizing the draft ranking process by integrating data from the NFL combine, and collegiate statistics, creating a player rating system that can be compared across all positions, and evaluating strategic decisions for NFL teams when crossing the 50-yard line, including the feasibility of attempting a first down conversion versus opting for a field goal attempt.
|
37 |
Advancing Credit Risk Analysis through Machine Learning Techniques : Utilizing Predictive Modeling to Enhance Financial Decision-Making and Risk AssessmentLampinen, Henrik, Nyström, Isac January 2024 (has links)
Assessment of credit risk is crucial for the financial stability of banks, directly influencing their lending policies and economic resilience. This thesis explores advanced techniques for predictive modeling of Loss Given Default (LGD) and credit losses within major Swedish banks, with a focus on sophisticated methods in statistics and machine learning. The study specifically evaluates the effectiveness of various models, including linear regression, quantile regression, extreme gradient boosting, and ANN, to address the complexity of LGD’s bimodal distribution and the non-linearity in credit loss data. Key findings highlight the robustness of ANN and XGBoost in modeling complex data patterns, offering significant improvements over traditional linear models. The research identifies critical macroeconomic indicators—such as real estate prices, inflation, and unemployment rates—through an Elastic Net model, underscoring their predictive power in assessing credit risks.
|
38 |
Weather Impact on Energy Consumption For Electric Trucks : Predictive modelling with Machine Learning / Väders påverkan på energikonsumption för elektriska lastbilar : Prediktiv modellering med maskininlärningCarlsson, Robert, Nordgren, Emrik January 2024 (has links)
Companies in the transporting sector are undergoing an important transformation of electrifyingtheir fleets to meet the industry’s climate targets. To meet customer’s requests, keep its marketposition, and to contribute to a sustainable transporting industry, Scania needs to be in frontof the evolution. One aspect of this is to attract customers by providing accurate information anddetecting customer’s opportunities for electrification. Understanding the natural behavior of weatherparameters and their impact on energy consumption is crucial for providing accurate simulations ofhow daily operations would appear with an electric truck. The aim of this thesis is to map weatherparameters impact on energy consumption and to get an understanding of the correlations betweenenergy consumption and dynamic weather data. ML and deep learning models have undergone training using historical data from operations per-formed by Scania’s Battery Electric Vehicles(BEV). These models have been assessed against eachother to ensure that they are robust and accurate. Utilizing the trained models ability to providereliable consumption predictions based on weather, we can extract information and patterns aboutconsumption derived from customised weather parameters. The results show several interesting correlations and can quantify the impact of weather parametersunder certain conditions. Temperature is a significant factor that has a negative correlation withenergy consumption while other factors like precipitation and humidity prove less clear results. Byinteracting parameters with each other, some new results were found. For instance, the effect ofhumidity is clarified under certain temperatures. Wind speed also turns out to be an importantfactor with a positive correlation to energy consumption.
|
39 |
Försäljningsprediktion : en jämförelse mellan regressionsmodeller / Sales prediction : a comparison between regression modelsFridh, Anton, Sandbecker, Erik January 2021 (has links)
Idag finns mängder av företag i olika branscher, stora som små, som vill förutsäga sin försäljning. Det kan bland annat bero på att de vill veta hur stort antal produkter de skall köpa in eller tillverka, och även vilka produkter som bör investeras i över andra. Vilka varor som är bra att investera i på kort sikt och vilka som är bra på lång sikt. Tidigare har detta gjorts med intuition och statistik, de flesta vet att skidjackor inte säljer så bra på sommaren, eller att strandprylar inte säljer bra under vintern. Det här är ett simpelt exempel, men hur blir det när komplexiteten ökar, och det finns ett stort antal produkter och butiker? Med hjälp av maskininlärning kan ett sånt här problem hanteras. En maskininlärningsalgoritm appliceras på en tidsserie, som är en datamängd med ett antal ordnade observationer vid olika tidpunkter under en viss tidsperiod. I den här studiens fall är detta försäljning av olika produkter som säljs i olika butiker och försäljningen ska prediceras på månadsbasis. Tidsserien som behandlas är ett dataset från Kaggle.com som kallas för “Predict Future Sales”. Algoritmerna som används i för den här studien för att hantera detta tidsserieproblem är XGBoost, MLP och MLR. XGBoost, MLR och MLP har i tidigare forskning gett bra resultat på liknande problem, där bland annat bilförsäljning, tillgänglighet och efterfrågan på taxibilar och bitcoin-priser legat i fokus. Samtliga algoritmer presterade bra utifrån de evalueringsmått som användes för studierna, och den här studien använder samma evalueringsmått. Algoritmernas prestation beskrivs enligt så kallade evalueringsmått, dessa är R², MAE, RMSE och MSE. Det är dessa mått som används i resultat- och diskussionskapitlen för att beskriva hur väl algoritmerna presterar. Den huvudsakliga forskningsfrågan för studien lyder därför enligt följande: Vilken av algoritmerna MLP, XGBoost och MLR kommer att prestera bäst enligt R², MAE, RMSE och MSE på tidsserien “Predict Future Sales”. Tidsserien behandlas med ett känt tillvägagångssätt inom området som kallas CRISP-DM, där metodens olika steg följs. Dessa steg innebär bland annat dataförståelse, dataförberedelse och modellering. Denna metod är vad som i slutändan leder till resultatet, där resultatet från de olika modellerna som skapats genom CRISP-DM presenteras. I slutändan var det MLP som fick bäst resultat enligt mätvärdena, följt av MLR och XGBoost. MLP fick en RMSE på 0.863, MLR på 1.233 och XGBoost på 1.262 / Today, there are a lot of companies in different industries, large and small, that want to predict their sales. This may be due, among other things, to the fact that they want to know how many products they should buy or manufacture, and also which products should be invested in over others. In the past, this has been done with intuition and statistics. Most people know that ski jackets do not sell so well in the summer, or that beach products do not sell well during the winter. This is a simple example, but what happens when complexity increases, and there are a large number of products and stores? With the help of machine learning, a problem like this can be managed easier. A machine learning algorithm is applied to a time series, which is a set of data with several ordered observations at different times during a certain time period. In the case of this study, it is the sales of different products sold in different stores, and sales are to be predicted on a monthly basis. The time series in question is a dataset from Kaggle.com called "Predict Future Sales". The algorithms used in this study to handle this time series problem are XGBoost, MLP and MLR. XGBoost, MLR and MLP. These have in previous research performed well on similar problems, where, among other things, car sales, availability and demand for taxis and bitcoin prices were in focus. All algorithms performed well based on the evaluation metrics used by the studies, and this study uses the same evaluation metrics. The algorithms' performances are described according to so-called evaluation metrics, these are R², MAE, RMSE and MSE. These measures are used in the results and discussion chapters to describe how well the algorithms perform. The main research question for the study is therefore as follows: Which of the algorithms MLP, XGBoost and MLR will perform best according to R², MAE, RMSE and MSE on the time series "Predict Future Sales". The time series is treated with a known approach called CRISP-DM, where the methods are followed in different steps. These steps include, among other things, data understanding, data preparation and modeling. This method is what ultimately leads to the results, where the results from the various models created by CRISP-DM are presented. In the end, it was the MLP algorithm that got the best results according to the measured values, followed by MLR and XGBoost. MLP got an RMSE of 0.863, MLR of 1,233 and XGBoost of 1,262
|
40 |
Improving End-Of-Line Quality Control of Fuel Cell Manufacturing Through Machine Lerning Enabled Data AnalysisSasse, Fabian, Fischer, Georg, Eschner, Niclas, Lanza, Gisela 27 May 2022 (has links)
For an economically sustainable fuel cell commercialization, robust manufacturing processes are essential. As current quality control is time-consuming and costly for manufacturers, standardized solutions are required that reduce cycle times needed to determine cell quality. With existing studies examining durability in field use, little is known about end-of-line detection of cell malfunctions. Applying machine learning algorithms to analyse performance measures of 3600 PEM fuel cells, this work presents a concept to automatically classify produced fuel cells according to cell performance indicators. Using a deep learning autoencoder and the extreme gradient boosting algorithm for anomaly detection and cell classification, models are created that detect cells associated with potential cell malfunctions. The work shows that the models developed predict key performance features in an early stage of the quality control phase and contributes to the overall goal of achieving cycle time reduction for manufacturers quality control procedures. / Für eine wirtschaftlich nachhaltige Kommerzialisierung von Brennstoffzellen sind robuste Herstellungsprozesse unerlässlich. Da die derzeitige Qualitätskontrolle zeitaufwändig und kostenintensiv ist, sind standardisierte Lösungen erforderlich. Während bisherige Arbeiten vorwiegend Lebensdaueruntersuchungen durchführen, ist nur wenig über die Erkennung von Zellfehlfunktionen am Ende der Produktionslinie bekannt. Durch die Anwendung von Algorithmen des maschinellen Lernens zur Analyse der Leistungsdaten von 3600 PEM-Brennstoffzellen wird in dieser Arbeit ein Konzept zur automatischen Klassifizierung produzierter Brennstoffzellen anhand von Leistungsindikatoren der Zellen vorgestellt. Unter Verwendung eines Deep-Learning-Autoencoders und des Extreme-Gradient-Boosting-Algorithmus zur Erkennung von Anomalien und zur Klassifizierung von Zellen werden Modelle erstellt, die Zellen erkennen, die mit potenziellen Zellfehlfunktionen in Verbindung stehen. Die Arbeit zeigt, dass die entwickelten Modelle wichtige Leistungsmerkmale in einem frühen Stadium der Qualitätskontrollphase vorhersagen und zum Gesamtziel der Reduzierung der Zykluszeit für die Qualitätskontrollverfahren der Hersteller beitragen.
|
Page generated in 0.0344 seconds