Global ETD Search

1	A Comparative Study of Machine Learning Models for Multivariate NextG Network Traffic Prediction with SLA-based Loss Function Baykal, Asude 20 October 2023 (has links) As Next Generation (NextG) networks become more complex, the need to develop a robust, reliable network traffic prediction framework for intelligent network management increases. This study compares the performance of machine learning models in network traffic prediction using a custom Service-Level Agreement (SLA) - based loss function to ensure SLA violation constraints while minimizing overprovisioning. The proposed SLA-based parametric custom loss functions are used to maintain the SLA violation rate percentages the network operators require. Our approach is multivariate, spatiotemporal, and SLA-driven, incorporating 20 Radio Access Network (RAN) features, custom peak traffic time features, and custom mobility-based clustering to leverage spatiotemporal relationships. In this study, five machine learning models are considered: one recurrent neural network (LSTM) model, two encoder-decoder architectures (Transformer and Autoformer), and two gradient-boosted tree models (XGBoost and LightGBM). The prediction performance of the models is evaluated based on different metrics such as SLA violation rate constraints, overprovisioning, and the custom SLA-based loss function parameter. According to our evaluations, Transformer models with custom peak time features achieve the minimum overprovisioning volume at 3% SLA violation constraint. Gradient-boosted tree models have lower overprovisioning volumes at higher SLA violation rates. / Master of Science / As the Next Generation (NextG) networks become more complex, the need to develop a robust, reliable network traffic prediction framework for intelligent network management increases. This study compares the performance of machine learning models in network traffic prediction using a custom loss function to ensure SLA violation constraints. The proposed SLA-based custom loss functions are used to maintain the SLA violation rate percentages required by the network operators while minimizing overprovisioning. Our approach is multivariate, spatiotemporal, and SLA-driven, incorporating 20 Radio Access Network (RAN) features, custom peak traffic time features, and mobility-based clustering to leverage spatiotemporal relationships. We use five machine learning and deep learning models for our comparative study: one recurrent neural network (RNN) model, two encoder-decoder architectures, and two gradient-boosted tree models. The prediction performance of the models was evaluated based on different metrics such as SLA violation rate constraints, overprovisioning, and the custom SLA-based loss function parameter. Cellular traffic prediction 5G and beyond LSTM Transformer Autoformer XGBoost LightGBM
2	Сравнение реализаций бустинг моделей на различных данных : магистерская диссертация / Comparison of boosting model implementations on different data Онуфриенко, В. И., Onufrienko, V. I. January 2024 (has links) Статья рассматривает сравнение различных реализаций бустинг моделей, таких как XGBoost, LightGBM и CatBoost, на различных наборах данных для оценки их эффективности и точности. / The article examines the comparison of different implementations of boosting models, such as XGBoost, LightGBM, and CatBoost, on various datasets to assess their effectiveness and accuracy. MASTER'S THESIS BOOSTING MODELS XGBOOST LIGHTGBM CATBOOST DATASETS EFFECTIVENESS ACCURACY БУСТИНГ МОДЕЛИ XGBOOST LIGHTGBM CATBOOST НАБОРЫ ДАННЫХ ЭФФЕКТИВНОСТЬ ТОЧНОСТЬ
3	Machine Learning for Outcome Prediction of High-Risk Trauma Patients in the Emergency Department Cardosi, Joshua David January 2021 (has links) No description available. Mechanical Engineering machine learning emergency department critical care mortality missing data neural network XGBoost LightGBM
4	Automation of price prediction using machine learning in a large furniture company Ghorbanali, Mojtaba January 2022 (has links) The accurate prediction of the price of products can be highlybeneficial for the procurers both businesses wised and productionwise. Many companies today, in various fields ofoperations and sizes, have access to a vast amount of datathat valuable information can be extracted from them. In thismaster thesis, some large databases of products in differentcategories have been analyzed. Because of confidentiality, thelabels from the database that are in this thesis are subtitled bysome general titles and the real titles are not mentioned. Also,the company is not referred to by name, but the whole job iscarried out on the real data set of products. As a real-worlddata set, the data was messy and full of nulls and missing data.So, the data wrangling took some more time. The approachesthat were used for the model were Regression methods andGradient Boosting models.The main purpose of this master thesis was to build priceprediction models based on the features of each item to assistwith the initial positioning of the product and its initial price.The best result that was achieved during this master thesiswas from XGBoost machine learning model with about 96%accuracy which can be beneficial for the producer to acceleratetheir pricing strategies. Price Prediction Machine learning Regression analysis Gradient boosting algorithms LightGBM XGBoost Computer Sciences Datavetenskap (datalogi)
5	Analytisk Studie av Avancerade Gradientförstärkningsalgoritmer för Maskininlärning : En jämförelse mellan XGBoost, CatBoost, LightGBM, SnapBoost, KTBoost, AdaBoost och GBDT för klassificering- och regressionsproblem Wessman, Filip January 2021 (has links) Maskininlärning (ML) är idag ett mycket aktuellt, populärt och aktivt forskat område. Därav finns det idag en stor uppsjö av olika avancerade och moderna ML-algoritmer. Svårigheten är att bland dessa identifiera den mest optimala att applicera på ens tillämpningsområde. Algoritmer som bygger på Gradientförstärkning (eng. Gradient Boosting (GB)) har visat sig ha ett väldigt brett spektrum av appliceringsområden, flexibilitet, hög förutsägelseprestanda samt låga tränings- och förutsägelsetider. Huvudsyftet med denna studie är på klassificerings- och regressiondataset utvärdera och belysa prestandaskillnaderna av 5 moderna samt 2 äldre GB-algoritmer. Målet är att avgöra vilken av dessa moderna algoritmer som presterar i genomsnitt bäst utifrån på flera utvärderingsmått. Initialt utfördes en teoretisk förstudie inom det aktuella forskningsområdet. Algoritmerna XGBoost, LightGBM, CatBoost, AdaBoost, SnapBoost, KTBoost, GBDT implementerades på plattformen Google Colab. Där utvärderades dess respektive, tränings- och förutsägelsestid samt prestandamåtten, uppdelat i ROCAUC och Log Loss för klassificering samt R2 och RMSE för regression. Resultaten visade att det generellt var små skillnader mellan dom olika testade algoritmerna. Med undantag för AdaBoost som i allmänhet, med större marginal, hade den sämsta prestandan. Därmed gick det inte i denna jämförelse utse en klar vinnare. Däremot presterade SnapBoost väldigt bra på flera utvärderingsmått. Modellresultaten är generellt sätt väldigt begränsade och bundna till det applicerade datasetet vilket gör att det överlag är väldigt svårt att generalisera det till andra datauppsättningar. Detta speglar sig från resultaten med svårigheten att identifiera ett ML-ramverk som utmärker sig och presterar bra i alla scenarier. / Machine learning (ML) is today a very relevent, popular and actively researched area. As a result, today there exits a large numer of different advanced and modern ML algorithms. The difficulty is to identify among these the most optimal to apply to one’s area of application. Algorithms based on Gradient Boosting (GB) have been shown to have a very wide range of application areas, flexibility, high prediction performance and low training and prediction times. The main purpose of this study is on classification and regression datasets evaluate and illustrate the performance differences of 5 modern and 2 older GB algorithms. The goal is to determine which of these modern algorithms, on average, performs best on the basis of several evaluation metrics. Initially, a theoretical feasibility study was carried out in the current research area. The algorithms XGBoost, LightGBM, CatBoost, AdaBoost, SnapBoost, KTBoost, GBDT were implemented on the Google Colab platform. There, respective training and prediction time as well as the performance metrics were evaluated, divided into ROC-AUC and Log Loss for classification and R2 and RMSE for regression. The results showed that there were generally small differences between the different algorithms tested. With the exception of AdaBoost which in general, by a larger margin, had the worst performance. Thus, it was not possible in this comparison to nominate a clear winner. However, SnapBoost performed very well in several evaluation metrics. The model results are generally very limited and bound to the applied dataset, which makes it generally very difficult to generalize it to other data sets. This is reflected in the results with the difficulty of identifying an ML framework that excels and performs well in all scenarios. Machine learning Classification Regression XGBoost LightGBM CatBoost AdaBoost SnapBoost KTBoost GBDT ROC-AUC Log Loss R2 RMSE Maskininlärning Klassificering Regression XGBoost LightGBM CatBoost AdaBoost SnapBoost KTBoost GBDT ROC-AUC Log Loss R2 RMSE Software Engineering Programvaruteknik
6	Shoppin’ in the Rain : An Evaluation of the Usefulness of Weather-Based Features for an ML Ranking Model in the Setting of Children’s Clothing Online Retailing / Handla i regnet : En utvärdering av användbarheten av väderbaserade variabler för en ML-rankningsmodell inom onlineförsäljning av barnkläder Lorentz, Isac January 2023 (has links) Online shopping offers numerous benefits, but large product catalogs make it difficult for shoppers to understand the existence and characteristics of every item for sale. To simplify the decision-making process, online retailers use ranking models to recommend products relevant to each individual user. Contextual user data, such as location, time, or local weather conditions, can serve as valuable features for ranking models, enabling personalized real-time recommendations. Little research has been published on the usefulness of weather-based features for ranking models in online clothing retailing, which makes additional research into this topic worthwhile. Using Swedish sales and customer data from Babyshop, an online retailer of children’s fashion, this study examined possible correlations between local weather data and sales. This was done by comparing differences in daily weather and differences in daily shares of sold items per clothing category for two cities: Stockholm and Göteborg. With Malmö as an additional city, historical observational weather data from one location each in the three cities Stockholm, Göteborg, and Malmö was then featurized and used along with the customers’ postal towns, sales features, and sales trend features to train and evaluate the ranking relevancy of a gradient boosted decision trees learning to rank LightGBM ranking model with weather features. The ranking relevancy was compared against a LightGBM baseline that omitted the weather features and a naive baseline: a popularity-based ranker. Several possible correlations between a clothing category such as shorts, rainwear, shell jackets, winter wear, and a weather variable such as feels-like temperature, solar energy, wind speed, precipitation, snow, and snow depth were found. Evaluation of the ranking relevancy was done using the mean reciprocal rank and the mean average precision @ 10 on a small dataset consisting only of customer data from the postal towns Stockholm, Göteborg, and Malmö and also on a larger dataset where customers in postal towns from larger geographical areas had their home locations approximated as Stockholm, Göteborg or Malmö. The LightGBM rankers beat the naive baseline in three out of four configurations, and the ranker with weather features outperformed the LightGBM baseline by 1.1 to 2.2 percent across all configurations. The findings can potentially help online clothing retailers create more relevant product recommendations. / Internethandel erbjuder flera fördelar, men stora produktsortiment gör det svårt för konsumenter att känna till existensen av och egenskaperna hos alla produkter som saluförs. För att förenkla beslutsprocessen så använder internethandlare rankningsmodeller för att rekommendera relevanta produkter till varje enskild användare. Kontextuell användardata såsom tid på dygnet, användarens plats eller lokalt väder kan vara värdefulla variabler för rankningsmodeller då det möjliggör personaliserade realtidsrekommendationer. Det finns inte mycket publicerad forskning inom nyttan av väderbaserade variabler för produktrekommendationssystem inom internethandel av kläder, vilket gör ytterligare studier inom detta område intressant. Med hjälp av svensk försäljnings- och kunddata från Babyshop, en internethandel för barnkläder så undersökte denna studie möjliga korrelationer mellan lokal väderdata och försäljning. Detta gjordes genom att jämföra skillnaderna i dagligt väder och skillnaderna i dagliga andelar av sålda artiklar per klädeskategori för två städer: Stockholm och Göteborg. Med Malmö som ytterligare en stad så gjordes historiska metereologiska observationer från en plats var i Stockholm, Göteborg och Malmö till variabler och användes tillsammans med kundernas postorter, försäljningsvariabler och variabler för försäljningstrender för att träna och utvärdera rankningsrelevansen hos en gradient-boosted decision trees learning to rank LightGBM rankningsmodell med vädervariabler. Rankningsrelevansen jämfördes mot en LightGBM baslinjesmodel som saknade vädervariabler samt en naiv baslinje: en popularitetsbaserad rankningsmodell. Flera möjliga korrelationer mellan en klädeskategori som shorts, regnkläder, skaljackor, vinterkläder och och en daglig vädervariabel som känns-som-temperatur, solenergi, vindhastighet, nederbörd, snö och snödjup upptäcktes. Utvärderingen av rankingsrelevansen utfördes med mean reciprocal rank och mean average precision @ 10 på ett mindre dataset som bestod endast av kunddata från postorterna Stockholm, Göteborg och Malmö och även på ett större dataset där kunder med postorter från större geografiska områden fick sina hemorter approximerade som Stockholm, Göteborg eller Malmö. LigthGBM-rankningsmodellerna slog den naiva baslinjen i tre av fyra konfigurationer och rankningsmodellen med vädervariabler slog LightGBM baslinjen med 1.1 till 2.2 procent i alla konfigurationer. Resultaten kan potentiellt hjälpa internethandlare inom mode att skapa bättre produktrekommendationssystem. Statistical analysis regression analysis recommender systems ensemble learning electronic commerce LightGBM learning to rank feature selection weather-based features fashion Statistisk analys regressionsanalys rekommendationssystem ensemble-inlärning näthandel LightGBM learning to rank variabelselektion väderbaserade variabler mode Computer and Information Sciences Data- och informationsvetenskap
7	Predicting House Prices on the Countryside using Boosted Decision Trees / Förutseende av huspriser på landsbygden genom boostade beslutsträd Revend, War January 2020 (has links) This thesis intends to evaluate the feasibility of supervised learning models for predicting house prices on the countryside of South Sweden. It is essential for mortgage lenders to have accurate housing valuation algorithms and the current model offered by Booli is not accurate enough when evaluating residence prices on the countryside. Different types of boosted decision trees were implemented to address this issue and their performances were compared to traditional machine learning methods. These different types of supervised learning models were implemented in order to find the best model with regards to relevant evaluation metrics such as root-mean-squared error (RMSE) and mean absolute percentage error (MAPE). The implemented models were ridge regression, lasso regression, random forest, AdaBoost, gradient boosting, CatBoost, XGBoost, and LightGBM. All these models were benchmarked against Booli's current housing valuation algorithms which are based on a k-NN model. The results from this thesis indicated that the LightGBM model is the optimal one as it had the best overall performance with respect to the chosen evaluation metrics. When comparing the LightGBM model to the benchmark, the performance was overall better, the LightGBM model had an RMSE score of 0.330 compared to 0.358 for the Booli model, indicating that there is a potential of using boosted decision trees to improve the predictive accuracy of residence prices on the countryside. / Denna uppsats ämnar utvärdera genomförbarheten hos olika övervakade inlärningsmodeller för att förutse huspriser på landsbygden i Södra Sverige. Det är viktigt för bostadslånsgivare att ha noggranna algoritmer när de värderar bostäder, den nuvarande modellen som Booli erbjuder har dålig precision när det gäller värderingar av bostäder på landsbygden. Olika typer av boostade beslutsträd implementerades för att ta itu med denna fråga och deras prestanda jämfördes med traditionella maskininlärningsmetoder. Dessa olika typer av övervakad inlärningsmodeller implementerades för att hitta den bästa modellen med avseende på relevanta prestationsmått som t.ex. root-mean-squared error (RMSE) och mean absolute percentage error (MAPE). De övervakade inlärningsmodellerna var ridge regression, lasso regression, random forest, AdaBoost, gradient boosting, CatBoost, XGBoost, and LightGBM. Samtliga algoritmers prestanda jämförs med Boolis nuvarande bostadsvärderingsalgoritm, som är baserade på en k-NN modell. Resultatet från denna uppsats visar att LightGBM modellen är den optimala modellen för att värdera husen på landsbygden eftersom den hade den bästa totala prestandan med avseende på de utvalda utvärderingsmetoderna. LightGBM modellen jämfördes med Booli modellen där prestandan av LightGBM modellen var i överlag bättre, där LightGBM modellen hade ett RMSE värde på 0.330 jämfört med Booli modellen som hade ett RMSE värde på 0.358. Vilket indikerar att det finns en potential att använda boostade beslutsträd för att förbättra noggrannheten i förutsägelserna av huspriser på landsbygden. Machine Learning Predicting House Prices Shrinkage Methods Random Forest Decision Tree AdaBoost Gradient Boosting LightGBM CatBoost XGBoost Maskininlärning Förutseende av Huspriser Krympningsmetoder Random Forest Beslutsträd AdaBoost Gradient Boosting LightGBM CatBoost XGBoost Probability Theory and Statistics Sannolikhetsteori och statistik
8	Predicting profitability of new customers using gradient boosting tree models : Evaluating the predictive capabilities of the XGBoost, LightGBM and CatBoost algorithms Kinnander, Mathias January 2020 (has links) In the context of providing credit online to customers in retail shops, the provider must perform risk assessments quickly and often based on scarce historical data. This can be achieved by automating the process with Machine Learning algorithms. Gradient Boosting Tree algorithms have demonstrated to be capable in a wide range of application scenarios. However, they are yet to be implemented for predicting the profitability of new customers based solely on the customers’ first purchases. This study aims to evaluate the predictive performance of the XGBoost, LightGBM, and CatBoost algorithms in this context. The Recall and Precision metrics were used as the basis for assessing the models’ performance. The experiment implemented for this study shows that the model displays similar capabilities while also being biased towards the majority class. Gradient tree boosting XGBoost LightGBM CatBoost prediction profitability online retail Information Systems, Social aspects
9	Predicting Location-Dependent Structural Dynamics Using Machine Learning Zink, Markus January 2022 (has links) Machining chatter is an undesirable phenomenon of material removal processes and hardly to control or avoid. Its occurrence and extent essentially depend onthe kinematic, which alters with the position of the Tool Centre Point, of the machine tool. Research as to chatter was done widely but rarely with respect to changing structural dynamics during manufacturing. This thesis applies intelligent methods to learn the underlying functions of modal parameters – natural frequency, damping ratio, and mode shape – and defines the dynamic properties of a system firstly at this extent. To do so, it embraces three steps: first, the elaboration of the necessary dynamic parameters, second, the acquisition of the data via a simulation,and third, the prediction of the modal parameters with two kinds of Machine Learning techniques: Gradient Boosting Machine and Multilayer Perceptron. In total, it investigates three types of kinematics: cross bed, gantry, and overhead gantry. It becomes apparent that Light Gradient Boosting Machine outperforms Multilayer Perceptron throughout all studies. It achieves a prediction error of at most 1.7 % for natural frequency and damping ratio for all kinematics. However, it cannot really control the prediction of the participation factor yet which might originate in the complexity of the data and the data size. As expected, the error rises with noisy data and less amount of measurement points but at a tenable extent for both natural frequency and damping ratio. / 'Bearbetningsvibrationer är ett oönskat fenomen i materialborttagningsprocesser och är svåra att kontrollera eller undvika. Dess förekomst och omfattning beror i huvudsak på kinematiken, som förändras med positionen för verktygets centrumpunkt på verktygsmaskinen. Det har gjorts mycket forskning om bearbetningsvibrationer, men sällan om förändrad strukturell dynamik under tillverkningen. I denna avhandling tillämpas intelligenta metoder för att lära sig de underliggande funktionerna hos modalparametrar – egenfrekvens, dämpningsgrad och modalform – och definierar systemets dynamiska egenskaper för första gången i denna omfattning. För att göra detta omfattar den tre steg: för det första utarbetandet av de nödvändiga dynamiska parametrarna, för det andra insamling av data via en simulering och för det tredje förutsägelse av modalparametrarna med hjälp av två typer av tekniker för maskininlärning: Gradient Boosting Machine och Multilayer Perceptron. Sammanlagt undersöks tre typer av kinematik: crossbed, gantry och overhead gantry. Det framgår tydligt att Light Gradient Boosting Machine överträffar Multilayer Perceptron i alla studier. Den uppnår ett prediktionsfel på högst 1,7 % för egenfrekvens och dämpningsförhållande för alla kinematiker. Den kan dock ännu inte riktigt kontrollera förutsägelsen av deltagarfaktorn, vilket kan bero på datans komplexitet och datastorlek. Som väntat ökar felet med bullrig data och färre mätpunkter, men i en acceptabel omfattning för både naturfrekvens och dämpningsförhållande. machine learning artificial intelligence gradient boosting LightGBM multilayer perceptron prediction chatter vibration structural dynamics modal parameters machine tool tool centre point work envelope Mechanical Engineering Maskinteknik
10	Utilizing Primary Health Care Data for Early Detection of Colorectal Cancer: A Machine Learning Approach / Användning av primärvårdsdata för tidig upptäckt av kolorektalcancer: Ett maskininlärningsperspektiv Eivinsson, Tova January 2024 (has links) Colorectal cancer (CRC) is a health challenge worldwide and early detection of the disease is crucial to improve patient prognosis. It is common for the first contact with care to occur in primary care centers where general practitioners often face the challenge of distinguishing CRC from other diseases with similar symptoms. In this master thesis, patient records from primary care were used to create, optimize, and evaluate a machine learning model that classifies patients with CRC for early detection of the disease. The data used in the project included parts of electronic health records (EHRs) from both public (SLSO) and privately run (Capio and Praktikertjänst) primary care centers in the Stockholm region. The available dataset was cleaned and pre- processed, and then tested on four separate models. After selecting and optimizing the most promising model, LightGBM, a detailed evaluation of the model was performed. To simulate realistic clinical conditions, data from the three months prior to diagnosis were excluded from two of the datasets. The results were then compared with a baseline machine learning model that utilized ICD codes extracted from EHRs in primary care for early detection of CRC.The results showed that the final developed model had a generally good performance with an AUROC score of a maximum of 85.8%, which indicates very good ability to distinguish between the classes. The performance dropped when using the datasets with 3 months of data removed, but the ROC curves still showed a better ability than random classification to distinguish between the classes with a AUROC score of maximum 60,8%. The results also showed that the model developed in this master thesis outperforms the baseline model, which was based on ICD codes, from a performance perspective. For future development and before a possible clinical implementation, a larger data set should be used for training and testing. / Tjock- och ändtarmscancer, kolorektal cancer (KRC) är en hälsoutmaning över hela världen och tidig upptäckt av sjukdomen är avgörande för att förbättra patientens prognos. Det är vanligt att den första kontakten med vården inträffar på vårdcentraler där allmänläkare ofta står inför utmaningen att skilja KRC från andra sjukdomar med liknande symtom. I denna masteruppsats kommer patientjournaler från primärvården att användas för att skapa, optimera och utvärdera en maskininlärningsmodell som klassificerar patienter med KRC för tidig upptäckt av sjukdomen.De data som använts i projektet omfattade delar av elektroniska patientjournaler (EHR) från både offentliga (SLSO) och privatägda (Capio och Praktikertjänst) primärvårdscentraler i Stockholmsregionen. Den tillgängliga datamängden städades och förbehandlades, och testades sedan på fyra separata modeller. Efter att ha valt ut och optimerat den mest lovande modellen, LightGBM, utfördes en detaljerad utvärdering av modellen. För att simulera realistiska kliniska tillstånd utvärderades modellen på två datamängder där data från tre månader före diagnos uteslöts. Resultaten jämfördes sedan med en baslinjemodell som använde ICD-koder som hämtats från journalsystem inom primärvården för tidig upptäckt av CRC.Resultaten visade att den slutliga utvecklade modellen hade en generellt bra prestanda med en AUROC-poäng på högst 85,8%, vilket indikerar mycket god förmåga att skilja mellan klasserna. Prestandan sjönk vid användning av datasatserna med 3 månaders data borttagen, men ROC-kurvorna visade fortfarande en bättre förmåga än slumpmässig klassificering att skilja mellan klasserna med en AUROC-poäng på högst 60,8%. Resultaten visade också att den modell som utvecklats i denna masteruppsats överträffar baslinjemodellen, som baserades på ICD-koder, ur ett prestationsperspektiv. För framtida utveckling och före en eventuell klinisk implementation bör en större datamängd användas för träning och testning av modellen. Machine Learning Artificial Intelligence LightGBM Ensemble Learning Colorectal Cancer Primary Health Care Electronic Health Records Cancer Prediction Maskininlärning Artificiell Intelligens LightGBM Ensembleinlärning Kolorektal cancer Primärvård Elektroniska Patientjournaler Cancerförutsägelse Medical Engineering Medicinteknik Other Computer and Information Science Annan data- och informationsvetenskap

Search results