Spelling suggestions: "subject:"[een] BOOSTING"" "subject:"[enn] BOOSTING""
151 |
Handling Imbalanced Data Classification With Variational Autoencoding And Random Under-Sampling BoostingLudvigsen, Jesper January 2020 (has links)
In this thesis, a comparison of three different pre-processing methods for imbalanced classification data, is conducted. Variational Autoencoder, Random Under-Sampling Boosting and a hybrid approach of the two, are applied to three imbalanced classification data sets with different class imbalances. A logistic regression (LR) model is fitted to each pre-processed data set and based on its classification performance, the pre-processing methods are evaluated. All three methods shows indications of different advantages when handling class imbalances. For each pre-processed data, the LR-model has is better at correctly classifying minority class observations, compared to a LR-model fitted to the original class imbalanced data sets. Evaluating the overall classification performance, both VAE and RUSBoost shows improving classification results while the hybrid method performs worse for the moderate class imbalanced data and best for the highly imbalanced data.
|
152 |
Forecasting anomalies in time series data from online production environmentsSseguya, Raymond January 2020 (has links)
Anomaly detection on time series forecasts can be used by many industries in especially forewarning systems that can predict anomalies before they happen. Infor (Sweden) AB is software company that provides Enterprise Resource Planning cloud solutions. Infor is interested in predicting anomalies in their data and that is the motivation for this thesis work. The general idea is firstly to forecast the time series and then secondly detect and classify anomalies on the forecast. The first part is time series forecasting and the second part is anomaly detection and classification done on the forecasted values. In this thesis work, the time series forecasting to predict anomalous behaviour is done using two strategies namely the recursive strategy and the direct strategy. The recursive strategy includes two methods; AutoRegressive Integrated Moving Average and Neural Network AutoRegression. The direct strategy is done with ForecastML-eXtreme Gradient Boosting. Then the three methods are compared concerning performance of forecasting. The anomaly detection and classification is done by setting a decision rule based on a threshold. In this thesis work, since the true anomaly thresholds were not previously known, an arbitrary initial anomaly threshold is set by using a combination of statistical methods for outlier detection and then human judgement by the company commissioners. These statistical methods include Seasonal and Trend decomposition using Loess + InterQuartile Range, Twitter + InterQuartile Range and Twitter + GESD (Generalized Extreme Studentized Deviate). After defining what an anomaly threshold is in the usage context of Infor (Sweden) AB, then a decision rule is set and used to classify anomalies in time series forecasts. The results from comparing the classifications of the forecasts from the three time series forecasting methods are unfortunate and no recommendation is made concerning what model or algorithm to be used by Infor (Sweden) AB. However, the thesis work concludes by recommending other methods that can be tried in future research.
|
153 |
Prediction of Credit Risk using Machine Learning ModelsIsaac, Philip January 2022 (has links)
This thesis aims to investigate different machine learning (ML) models and their performance to find the best performing model to predict credit risk at a specific company. Since granting credit to corporate customers is a part of this company's core business, managing the credit risk is of high importance. The company has of today only one credit risk measurement, which is obtained through an external company, and the goal is to find a model that outperforms this measurement. The study consists of two ML models, Logistic Regression (LR) and eXtreme Gradient Boosting. This thesis proves that both methods perform better than the external risk measurement and the LR method achieves the overall best performance. One of the most important analyses done in this thesis was handling the dataset and finding the best-suited combination of features that the ML models should use.
|
154 |
Método de estabilidad para el dimensionamiento de tajeos obtenido mediante el algoritmo Gradient Boosting Machine considerando la incorporación de los esfuerzos activos en minería subterránea / Stability method for the dimensioning of stopes obtained through the gradient boosting machine algorithm considering the incorporation of active stresses in underground miningCamacho Cosio, Hernán 23 May 2020 (has links)
En las últimas cuatro décadas, el método gráfico de estabilidad de Mathews ha constituido el abanico de herramientas indispensables para el dimensionamiento de tajeos; caracterizándose por su eficiencia en costos, ahorro de tiempo y esfuerzo. Asimismo, el aporte de diversos autores por optimizar su rendimiento ha permitido desplegar una serie de criterios que han permitido abordar cada vez más escenarios. No obstante, con la diversificación de la minería en diferentes contextos geológicos y la necesidad trabajar a profundidades más altas se ha mostrado que el método gráfico de estabilidad ha desestimado escenarios con presencia de agua y distintos regímenes de confinamiento. Es por este motivo, que la presente investigación busca incorporar dichos escenarios por medio del algoritmo Gradient Boosting Machine. Para dicho fin, se simuló escenarios con diversos niveles de presión de agua y se consideró el grado de confinamiento alrededor de las excavaciones. El modelo generado se basó en el criterio de la clasificación binaria, siento las clases predichas, “estable” e “inestable”; con lo que se obtuvo un valor AUC de 0.88, lo que demostró una excelente capacidad predictiva del modelo GBM. Asimismo, se demostró las ventajas frente al método tradicional, puesto que se añade una componente de rigurosidad y de generalización. Finalmente, se evidencia el logro de un método de estabilidad que incorpora los esfuerzos activos y que ostenta un adecuado rendimiento predictivo. / In the last four decades, the Mathews' graphical stability method has constituted the range of indispensable tools for the dimensioning of stopes; characterized by its cost efficiency, time and effort savings. Likewise, the contribution of several authors to optimize its performance has made it possible to deploy a series of criteria that have made it possible to address more and more scenarios. However, with the diversification of mining in different geological contexts and the need to work at higher depths, it has been shown that the graphical stability method has neglected scenarios with the presence of water and different confinement regimes. For this reason, the present research sought to incorporate such scenarios by means of the Gradient Boosting Machine algorithm. For this purpose, scenarios with different levels of water pressure were simulated and the degree of confinement around the excavations was considered. The model generated was based on the binary classification criterion, feeling the predicted classes, "stable" and "unstable"; with which an AUC value of 0.88 was obtained, which demonstrated an excellent predictive capacity of the GBM model. Likewise, the advantages over the traditional method were demonstrated since a component of rigor and generalization is added. Finally, the achievement of a stability method that incorporates the active stresses and has an adequate predictive performance is evidenced. / Trabajo de investigación
|
155 |
Road-traffic accident prediction model : Predicting the Number of CasualtiesAndeta, Jemal Ahmed January 2021 (has links)
Efficient and effective road traffic prediction and management techniques are crucial in intelligent transportation systems. It can positively influence road advancement, safety enhancement, regulation formulation, and route planning to save living things in advance from road traffic accidents. This thesis considers road safety by predicting the number of casualties if an accident occurs using multiple traffic accident attributes. It helps individuals (drivers) or traffic offices to adjust and control their contributions for the occurrence of an accident before emerging it. Three candidate algorithms from different regression fit patterns are proposed and evaluated to conduct the thesis: the bagging, linear, and non-linear fitting patterns. The gradient boosting machines (GBoost) from the bagging, Linearsupport vector regression (LinearSVR) from the linear, and extreme learning machines (ELM) also from the non-linear side are the selected algorithms. RMSE and MAE performance evaluation metrics are applied to evaluate the models. The GBoost achieved a better performance than the other two with a low error rate and minimum prediction interval value for 95% prediction interval. A SHAP (SHapley Additive exPlanations) interpretation technique is applied to interpret each model at the global interpretation level using SHAP’s beeswarm plots. Finally, suggestions for future improvements are presented via the dataset and hyperparameter tuning.
|
156 |
Variable selection in discrete survival modelsMabvuu, Coster 27 February 2020 (has links)
MSc (Statistics) / Department of Statistics / Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso. / NRF
|
157 |
Maskininlärning med konform förutsägelse för prediktiva underhållsuppgifter i industri 4.0 / Machine Learning with Conformal Prediction for Predictive Maintenance tasks in Industry 4.0 : Data-driven ApproachLiu, Shuzhou, Mulahuko, Mpova January 2023 (has links)
This thesis is a cooperation with Knowit, Östrand \& Hansen, and Orkla. It aimed to explore the application of Machine Learning and Deep Learning models with Conformal Prediction for a predictive maintenance situation at Orkla. Predictive maintenance is essential in numerous industrial manufacturing scenarios. It can help to reduce machine downtime, improve equipment reliability, and save unnecessary costs. In this thesis, various Machine Learning and Deep Learning models, including Decision Tree, Random Forest, Support Vector Regression, Gradient Boosting, and Long short-term memory, are applied to a real-world predictive maintenance dataset. The Orkla dataset was originally planned to use in this thesis project. However, due to some challenges met and time limitations, one NASA C-MAPSS dataset with a similar data structure was chosen to study how Machine Learning models could be applied to predict the remaining useful lifetime (RUL) in manufacturing. Besides, conformal prediction, a recently developed framework to measure the prediction uncertainty of Machine Learning models, is also integrated into the models for more reliable RUL prediction. The thesis project results show that both the Machine Learning and Deep Learning models with conformal prediction could predict RUL closer to the true RUL while LSTM outperforms the Machine Learning models. Also, the conformal prediction intervals provide informative and reliable information about the uncertainty of the predictions, which can help inform personnel at factories in advance to take necessary maintenance actions. Overall, this thesis demonstrates the effectiveness of utilizing machine learning and Deep Learning models with Conformal Prediction for predictive maintenance situations. Moreover, based on the modeling results of the NASA dataset, some insights are discussed on how to transfer these experiences into Orkla data for RUL prediction in the future.
|
158 |
Model Risk Management and Ensemble Methods in Credit Risk ModelingSexton, Sean January 2022 (has links)
The number of statistical and mathematical credit risk models that financial institutions use and manage due to international and domestic regulatory pressures in recent years has steadily increased. This thesis examines the evolution of model risk management and provides some guidance on how to effectively build and manage different bagging and boosting machine learning techniques for estimating expected credit losses. It examines the pros and cons of these machine learning models and benchmarks them against more conventional models used in practice. It also examines methods for improving their interpretability in order to gain comfort and acceptance from auditors and regulators. To the best of this author’s knowledge, there are no academic publications which review, compare, and provide effective model risk management guidance on these machine learning techniques with the purpose of estimating expected credit losses. This thesis is intended for academics, practitioners, auditors, and regulators working in the model risk management and expected credit loss forecasting space. / Dissertation / Doctor of Philosophy (PhD)
|
159 |
A Cloud-Based Intelligent and Energy Efficient Malware Detection Framework. A Framework for Cloud-Based, Energy Efficient, and Reliable Malware Detection in Real-Time Based on Training SVM, Decision Tree, and Boosting using Specified Heuristics Anomalies of Portable Executable FilesMirza, Qublai K.A. January 2017 (has links)
The continuity in the financial and other related losses due to cyber-attacks prove the substantial growth of malware and their lethal proliferation techniques. Every successful malware attack highlights the weaknesses in the defence mechanisms responsible for securing the targeted computer or a network. The recent cyber-attacks reveal the presence of sophistication and intelligence in malware behaviour having the ability to conceal their code and operate within the system autonomously. The conventional detection mechanisms not only possess the scarcity in malware detection capabilities, they consume a large amount of resources while scanning for malicious entities in the system. Many recent reports have highlighted this issue along with the challenges faced by the alternate solutions and studies conducted in the same area. There is an unprecedented need of a resilient and autonomous solution that takes proactive approach against modern malware with stealth behaviour. This thesis proposes a multi-aspect solution comprising of an intelligent malware detection framework and an energy efficient hosting model. The malware detection framework is a combination of conventional and novel malware detection techniques. The proposed framework incorporates comprehensive feature heuristics of files generated by a bespoke static feature extraction tool. These comprehensive heuristics are used to train the machine learning algorithms; Support Vector Machine, Decision Tree, and Boosting to differentiate between clean and malicious files. Both these techniques; feature heuristics and machine learning are combined to form a two-factor detection mechanism. This thesis also presents a cloud-based energy efficient and scalable hosting model, which combines multiple infrastructure components of Amazon Web Services to host the malware detection framework. This hosting model presents a client-server architecture, where client is a lightweight service running on the host machine and server is based on the cloud. The proposed framework and the hosting model were evaluated individually and combined by specifically designed experiments using separate repositories of clean and malicious files. The experiments were designed to evaluate the malware detection capabilities and energy efficiency while operating within a system. The proposed malware detection framework and the hosting model showed significant improvement in malware detection while consuming quite low CPU resources during the operation.
|
160 |
Predicting Location-Dependent Structural Dynamics Using Machine LearningZink, Markus January 2022 (has links)
Machining chatter is an undesirable phenomenon of material removal processes and hardly to control or avoid. Its occurrence and extent essentially depend onthe kinematic, which alters with the position of the Tool Centre Point, of the machine tool. Research as to chatter was done widely but rarely with respect to changing structural dynamics during manufacturing. This thesis applies intelligent methods to learn the underlying functions of modal parameters – natural frequency, damping ratio, and mode shape – and defines the dynamic properties of a system firstly at this extent. To do so, it embraces three steps: first, the elaboration of the necessary dynamic parameters, second, the acquisition of the data via a simulation,and third, the prediction of the modal parameters with two kinds of Machine Learning techniques: Gradient Boosting Machine and Multilayer Perceptron. In total, it investigates three types of kinematics: cross bed, gantry, and overhead gantry. It becomes apparent that Light Gradient Boosting Machine outperforms Multilayer Perceptron throughout all studies. It achieves a prediction error of at most 1.7 % for natural frequency and damping ratio for all kinematics. However, it cannot really control the prediction of the participation factor yet which might originate in the complexity of the data and the data size. As expected, the error rises with noisy data and less amount of measurement points but at a tenable extent for both natural frequency and damping ratio. / 'Bearbetningsvibrationer är ett oönskat fenomen i materialborttagningsprocesser och är svåra att kontrollera eller undvika. Dess förekomst och omfattning beror i huvudsak på kinematiken, som förändras med positionen för verktygets centrumpunkt på verktygsmaskinen. Det har gjorts mycket forskning om bearbetningsvibrationer, men sällan om förändrad strukturell dynamik under tillverkningen. I denna avhandling tillämpas intelligenta metoder för att lära sig de underliggande funktionerna hos modalparametrar – egenfrekvens, dämpningsgrad och modalform – och definierar systemets dynamiska egenskaper för första gången i denna omfattning. För att göra detta omfattar den tre steg: för det första utarbetandet av de nödvändiga dynamiska parametrarna, för det andra insamling av data via en simulering och för det tredje förutsägelse av modalparametrarna med hjälp av två typer av tekniker för maskininlärning: Gradient Boosting Machine och Multilayer Perceptron. Sammanlagt undersöks tre typer av kinematik: crossbed, gantry och overhead gantry. Det framgår tydligt att Light Gradient Boosting Machine överträffar Multilayer Perceptron i alla studier. Den uppnår ett prediktionsfel på högst 1,7 % för egenfrekvens och dämpningsförhållande för alla kinematiker. Den kan dock ännu inte riktigt kontrollera förutsägelsen av deltagarfaktorn, vilket kan bero på datans komplexitet och datastorlek. Som väntat ökar felet med bullrig data och färre mätpunkter, men i en acceptabel omfattning för både naturfrekvens och dämpningsförhållande.
|
Page generated in 0.0555 seconds