Global ETD Search

21	Extracting Rules from Trained Machine Learning Models with Applications in Bioinformatics / 機械学習モデルからの知識抽出と生命情報学への応用 Liu, Pengyu 24 May 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23397号 / 情博第766号 / 新制\|\|情\|\|131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授阿久津達也, 教授山本章博, 教授鹿島久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Machine learning Neural networks Boolean functions Rule extraction Dynamic programming Dicer cleavage site Gradient boosting machine 007
22	Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based Methods Al-Mter, Yusur January 2020 (has links) Heart rate variability (HRV) is the time variation between adjacent heartbeats. This variation is regulated by the autonomic nervous system (ANS) and its two branches, the sympathetic and parasympathetic nervous system. HRV is considered as an essential clinical tool to estimate the imbalance between the two branches, hence as an indicator of age and cardiac-related events.This thesis focuses on the ECG recordings during nocturnal rest to estimate the influence of HRV in predicting the age decade of healthy individuals. Time and frequency domains, as well as non-linear methods, are explored to extract the HRV features. Three feature-based methods (support vector machine (SVM), random forest, and extreme gradient boosting (XGBoost)) were employed, and the overall test accuracy achieved in capturing the actual class was relatively low (lower than 30%). SVM classifier had the lowest performance, while random forests and XGBoost performed slightly better. Although the difference is negligible, the random forest had the highest test accuracy, approximately 29%, using a subset of ten optimal HRV features. Furthermore, to validate the findings, the original dataset was shuffled and used as a test set and compared the performance to other related research outputs. Supervised learning Classification Ensemble Support Vector Machines Heart Rate Variability Extreme Gradient Boosting Random Forest Computer and Information Sciences Data- och informationsvetenskap
23	Predicting Risk of Delays in Postal Deliveries with Neural Networks and Gradient Boosting Machines / Predicering av risk för förseningar av leveranser med neurala nätverk och gradient boosting machines Söderholm, Matilda January 2020 (has links) This thesis conducts a study on a data set from the Swedish and Danish postal service Postnord, comparing an artificial neural network (ANN) and a gradient boosting machine (GBM) for predicting delays in package deliveries. The models are evaluated based on F1-score for the important class which represents the data points that are delayed and needed to be identified. The GBM is already implemented and tuned using grid search by Postnord, the ANN is tuned using sequential model based optimization with the tree Parzen estimator function. Furthermore, it is trained using dynamic resampling to handle the imbalanced data set. Even with several measures implemented to handle the class imbalance, the ANN performs poorly when tested on unseen data, unlike the GBM. The GBM has high precision (84%) and decent recall (24%), which produces a F1-score of 0.38. The ANN has high recall (62%) but extremely low precision (5%) which gives a F1-score of 0.08, indicating that it is biased to predict sample as delayed when it is in time. The GBM has a natural handling of class imbalance unlike the ANN, and even with measures taken to improve the ANN and its handling of class imbalance, GBM performs better. Machine Learning Artificial Neural Networks Gradient Boosting Delivery prediction Delay prediction Postal Deliveries Computer and Information Sciences Data- och informationsvetenskap
24	Investigating the Impact of Air Pollution, Meteorology, and Human Mobility on Excess Deaths during COVID-19 in Quito : A Correlation, Regression, Machine Learning, and Granger Causality Analysis Tariq, Waleed, Naqvi, Sehrish January 2023 (has links) Air pollution and meteorological conditions impact COVID-19 mortality rates. This research studied Quito, Ecuador, using Granger causality tests and regression models to investigate the relationship between pollutants, meteorological variables, human mobility, and excess deaths. Results suggested that Mobility as defined by Google Mobility Index, Facebook Isolation Index, in addition to Nitrogen Dioxide, and Sulphur Dioxide significantly impact excess deaths, while Carbon Monoxide and Relative Humidity have mixed results. Measures to reduce Carbon Monoxide emissions and increase humidity levels may mitigate the impact of air pollution on COVID-19 mortality rates. Further research is needed to investigate the impact of pollutants on COVID-19 transmission in other locations. Healthcare decision-makers must monitor and mitigate the impact of pollutants, promote healthy air quality policies, and encourage physical activity in safe environments. They must also consider meteorological conditions and implement measures such as increased ventilation and air conditioning to reduce exposure. Additionally, they must consider human mobility and reduce it to slow the spread of the diseases. Decisionmakers must monitor and track excess deaths during the pandemic to understand the impact of pollutants, meteorological conditions, and human mobility on human health. Public education is critical to raising awareness of air quality and its impact on health. Encouraging individuals to reduce their exposure to pollutants and meteorological conditions can play a critical role in mitigating the impact of air pollution on respiratory health during the pandemic. Granger Causality Gradient Boosting Machine Lasso Regression Excess Deaths Correlation Analysis Stationarity Computer and Information Sciences Data- och informationsvetenskap Health Sciences Hälsovetenskaper
25	A Framework for Defining, Measuring, and Predicting Service Procurement Savings Berggren, Oliver, Matti, Zina January 2021 (has links) Recent technical advances have paved the way for transformations such as Industry 4.0, Supply Chain 4.0, and new ways for organizations to utilize services to meet the needs of people. In the midst of this shift, a focus has been put on service procurement to meet the demand of everything from cloud computing and information technology to software solutions that support operations or add value to the end customer. Procurement is an integral part of organizations and typically accounts for a substantial part of their costs. Analyzing savings is one of the primary ways of measuring cost reduction and performance. This paper examines how savings can be defined and measured in a unifying way, and determine if machine learning can be used to predict service purchase costs. Semi-structured interviews were utilized to find definitions and measurements. Three decision-tree ensemble machine learning models, XGBoost, LightGBM, and CatBoost were evaluated to study cost prediction. The result indicates that cost reduction and cost avoidance should be seen as a financial, and a performance measure, respectively. Spend and capital binding can be controlled by a budget reallocation system and could be improved further with machine learning cost prediction. The best performing model was XGBoost with a MAPE of 14.17%, compared to the base model’s MAPE of 40.24%. This suggests that budget setting and negotiation can be aided by more accurately predicting cost through machine learning, and in turn have a positive impact on an organization’s resource allocation and profitability. / Nya teknologiska framsteg har gett upphov till transformationer som Industri 4.0, Supply Chain 4.0 och nya satt för organisationer att använda tjänster för att möta människors behov. Från denna förändring har fokus hamna på tjänsteupphandling för att möta efterfrågan på allt från molntjänster och informationsteknologi till mjukvarulösningar som stödjer operationer eller skapar värde för slutkunder. Upphandling ar en väsentlig del av organisationer och utgör oftast en stor del av deras kostnader. Att mata besparingar är ett av de primära sätten att driva kostnadsreducering och prestanda. Detta arbete utforskar hur besparingar kan definieras och matas på ett förenande sätt och undersöker om maskininlärning kan användas för att predicera tjänsteinköpskostnader. Semistrukturerade intervjuer hölls för att hitta definitioner och mått. Tre maskininlärningsmodeller, XGBoost, LightGMB och CatBoost utvärderades för att studera kostnadsprediktion. XGBoost presterade bäst med MAPE 14,17%, jämfört med basmodellens MAPE på 40,24%. Detta tyder på att budgetsättning och förhandling kan stödjas av maskininlärning genom att mer precist predicera kostnader, som i sin tur kan ha en positiv påverkan på en organisations resursallokering och lönsamhet. Procurement Service Procurement Management Accounting Savings Cost Regression Machine Learning Gradient Boosting Other Computer and Information Science Annan data- och informationsvetenskap
26	Técnicas de machine learning aplicadas na recuperação de crédito do mercado brasileiro Forti, Melissa 08 August 2018 (has links) Submitted by Melissa Forti (melissaforti@gmail.com) on 2018-09-03T12:07:02Z No. of bitstreams: 1 Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5) / Approved for entry into archive by Joana Martorini (joana.martorini@fgv.br) on 2018-09-03T17:14:01Z (GMT) No. of bitstreams: 1 Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5) / Approved for entry into archive by Suzane Guimarães (suzane.guimaraes@fgv.br) on 2018-09-04T13:30:27Z (GMT) No. of bitstreams: 1 Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5) / Made available in DSpace on 2018-09-04T13:30:28Z (GMT). No. of bitstreams: 1 Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5) Previous issue date: 2018-08-08 / A necessidade de conhecer o cliente sempre foi um diferencial para o mercado e nestes últimos anos vivenciamos um crescimento exponencial de informações e técnicas que promovem a avaliação para todas as fases do ciclo de crédito, desde a prospecção até a recuperação de dívidas. Nesse contexto, as empresas estão investindo cada vez mais em métodos de Machine Learning para que possam extrair o máximo de informações e assim terem processos mais assertivos e rentáveis. No entanto, essas técnicas possuem ainda alguma desconfiança no ambiente financeiro. Diante desse contexto, o objetivo desse trabalho foi aplicar as técnicas de Machine Learning: Random Forest, Support Vector Machine e Gradient Boosting para um banco de dados real de cobrança, a fim de identificar os clientes mais propensos a quitar suas dívidas (Collection Score) e comparar a acurácia e interpretação desses modelos com a metodologia tradicional de Regressão Logística. A principal contribuição desse trabalho está relacionada com a comparação das técnicas em um cenário de recuperação de crédito considerando as principais características, vantagens e desvantagens. / The need to know the customer has always been a differential for the market, and in currently years we have experienced an exponential growth of information and techniques that promote this evaluation for all phases of the credit cycle, from prospecting to debt recovery. In this context, companies are increasingly investing in Machine Learning methods, so that they can extract the maximum information and thus have more assertive and profitable processes. However, these models still have a lot of distrust in the financial environment. Given this need and uncertainty, the objective of this work was to apply the Machine Learning techniques: Random Forest, Support Vector Machine and Gradient Boosting to a real collection database in order to identify the recover clients (Collection Score) and to compare the accuracy and interpretation of these models with the classical logistic regression methodology. The main contribution of this work is related to the comparison of the techniques and if they are suitable for this application, considering its main characteristics, pros and cons. Modelos de cobrança Ciclo de crédito Recuperação de dívidas Regressão logística Collection score Credit cycle Logistic regression Gradient boosting Random forest Economia Administração de crédito Cobrança de contas Créditos - Modelos matemáticos
27	Strategies for Combining Tree-Based Ensemble Models Zhang, Yi 01 January 2017 (has links) Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
28	Forecasting anomalies in time series data from online production environments Sseguya, Raymond January 2020 (has links) Anomaly detection on time series forecasts can be used by many industries in especially forewarning systems that can predict anomalies before they happen. Infor (Sweden) AB is software company that provides Enterprise Resource Planning cloud solutions. Infor is interested in predicting anomalies in their data and that is the motivation for this thesis work. The general idea is firstly to forecast the time series and then secondly detect and classify anomalies on the forecast. The first part is time series forecasting and the second part is anomaly detection and classification done on the forecasted values. In this thesis work, the time series forecasting to predict anomalous behaviour is done using two strategies namely the recursive strategy and the direct strategy. The recursive strategy includes two methods; AutoRegressive Integrated Moving Average and Neural Network AutoRegression. The direct strategy is done with ForecastML-eXtreme Gradient Boosting. Then the three methods are compared concerning performance of forecasting. The anomaly detection and classification is done by setting a decision rule based on a threshold. In this thesis work, since the true anomaly thresholds were not previously known, an arbitrary initial anomaly threshold is set by using a combination of statistical methods for outlier detection and then human judgement by the company commissioners. These statistical methods include Seasonal and Trend decomposition using Loess + InterQuartile Range, Twitter + InterQuartile Range and Twitter + GESD (Generalized Extreme Studentized Deviate). After defining what an anomaly threshold is in the usage context of Infor (Sweden) AB, then a decision rule is set and used to classify anomalies in time series forecasts. The results from comparing the classifications of the forecasts from the three time series forecasting methods are unfortunate and no recommendation is made concerning what model or algorithm to be used by Infor (Sweden) AB. However, the thesis work concludes by recommending other methods that can be tried in future research. Infor (Sweden) AB time series forecasting anomaly detection ARIMA neural network autoregression eXtreme Gradient Boosting package Computer and Information Sciences Data- och informationsvetenskap Probability Theory and Statistics Sannolikhetsteori och statistik
29	Prediction of Credit Risk using Machine Learning Models Isaac, Philip January 2022 (has links) This thesis aims to investigate different machine learning (ML) models and their performance to find the best performing model to predict credit risk at a specific company. Since granting credit to corporate customers is a part of this company's core business, managing the credit risk is of high importance. The company has of today only one credit risk measurement, which is obtained through an external company, and the goal is to find a model that outperforms this measurement. The study consists of two ML models, Logistic Regression (LR) and eXtreme Gradient Boosting. This thesis proves that both methods perform better than the external risk measurement and the LR method achieves the overall best performance. One of the most important analyses done in this thesis was handling the dataset and finding the best-suited combination of features that the ML models should use. Credit Risk Credit Risk Scorecard Machine Learning Artificial Intelligence AI Logistic Regression eXtreme Gradient Boosting ROC-AUC Binning Cross-Validation Correlation Computer Sciences Datavetenskap (datalogi)
30	Método de estabilidad para el dimensionamiento de tajeos obtenido mediante el algoritmo Gradient Boosting Machine considerando la incorporación de los esfuerzos activos en minería subterránea / Stability method for the dimensioning of stopes obtained through the gradient boosting machine algorithm considering the incorporation of active stresses in underground mining Camacho Cosio, Hernán 23 May 2020 (has links) En las últimas cuatro décadas, el método gráfico de estabilidad de Mathews ha constituido el abanico de herramientas indispensables para el dimensionamiento de tajeos; caracterizándose por su eficiencia en costos, ahorro de tiempo y esfuerzo. Asimismo, el aporte de diversos autores por optimizar su rendimiento ha permitido desplegar una serie de criterios que han permitido abordar cada vez más escenarios. No obstante, con la diversificación de la minería en diferentes contextos geológicos y la necesidad trabajar a profundidades más altas se ha mostrado que el método gráfico de estabilidad ha desestimado escenarios con presencia de agua y distintos regímenes de confinamiento. Es por este motivo, que la presente investigación busca incorporar dichos escenarios por medio del algoritmo Gradient Boosting Machine. Para dicho fin, se simuló escenarios con diversos niveles de presión de agua y se consideró el grado de confinamiento alrededor de las excavaciones. El modelo generado se basó en el criterio de la clasificación binaria, siento las clases predichas, “estable” e “inestable”; con lo que se obtuvo un valor AUC de 0.88, lo que demostró una excelente capacidad predictiva del modelo GBM. Asimismo, se demostró las ventajas frente al método tradicional, puesto que se añade una componente de rigurosidad y de generalización. Finalmente, se evidencia el logro de un método de estabilidad que incorpora los esfuerzos activos y que ostenta un adecuado rendimiento predictivo. / In the last four decades, the Mathews' graphical stability method has constituted the range of indispensable tools for the dimensioning of stopes; characterized by its cost efficiency, time and effort savings. Likewise, the contribution of several authors to optimize its performance has made it possible to deploy a series of criteria that have made it possible to address more and more scenarios. However, with the diversification of mining in different geological contexts and the need to work at higher depths, it has been shown that the graphical stability method has neglected scenarios with the presence of water and different confinement regimes. For this reason, the present research sought to incorporate such scenarios by means of the Gradient Boosting Machine algorithm. For this purpose, scenarios with different levels of water pressure were simulated and the degree of confinement around the excavations was considered. The model generated was based on the binary classification criterion, feeling the predicted classes, "stable" and "unstable"; with which an AUC value of 0.88 was obtained, which demonstrated an excellent predictive capacity of the GBM model. Likewise, the advantages over the traditional method were demonstrated since a component of rigor and generalization is added. Finally, the achievement of a stability method that incorporates the active stresses and has an adequate predictive performance is evidenced. / Trabajo de investigación Método gráfico de estabilidad Gradient Boosting Machine Esfuerzos activos Graphical stability method Active efforts

Search results