• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 70
  • 5
  • 4
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 88
  • 65
  • 62
  • 32
  • 28
  • 28
  • 27
  • 25
  • 25
  • 24
  • 21
  • 17
  • 16
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Machine learning for detecting financial crime from transactional behaviour

Englund, Markus January 2023 (has links)
Banks and other financial institutions are to a certain extent obligated to ensure that their services are not utilized for any type of financial crime. This thesis investigates the possibility of analyzing bank customers' transactional behaviour with machine learning to detect if they are involved in financial crime. The purpose of this is to see if a new approach to processing and analyzing transaction data could make financial crime detection more accurate and efficient. Transactions of a customer over a time period are processed to form multivariate time series. These time series are then used as input to different machine learning models for time series classification. The best method involves a transform called Random Convolutional Kernel Transform that extracts features from the time series. These features are then used as input to a logistic regression model that generates probabilities of the different class labels. This method achieves a ROC AUC-score of 0.856 when classifying customers as being involved in financial crime or not. The results indicate that the time series models detect patterns in transaction data that connect customers to financial crime which previously investigated methods have not been able to find.
12

Dynamic Control, Modeling and Sizing of Hybrid Power Plants : Investigating the optimum usage of energy storage for Fortum’s hydropower / Dynamisk reglering, modellering och dimensionering av hybridkraftverk : Utredning av optimal användning av energilagring för Fortums vattenkraft

Lindgren, Klas January 2023 (has links)
The rapidly evolving Nordic Power System demands enhanced flexibility and robustness in electricity production. The traditional role of hydropower plants in regulating the grid frequency has been challenged by new criteria for dynamic stability, which some units struggle to meet due to their relatively poor dynamic performance. This study addresses this challenge by investigating the potential of integrating optimal energy storage systems with hydropower plants. This study aimed to develop a tool that could streamline the process of converting a traditional hydropower plant into a hybrid unit using an optimal energy storage system. The problem is complex and requires an innovative approach that combines electrical engineering expertise with cutting-edge machine-learning algorithms. A comprehensive hydropower plant model, including governor control and mechanical and hydraulic subsystems, was developed and integrated with an energy storage system model to form a hybrid unit. This model was validated using real power plant data. Three distinct XGBoost Regressor models were trained using data samples generated from the optimized hybrid unit. These models aim to predict power and energy requirements for an optimal energy storage solution, including an estimation of wear and tear reduction. The XGBoost Power Regressor achieved a prediction accuracy of 92 % and the XGBoost Energy Regressor demonstrated a 95 % accuracy. The XGBoost Movement Regressor, indicating wear and tear, boasted an accuracy greater than 99 %. The integration of energy storage systems can significantly mitigate wear and tear on a hydropower plant, with reductions of up to 85 % or more. The results indicate that integrating energy storage systems with hydropower units can substantially enhance the dynamic performance, reduce wear and tear and enable the plants to meet the demanding requirements of providing frequency regulation services in the Nordic Power System. The findings of this study culminate in a robust and user-friendly tool capable of accurately estimating optimal energy storage requirements for any hydropower plant tasked with meeting frequency regulation service demands. / Det nordiska kraftsystemet är under snabb förändring och skiftar alltmera till elproduktion med krav på ökad flexibilitet och tillförlitlighet. Vattenkraftverkens traditionella roll som källa till reglering och stabilisering av nätfrekvensen, utmanas nu av nya krav på dynamisk prestanda och stabilitet. På grund av sina relativt dåliga prestanda har vissa vattenkraftverk svårigheter att uppfylla dessa nya krav. Detta examensarbete behandlar denna utmaning genom att undersöka möjligheterna att integrera optimala energilagringssystem med vattenkraftverk. Syftet med arbetet var att utveckla ett verktyg som skulle kunna effektivisera processen för att omvandla ett traditionellt vattenkraftverk till ett hybridkraftverk med hjälp av ett optimalt energilagringssystem. Detta är ett komplext problem som kräver ett innovativt tillvägagångssätt som kombinerar elkraftteknik med avancerade algoritmer för maskininlärning. En omfattande modell utvecklades för att simulera ett vattenkraftverk med styrsystem, mekaniska och hydrauliska system. Denna kraftverksmodell integrerades med en modell för ett energilagringssystem för att tillsammans bilda en hybridenhet. Modellens validitet verifierades med hjälp av verkliga testdata. Med hjälp av data från simuleringar av den optimerade hybridenheten kunde tre XGBoost-regressionsmodeller skapas för att estimera både effekt och energibehov för ett optimalt energilagringssystem. Utöver detta kunde även en uppskattning av minskning av slitage presenteras. XGBoost Power Regressor uppnådde en träffsäkerhet på 92 % och XGBoost Energy Regressor uppvisade en träffsäkerhet på 95 %. XGBoost Movement Regressor, som indikerar slitage, hade en noggrannhet på högre än 99 %. Integrering med energilagringssystem kan avsevärt minska slitaget på ett vattenkraftverk, med minskningar på upp till 85 % eller mer. Resultaten visar att integrering av energilagringssystem och vattenkraftverk väsentligt kan förbättra den dynamiska prestandan, minska slitage och göra det möjligt för kraftverken att uppfylla kraven för att bidra med frekvensregleringstjänster i det nordiska kraftsystemet. Resultaten av denna studie kulminerar i ett robust och användarvänligt verktyg som kan uppskatta ett optimalt energilagringsystem för ett vattenkraftverk som ska uppfylla kraven för frekvensreglering.
13

Modelización integrada con aprendizaje automático para evaluar la contaminación por nutrientes en las masas de agua actual y bajo el efecto del cambio climático. Aplicación a la Demarcación Hidrográfica del Júcar

Dorado Guerra, Diana Yaritza 26 February 2024 (has links)
Tesis por compendio / [ES] La contaminación del agua representa un desafío ambiental crítico a nivel global y en la Unión Europea (UE), particularmente en la región mediterránea de España. El crecimiento poblacional, la demanda creciente de alimentos y combustibles, junto con el cambio climático, intensifican la contaminación por nutrientes en los cuerpos de agua. Esta contaminación amenaza la calidad del agua y los ecosistemas acuáticos, así como la salud humana. La complejidad de las vías de transporte de nutrientes hace que su monitoreo y mitigación sean complicados. Se requieren modelos integrales que vinculen procesos y relaciones de causa y efecto para controlar eficazmente la contaminación. En la región mediterránea, como la Demarcación Hidrográfica del Júcar (DHJ), la interacción entre agua superficial y subterránea es clave, pero los modelos tradicionales presentan limitaciones. Esta tesis aborda estos desafíos al caracterizar la contribución de nutrientes a las masas de agua superficiales de la DHJ, evaluar medidas de reducción de la contaminación, considerando el cambio climático a largo plazo y aplicar técnicas de aprendizaje supervisado para predecir la concentración de nitratos. El acoplamiento de modelos hidrológicos y de calidad del agua, junto con el aprendizaje automático, ofrece una comprensión profunda y valiosa de los factores detrás de la contaminación por nutrientes y proporciona una base sólida para la toma de decisiones y la gestión sostenible del agua en la DHJ y regiones similares. Esta tesis fue estructurada como un compendio de tres artículos que abarcan estos desafíos. El primer artículo profundiza en la compleja interacción entre las aguas superficiales y las subterráneas en las cuencas de la DHJ, centrándose en la dinámica de la contaminación por nitratos. Los resultados muestran una correlación directa entre las concentraciones de nitratos en ríos y acuíferos a lo largo del eje principal de los ríos Júcar y Turia, lo cual destaca el papel fundamental de las aportaciones de agua subterránea en la contribución a los niveles de nitratos de los ríos. Además, el estudio identifica regiones aguas abajo con actividades agrícolas y urbanas intensificadas como focos de contaminación por nitratos. El segundo artículo aborda la vulnerabilidad de la calidad de las aguas superficiales al cambio climático y escenarios de reducción de la contaminación difusa y puntual en las cuencas de la DHJ a largo plazo. Los resultados indican que, en los escenarios de cambio climático, se espera que aumenten significativamente las masas de agua con un mal estado de amonio, fósforo y DBO5, y en menor proporción las masas en mal estado de nitratos. En concreto, las concentraciones medias de amonio y fósforo podrían duplicarse durante los meses de bajo caudal. Para mantener la calidad actual del agua, se requieren reducciones sustanciales de al menos el 25% de la contaminación difusa por nitratos y del 50% de las cargas puntuales de amonio, fósforo y DBO5. El tercer artículo presenta un enfoque innovador para simular la concentración de nitratos en masas de agua superficiales mediante modelos de aprendizaje automático. Aprovechando los métodos de selección de características y los algoritmos random forest (RF) y eXtreme Gradient Boosting (XGBoost), el estudio logró una gran precisión en la predicción de la concentración de nitratos. Estos modelos analizaron 19 variables de entrada, que abarcan factores ecológicos, hidrológicos y ambientales, junto con datos de concentración de nitratos procedentes de estaciones de aforo de la calidad de las aguas superficiales. En particular, la investigación destaco que la localización desempeña un papel dominante, explicando el 87% de la variabilidad de los nitratos en relación con la concentración de nitrógeno y fósforo. Esta investigación destaco el potencial del aprendizaje automático en la predicción de la calidad del agua y la evaluación de riesgos. / [CA] La contaminació de l'aigua representa un desafiament ambiental crític a nivell global i a la Unió Europea (UE), particularment a la regió mediterrània d'Espanya. El creixement poblacional, la demanda creixent d'aliments i combustibles, juntament amb el canvi climàtic, intensifiquen la contaminació per nutrients en els cossos d'aigua. Aquesta contaminació amenaça la qualitat de l'aigua i els ecosistemes aquàtics, així com la salut humana. La complexitat de les vies de transport de nutrients fa que el seu monitoratge i mitigació siguin complicats. Es requereixen models integrals que vinculin processos i relacions de causa i efecte per a controlar eficaçment la contaminació. A la regió mediterrània, com la Demarcació Hidrogràfica del Xúquer (DHJ), la interacció entre aigua superficial i subterrània és clau, però els models tradicionals presenten limitacions. Aquesta tesi aborda aquests desafiaments en caracteritzar la contribució de nutrients a les masses d'aigua superficials de la DHJ, avaluar mesures de reducció de la contaminació, considerant el canvi climàtic a llarg termini i aplicar tècniques d'aprenentatge supervisat per a predir la concentració de nitrats. L'acoblament de models hidrològics i de qualitat de l'aigua, juntament amb l'aprenentatge automàtic, ofereix una comprensió profunda i valuosa dels factors darrere de la contaminació per nutrients i proporciona una base sòlida per a la presa de decisions i la gestió sostenible de l'aigua en la DHJ i regions similars. Aquesta tesi va ser estructurada com un compendi de tres articles que abasten aquests desafiaments. El primer article aprofundeix en la complexa interacció entre les aigües superficials i les subterrànies en les conques de la DHJ, centrant-se en la dinàmica de la contaminació per nitrats. Els resultats mostren una correlació directa entre les concentracions de nitrats en rius i aqüífers al llarg de l'eix principal dels rius Xúquer i Túria, la qual cosa destaca el paper fonamental de les aportacions d'aigua subterrània en la contribució als nivells de nitrats dels rius. A més, l'estudi identifica regions aigües avall amb activitats agrícoles i urbanes intensificades com a focus de contaminació per nitrats. El segon article aborda la vulnerabilitat de la qualitat de les aigües superficials al canvi climàtic i escenaris de reducció de la contaminació difusa i puntual en les conques de la DHJ a llarg termini. Els resultats indiquen que, en els escenaris de canvi climàtic, s'espera que augmentin significativament les masses d'aigua amb un mal estat d'amoni, fòsfor i DBO5, i en menor proporció les masses en mal estat de nitrats. En concret, les concentracions mitjanes d'amoni i fòsfor podrien duplicar-se durant els mesos de baix cabal. Per a mantenir la qualitat actual de l'aigua, es requereixen reduccions substancials d'almenys el 25% de la contaminació difusa per nitrats i del 50% de les càrregues puntuals d'amoni, fòsfor i DBO5. El tercer article presenta un enfocament innovador per a simular la concentració de nitrats en masses d'aigua superficials mitjançant models d'aprenentatge automàtic. Aprofitant els mètodes de selecció de característiques i els algorismes random forest (RF) i extremi Gradient Boosting (XGBoost), l'estudi va aconseguir una gran precisió en la predicció de la concentració de nitrats. Aquests models van analitzar 19 variables d'entrada, que abasten factors ecològics, hidrològics i ambientals, juntament amb dades de concentració de nitrats procedents d'estacions d'aforament de la qualitat de les aigües superficials. En particular, la recerca destaco que la localització exerceix un paper dominant, explicant el 87% de la variabilitat dels nitrats en relació amb la concentració de nitrogen i fòsfor. Aquesta recerca destaco el potencial de l'aprenentatge automàtic en la predicció de la qualitat de l'aigua i l'avaluació de riscos. / [EN] Water pollution poses a critical environmental challenge globally and in the European Union (EU), particularly in the Mediterranean region of Spain. Population growth, increasing demand for food and fuels, coupled with climate change, intensify nutrient pollution in water bodies. This pollution threatens water quality, aquatic ecosystems, and human health. The complexity of nutrient transport pathways makes monitoring and mitigation challenging. Comprehensive models that link processes and cause-and-effect relationships are required to effectively control pollution. In the Mediterranean region, such as the Júcar River Basin District (RBD), the interaction between surface and groundwater is crucial, but traditional models have limitations. This thesis addresses these challenges by characterising the contribution of nutrients to surface waters in the Júcar RBD, evaluating pollution reduction measures considering long-term climate change, and applying supervised learning techniques to predict nitrate concentrations. The coupling of hydrological and water quality models, along with machine learning, provides a deep and valuable understanding of the factors behind nutrient pollution and establishes a solid foundation for decision-making and sustainable water management in the Júcar RBD and similar regions. This thesis is structured as a compendium of three articles that encompass these challenges. The first article delves into the complex interaction between surface and groundwater in the Júcar RBD basins, focusing on nitrate pollution dynamics.The results reveal a direct linear correlation between nitrate concentrations in rivers and aquifers along the main axes of the Júcar and Turia rivers, highlighting the fundamental role of groundwater contributions to river nitrate levels. Additionally, the study identifies downstream regions with intensified agricultural and urban activities as nitrate pollution hotspots. This research not only identifies pollution sources but also offers a means to predict nitrate concentrations and assess the effectiveness of pollution prevention measures. The second article addresses the vulnerability of surface water quality to climate change and long-term diffuse and point source pollution reduction scenarios in the Júcar RBD basins. In a region where nutrient concentrations are of particular concern, the study investigates how changing climatic conditions, including rising temperatures and altered precipitation patterns, affect nitrate, ammonium, phosphorus, and biochemical oxygen demand (BOD5) levels. The results indicate that under climate change scenarios, significantly more water bodies are expected to be in poor condition for ammonium, phosphorus, and BOD5, and to a lesser extent, nitrate. Specifically, average concentrations of ammonium and phosphorus could double during low-flow months. To maintain current water quality, substantial reductions of at least 25% in diffuse nitrate pollution and 50% in point source loads of ammonium, phosphorus, and BOD5 are required. This research underscores the importance of water quality management strategies. The third article introduces an innovative approach to simulate nitrate concentrations in surface water bodies using machine learning models. Leveraging feature selection methods and artificial intelligence algorithms, including random forest (RF) and eXtreme Gradient Boosting (XGBoost), the study achieved high precision in predicting nitrate concentrations. These models analysed 19 input variables spanning ecological, hydrological, and environmental factors, along with nitrate concentration data from surface water quality gauging stations. In particular, the research highlighted the dominant role of location, explaining 87% of nitrate variability in relation to nitrogen and phosphorus concentration. This research showcased the potential of machine learning in water quality prediction and risk assessment. / We appreciate the help provided by the Júcar River Basin District Authority (CHJ), who gathered field data. The first author’s research was partially funded by a PhD scholarship from the food research stream of the programme “Colombia Científica—Pasaporte a la Ciencia”, granted by the Colombian Institute for Educational Technical Studies Abroad (Instituto Colombiano de Crédito Educativo y Estudios Técnicos en el Exterior, ICETEX). The authors thank the Spanish Research Agency (AEI) for the financial support to RESPHIRA project (PID2019-106322RB- 100)/AEI/10.13039/501100011033. The contributors gratefully acknowledge funding for open access charge: CRUE-Universitat Politècnica de València / Dorado Guerra, DY. (2024). Modelización integrada con aprendizaje automático para evaluar la contaminación por nutrientes en las masas de agua actual y bajo el efecto del cambio climático. Aplicación a la Demarcación Hidrográfica del Júcar [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202898 / Compendio
14

Utilizing Hybrid Ensemble Prediction Model In Order to Predict Energy Demand in Sweden : A Machine-Learning Approach / En maskininlärningsmetod som använder hybridensembleprediktionsmodell för att förutsäga energiefterfrågan i Sverige

Su, Binxin January 2022 (has links)
Conventional machine learning (ML) models and algorithms are constantly advancing at a fast pace. Most of this development are due to the implementation of hybrid- and ensemble techniques that are powerful tools to complement and empower the efficiency of the algorithms. At the same time, the development and demand for renewable energy sources are rapidly increasing driven by political and environmental issues in which failure to act fast enough, could lead to an existential crisis. With the phasing of non-renewable to renewable energy sources, new challenges arise due to its intermittent and variable nature. Accurate forecasting techniques plays a crucial role in addressing these challenges. In this thesis, I present a hybrid ensemble machine learning model based upon stacking, utilizing a Gradient Boosted Tree as a meta-learner to predict the energy demand for the energy area SE3 in Sweden. The Hybrid model is based on three composite models: XGBoost, CatBoost and Random Forest (RF); utilizing only features extracted from the timeseries data. For training and testing the proposed Hybrid model, hourly demand load data was gathered from Svenska Kraftnät, measuring energy consumption for the energy area SE3 from year 2016-2021. The forecasting results of the models are measured using a regression score (R-squared, which measures Explained Variance) and Accuracy (measured in terms of Mean Absolute Percentage Error). The result shows that in an experimental setting, the Hybrid model reaches a R-squared score of 0.9785 and an accuracy of 97.85%. When utilized for day-ahead prediction on unseen data outside of the scope of the training dataset, the Hybrid model reaches a R-squared score of 0.9764 and an Accuracy of 93.43%. This thesis concludes that the proposed methodology can be utilized to accurately predict the variance in the energy demand and can serve as a framework to decision makers in order to accurately predict the energy demand in Sweden. / Konventionella maskininlärningsmodeller (ML) och algoritmer utvecklas ständigt i snabb takt. Det mesta av denna utveckling beror på implementeringen av hybrid- och ensembletekniker som är kraftfulla verktyg för att komplettera och stärka effektiviteten hos algoritmer. Samtidigt ökar utvecklingen och efterfrågan på förnybara energikällor snabbt, drivet av politiska och miljömässiga motiv, där underlåtenhet att agera tillräckligt snabbt kan leda till en existentiell kris. Med utfasningen av icke-förnybara till förnybara energikällor uppstår nya utmaningar på grund av dess intermittenta och varierande karaktär. Noggranna prognostekniker spelar en avgörande roll för att hantera dessa utmaningar. I det här examensarbetet presenterar jag en hybrid ensemble maskininlärningsmodell baserad på stacking, med användning av ett Gradient Boosted Decision Tree (GBDT) som en meta-learner för att förutsäga energibehovet för energiområdet SE3 i Sverige. Hybridmodellen är baserad på tre kompositmodeller: XGBoost, CatBoost och Random Forest (RF) och använder endast features extraherade från tidsseriedata. För att utbilda och testa den föreslagna hybridmodellen samlades timbelastningsdata från Svenska Kraftnät, som mäter energiförbrukningen för energiområdet SE3 från år 2016-2021. Modellernas prognosresultat mäts med hjälp av ett regressionsmått (R-kvadrat, som mäter Explained Variance) och Accuracy (mätt i termer av Mean Absolute Percentage Error). Resultatet visar att i en experimentell miljö når hybridmodellen en R-kvadratvärde på 0,9785 och en Accuracy på 97,85%. När hybridmodellen används för att förutsäga energiförbrukningen dagen framåt på data utanför omfattningen av träningsdata, når hybridmodellen ett R-kvadratpoäng på 0,9764 och en Accuracy på 93,43%. Denna avhandling drar slutsatsen att den föreslagna metoden kan användas för att korrekt förutsäga variansen i energibehovet och kan fungera som ett ramverk för beslutsfattare för att korrekt prognostisera energibehovet i Sverige.
15

Bayesian Networks for Modelling the Respiratory System and Predicting Hospitalizations

Lopo Martinez, Victor January 2023 (has links)
Bayesian networks can be used to model the respiratory system. Their structure indicate how risk factors, symptoms, and diseases are related and the Conditional Probability Tables enable predictions about a patient’s need for hospitalization. Numerous structure learning algorithms exist for discerning the structure of a Bayesian network, but none can guarantee to find the perfect structure. Employing multiple algorithms can discover relationships between variables that might otherwise remain hidden when relying on a single algorithm. The Maximum Likelihood Estimator is the predominant algorithm for learning the Conditional Probability Tables. However, it faces challenges due to the data fragmentation problem, which can compromise its predictions. Failing to hospitalize patients who require specialized medical care could lead to severe consequences. Therefore, in this thesis, the use of an XGBoost model for learning is proposed as a novel and better method since it does not suffer from data fragmentation. A Bayesian network is constructed combining several structure learning algorithms, and the predictive performance of the Maximum Likelihood Estimator and XGBoost are compared. XGBoost achieved a maximum accuracy of 86.0% compared to the Maximum Likelihood Estimator, which attained an accuracy of 81.5% in predicting future patient hospitalization. In this way, the predictive performance of Bayesian networks has been enhanced. / Bayesianska nätverk kan användas för att modellera andningssystemet. Deras struktur visar hur riskfaktorer, symtom och sjukdomar är relaterade, och de villkorliga sannolikhetstabellerna möjliggör prognoser om en patients behov av sjukhusvård. Det finns många strukturlärningsalgoritmer för att urskilja strukturen i ett bayesianskt nätverk, men ingen kan garantera att hitta den perfekta strukturen. Genom att använda flera algoritmer kan man upptäcka relationer mellan variabler som annars kan förbli dolda när man bara förlitar sig på en enda algoritm. Maximum Likelihood Estimator är den dominerande algoritmen för att lära sig de villkorliga sannolikhetstabellerna. Men den står inför utmaningar på grund av datafragmenteringsproblemet, vilket kan äventyra dess prognoser. Att inte lägga in patienter som behöver specialiserad medicinsk vård kan leda till allvarliga konsekvenser. Därför föreslås i denna avhandling användningen av en XGBoost-modell för inlärning som en ny och bättre metod eftersom den inte lider av datafragmentering. Ett bayesianskt nätverk byggs genom att kombinera flera strukturlärningsalgoritmer, och den prediktiva prestandan för Maximum Likelihood Estimator och XGBoost jämförs. XGBoost uppnådde en maximal noggrannhet på 86,0% jämfört med Maximum Likelihood Estimator, som uppnådde en noggrannhet på 81,5% för att förutsäga framtida patientinläggning. På detta sätt har den prediktiva prestandan för bayesianska nätverk förbättrats.
16

[en] ANALYSIS OF THE CONTRIBUTION OF CHARACTERISTICS ASSOCIATED WITH THE EVOLUTION OF DEATHS FROM COVID19 IN BRAZILIAN STATES USING SHAPLEY VALUES / [pt] ANÁLISE DA CONTRIBUIÇÃO DAS CARACTERÍSTICAS ASSOCIADAS À EVOLUÇÃO DOS ÓBITOS POR COVID-19 NOS ESTADOS BRASILEIROS UTILIZANDO OS VALORES DE SHAPLEY

PAULO HENRIQUE COUTO SIMOES 27 September 2022 (has links)
[pt] Este trabalho propõe um método para hierarquizar a contribuição de diferentes estratégias para conter a evolução da pandemia de COVID-19 em diferentes estados do Brasil, nos períodos pré- e pós-vacinação. O método proposto incluiu o aprendizado automático de modelos de regressão utilizando o algoritmo de aprendizado de máquina XGBoost, e aplicou a teoria dos jogos cooperativos de Shapley para quantificar a contribuição das características analisadas para a variável-alvo. Para interpretar o modelo globalmente, foi usado o SHapley Additive exPlanations (SHAP), que é um algoritmo baseado na teoria de Shapley. Os resultados de avaliação do método apontaram a sua eficácia para quantificar a contribuição de cada variável de forma robusta, e revelam que os percentuais de cobertura vacinal de primeira e segunda dose, além do fechamento das escolas, foram as medidas que tiveram maior contribuição na evolução do número de casos e óbitos por COVID-19. A ponderação das variáveis pode ajudar os atores responsáveis na elaboração de políticas públicas para minimizar os efeitos socioeconômicos em suas regiões, dado que o Brasil é um país que possui extrema desigualdade social. / [en] This work proposes a method to rank the contribution of different strategies to contain the evolution of the COVID-19 pandemic in different states of Brazil, in the pre- and post-vaccination periods. The proposed method included the automatic learning of regression models using the XGBoost machine learning algorithm, and applied Shapley s cooperative game theory to quantify the contribution of the analyzed characteristics to the target variable. To interpret the model globally, the SHapley Additive exPlanations (SHAP) was used, which is an algorithm based on Shapley s theory. The evaluation results point to its efficacy to quantify the contribution of each variable in a robust way, and reveal that the percentages of first and second dose vaccination coverage, in addition to the closing of schools, were the measures that had the greatest contribution in the evolution of the number of cases and deaths due to COVID-19. The weighting of variables can help the actors responsible in the elaboration of public policies to minimize the socioeconomic effects in their regions, since Brazil is a country that has extreme social inequality.
17

Modelo híbrido de avaliação de risco de crédito para corporações brasileiras com base em algoritmos de aprendizado de máquina

Gregório, Rafael Leite 09 July 2018 (has links)
Submitted by Sara Ribeiro (sara.ribeiro@ucb.br) on 2018-08-08T13:33:03Z No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5) / Approved for entry into archive by Sara Ribeiro (sara.ribeiro@ucb.br) on 2018-08-08T13:33:24Z (GMT) No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5) / Made available in DSpace on 2018-08-08T13:33:24Z (GMT). No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5) Previous issue date: 2018-07-09 / The credit risk assessment has a relevant role for financial institutions because it is associated with possible losses and has a large impact on the balance sheets. Although there are several researches on applications of machine learning and finance models, a study is still lacking that integrates available knowledge about credit risk assessment. This paper aims at specifying the machine learning model of the probability of default of publicly traded companies present in the Bovespa Index (corporations) and, based on the estimations of the model, to obtain risk assessment metrics based on risk letters. We converged methodologies verified in the literature and we estimated models that comprise fundamentalist (balance sheet) and governance data, macroeconomic and even variables resulting from the application of the proprietary model of KMV credit risk assessment. We test the XGboost and LinearSVM algorithms, which have very different characteristics among them, but are potentially useful to the problem. Parameter Grids were performed to identify the most representative variables and to specify the best performing model. The model selected was XGboost, and performance was very similar to the results obtained for the North American stock market in analogous research. The estimated credit ratings suggest that they are more sensitive to the economic and financial situation of the companies than that verified by traditional Rating Agencies. / A avaliação do risco de crédito tem papel relevante para as instituições financeiras por estar associada a possíveis perdas que podem gerar grande impacto nos balanços. Embora existam várias pesquisas sobre aplicações de modelos de aprendizado de máquina e finanças, ainda não há estudo que integre o conhecimento disponível sobre avaliação de risco de crédito. Este trabalho visa especificar modelo de aprendizado de máquina da probabilidade de descumprimento de empresas de capital aberto presentes no Índice Bovespa (corporações) e, fruto das estimações do modelo, obter métrica de avaliação de risco baseada em letras (ratings) de risco. Convergiu-se metodologias verificadas na literatura e estimou-se modelos que compreendem componentes fundamentalistas (de balanço) e de governança corporativa, macroeconômicos e ainda variáveis produto da aplicação do modelo proprietário de avaliação de risco de crédito KMV. Testou-se os algoritmos XGboost e LinearSVM, os quais possuem características bastante distintas entre si, mas são potencialmente úteis ao problema exposto. Foram realizados Grids de parâmetros para identificação das variáveis mais representativas e para a especificação do modelo com melhor desempenho. O modelo selecionado foi o XGboost, tendo sido observado desempenho bastante semelhante aos resultados obtidos para o mercado de ações norte-americano em pesquisa análoga. Os ratings de crédito estimados mostram-se mais sensíveis à situação econômico-financeira das empresas ante o verificado por agências de rating tradicionais.
18

Early Stratification of Gestational Diabetes Mellitus (GDM) by building and evaluating machine learning models

Sharma, Vibhor January 2020 (has links)
Gestational diabetes Mellitus (GDM), a condition involving abnormal levels of glucose in the blood plasma has seen a rapid surge amongst the gestating mothers belonging to different regions and ethnicities around the world. Cur- rent method of screening and diagnosing GDM is restricted to Oral Glucose Tolerance Test (OGTT). With the advent of machine learning algorithms, the healthcare has seen a surge of machine learning methods for disease diag- nosis which are increasingly being employed in a clinical setup. Yet in the area of GDM, there has not been wide spread utilization of these algorithms to generate multi-parametric diagnostic models to aid the clinicians for the aforementioned condition diagnosis.In literature, there is an evident scarcity of application of machine learn- ing algorithms for the GDM diagnosis. It has been limited to the proposed use of some very simple algorithms like logistic regression. Hence, we have attempted to address this research gap by employing a wide-array of machine learning algorithms, known to be effective for binary classification, for GDM classification early on amongst gestating mother. This can aid the clinicians for early diagnosis of GDM and will offer chances to mitigate the adverse out- comes related to GDM among the gestating mother and their progeny.We set up an empirical study to look into the performance of different ma- chine learning algorithms used specifically for the task of GDM classification. These algorithms were trained on a set of chosen predictor variables by the ex- perts. Then compared the results with the existing machine learning methods in the literature for GDM classification based on a set of performance metrics. Our model couldn’t outperform the already proposed machine learning mod- els for GDM classification. We could attribute it to our chosen set of predictor variable and the under reporting of various performance metrics like precision in the existing literature leading to a lack of informed comparison. / Graviditetsdiabetes Mellitus (GDM), ett tillstånd som involverar onormala ni- våer av glukos i blodplasma har haft en snabb kraftig ökning bland de drab- bade mammorna som tillhör olika regioner och etniciteter runt om i världen. Den nuvarande metoden för screening och diagnos av GDM är begränsad till Oralt glukosetoleranstest (OGTT). Med tillkomsten av maskininlärningsalgo- ritmer har hälso- och sjukvården sett en ökning av maskininlärningsmetoder för sjukdomsdiagnos som alltmer används i en klinisk installation. Ändå inom GDM-området har det inte använts stor spridning av dessa algoritmer för att generera multiparametriska diagnostiska modeller för att hjälpa klinikerna för ovannämnda tillståndsdiagnos.I litteraturen finns det en uppenbar brist på tillämpning av maskininlär- ningsalgoritmer för GDM-diagnosen. Det har begränsats till den föreslagna användningen av några mycket enkla algoritmer som logistisk regression. Där- för har vi försökt att ta itu med detta forskningsgap genom att använda ett brett spektrum av maskininlärningsalgoritmer, kända för att vara effektiva för binär klassificering, för GDM-klassificering tidigt bland gesterande mamma. Det- ta kan hjälpa klinikerna för tidig diagnos av GDM och kommer att erbjuda chanser att mildra de negativa utfallen relaterade till GDM bland de dödande mamma och deras avkommor.Vi inrättade en empirisk studie för att undersöka prestandan för olika ma- skininlärningsalgoritmer som används specifikt för uppgiften att klassificera GDM. Dessa algoritmer tränades på en uppsättning valda prediktorvariabler av experterna. Jämfört sedan resultaten med de befintliga maskininlärnings- metoderna i litteraturen för GDM-klassificering baserat på en uppsättning pre- standametriker. Vår modell kunde inte överträffa de redan föreslagna maskininlärningsmodellerna för GDM-klassificering. Vi kunde tillskriva den valda uppsättningen prediktorvariabler och underrapportering av olika prestanda- metriker som precision i befintlig litteratur vilket leder till brist på informerad jämförelse.
19

SYSTEMATICALLY LEARNING OF INTERNAL RIBOSOME ENTRY SITE AND PREDICTION BY MACHINE LEARNING

Junhui Wang (5930375) 15 May 2019 (has links)
<p><a>Internal ribosome entry sites (IRES) are segments of the mRNA found in untranslated regions, which can recruit the ribosome and initiate translation independently of the more widely used 5’ cap dependent translation initiation mechanism. IRES play an important role in conditions where has been 5’ cap dependent translation initiation blocked or repressed. They have been found to play important roles in viral infection, cellular apoptosis, and response to other external stimuli. It has been suggested that about 10% of mRNAs, both viral and cellular, can utilize IRES. But due to the limitations of IRES bicistronic assay, which is a gold standard for identifying IRES, relatively few IRES have been definitively described and functionally validated compared to the potential overall population. Viral and cellular IRES may be mechanistically different, but this is difficult to analyze because the mechanistic differences are still not very clearly defined. Identifying additional IRES is an important step towards better understanding IRES mechanisms. Development of a new bioinformatics tool that can accurately predict IRES from sequence would be a significant step forward in identifying IRES-based regulation, and in elucidating IRES mechanism. This dissertation systematically studies the features which can distinguish IRES from nonIRES sequences. Sequence features such as kmer words, and structural features such as predicted MFE of folding, Q<sub>MFE</sub>, and sequence/structure triplets are evaluated as possible discriminative features. Those potential features incorporated into an IRES classifier based on XGBboost, a machine learning model, to classify novel sequences as belong to IRES or nonIRES groups. The XGBoost model performs better than previous predictors, with higher accuracy and lower computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by adding global kmer and structural features. The trained XGBoost model has been implemented as the first high-throughput bioinformatics tool for IRES prediction, IRESpy. This website provides a public tool for all IRES researchers and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.</a></p>
20

Predicting the Movement Direction of OMXS30 Stock Index Using XGBoost and Sentiment Analysis

Elena, Podasca January 2021 (has links)
Background. Stock market prediction is an active yet challenging research area. A lot of effort has been put in by both academia and practitioners to produce accurate stock market predictions models, in the attempt to maximize investment objectives. Tree-based ensemble machine learning methods such as XGBoost have proven successful in practice. At the same time, there is a growing trend to incorporate multiple data sources in prediction models, such as historical prices and text, in order to achieve superior forecasting performance. However, most applications and research have so far focused on the American or Asian stock markets, while the Swedish stock market has not been studied extensively from the perspective of hybrid models using both price and text derived features.  Objectives. The purpose of this thesis is to investigate whether augmenting a numerical dataset based on historical prices with sentiment features extracted from financial news improves classification performance when predicting the daily price trend of the Swedish stock market index, OMXS30. Methods. A dataset of 3,517 samples between 2006 - 2020 was collected from two sources, historical prices and financial news. XGBoost was used as classifier and four different metrics were employed for model performance comparison given three complementary datasets: the dataset which contains only the sentiment feature, the dataset with only price-derived features and finally, the dataset augmented with sentiment feature extracted from financial news.  Results. Results show that XGBoost has a good performance in classifying the daily trend of OMXS30 given historical price features, achieving an accuracy of 73% on the test set. A small improvement across all metrics is recorded on the test set when augmenting the numerical dataset with sentiment features extracted from financial news.  Conclusions. XGBoost is a powerful ensemble method for stock market prediction, reflected in a satisfactory classification performance of the daily movement direction of OMXS30. However, augmenting the numerical input set with sentiment features extracted from text did not have a powerful impact on classification performance in this case, as the improvements across all employed metrics were small.

Page generated in 0.0577 seconds