Global ETD Search

41	Exploring ensemble learning techniques to optimize the reverse engineering of gene regulatory networks / Explorando técnicas de ensemble learning para otimizar a engenharia reversa de redes regulatórias genéticas Recamonde-Mendoza, Mariana January 2014 (has links) Nesta tese estamos especificamente interessados no problema de engenharia re- versa de redes regulatórias genéticas a partir de dados de pós-genômicos, um grande desafio na área de Bioinformática. Redes regulatórias genéticas são complexos cir- cuitos biológicos responsáveis pela regulação do nível de expressão dos genes, desem- penhando assim um papel fundamental no controle de inúmeros processos celulares, incluindo diferenciação celular, ciclo celular e metabolismo. Decifrar a estrutura destas redes é crucial para possibilitar uma maior compreensão à nível de sistema do desenvolvimento e comportamento dos organismos, e eventualmente esclarecer os mecanismos de doenças causados pela desregulação dos processos acima mencio- nados. Devido ao expressivo aumento da disponibilidade de dados experimentais de larga escala e da grande dimensão e complexidade dos sistemas biológicos, métodos computacionais têm sido ferramentas essenciais para viabilizar esta investigação. No entanto, seu desempenho ainda é bastante deteriorado por importantes desafios com- putacionais e biológicos impostos pelo cenário. Em particular, o ruído e esparsidade inerentes aos dados biológicos torna este problema de inferência de redes um difícil problema de otimização combinatória, para o qual métodos computacionais dispo- níveis falham em relação à exatidão e robustez das predições. Esta tese tem como objetivo investigar o uso de técnicas de ensemble learning como forma de superar as limitações existentes e otimizar o processo de inferência, explorando a diversidade entre um conjunto de modelos. Com este intuito, desenvolvemos métodos computa- cionais tanto para gerar redes diversificadas, como para combinar estas predições em uma solução única (solução ensemble ), e aplicamos esta abordagem a uma série de cenários com diferentes fontes de diversidade a fim de compreender o seu potencial neste contexto específico. Mostramos que as soluções propostas são competitivas com algoritmos tradicionais deste campo de pesquisa e que melhoram nossa capa- cidade de reconstruir com precisão as redes regulatórias genéticas. Os resultados obtidos para a inferência de redes de regulação transcricional e pós-transcricional, duas camadas adjacentes e complementares que compõem a rede de regulação glo- bal, tornam evidente a eficiência e robustez da nossa abordagem, encorajando a consolidação de ensemble learning como uma metodologia promissora para decifrar a estrutura de redes regulatórias genéticas. / In this thesis we are concerned about the reverse engineering of gene regulatory networks from post-genomic data, a major challenge in Bioinformatics research. Gene regulatory networks are intricate biological circuits responsible for govern- ing the expression levels (activity) of genes, thereby playing an important role in the control of many cellular processes, including cell differentiation, cell cycle and metabolism. Unveiling the structure of these networks is crucial to gain a systems- level understanding of organisms development and behavior, and eventually shed light on the mechanisms of diseases caused by the deregulation of these cellular pro- cesses. Due to the increasing availability of high-throughput experimental data and the large dimension and complexity of biological systems, computational methods have been essential tools in enabling this investigation. Nonetheless, their perfor- mance is much deteriorated by important computational and biological challenges posed by the scenario. In particular, the noisy and sparse features of biological data turn the network inference into a challenging combinatorial optimization prob- lem, to which current methods fail in respect to the accuracy and robustness of predictions. This thesis aims at investigating the use of ensemble learning tech- niques as means to overcome current limitations and enhance the inference process by exploiting the diversity among multiple inferred models. To this end, we develop computational methods both to generate diverse network predictions and to combine multiple predictions into an ensemble solution, and apply this approach to a number of scenarios with different sources of diversity in order to understand its potential in this specific context. We show that the proposed solutions are competitive with tra- ditional algorithms in the field and improve our capacity to accurately reconstruct gene regulatory networks. Results obtained for the inference of transcriptional and post-transcriptional regulatory networks, two adjacent and complementary layers of the overall gene regulatory network, evidence the efficiency and robustness of our approach, encouraging the consolidation of ensemble systems as a promising methodology to decipher the structure of gene regulatory networks. Bioinformática Aprendizagem : Maquina Inteligência artificial Engenharia reversa Bioinformatics Machine learning Gene regulatory networks Reverse engineering Ensemble learning
42	A Mixture-of-Experts Approach for Gene Regulatory Network Inference Shao, Borong January 2014 (has links) Context. Gene regulatory network (GRN) inference is an important and challenging problem in bioinformatics. A variety of machine learning algorithms have been applied to increase the GRN inference accuracy. Ensemble learning methods are shown to yield a higher inference accuracy than individual algorithms. Objectives. We propose an ensemble GRN inference method, which is based on the principle of Mixture-of-Experts ensemble learning. The proposed method can quantitatively measure the accuracy of individual GRN inference algorithms at the network motifs level. Based on the accuracy of the individual algorithms at predicting different types of network motifs, weights are assigned to the individual algorithms so as to take advantages of their strengths and weaknesses. In this way, we can improve the accuracy of the ensemble prediction. Methods. The research methodology is controlled experiment. The independent variable is method. It has eight groups: five individual algorithms, the generic average ranking method used in the DREAM5 challenge, the proposed ensemble method including four types of network motifs and five types of network motifs. The dependent variable is GRN inference accuracy, measured by the area under the precision-recall curve (AUPR). The experiment has training and testing phases. In the training phase, we analyze the accuracy of five individual algorithms at the network motifs level to decide their weights. In the testing phase, the weights are used to combine predictions from the five individual algorithms to generate ensemble predictions. We compare the accuracy of the eight method groups on Escherichia coli microarray dataset using AUPR. Results. In the training phase, we obtain the AUPR values of the five individual algorithms at predicting each type of the network motifs. In the testing phase, we collect the AUPR values of the eight methods on predicting the GRN of the Escherichia coli microarray dataset. Each method group has a sample size of ten (ten AUPR values). Conclusions. Statistical tests on the experiment results show that the proposed method yields a significantly higher accuracy than the generic average ranking method. In addition, a new type of network motif is found in GRN, the inclusion of which can increase the accuracy of the proposed method significantly. / Genes are DNA molecules that control the biological traits and biochemical processes that comprise life. They interact with each other to realize the precise regulation of life activities. Biologists aim to understand the regulatory network among the genes, with the help of high-throughput techonologies, such as microarrays, RNA-seq, etc. These technologies produce large amount of gene expression data which contain useful information. Therefore, effective data mining is necessary to discover the information to promote biological research. Gene regulatory network (GRN) inference is to infer the gene interactions from gene expression data, such as microarray datasets. The inference results can be used to guide the direction of further experiments to discover or validate gene interactions. A variety of machine learning (data mining) methods have been proposed to solve this problem. In recent years, experiments have shown that ensemble learning methods achieve higher accuracy than the individual learning methods. Because the ensemble learning methods can take advantages of the strength of different individual methods and it is robust to different network structures. In this thesis, we propose an ensemble GRN inference method, which is based on the principle of the Mixture-of-Experts ensemble learning. By quantitatively measure the accuracy of individual methods at the network motifs level, the proposed method is able to take advantage of the complementarity among the individual methods. The proposed method yields a significantly higher accuracy than the generic average ranking method, which is the most accurate method out of 35 GRN inference methods in the DREAM5 challenge. / 0769607980 GRN inference Ensemble learning Mixture-of-Experts network motif analysis Computer Sciences Datavetenskap (datalogi) Information Systems
43	Exploring ensemble learning techniques to optimize the reverse engineering of gene regulatory networks / Explorando técnicas de ensemble learning para otimizar a engenharia reversa de redes regulatórias genéticas Recamonde-Mendoza, Mariana January 2014 (has links) Nesta tese estamos especificamente interessados no problema de engenharia re- versa de redes regulatórias genéticas a partir de dados de pós-genômicos, um grande desafio na área de Bioinformática. Redes regulatórias genéticas são complexos cir- cuitos biológicos responsáveis pela regulação do nível de expressão dos genes, desem- penhando assim um papel fundamental no controle de inúmeros processos celulares, incluindo diferenciação celular, ciclo celular e metabolismo. Decifrar a estrutura destas redes é crucial para possibilitar uma maior compreensão à nível de sistema do desenvolvimento e comportamento dos organismos, e eventualmente esclarecer os mecanismos de doenças causados pela desregulação dos processos acima mencio- nados. Devido ao expressivo aumento da disponibilidade de dados experimentais de larga escala e da grande dimensão e complexidade dos sistemas biológicos, métodos computacionais têm sido ferramentas essenciais para viabilizar esta investigação. No entanto, seu desempenho ainda é bastante deteriorado por importantes desafios com- putacionais e biológicos impostos pelo cenário. Em particular, o ruído e esparsidade inerentes aos dados biológicos torna este problema de inferência de redes um difícil problema de otimização combinatória, para o qual métodos computacionais dispo- níveis falham em relação à exatidão e robustez das predições. Esta tese tem como objetivo investigar o uso de técnicas de ensemble learning como forma de superar as limitações existentes e otimizar o processo de inferência, explorando a diversidade entre um conjunto de modelos. Com este intuito, desenvolvemos métodos computa- cionais tanto para gerar redes diversificadas, como para combinar estas predições em uma solução única (solução ensemble ), e aplicamos esta abordagem a uma série de cenários com diferentes fontes de diversidade a fim de compreender o seu potencial neste contexto específico. Mostramos que as soluções propostas são competitivas com algoritmos tradicionais deste campo de pesquisa e que melhoram nossa capa- cidade de reconstruir com precisão as redes regulatórias genéticas. Os resultados obtidos para a inferência de redes de regulação transcricional e pós-transcricional, duas camadas adjacentes e complementares que compõem a rede de regulação glo- bal, tornam evidente a eficiência e robustez da nossa abordagem, encorajando a consolidação de ensemble learning como uma metodologia promissora para decifrar a estrutura de redes regulatórias genéticas. / In this thesis we are concerned about the reverse engineering of gene regulatory networks from post-genomic data, a major challenge in Bioinformatics research. Gene regulatory networks are intricate biological circuits responsible for govern- ing the expression levels (activity) of genes, thereby playing an important role in the control of many cellular processes, including cell differentiation, cell cycle and metabolism. Unveiling the structure of these networks is crucial to gain a systems- level understanding of organisms development and behavior, and eventually shed light on the mechanisms of diseases caused by the deregulation of these cellular pro- cesses. Due to the increasing availability of high-throughput experimental data and the large dimension and complexity of biological systems, computational methods have been essential tools in enabling this investigation. Nonetheless, their perfor- mance is much deteriorated by important computational and biological challenges posed by the scenario. In particular, the noisy and sparse features of biological data turn the network inference into a challenging combinatorial optimization prob- lem, to which current methods fail in respect to the accuracy and robustness of predictions. This thesis aims at investigating the use of ensemble learning tech- niques as means to overcome current limitations and enhance the inference process by exploiting the diversity among multiple inferred models. To this end, we develop computational methods both to generate diverse network predictions and to combine multiple predictions into an ensemble solution, and apply this approach to a number of scenarios with different sources of diversity in order to understand its potential in this specific context. We show that the proposed solutions are competitive with tra- ditional algorithms in the field and improve our capacity to accurately reconstruct gene regulatory networks. Results obtained for the inference of transcriptional and post-transcriptional regulatory networks, two adjacent and complementary layers of the overall gene regulatory network, evidence the efficiency and robustness of our approach, encouraging the consolidation of ensemble systems as a promising methodology to decipher the structure of gene regulatory networks. Bioinformática Aprendizagem : Maquina Inteligência artificial Engenharia reversa Bioinformatics Machine learning Gene regulatory networks Reverse engineering Ensemble learning
44	A Novel Ensemble Method using Signed and Unsigned Graph Convolutional Networks for Predicting Mechanisms of Action of Small Molecules from Gene Expression Data Karim, Rashid Saadman 24 May 2022 (has links) No description available. Bioinformatics Graph Convolutional Neural Network Drug Mechanism of Action Prediction Ensemble Learning Unsigned and Signed Networks Bioinformatics Deep Learning on Gene Expression Data
45	Stronger Together? An Ensemble of CNNs for Deepfakes Detection / Starkare Tillsammans? En Ensemble av CNNs för att Identifiera Deepfakes Gardner, Angelica January 2020 (has links) Deepfakes technology is a face swap technique that enables anyone to replace faces in a video, with highly realistic results. Despite its usefulness, if used maliciously, this technique can have a significant impact on society, for instance, through the spreading of fake news or cyberbullying. This makes the ability of deepfakes detection a problem of utmost importance. In this paper, I tackle the problem of deepfakes detection by identifying deepfakes forgeries in video sequences. Inspired by the state-of-the-art, I study the ensembling of different machine learning solutions built on convolutional neural networks (CNNs) and use these models as objects for comparison between ensemble and single model performances. Existing work in the research field of deepfakes detection suggests that escalated challenges posed by modern deepfake videos make it increasingly difficult for detection methods. I evaluate that claim by testing the detection performance of four single CNN models as well as six stacked ensembles on three modern deepfakes datasets. I compare various ensemble approaches to combine single models and in what way their predictions should be incorporated into the ensemble output. The results I found was that the best approach for deepfakes detection is to create an ensemble, though, the ensemble approach plays a crucial role in the detection performance. The final proposed solution is an ensemble of all available single models which use the concept of soft (weighted) voting to combine its base-learners’ predictions. Results show that this proposed solution significantly improved deepfakes detection performance and substantially outperformed all single models. deepfakes deepfakes detection supervised learning binary classification convolutional neural networks ensemble learning stacking Computer Sciences Datavetenskap (datalogi)
46	MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels in Stacking Ensemble Learning Ploshchik, Ilya January 2023 (has links) Stacking, also known as stacked generalization, is a method of ensemble learning where multiple base models are trained on the same dataset, and their predictions are used as input for one or more metamodels in an extra layer. This technique can lead to improved performance compared to single layer ensembles, but often requires a time-consuming trial-and-error process. Therefore, the previously developed Visual Analytics system, StackGenVis, was designed to help users select the set of the most effective and diverse models and measure their predictive performance. However, StackGenVis was developed with only one metamodel: Logistic Regression. The focus of this Bachelor's thesis is to examine how alternative metamodels affect the performance of stacked ensembles through the use of a visualization tool called MetaStackVis. Our interactive tool facilitates visual examination of individual metamodels and metamodels' pairs based on their predictive probabilities (or confidence), various supported validation metrics, and their accuracy in predicting specific problematic data instances. The efficiency and effectiveness of MetaStackVis are demonstrated with an example based on a real healthcare dataset. The tool has also been evaluated through semi-structured interview sessions with Machine Learning and Visual Analytics experts. In addition to this thesis, we have written a short research paper explaining the design and implementation of MetaStackVis. However, this thesis provides further insights into the topic explored in the paper by offering additional findings and in-depth analysis. Thus, it can be considered a supplementary source of information for readers who are interested in diving deeper into the subject. Visualization interaction metamodels validation metrics predicted probabilities stacking stacked generalization ensemble learning machine learning Computer Sciences Datavetenskap (datalogi)
47	An investigation into applications of canonical polyadic decomposition & ensemble learning in forecasting thermal data streams in direct laser deposition processes Storey, Jonathan 08 December 2023 (has links) (PDF) Additive manufacturing (AM) is a process of creating objects from 3D model data by adding layers of material. AM technologies present several advantages compared to traditional manufacturing technologies, such as producing less material waste and being capable of producing parts with greater geometric complexity. However, deficiencies in the printing process due to high process uncertainty can affect the microstructural properties of a fabricated part leading to defects. In metal AM, previous studies have linked defects in parts with melt pool temperature fluctuations, with the size of the melt pool and the scan pattern being key factors associated with part defects. Thus being able to adjust certain process parameters during a part's fabrication, and knowing when to adjust these parameters, is critical to producing reliable parts. To know when to effectively adjust these parameters it is necessary to have models that can both identify when a defect has occurred and forecast the behavior of the process to identify if a defect will occur. This study focuses on the development of accurate forecasting models of the melt pool temperature distribution. Researchers at Mississippi State University have collected in-situ pyrometer data of a direct laser deposition process which captures the temperature distribution of the melt pool. The high-dimensionality and noise of the data pose unique challenges in developing accurate forecasting models. To overcome these challenges, a tensor decomposition modeling framework is developed that can actively learn and adapt to new data. The framework is evaluated on two datasets which demonstrates its ability to generate accurate forecasts and adjust to new data. additive manufacturing thermal history forecasting tensor decomposition ensemble learning Computational Engineering Data Science
48	Sublimation temperature prediction of OLED materials : using machine learning Norinder, Niklas January 2023 (has links) Organic light-emitting diodes (OLED) are and have been the future of display technology for a minute. Looking back, display technology has moved from cathode-ray tube displays (CRTs) to liquid crystal displays (LCDs). Whereas CRT displays were clunky and had quite high powerconsumption, LCDs were thinner, lighter and consumed less energy. This technological shift has made it possible to create smaller and more portable screens, aiding in the development of personal electronics. Currently, however, LCDs place at the top of the display hierarchy is being challenged by OLED displays, providing higher pixel density and overall higher performance.OLED displays consist of thin layers of organic semiconductors, and are instrumental in the development of folding displays; small displays for virtual reality and augmented reality applications; as well as development of displays that are energy-efficient. In the creation of OLED displays, the organic semiconducting material is vaporized and adhered to a thin film through vapor deposition techniques. One way of aiding in the creation of organic electroluminescent (OEL) materials and OLEDs is through in silico analysis of sublimationtemperatures through machine learning. This master’s thesis inhabits that space, aiming to create a deeper understanding of the OEL materials through sublimation temperature prediction using ensemble learning (light gradient-boosting machine) and deep learning (convolutional neural network) methods. Through analysis of experimental OEL data, it is found that the sublimation temperatures of OLED materials can be predicted with machine learning regression using molecular descriptors, with an R2 score of ~0.86, Mean Absolute Error of ~13°C, Mean Absolute Percentage Error of ~3.1%, and Normalized Mean Absolute Error of ~0.56. Semiconductors OLED machine learning vapor deposition sublimation molecular property prediction regression ensemble learning deep learning Computer Sciences Datavetenskap (datalogi)
49	Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models Henriksson, Erik, Werlinder, Kristopher January 2021 (has links) The aim of this research project is to investigate how an XGBoost regressor compares to a Random Forest regressor in terms of predictive performance of housing prices with the help of two data sets. The comparison considers training time, inference time and the three evaluation metrics R2, RMSE and MAPE. The data sets are described in detail together with background about the regressor models that are used. The method makes substantial data cleaning of the two data sets, it involves hyperparameter tuning to find optimal parameters and 5foldcrossvalidation in order to achieve good performance estimates. The finding of this research project is that XGBoost performs better on both small and large data sets. While the Random Forest model can achieve similar results as the XGBoost model, it needs a much longer training time, between 2 and 50 times as long, and has a longer inference time, around 40 times as long. This makes it especially superior when used on larger sets of data. / Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar. Random Forest XGBoost predicting housing prices feature engineering ensemble learning boosting data cleansing 5foldcrossvalidation. Computer Sciences Datavetenskap (datalogi)
50	OPTIMIZING DECISION TREE ENSEMBLES FOR GENE-GENE INTERACTION DETECTION Assareh, Amin 27 November 2012 (has links) No description available. Bioinformatics Computer Science GWAS Epistasis Interaction Detection Variable Selection Decision Trees Ensemble Learning AdaBoost LogitBoost Bagging Random Forest

Search results