• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 64
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 87
  • 87
  • 44
  • 35
  • 33
  • 21
  • 19
  • 17
  • 16
  • 16
  • 15
  • 14
  • 12
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Metagenomic Data Analysis Using Extremely Randomized Tree Algorithm

Gupta, Suraj 26 June 2018 (has links)
Many antibiotic resistance genes (ARGs) conferring resistance to a broad range of antibiotics have often been detected in aquatic environments such as untreated and treated wastewater, river and surface water. ARG proliferation in the aquatic environment could depend upon various factors such as geospatial variations, the type of aquatic body, and the type of wastewater (untreated or treated) discharged into these aquatic environments. Likewise, the strong interconnectivity of aquatic systems may accelerate the spread of ARGs through them. Hence a comparative and a holistic study of different aquatic environments is required to appropriately comprehend the problem of antibiotic resistance. Many studies approach this issue using molecular techniques such as metagenomic sequencing and metagenomic data analysis. Such analyses compare the broad spectrum of ARGs in water and wastewater samples, but these studies use comparisons which are limited to similarity/dissimilarity analyses. However, in such analyses, the discriminatory ARGs (associated ARGs driving such similarity/ dissimilarity measures) may not be identified. Consequentially, the reason which drives the dissimilarities among the samples would not be identified and the reason for antibiotic resistance proliferation may not be clearly understood. In this study, an effective methodology, using Extremely Randomized Trees (ET) Algorithm, was formulated and demonstrated to capture such ARG variations and identify discriminatory ARGs among environmentally derived metagenomes. In this study, data were grouped by: geographic location (to understand the spread of ARGs globally), untreated vs. treated wastewater (to see the effectiveness of WWTPs in removing ARGs), and different aquatic habitats (to understand the impact and spread within aquatic habitats). It was observed that there were certain ARGs which were specific to wastewater samples from certain locations suggesting that site-specific factors can have a certain effect in shaping ARG profiles. Comparing untreated and treated wastewater samples from different WWTPs revealed that biological treatments have a definite impact on shaping the ARG profile. While there were several ARGs which got removed after the treatment, there were some ARGs which showed an increase in relative abundance irrespective of location and treatment plant specific variables. On comparing different aquatic environments, the algorithm identified ARGs which were specific to certain environments. The algorithm captured certain ARGs which were specific to hospital discharges when compared with other aquatic environments. It was determined that the proposed method was efficient in identifying the discriminatory ARGs which could classify the samples according to their groups. Further, it was also effective in capturing low-level variations which generally get over-shadowed in the analysis due to highly abundant genes. The results of this study suggest that the proposed method is an effective method for comprehensive analyses and can provide valuable information to better understand antibiotic resistance. / MS
42

Ensembles of Semantic Spaces : On Combining Models of Distributional Semantics with Applications in Healthcare

Henriksson, Aron January 2015 (has links)
Distributional semantics allows models of linguistic meaning to be derived from observations of language use in large amounts of text. By modeling the meaning of words in semantic (vector) space on the basis of co-occurrence information, distributional semantics permits a quantitative interpretation of (relative) word meaning in an unsupervised setting, i.e., human annotations are not required. The ability to obtain inexpensive word representations in this manner helps to alleviate the bottleneck of fully supervised approaches to natural language processing, especially since models of distributional semantics are data-driven and hence agnostic to both language and domain. All that is required to obtain distributed word representations is a sizeable corpus; however, the composition of the semantic space is not only affected by the underlying data but also by certain model hyperparameters. While these can be optimized for a specific downstream task, there are currently limitations to the extent the many aspects of semantics can be captured in a single model. This dissertation investigates the possibility of capturing multiple aspects of lexical semantics by adopting the ensemble methodology within a distributional semantic framework to create ensembles of semantic spaces. To that end, various strategies for creating the constituent semantic spaces, as well as for combining them, are explored in a number of studies. The notion of semantic space ensembles is generalizable across languages and domains; however, the use of unsupervised methods is particularly valuable in low-resource settings, in particular when annotated corpora are scarce, as in the domain of Swedish healthcare. The semantic space ensembles are here empirically evaluated for tasks that have promising applications in healthcare. It is shown that semantic space ensembles – created by exploiting various corpora and data types, as well as by adjusting model hyperparameters such as the size of the context window and the strategy for handling word order within the context window – are able to outperform the use of any single constituent model on a range of tasks. The semantic space ensembles are used both directly for k-nearest neighbors retrieval and for semi-supervised machine learning. Applying semantic space ensembles to important medical problems facilitates the secondary use of healthcare data, which, despite its abundance and transformative potential, is grossly underutilized. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 4 and 5: Unpublished conference papers.</p> / High-Performance Data Mining for Drug Effect Detection
43

Statistique pour l’anticipation des niveaux de sécurité secondaire des générations de véhicules / Statistics for anticipating the levels of secondary safety for generations of vehicles

Ouni, Zaïd 19 July 2016 (has links)
La sécurité routière est une priorité mondiale, européenne et française. Parce que les véhicules légers (ou simplement “les véhicules”) sont évidemment l’un des acteurs principaux de l’activité routière, l'amélioration de la sécurité routière passe nécessairement par l’analyse de leurs caractéristiques accidentologiques. Si les nouveaux véhicules sont développés en bureau d’étude et validés en laboratoire, c’est la réalité accidentologique qui permet de vraiment cerner comment ils se comportent en matière de sécurité secondaire, c’est-à-dire quelle sécurité ils offrent à leurs occupants lors d’un accident. C’est pourquoi les constructeurs souhaitent procéder au classement des générations de véhicules en fonction de leurs niveaux de sécurité secondaire réelle. Nous abordons cette thématique en exploitant les données nationales d’accidents corporels de la route appelées BAAC (Bulletin d’Analyse d’Accident Corporel de la Circulation). En complément de celles-ci, les données de parc automobile permettent d’associer une classe générationelle (CG) à chaque véhicule. Nous élaborons deux méthodes de classement de CGs en termes de sécurité secondaire. La première produit des classements contextuels, c’est-à-dire des classements de CGs plongées dans des contextes d’accident. La seconde produit des classements globaux, c’est-`a-dire des classements de CGs déterminés par rapport à une distribution de contextes d’accident. Pour le classement contextuel, nous procédons par “scoring” : nous cherchons une fonction de score qui associe un nombre réel à toute combinaison de CG et de contexte d’accident ; plus ce nombre est petit, plus la CG est sûre dans le contexte d’accident donné. La fonction de score optimale est estimée par “ensemble learning”, sous la forme d’une combinaison convexe optimale de fonctions de score produites par une librairie d’algorithmes de classement par scoring. Une inégalité oracle illustre les performances du méta-algorithme ainsi obtenu. Le classement global est également basé sur le principe de “scoring” : nous cherchons une fonction de score qui associe à toute CG un nombre réel ; plus ce nombre est petit, plus la CG est jugée sûre globalement. Des arguments causaux permettent d’adapter le méta-algorithme évoqué ci-dessus en s’affranchissant du contexte d’accident. Les résultats des deux méthodes de classement sont conformes aux attentes des experts. / Road safety is a world, European and French priority. Because light vehicles (or simply“vehicles”) are obviously one of the main actors of road activity, the improvement of roadsafety necessarily requires analyzing their characteristics in terms of traffic road accident(or simply “accident”). If the new vehicles are developed in engineering department and validated in laboratory, it is the reality of real-life accidents that ultimately characterizesthem in terms of secondary safety, ie, that demonstrates which level of security they offer to their occupants in case of an accident. This is why car makers want to rank generations of vehicles according to their real-life levels of safety. We address this problem by exploiting a French data set of accidents called BAAC (Bulletin d’Analyse d’Accident Corporel de la Circulation). In addition, fleet data are used to associate a generational class (GC) to each vehicle. We elaborate two methods of ranking of GCs in terms of secondary safety. The first one yields contextual rankings, ie, rankings of GCs in specified contexts of accident. The second one yields global rankings, ie, rankings of GCs determined relative to a distribution of contexts of accident. For the contextual ranking, we proceed by “scoring”: we look for a score function that associates a real number to any combination of GC and a context of accident; the smaller is this number, the safer is the GC in the given context. The optimal score function is estimated by “ensemble learning”, under the form of an optimal convex combination of scoring functions produced by a library of ranking algorithms by scoring. An oracle inequality illustrates the performance of the obtained meta-algorithm. The global ranking is also based on “scoring”: we look for a scoring function that associates any GC with a real number; the smaller is this number, the safer is the GC. Causal arguments are used to adapt the above meta-algorithm by averaging out the context. The results of the two ranking procedures are in line with the experts’ expectations.
44

Exploring ensemble learning techniques to optimize the reverse engineering of gene regulatory networks / Explorando técnicas de ensemble learning para otimizar a engenharia reversa de redes regulatórias genéticas

Recamonde-Mendoza, Mariana January 2014 (has links)
Nesta tese estamos especificamente interessados no problema de engenharia re- versa de redes regulatórias genéticas a partir de dados de pós-genômicos, um grande desafio na área de Bioinformática. Redes regulatórias genéticas são complexos cir- cuitos biológicos responsáveis pela regulação do nível de expressão dos genes, desem- penhando assim um papel fundamental no controle de inúmeros processos celulares, incluindo diferenciação celular, ciclo celular e metabolismo. Decifrar a estrutura destas redes é crucial para possibilitar uma maior compreensão à nível de sistema do desenvolvimento e comportamento dos organismos, e eventualmente esclarecer os mecanismos de doenças causados pela desregulação dos processos acima mencio- nados. Devido ao expressivo aumento da disponibilidade de dados experimentais de larga escala e da grande dimensão e complexidade dos sistemas biológicos, métodos computacionais têm sido ferramentas essenciais para viabilizar esta investigação. No entanto, seu desempenho ainda é bastante deteriorado por importantes desafios com- putacionais e biológicos impostos pelo cenário. Em particular, o ruído e esparsidade inerentes aos dados biológicos torna este problema de inferência de redes um difícil problema de otimização combinatória, para o qual métodos computacionais dispo- níveis falham em relação à exatidão e robustez das predições. Esta tese tem como objetivo investigar o uso de técnicas de ensemble learning como forma de superar as limitações existentes e otimizar o processo de inferência, explorando a diversidade entre um conjunto de modelos. Com este intuito, desenvolvemos métodos computa- cionais tanto para gerar redes diversificadas, como para combinar estas predições em uma solução única (solução ensemble ), e aplicamos esta abordagem a uma série de cenários com diferentes fontes de diversidade a fim de compreender o seu potencial neste contexto específico. Mostramos que as soluções propostas são competitivas com algoritmos tradicionais deste campo de pesquisa e que melhoram nossa capa- cidade de reconstruir com precisão as redes regulatórias genéticas. Os resultados obtidos para a inferência de redes de regulação transcricional e pós-transcricional, duas camadas adjacentes e complementares que compõem a rede de regulação glo- bal, tornam evidente a eficiência e robustez da nossa abordagem, encorajando a consolidação de ensemble learning como uma metodologia promissora para decifrar a estrutura de redes regulatórias genéticas. / In this thesis we are concerned about the reverse engineering of gene regulatory networks from post-genomic data, a major challenge in Bioinformatics research. Gene regulatory networks are intricate biological circuits responsible for govern- ing the expression levels (activity) of genes, thereby playing an important role in the control of many cellular processes, including cell differentiation, cell cycle and metabolism. Unveiling the structure of these networks is crucial to gain a systems- level understanding of organisms development and behavior, and eventually shed light on the mechanisms of diseases caused by the deregulation of these cellular pro- cesses. Due to the increasing availability of high-throughput experimental data and the large dimension and complexity of biological systems, computational methods have been essential tools in enabling this investigation. Nonetheless, their perfor- mance is much deteriorated by important computational and biological challenges posed by the scenario. In particular, the noisy and sparse features of biological data turn the network inference into a challenging combinatorial optimization prob- lem, to which current methods fail in respect to the accuracy and robustness of predictions. This thesis aims at investigating the use of ensemble learning tech- niques as means to overcome current limitations and enhance the inference process by exploiting the diversity among multiple inferred models. To this end, we develop computational methods both to generate diverse network predictions and to combine multiple predictions into an ensemble solution, and apply this approach to a number of scenarios with different sources of diversity in order to understand its potential in this specific context. We show that the proposed solutions are competitive with tra- ditional algorithms in the field and improve our capacity to accurately reconstruct gene regulatory networks. Results obtained for the inference of transcriptional and post-transcriptional regulatory networks, two adjacent and complementary layers of the overall gene regulatory network, evidence the efficiency and robustness of our approach, encouraging the consolidation of ensemble systems as a promising methodology to decipher the structure of gene regulatory networks.
45

Exploring ensemble learning techniques to optimize the reverse engineering of gene regulatory networks / Explorando técnicas de ensemble learning para otimizar a engenharia reversa de redes regulatórias genéticas

Recamonde-Mendoza, Mariana January 2014 (has links)
Nesta tese estamos especificamente interessados no problema de engenharia re- versa de redes regulatórias genéticas a partir de dados de pós-genômicos, um grande desafio na área de Bioinformática. Redes regulatórias genéticas são complexos cir- cuitos biológicos responsáveis pela regulação do nível de expressão dos genes, desem- penhando assim um papel fundamental no controle de inúmeros processos celulares, incluindo diferenciação celular, ciclo celular e metabolismo. Decifrar a estrutura destas redes é crucial para possibilitar uma maior compreensão à nível de sistema do desenvolvimento e comportamento dos organismos, e eventualmente esclarecer os mecanismos de doenças causados pela desregulação dos processos acima mencio- nados. Devido ao expressivo aumento da disponibilidade de dados experimentais de larga escala e da grande dimensão e complexidade dos sistemas biológicos, métodos computacionais têm sido ferramentas essenciais para viabilizar esta investigação. No entanto, seu desempenho ainda é bastante deteriorado por importantes desafios com- putacionais e biológicos impostos pelo cenário. Em particular, o ruído e esparsidade inerentes aos dados biológicos torna este problema de inferência de redes um difícil problema de otimização combinatória, para o qual métodos computacionais dispo- níveis falham em relação à exatidão e robustez das predições. Esta tese tem como objetivo investigar o uso de técnicas de ensemble learning como forma de superar as limitações existentes e otimizar o processo de inferência, explorando a diversidade entre um conjunto de modelos. Com este intuito, desenvolvemos métodos computa- cionais tanto para gerar redes diversificadas, como para combinar estas predições em uma solução única (solução ensemble ), e aplicamos esta abordagem a uma série de cenários com diferentes fontes de diversidade a fim de compreender o seu potencial neste contexto específico. Mostramos que as soluções propostas são competitivas com algoritmos tradicionais deste campo de pesquisa e que melhoram nossa capa- cidade de reconstruir com precisão as redes regulatórias genéticas. Os resultados obtidos para a inferência de redes de regulação transcricional e pós-transcricional, duas camadas adjacentes e complementares que compõem a rede de regulação glo- bal, tornam evidente a eficiência e robustez da nossa abordagem, encorajando a consolidação de ensemble learning como uma metodologia promissora para decifrar a estrutura de redes regulatórias genéticas. / In this thesis we are concerned about the reverse engineering of gene regulatory networks from post-genomic data, a major challenge in Bioinformatics research. Gene regulatory networks are intricate biological circuits responsible for govern- ing the expression levels (activity) of genes, thereby playing an important role in the control of many cellular processes, including cell differentiation, cell cycle and metabolism. Unveiling the structure of these networks is crucial to gain a systems- level understanding of organisms development and behavior, and eventually shed light on the mechanisms of diseases caused by the deregulation of these cellular pro- cesses. Due to the increasing availability of high-throughput experimental data and the large dimension and complexity of biological systems, computational methods have been essential tools in enabling this investigation. Nonetheless, their perfor- mance is much deteriorated by important computational and biological challenges posed by the scenario. In particular, the noisy and sparse features of biological data turn the network inference into a challenging combinatorial optimization prob- lem, to which current methods fail in respect to the accuracy and robustness of predictions. This thesis aims at investigating the use of ensemble learning tech- niques as means to overcome current limitations and enhance the inference process by exploiting the diversity among multiple inferred models. To this end, we develop computational methods both to generate diverse network predictions and to combine multiple predictions into an ensemble solution, and apply this approach to a number of scenarios with different sources of diversity in order to understand its potential in this specific context. We show that the proposed solutions are competitive with tra- ditional algorithms in the field and improve our capacity to accurately reconstruct gene regulatory networks. Results obtained for the inference of transcriptional and post-transcriptional regulatory networks, two adjacent and complementary layers of the overall gene regulatory network, evidence the efficiency and robustness of our approach, encouraging the consolidation of ensemble systems as a promising methodology to decipher the structure of gene regulatory networks.
46

A Mixture-of-Experts Approach for Gene Regulatory Network Inference

Shao, Borong January 2014 (has links)
Context. Gene regulatory network (GRN) inference is an important and challenging problem in bioinformatics. A variety of machine learning algorithms have been applied to increase the GRN inference accuracy. Ensemble learning methods are shown to yield a higher inference accuracy than individual algorithms. Objectives. We propose an ensemble GRN inference method, which is based on the principle of Mixture-of-Experts ensemble learning. The proposed method can quantitatively measure the accuracy of individual GRN inference algorithms at the network motifs level. Based on the accuracy of the individual algorithms at predicting different types of network motifs, weights are assigned to the individual algorithms so as to take advantages of their strengths and weaknesses. In this way, we can improve the accuracy of the ensemble prediction. Methods. The research methodology is controlled experiment. The independent variable is method. It has eight groups: five individual algorithms, the generic average ranking method used in the DREAM5 challenge, the proposed ensemble method including four types of network motifs and five types of network motifs. The dependent variable is GRN inference accuracy, measured by the area under the precision-recall curve (AUPR). The experiment has training and testing phases. In the training phase, we analyze the accuracy of five individual algorithms at the network motifs level to decide their weights. In the testing phase, the weights are used to combine predictions from the five individual algorithms to generate ensemble predictions. We compare the accuracy of the eight method groups on Escherichia coli microarray dataset using AUPR. Results. In the training phase, we obtain the AUPR values of the five individual algorithms at predicting each type of the network motifs. In the testing phase, we collect the AUPR values of the eight methods on predicting the GRN of the Escherichia coli microarray dataset. Each method group has a sample size of ten (ten AUPR values). Conclusions. Statistical tests on the experiment results show that the proposed method yields a significantly higher accuracy than the generic average ranking method. In addition, a new type of network motif is found in GRN, the inclusion of which can increase the accuracy of the proposed method significantly. / Genes are DNA molecules that control the biological traits and biochemical processes that comprise life. They interact with each other to realize the precise regulation of life activities. Biologists aim to understand the regulatory network among the genes, with the help of high-throughput techonologies, such as microarrays, RNA-seq, etc. These technologies produce large amount of gene expression data which contain useful information. Therefore, effective data mining is necessary to discover the information to promote biological research. Gene regulatory network (GRN) inference is to infer the gene interactions from gene expression data, such as microarray datasets. The inference results can be used to guide the direction of further experiments to discover or validate gene interactions. A variety of machine learning (data mining) methods have been proposed to solve this problem. In recent years, experiments have shown that ensemble learning methods achieve higher accuracy than the individual learning methods. Because the ensemble learning methods can take advantages of the strength of different individual methods and it is robust to different network structures. In this thesis, we propose an ensemble GRN inference method, which is based on the principle of the Mixture-of-Experts ensemble learning. By quantitatively measure the accuracy of individual methods at the network motifs level, the proposed method is able to take advantage of the complementarity among the individual methods. The proposed method yields a significantly higher accuracy than the generic average ranking method, which is the most accurate method out of 35 GRN inference methods in the DREAM5 challenge. / 0769607980
47

Exploring ensemble learning techniques to optimize the reverse engineering of gene regulatory networks / Explorando técnicas de ensemble learning para otimizar a engenharia reversa de redes regulatórias genéticas

Recamonde-Mendoza, Mariana January 2014 (has links)
Nesta tese estamos especificamente interessados no problema de engenharia re- versa de redes regulatórias genéticas a partir de dados de pós-genômicos, um grande desafio na área de Bioinformática. Redes regulatórias genéticas são complexos cir- cuitos biológicos responsáveis pela regulação do nível de expressão dos genes, desem- penhando assim um papel fundamental no controle de inúmeros processos celulares, incluindo diferenciação celular, ciclo celular e metabolismo. Decifrar a estrutura destas redes é crucial para possibilitar uma maior compreensão à nível de sistema do desenvolvimento e comportamento dos organismos, e eventualmente esclarecer os mecanismos de doenças causados pela desregulação dos processos acima mencio- nados. Devido ao expressivo aumento da disponibilidade de dados experimentais de larga escala e da grande dimensão e complexidade dos sistemas biológicos, métodos computacionais têm sido ferramentas essenciais para viabilizar esta investigação. No entanto, seu desempenho ainda é bastante deteriorado por importantes desafios com- putacionais e biológicos impostos pelo cenário. Em particular, o ruído e esparsidade inerentes aos dados biológicos torna este problema de inferência de redes um difícil problema de otimização combinatória, para o qual métodos computacionais dispo- níveis falham em relação à exatidão e robustez das predições. Esta tese tem como objetivo investigar o uso de técnicas de ensemble learning como forma de superar as limitações existentes e otimizar o processo de inferência, explorando a diversidade entre um conjunto de modelos. Com este intuito, desenvolvemos métodos computa- cionais tanto para gerar redes diversificadas, como para combinar estas predições em uma solução única (solução ensemble ), e aplicamos esta abordagem a uma série de cenários com diferentes fontes de diversidade a fim de compreender o seu potencial neste contexto específico. Mostramos que as soluções propostas são competitivas com algoritmos tradicionais deste campo de pesquisa e que melhoram nossa capa- cidade de reconstruir com precisão as redes regulatórias genéticas. Os resultados obtidos para a inferência de redes de regulação transcricional e pós-transcricional, duas camadas adjacentes e complementares que compõem a rede de regulação glo- bal, tornam evidente a eficiência e robustez da nossa abordagem, encorajando a consolidação de ensemble learning como uma metodologia promissora para decifrar a estrutura de redes regulatórias genéticas. / In this thesis we are concerned about the reverse engineering of gene regulatory networks from post-genomic data, a major challenge in Bioinformatics research. Gene regulatory networks are intricate biological circuits responsible for govern- ing the expression levels (activity) of genes, thereby playing an important role in the control of many cellular processes, including cell differentiation, cell cycle and metabolism. Unveiling the structure of these networks is crucial to gain a systems- level understanding of organisms development and behavior, and eventually shed light on the mechanisms of diseases caused by the deregulation of these cellular pro- cesses. Due to the increasing availability of high-throughput experimental data and the large dimension and complexity of biological systems, computational methods have been essential tools in enabling this investigation. Nonetheless, their perfor- mance is much deteriorated by important computational and biological challenges posed by the scenario. In particular, the noisy and sparse features of biological data turn the network inference into a challenging combinatorial optimization prob- lem, to which current methods fail in respect to the accuracy and robustness of predictions. This thesis aims at investigating the use of ensemble learning tech- niques as means to overcome current limitations and enhance the inference process by exploiting the diversity among multiple inferred models. To this end, we develop computational methods both to generate diverse network predictions and to combine multiple predictions into an ensemble solution, and apply this approach to a number of scenarios with different sources of diversity in order to understand its potential in this specific context. We show that the proposed solutions are competitive with tra- ditional algorithms in the field and improve our capacity to accurately reconstruct gene regulatory networks. Results obtained for the inference of transcriptional and post-transcriptional regulatory networks, two adjacent and complementary layers of the overall gene regulatory network, evidence the efficiency and robustness of our approach, encouraging the consolidation of ensemble systems as a promising methodology to decipher the structure of gene regulatory networks.
48

A Novel Ensemble Method using Signed and Unsigned Graph Convolutional Networks for Predicting Mechanisms of Action of Small Molecules from Gene Expression Data

Karim, Rashid Saadman 24 May 2022 (has links)
No description available.
49

Stronger Together? An Ensemble of CNNs for Deepfakes Detection / Starkare Tillsammans? En Ensemble av CNNs för att Identifiera Deepfakes

Gardner, Angelica January 2020 (has links)
Deepfakes technology is a face swap technique that enables anyone to replace faces in a video, with highly realistic results. Despite its usefulness, if used maliciously, this technique can have a significant impact on society, for instance, through the spreading of fake news or cyberbullying. This makes the ability of deepfakes detection a problem of utmost importance. In this paper, I tackle the problem of deepfakes detection by identifying deepfakes forgeries in video sequences. Inspired by the state-of-the-art, I study the ensembling of different machine learning solutions built on convolutional neural networks (CNNs) and use these models as objects for comparison between ensemble and single model performances. Existing work in the research field of deepfakes detection suggests that escalated challenges posed by modern deepfake videos make it increasingly difficult for detection methods. I evaluate that claim by testing the detection performance of four single CNN models as well as six stacked ensembles on three modern deepfakes datasets. I compare various ensemble approaches to combine single models and in what way their predictions should be incorporated into the ensemble output. The results I found was that the best approach for deepfakes detection is to create an ensemble, though, the ensemble approach plays a crucial role in the detection performance. The final proposed solution is an ensemble of all available single models which use the concept of soft (weighted) voting to combine its base-learners’ predictions. Results show that this proposed solution significantly improved deepfakes detection performance and substantially outperformed all single models.
50

MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels in Stacking Ensemble Learning

Ploshchik, Ilya January 2023 (has links)
Stacking, also known as stacked generalization, is a method of ensemble learning where multiple base models are trained on the same dataset, and their predictions are used as input for one or more metamodels in an extra layer. This technique can lead to improved performance compared to single layer ensembles, but often requires a time-consuming trial-and-error process. Therefore, the previously developed Visual Analytics system, StackGenVis, was designed to help users select the set of the most effective and diverse models and measure their predictive performance. However, StackGenVis was developed with only one metamodel: Logistic Regression. The focus of this Bachelor's thesis is to examine how alternative metamodels affect the performance of stacked ensembles through the use of a visualization tool called MetaStackVis. Our interactive tool facilitates visual examination of individual metamodels and metamodels' pairs based on their predictive probabilities (or confidence), various supported validation metrics, and their accuracy in predicting specific problematic data instances. The efficiency and effectiveness of MetaStackVis are demonstrated with an example based on a real healthcare dataset. The tool has also been evaluated through semi-structured interview sessions with Machine Learning and Visual Analytics experts. In addition to this thesis, we have written a short research paper explaining the design and implementation of MetaStackVis. However, this thesis  provides further insights into the topic explored in the paper by offering additional findings and in-depth analysis. Thus, it can be considered a supplementary source of information for readers who are interested in diving deeper into the subject.

Page generated in 0.486 seconds