Global ETD Search

11	Machine learning for epigenetics : algorithms for next generation sequencing data Mayo, Thomas Richard January 2018 (has links) The advent of Next Generation Sequencing (NGS), a little over a decade ago, has led to a vast and rapid increase in the generation of genomic data. The drastically reduced cost has in turn enabled powerful modifications that can be used to investigate not just genetic, but epigenetic, phenomena. Epigenetics refers to the study of mechanisms effecting gene expression other than the genetic code itself and thus, at the transcription level, incorporates DNA methylation, transcription factor binding and histone modifications amongst others. This thesis outlines and tackles two major challenges in the computational analysis of such data using techniques from machine learning. Firstly, I address the problem of testing for differential methylation between groups of bisulfite sequencing data sets. DNA methylation plays an important role in genomic imprinting, X-chromosome inactivation and the repression of repetitive elements, as well as being implicated in numerous diseases, such as cancer. Bisulfite sequencing provides single nucleotide resolution methylation data at the whole genome scale, but a sensitive analysis of such data is difficult. I propose a solution that uses a powerful kernel-based machine learning technique, the Maximum Mean Discrepancy, to leverage well-characterised spatial correlations in DNA methylation, and adapt the method for this particular use. I use this tailored method to analyse a novel data set from a study of ageing in three different tissues in the mouse. This study motivates further modifications to the method and highlights the utility of the underlying measure as an exploratory tool for methylation analysis. Secondly, I address the problem of predictive and explanatory modelling of chromatin immunoprecipitation sequencing data (ChIP-Seq). ChIP-Seq is typically used to assay the binding of a protein of interest, such as a transcription factor or histone, to the DNA, and as such is one of the most widely used sequencing assays. While peak callers are a powerful tool in identifying binding sites of sparse and clean ChIPSeq profiles, more broad signals defy analysis in this framework. Instead, generative models that explain the data in terms of the underlying sequence can help uncover mechanisms that predicting binding or the lack thereof. I explore current problems with ChIP-Seq analysis, such as zero-inflation and the use of the control experiment, known as the input. I then devise a method for representing k-mers that enables the use of longer DNA sub-sequences within a flexible model development framework, such as generalised linear models, without heavy programming requirements. Finally, I use these insights to develop an appropriate Bayesian generative model that predicts ChIP-Seq count data in terms of the underlying DNA sequence, incorporating DNA methylation information where available, fitting the model with the Expectation-Maximization algorithm. The model is tested on simulated data and real data pertaining to the histone mark H3k27me3. This thesis therefore straddles the fields of bioinformatics and machine learning. Bioinformatics is both plagued and blessed by the plethora of different techniques available for gathering data and their continual innovations. Each technique presents a unique challenge, and hence out-of-the-box machine learning techniques have had little success in solving biological problems. While I have focused on NGS data, the methods developed in this thesis are likely to be applicable to future technologies, such as Third Generation Sequencing methods, and the lessons learned in their adaptation will be informative for the next wave of computational challenges.
12	USING MACHINE LEARNING TECHNIQUES TO IMPROVE STATIC CODE ANALYSIS TOOLS USEFULNESS Enas Ahmad Alikhashashneh (7013450) 16 October 2019 (has links) <p>This dissertation proposes an approach to reduce the cost of manual inspections for as large a number of false positive warnings that are being reported by Static Code Analysis (SCA) tools as much as possible using Machine Learning (ML) techniques. The proposed approach neither assume to use the particular SCA tools nor depends on the specific programming language used to write the target source code or the application. To reduce the number of false positive warnings we first evaluated a number of SCA tools in terms of software engineering metrics using a highlighted synthetic source code named the Juliet test suite. From this evaluation, we concluded that the SCA tools report plenty of false positive warnings that need a manual inspection. Then we generated a number of datasets from the source code that forced the SCA tool to generate either true positive, false positive, or false negative warnings. The datasets, then, were used to train four of ML classifiers in order to classify the collected warnings from the synthetic source code. From the experimental results of the ML classifiers, we observed that the classifier that built using the Random Forests</p> <p>(RF) technique outperformed the rest of the classifiers. Lastly, using this classifier and an instance-based transfer learning technique, we ranked a number of warnings that were aggregated from various open-source software projects. The experimental results show that the proposed approach to reduce the cost of the manual inspection of the false positive warnings outperformed the random ranking algorithm and was highly correlated with the ranked list that the optimal ranking algorithm generated.</p> Software Engineering
13	Constructivist Pedagogical Approaches in Higher Education: A Qualitative Case Study ofStudents and their Learning Experiences in a Collaborative Learning Space Njai, Samuel 10 September 2021 (has links) No description available. Education Adult Education Educational Technology Instructional Design Learning theory collaborative learning flexible learning environments learner-centered approach 21st century skills active learning technique learning resources diversity and inclusion.
14	Classificadores e aprendizado em processamento de imagens e visão computacional / Classifiers and machine learning techniques for image processing and computer vision Rocha, Anderson de Rezende, 1980- 03 March 2009 (has links) Orientador: Siome Klein Goldenstein / Tese (doutorado) - Universidade Estadual de Campinas, Instituto da Computação / Made available in DSpace on 2018-08-12T17:37:15Z (GMT). No. of bitstreams: 1 Rocha_AndersondeRezende_D.pdf: 10303487 bytes, checksum: 243dccfe5255c828ce7ead27c27eb1cd (MD5) Previous issue date: 2009 / Resumo: Neste trabalho de doutorado, propomos a utilizaçãoo de classificadores e técnicas de aprendizado de maquina para extrair informações relevantes de um conjunto de dados (e.g., imagens) para solução de alguns problemas em Processamento de Imagens e Visão Computacional. Os problemas de nosso interesse são: categorização de imagens em duas ou mais classes, detecçãao de mensagens escondidas, distinção entre imagens digitalmente adulteradas e imagens naturais, autenticação, multi-classificação, entre outros. Inicialmente, apresentamos uma revisão comparativa e crítica do estado da arte em análise forense de imagens e detecção de mensagens escondidas em imagens. Nosso objetivo é mostrar as potencialidades das técnicas existentes e, mais importante, apontar suas limitações. Com esse estudo, mostramos que boa parte dos problemas nessa área apontam para dois pontos em comum: a seleção de características e as técnicas de aprendizado a serem utilizadas. Nesse estudo, também discutimos questões legais associadas a análise forense de imagens como, por exemplo, o uso de fotografias digitais por criminosos. Em seguida, introduzimos uma técnica para análise forense de imagens testada no contexto de detecção de mensagens escondidas e de classificação geral de imagens em categorias como indoors, outdoors, geradas em computador e obras de arte. Ao estudarmos esse problema de multi-classificação, surgem algumas questões: como resolver um problema multi-classe de modo a poder combinar, por exemplo, caracteríisticas de classificação de imagens baseadas em cor, textura, forma e silhueta, sem nos preocuparmos demasiadamente em como normalizar o vetor-comum de caracteristicas gerado? Como utilizar diversos classificadores diferentes, cada um, especializado e melhor configurado para um conjunto de caracteristicas ou classes em confusão? Nesse sentido, apresentamos, uma tecnica para fusão de classificadores e caracteristicas no cenário multi-classe através da combinação de classificadores binários. Nós validamos nossa abordagem numa aplicação real para classificação automática de frutas e legumes. Finalmente, nos deparamos com mais um problema interessante: como tornar a utilização de poderosos classificadores binarios no contexto multi-classe mais eficiente e eficaz? Assim, introduzimos uma tecnica para combinação de classificadores binarios (chamados classificadores base) para a resolução de problemas no contexto geral de multi-classificação. / Abstract: In this work, we propose the use of classifiers and machine learning techniques to extract useful information from data sets (e.g., images) to solve important problems in Image Processing and Computer Vision. We are particularly interested in: two and multi-class image categorization, hidden messages detection, discrimination among natural and forged images, authentication, and multiclassification. To start with, we present a comparative survey of the state-of-the-art in digital image forensics as well as hidden messages detection. Our objective is to show the importance of the existing solutions and discuss their limitations. In this study, we show that most of these techniques strive to solve two common problems in Machine Learning: the feature selection and the classification techniques to be used. Furthermore, we discuss the legal and ethical aspects of image forensics analysis, such as, the use of digital images by criminals. We introduce a technique for image forensics analysis in the context of hidden messages detection and image classification in categories such as indoors, outdoors, computer generated, and art works. From this multi-class classification, we found some important questions: how to solve a multi-class problem in order to combine, for instance, several different features such as color, texture, shape, and silhouette without worrying about the pre-processing and normalization of the combined feature vector? How to take advantage of different classifiers, each one custom tailored to a specific set of classes in confusion? To cope with most of these problems, we present a feature and classifier fusion technique based on combinations of binary classifiers. We validate our solution with a real application for automatic produce classification. Finally, we address another interesting problem: how to combine powerful binary classifiers in the multi-class scenario more effectively? How to boost their efficiency? In this context, we present a solution that boosts the efficiency and effectiveness of multi-class from binary techniques. / Doutorado / Engenharia de Computação / Doutor em Ciência da Computação Aprendizado de máquina - Técnica Análise forense de imagem Esteganalise Fusão de caracteristicas Fusão de classificadores Classificação multi-classe Categorização de imagens Machine learning - Technique Forensic image analysis Steganalysis Feature fusion Classifier fusion Multi-class classification Image categorization
15	Computational protein design: assessment and applications Li, Zhixiu January 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Computational protein design aims at designing amino acid sequences that can fold into a target structure and perform a desired function. Many computational design methods have been developed and their applications have been successful during past two decades. However, the success rate of protein design remains too low to be of a useful tool by biochemists whom are not an expert of computational biology. In this dissertation, we first developed novel computational assessment techniques to assess several state-of-the-art computational techniques. We found that significant progresses were made in several important measures by two new scoring functions from RosettaDesign and from OSCAR-design, respectively. We also developed the first machine-learning technique called SPIN that predicts a sequence profile compatible to a given structure with a novel nonlocal energy-based feature. The accuracy of predicted sequences is comparable to RosettaDesign in term of sequence identity to wild type sequences. In the last two application chapters, we have designed self-inhibitory peptides of Escherichia coli methionine aminopeptidase (EcMetAP) and de novo designed barstar. Several peptides were confirmed inhibition of EcMetAP at the micromole-range 50% inhibitory concentration. Meanwhile, the assessment of designed barstar sequences indicates the improvement of OSCAR-design over RosettaDesign. Computational protein design Energy function Machine learning Self-inhibitory peptide Sequence profile Inhibitor Protein engineering Protein engineering -- Methods Proteins -- Conformation Protein folding Computational biology Computational biology Computational biology -- Methods Machine learning -- Technique
16	Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries / Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries Teng, Sin Yong January 2020 (has links) S tím, jak se neustále vyvíjejí nové technologie pro energeticky náročná průmyslová odvětví, stávající zařízení postupně zaostávají v efektivitě a produktivitě. Tvrdá konkurence na trhu a legislativa v oblasti životního prostředí nutí tato tradiční zařízení k ukončení provozu a k odstavení. Zlepšování procesu a projekty modernizace jsou zásadní v udržování provozních výkonů těchto zařízení. Současné přístupy pro zlepšování procesů jsou hlavně: integrace procesů, optimalizace procesů a intenzifikace procesů. Obecně se v těchto oblastech využívá matematické optimalizace, zkušeností řešitele a provozní heuristiky. Tyto přístupy slouží jako základ pro zlepšování procesů. Avšak, jejich výkon lze dále zlepšit pomocí moderní výpočtové inteligence. Účelem této práce je tudíž aplikace pokročilých technik umělé inteligence a strojového učení za účelem zlepšování procesů v energeticky náročných průmyslových procesech. V této práci je využit přístup, který řeší tento problém simulací průmyslových systémů a přispívá následujícím: (i)Aplikace techniky strojového učení, která zahrnuje jednorázové učení a neuro-evoluci pro modelování a optimalizaci jednotlivých jednotek na základě dat. (ii) Aplikace redukce dimenze (např. Analýza hlavních komponent, autoendkodér) pro vícekriteriální optimalizaci procesu s více jednotkami. (iii) Návrh nového nástroje pro analýzu problematických částí systému za účelem jejich odstranění (bottleneck tree analysis – BOTA). Bylo také navrženo rozšíření nástroje, které umožňuje řešit vícerozměrné problémy pomocí přístupu založeného na datech. (iv) Prokázání účinnosti simulací Monte-Carlo, neuronové sítě a rozhodovacích stromů pro rozhodování při integraci nové technologie procesu do stávajících procesů. (v) Porovnání techniky HTM (Hierarchical Temporal Memory) a duální optimalizace s několika prediktivními nástroji pro podporu managementu provozu v reálném čase. (vi) Implementace umělé neuronové sítě v rámci rozhraní pro konvenční procesní graf (P-graf). (vii) Zdůraznění budoucnosti umělé inteligence a procesního inženýrství v biosystémech prostřednictvím komerčně založeného paradigmatu multi-omics.

Page generated in 0.0593 seconds