Global ETD Search

31	Graph-based Regularization in Machine Learning: Discovering Driver Modules in Biological Networks Gao, Xi 01 January 2015 (has links) Curiosity of human nature drives us to explore the origins of what makes each of us different. From ancient legends and mythology, Mendel's law, Punnett square to modern genetic research, we carry on this old but eternal question. Thanks to technological revolution, today's scientists try to answer this question using easily measurable gene expression and other profiling data. However, the exploration can easily get lost in the data of growing volume, dimension, noise and complexity. This dissertation is aimed at developing new machine learning methods that take data from different classes as input, augment them with knowledge of feature relationships, and train classification models that serve two goals: 1) class prediction for previously unseen samples; 2) knowledge discovery of the underlying causes of class differences. Application of our methods in genetic studies can help scientist take advantage of existing biological networks, generate diagnosis with higher accuracy, and discover the driver networks behind the differences. We proposed three new graph-based regularization algorithms. Graph Connectivity Constrained AdaBoost algorithm combines a connectivity module, a deletion function, and a model retraining procedure with the AdaBoost classifier. Graph-regularized Linear Programming Support Vector Machine integrates penalty term based on submodular graph cut function into linear classifier's objective function. Proximal Graph LogisticBoost adds lasso and graph-based penalties into logistic risk function of an ensemble classifier. Results of tests of our models on simulated biological datasets show that the proposed methods are able to produce accurate, sparse classifiers, and can help discover true genetic differences between phenotypes. Machine Learning Graph-based Regularization SVM AdaBoost System Biology Artificial Intelligence and Robotics Bioinformatics Computer Sciences Microarrays Systems Biology Theory and Algorithms
32	Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes. Tangirala, Karthik January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / As genomes are sequenced, a major challenge is their annotation -- the identification of genes and regulatory elements, their locations and their functions. For years, it was believed that one gene corresponds to one protein, but the discovery of alternative splicing provided a mechanism for generating different gene transcripts (isoforms) from the same genomic sequence. In the recent years, it has become obvious that a large fraction of genes undergoes alternative splicing. Thus, understanding alternative splicing is a problem of great interest to biologists. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to produce accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider three sets of features for constructing views for the problem of predicting alternatively spliced exons: lengths of the exon of interest and its flanking introns, exonic splicing enhancers (a.k.a., ESE motifs) and intronic regulatory sequences (a.k.a., IRS motifs). Naive Bayes and Support Vector Machine (SVM) algorithms are used as based classifiers in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone. In addition to semi-supervised approaches, we also also study the usefulness of graph based transductive learning approaches for predicting alternatively spliced exons. Similar to the semi-supervised learning algorithms, transductive learning algorithms can make use of unlabeled data, together with labeled data, to produce labels for the unlabeled data. However, a classification model that could be used to classify new unlabeled data is not learned in this case. Experimental results show that graph based transductive approaches can make effective use of the unlabeled data. Alternative splicing Co training Semi supervised learning Transductive learning Graph based approach Bioinformatics (0715) Computer Science (0984)
33	Ensembles na classificação relacional / Ensembles in relational classification Llerena, Nils Ever Murrugarra 08 September 2011 (has links) Em diversos domínios, além das informações sobre os objetos ou entidades que os compõem, existem, também, informaçõoes a respeito das relações entre esses objetos. Alguns desses domínios são, por exemplo, as redes de co-autoria, e as páginas Web. Nesse sentido, é natural procurar por técnicas de classificação que levem em conta estas informações. Dentre essas técnicas estão as denominadas classificação baseada em grafos, que visam classificar os exemplos levando em conta as relações existentes entre eles. Este trabalho aborda o desenvolvimento de métodos para melhorar o desempenho de classificadores baseados em grafos utilizando estratégias de ensembles. Um classificador ensemble considera um conjunto de classificadores cujas predições individuais são combinadas de alguma forma. Este classificador normalmente apresenta um melhor desempenho do que seus classificadores individualmente. Assim, foram desenvolvidas três técnicas: a primeira para dados originalmente no formato proposicional e transformados para formato relacional baseado em grafo e a segunda e terceira para dados originalmente já no formato de grafo. A primeira técnica, inspirada no algoritmo de boosting, originou o algoritmo KNN Adaptativo Baseado em Grafos (A-KNN). A segunda ténica, inspirada no algoritmo de Bagging originou trê abordagens de Bagging Baseado em Grafos (BG). Finalmente, a terceira técnica, inspirada no algoritmo de Cross-Validated Committees, originou o Cross-Validated Committees Baseado em Grafos (CVCG). Os experimentos foram realizados em 38 conjuntos de dados, sendo 22 conjuntos proposicionais e 16 conjuntos no formato relacional. Na avaliação foi utilizado o esquema de 10-fold stratified cross-validation e para determinar diferenças estatísticas entre classificadores foi utilizado o método proposto por Demsar (2006). Em relação aos resultados, as três técnicas melhoraram ou mantiveram o desempenho dos classificadores bases. Concluindo, ensembles aplicados em classificadores baseados em grafos apresentam bons resultados no desempenho destes / In many fields, besides information about the objects or entities that compose them, there is also information about the relationships between objects. Some of these fields are, for example, co-authorship networks and Web pages. Therefore, it is natural to search for classification techniques that take into account this information. Among these techniques are the so-called graphbased classification, which seek to classify examples taking into account the relationships between them. This paper presents the development of methods to improve the performance of graph-based classifiers by using strategies of ensembles. An ensemble classifier considers a set of classifiers whose individual predictions are combined in some way. This combined classifier usually performs better than its individual classifiers. Three techniques have been developed: the first applied for originally propositional data transformed to relational format based on graphs and the second and the third applied for data originally in graph format. The first technique, inspired by the boosting algorithm originated the Adaptive Graph-Based K-Nearest Neighbor (A-KNN). The second technique, inspired by the bagging algorithm led to three approaches of Graph-Based Bagging (BG). Finally the third technique, inspired by the Cross- Validated Committees algorithm led to the Graph-Based Cross-Validated Committees (CVCG). The experiments were performed on 38 data sets, 22 datasets in propositional format and 16 in relational format. Evaluation was performed using the scheme of 10-fold stratified cross-validation and to determine statistical differences between the classifiers it was used the method proposed by Demsar (2006). Regarding the results, these three techniques improved or at least maintain the performance of the base classifiers. In conclusion, ensembles applied to graph-based classifiers have good results in the performance of them Aprendizado de máquina Bagging Bagging Boosting Boosting Classificadores baseados em grafos Ensembles Ensembles Graph-based classifiers Machine learning
34	Interactive 3D segmentation repair with image-foresting transform, supervoxels and seed robustness / Reparação interativa de segmentações 3D com transformada imagem-floresta, supervoxels, robustez de sementes Tavares, Anderson Carlos Moreira 02 June 2017 (has links) Image segmentation consists on its partition into relevant regions, such as to isolate the pixels belonging to desired objects in the image domain, which is an important step for computer vision, medical image processing, and other applications. Many times automatic segmentation generates results with imperfections. The user can correct them by editing manually, interactively or can simply discard the segmentation and try to automatically generate another result by a different method. Interactive methods combine benefits from manual and automatic ones, reducing user effort and using its high-level knowledge. In seed-based methods, to continue or repair a prior segmentation (presegmentation), avoiding the user to start from scratch, it is necessary to solve the Reverse Interactive Segmentation Problem (RISP), that is, how to automatically estimate the seeds that would generate it. In order to achieve this goal, we first divide the segmented object into its composing cores. Inside a core, two seeds separately always produce the same result, making one redundant. With this, only one seed per core is required. Cores leading to segmentations which are contained in the result of other cores are redundant and can also be discarded, further reducing the seed set, a process called Redundancy Analysis. A minimal set of seeds for presegmentation is generated and the problem of interactive repair can be solved by adding new seeds or removing seeds. Within the framework of the Image-Foresting Transform (IFT), new methods such as Oriented Image-Foresting Transform (OIFT) and Oriented Relative Fuzzy Connectedness (ORFC) were developed. However, there were no known algorithms for computing the core of these methods. This work develops such algorithms, with proof of correctness. The cores also give an indication of the degree of robustness of the methods on the positioning of the seeds. Therefore, a hybrid method that combines GraphCut and the ORFC cores, as well as the Robustness Coefficient (RC), have been developed. In this work, we present another developed solution to repair segmentations, which is based on IFT-SLIC, originally used to generate supervoxels. Experimental results analyze, compare and demonstrate the potential of these solutions. / Segmentação de imagem consiste no seu particionamento em regiões, tal como para isolar os pixels pertencentes a objetos de interesse em uma imagem, sendo uma etapa importante para visão computacional, processamento de imagens médicas e outras aplicações. Muitas vezes a segmentação automática gera resultados com imperfeições. O usuário pode corrigi-las editando-a manualmente, interativamente ou simplesmente descartar o resultado e gerar outro automaticamente. Métodos interativos combinam os benefícios dos métodos manuais e automáticos, reduzindo o esforço do usuário e utilizando seu conhecimento de alto nível. Nos métodos baseados em sementes, para continuar ou reparar uma segmentação prévia (presegmentação), evitando o usuário começar do zero, é necessário resolver o Problema da Segmentação Interativa Reversa (RISP), ou seja, estimar automaticamente as sementes que o gerariam. Para isso, este trabalho particiona o objeto da segmentação em núcleos. Em um núcleo, duas sementes separadamente produzem o mesmo resultado, tornando uma delas redundante. Com isso, apenas uma semente por núcleo é necessária. Núcleos contidos nos resultados de outros núcleos são redundantes e também podem ser descartados, reduzindo ainda mais o conjunto de sementes, um processo denominado Análise de Redundância. Um conjunto mínimo de sementes para a presegmentação é gerado e o problema da reparação interativa pode então ser resolvido através da adição de novas sementes ou remoção. Dentro do arcabouço da Transformada Imagem-Floresta (IFT), novos métodos como Oriented Image-Foresting Transform (OIFT) e Oriented Relative Fuzzy Connectedness (ORFC) foram desenvolvidos. Todavia, não há algoritmos para calcular o núcleo destes métodos. Este trabalho desenvolve tais algoritmos, com prova de corretude. Os núcleos também nos fornecem uma indicação do grau de robustez dos métodos sobre o posicionamento das sementes. Por isso, um método híbrido do GraphCut com o núcleo do ORFC, bem como um Coeficiente de Robustez (RC), foram desenvolvidos. Neste trabalho também foi desenvolvida outra solução para reparar segmentações, a qual é baseada em IFT-SLIC, originalmente utilizada para gerar supervoxels. Resultados experimentais analisam, comparam e demonstram o potencial destas soluções. Graph-based segmentation Image-foresting transform Robustez de sementes Seed robustness Segmentação baseada em grafos Supervoxels Supervoxels Transformada imagem-floresta
35	A Machine Learning Approach to Artificial Floorplan Generation Goodman, Genghis 01 January 2019 (has links) The process of designing a floorplan is highly iterative and requires extensive human labor. Currently, there are a number of computer programs that aid humans in floorplan design. These programs, however, are limited in their inability to fully automate the creative process. Such automation would allow a professional to quickly generate many possible floorplan solutions, greatly expediting the process. However, automating this creative process is very difficult because of the many implicit and explicit rules a model must learn in order create viable floorplans. In this paper, we propose a method of floorplan generation using two machine learning models: a sequential model that generates rooms within the floorplan, and a graph-based model that finds adjacencies between generated rooms. Each of these models can be altered such that they are each capable of producing a floorplan independently; however, we find that the combination of these models outperforms each of its pieces, as well as a statistic-based approach. Automated Floorplan Design Creative Artificial Intelligence Generative Machine Learning Models Sequence-based Models Graph-based Models Artificial Intelligence and Robotics
36	Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings Kilinc, Ismail Ozsel 30 November 2017 (has links) Machine learning has been immensely successful in supervised learning with outstanding examples in major industrial applications such as voice and image recognition. Following these developments, the most recent research has now begun to focus primarily on algorithms which can exploit very large sets of unlabeled examples to reduce the amount of manually labeled data required for existing models to perform well. In this dissertation, we propose graph-based latent embedding/annotation/representation learning techniques in neural networks tailored for semi-supervised and unsupervised learning problems. Specifically, we propose a novel regularization technique called Graph-based Activity Regularization (GAR) and a novel output layer modification called Auto-clustering Output Layer (ACOL) which can be used separately or collaboratively to develop scalable and efficient learning frameworks for semi-supervised and unsupervised settings. First, singularly using the GAR technique, we develop a framework providing an effective and scalable graph-based solution for semi-supervised settings in which there exists a large number of observations but a small subset with ground-truth labels. The proposed approach is natural for the classification framework on neural networks as it requires no additional task calculating the reconstruction error (as in autoencoder based methods) or implementing zero-sum game mechanism (as in adversarial training based methods). We demonstrate that GAR effectively and accurately propagates the available labels to unlabeled examples. Our results show comparable performance with state-of-the-art generative approaches for this setting using an easier-to-train framework. Second, we explore a different type of semi-supervised setting where a coarse level of labeling is available for all the observations but the model has to learn a fine, deeper level of latent annotations for each one. Problems in this setting are likely to be encountered in many domains such as text categorization, protein function prediction, image classification as well as in exploratory scientific studies such as medical and genomics research. We consider this setting as simultaneously performed supervised classification (per the available coarse labels) and unsupervised clustering (within each one of the coarse labels) and propose a novel framework combining GAR with ACOL, which enables the network to perform concurrent classification and clustering. We demonstrate how the coarse label supervision impacts performance and the classification task actually helps propagate useful clustering information between sub-classes. Comparative tests on the most popular image datasets rigorously demonstrate the effectiveness and competitiveness of the proposed approach. The third and final setup builds on the prior framework to unlock fully unsupervised learning where we propose to substitute real, yet unavailable, parent- class information with pseudo class labels. In this novel unsupervised clustering approach the network can exploit hidden information indirectly introduced through a pseudo classification objective. We train an ACOL network through this pseudo supervision together with unsupervised objective based on GAR and ultimately obtain a k-means friendly latent representation. Furthermore, we demonstrate how the chosen transformation type impacts performance and helps propagate the latent information that is useful in revealing unknown clusters. Our results show state-of-the-art performance for unsupervised clustering tasks on MNIST, SVHN and USPS datasets with the highest accuracies reported to date in the literature. Machine Learning Deep Learning Graph-based Regularization Clustering Auto-clustering Output Layer Artificial Intelligence and Robotics Electrical and Computer Engineering
37	A Hybrid Veideo Recommendation System Based On A Graph Based Algorithm Ozturk, Gizem 01 September 2010 (has links) (PDF) This thesis proposes the design, development and evaluation of a hybrid video recommendation system. The proposed hybrid video recommendation system is based on a graph algorithm called Adsorption. Adsorption is a collaborative filtering algorithm in which relations between users are used to make recommendations. Adsorption is used to generate the base recommendation list. In order to overcome the problems that occur in pure collaborative system, content based filtering is injected. Content based filtering uses the idea of suggesting similar items that matches user preferences. In order to use content based filtering, first, the base recommendation list is updated by removing weak recommendations. Following this, item similarities of the remaining list are calculated and new items are inserted to form the final recommendations. Thus, collaborative recommendations are empowered considering item similarities. Therefore, the developed hybrid system combines both collaborative and content based approaches to produce more effective suggestions. QA Computer Software 76.75-76.765
38	Transforming Mission Space Models To Executable Simulation Models Ozhan, Gurkan 01 September 2011 (has links) (PDF) This thesis presents a two step automatic transformation of Field Artillery Mission Space Conceptual Models (ACMs) into High Level Architecture (HLA) Federation Architecture Models (FAMs) into executable distributed simulation code. The approach followed in the course of this thesis adheres to the Model-Driven Engineering (MDE) philosophy. Both ACMs and FAMs are formally defined conforming to their metamodels, ACMM and FAMM, respectively. ACMM is comprised of a behavioral component, based on Live Sequence Charts (LSCs), and a data component based on UML class diagrams. Using ACMM, the Adjustment Followed by Fire For Effect (AdjFFE) mission, which serves as the source model for the model transformation case study, is constructed. The ACM to FAM transformation, which is defined over metamodel-level graph patterns, is carried out with the Graph Rewriting and Transformation (GReAT) tool. Code generation from a FAM is accomplished by employing a model interpreter that produces Java/AspectJ code. The resulting code can then be executed on an HLA Run-Time Infrastructure (RTI). Bringing a fully fledged transformation approach to conceptual modeling is a distinguishing feature of this thesis. This thesis also aims to bring the chart notations to the attention of the mission space modeling community regarding the description of military tasks, particularly their communication aspect. With the experience gained, a set of guidelines for a domainindependent transformer from any metamodel-based conceptual model to FAM is offered. QA Computer Software 76.75-76.765
39	Historical handwriting representation model dedicated to word spotting application / Modèle de représentation des écritures pour la recherche de mots par similarité dans les documents manuscrits du patrimoine Wang, Peng 18 November 2014 (has links) L’objectif du travail de thèse est de proposer un modèle de représentation des écritures dans les images de documents du patrimoine sans recourir à une transcription des textes. Ce modèle, issu d’une étude très complète des méthodes actuelles de caractérisation des écritures, est à la base d’une proposition de scénario de recherche par similarité de mots, indépendante du scripteur et ne nécessitant pas d’apprentissage. La recherche par similarité proposée repose sur une structure de graphes intégrant des informations sur la topologie, la morphologie locale des mots et sur le contexte extrait du voisinage de chaque point d’intérêt. Un graphe est construit à partir du squelette décrit en chaque point sommet par le contexte de formes, descripteur riche et compact. L’extraction de mots est assurée par une première étape de localisation grossière de régions candidates, décrites par une séquence déduite d’une représentation par graphes liée à des critères topologiques de voisinage. L’appariement entre mots repose ensuite sur une distance dynamique et un usage adapté du coût d’édition approximé entre graphes rendant compte de la nature bi-dimensionnelle de l’écriture. L’approche a été conçue pour être robuste aux distorsions de l’écriture et aux changements de scripteurs. Les expérimentations sont réalisées sur des bases de documents manuscrits patrimoniaux exploitées dans les compétitions de word-spotting. Les performances illustrent la pertinence de la proposition et ouvrent des voies nouvelles d’investigation dans des domaines d’applications autour de la reconnaissance de symboles et d’écritures iconographiques / As more and more documents, especially historical handwritten documents, are converted into digitized version for long-term preservation, the demands for efficient information retrieval techniques in such document images are increasing. The objective of this research is to establish an effective representation model for handwriting, especially historical manuscripts. The proposed model is supposed to help the navigation in historical document collections. Specifically speaking, we developed our handwriting representation model with regards to word spotting application. As a specific pattern recognition task, handwritten word spotting faces many challenges such as the high intra-writer and inter-writer variability. Nowadays, it has been admitted that OCR techniques are unsuccessful in handwritten offline documents, especially historical ones. Therefore, the particular characterization and comparison methods dedicated to handwritten word spotting are strongly required. In this work, we explore several techniques that allow the retrieval of singlestyle handwritten document images with query image. The proposed representation model contains two facets of handwriting, morphology and topology. Based on the skeleton of handwriting, graphs are constructed with the structural points as the vertexes and the strokes as the edges. By signing the Shape Context descriptor as the label of vertex, the contextual information of handwriting is also integrated. Moreover, we develop a coarse-to-fine system for the large-scale handwritten word spotting using our representation model. In the coarse selection, graph embedding is adapted with consideration of simple and fast computation. With selected regions of interest, in the fine selection, a specific similarity measure based on graph edit distance is designed. Regarding the importance of the order of handwriting, dynamic time warping assignment with block merging is added. The experimental results using benchmark handwriting datasets demonstrate the power of the proposed representation model and the efficiency of the developed word spotting approach. The main contribution of this work is the proposed graph-based representation model, which realizes a comprehensive description of handwriting, especially historical script. Our structure-based model captures the essential characteristics of handwriting without redundancy, and meanwhile is robust to the intra-variation of handwriting and specific noises. With additional experiments, we have also proved the potential of the proposed representation model in other symbol recognition applications, such as handwritten musical and architectural classification Modèle de représentation Reconnaissance de mots Recherche par similarité Contexte de forme Comprehensive representation model Word spotting Graph-based Shape context
40	A Lab System for Secret Sharing / Utveckling av laborationssystem för secret sharing Olsson, Fredrik January 2004 (has links) Finnegan Lab System is a graphical computer program for learning how secret sharing works. With its focus on the algorithms and the data streams, the user does not have to consider machine-specific low-level details. It is highly modularised and is not restricted to secret sharing, but can easily be extended with new functions, such as building blocks for Feistel networks or signal processing. This thesis describes what secret sharing is, the development of a new lab system designed for secret sharing and how it can be used. Informationsteknik Cryptography Graph-Based Application Framework Java Key Management Secret Sharing Informationsteknik Computer and Information Sciences Data- och informationsvetenskap

Search results