Spelling suggestions: "subject:"bipartite"" "subject:"ripartite""
81 |
A Combinatorial Algorithm for Minimizing the Maximum Laplacian Eigenvalue of Weighted Bipartite GraphsHelmberg, Christoph, Rocha, Israel, Schwerdtfeger, Uwe 13 November 2015 (has links) (PDF)
We give a strongly polynomial time combinatorial algorithm to minimise the largest eigenvalue of the weighted Laplacian of a bipartite graph. This is accomplished by solving the dual graph embedding problem which arises from a semidefinite programming formulation. In particular, the problem for trees can be solved in time cubic in the number of vertices.
|
82 |
Codes, graphs and designs from maximal subgroups of alternating groupsMumba, Nephtale Bvalamanja January 2018 (has links)
Philosophiae Doctor - PhD (Mathematics) / The main theme of this thesis is the construction of linear codes from adjacency matrices or sub-matrices of adjacency matrices of regular graphs. We first examine the binary codes from the row span of biadjacency matrices and their transposes for some classes of bipartite graphs. In this case we consider a sub-matrix of an adjacency matrix of a graph as the generator of the code. We then shift our attention to uniform subset graphs by exploring the automorphism groups of graph covers and some classes of uniform subset graphs. In the sequel, we explore equal codes from adjacency matrices of non-isomorphic uniform subset graphs and finally consider codes generated by an adjacency matrix formed by adding adjacency matrices of two classes of uniform subset graphs.
|
83 |
Propagação em grafos bipartidos para extração de tópicos em fluxo de documentos textuais / Propagation in bipartite graphs for topic extraction in stream of textual dataThiago de Paulo Faleiros 08 June 2016 (has links)
Tratar grandes quantidades de dados é uma exigência dos modernos algoritmos de mineração de texto. Para algumas aplicações, documentos são constantemente publicados, o que demanda alto custo de armazenamento em longo prazo. Então, é necessário criar métodos de fácil adaptação para uma abordagem que considere documentos em fluxo, e que analise os dados em apenas um passo sem requerer alto custo de armazenamento. Outra exigência é a de que essa abordagem possa explorar heurísticas a fim de melhorar a qualidade dos resultados. Diversos modelos para a extração automática das informações latentes de uma coleção de documentos foram propostas na literatura, dentre eles destacando-se os modelos probabilísticos de tópicos. Modelos probabilísticos de tópicos apresentaram bons resultados práticos, sendo estendidos para diversos modelos com diversos tipos de informações inclusas. Entretanto, descrever corretamente esses modelos, derivá-los e em seguida obter o apropriado algoritmo de inferência são tarefas difíceis, exigindo um tratamento matemático rigoroso para as descrições das operações efetuadas no processo de descoberta das dimensões latentes. Assim, para a elaboração de um método simples e eficiente para resolver o problema da descoberta das dimensões latentes, é necessário uma apropriada representação dos dados. A hipótese desta tese é a de que, usando a representação de documentos em grafos bipartidos, é possível endereçar problemas de aprendizado de máquinas, para a descoberta de padrões latentes em relações entre objetos, por exemplo nas relações entre documentos e palavras, de forma simples e intuitiva. Para validar essa hipótese, foi desenvolvido um arcabouço baseado no algoritmo de propagação de rótulos utilizando a representação em grafos bipartidos. O arcabouço, denominado PBG (Propagation in Bipartite Graph), foi aplicado inicialmente para o contexto não supervisionado, considerando uma coleção estática de documentos. Em seguida, foi proposta uma versão semissupervisionada, que considera uma pequena quantidade de documentos rotulados para a tarefa de classificação transdutiva. E por fim, foi aplicado no contexto dinâmico, onde se considerou fluxo de documentos textuais. Análises comparativas foram realizadas, sendo que os resultados indicaram que o PBG é uma alternativa viável e competitiva para tarefas nos contextos não supervisionado e semissupervisionado. / Handling large amounts of data is a requirement for modern text mining algorithms. For some applications, documents are published constantly, which demand a high cost for long-term storage. So it is necessary easily adaptable methods for an approach that considers documents flow, and be capable of analyzing the data in one step without requiring the high cost of storage. Another requirement is that this approach can exploit heuristics in order to improve the quality of results. Several models for automatic extraction of latent information in a collection of documents have been proposed in the literature, among them probabilistic topic models are prominent. Probabilistic topic models achieve good practical results, and have been extended to several models with different types of information included. However, properly describe these models, derive them, and then get appropriate inference algorithms are difficult tasks, requiring a rigorous mathematical treatment for descriptions of operations performed in the latent dimensions discovery process. Thus, for the development of a simple and efficient method to tackle the problem of latent dimensions discovery, a proper representation of the data is required. The hypothesis of this thesis is that by using bipartite graph for representation of textual data one can address the task of latent patterns discovery, present in the relationships between documents and words, in a simple and intuitive way. For validation of this hypothesis, we have developed a framework based on label propagation algorithm using the bipartite graph representation. The framework, called PBG (Propagation in Bipartite Graph) was initially applied to the unsupervised context for a static collection of documents. Then a semi-supervised version was proposed which need only a small amount of labeled documents to the transductive classification task. Finally, it was applied in the dynamic context in which flow of textual data was considered. Comparative analyzes were performed, and the results indicated that the PBG is a viable and competitive alternative for tasks in the unsupervised and semi-supervised contexts.
|
84 |
Monophonic convexity in classes of graphs / Convexidade MonofÃnica em Classes de GrafosEurinardo Rodrigues Costa 06 February 2015 (has links)
Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico / In this work, we study some parameters of monophonic convexity in some classes of graphs and we present our results about this subject. We prove that decide if the $m$-interval number is at most 2 and decide if the $m$-percolation time is at most 1 are NP-complete problems even on bipartite graphs. We also prove that the $m$-convexity number is as hard to approximate as the maximum clique problem, which is, $O(n^{1-varepsilon})$-unapproachable in polynomial-time, unless P=NP, for each $varepsilon>0$. Finally, we obtain polynomial time algorithms to compute the $m$-convexity number on hereditary graph classes such that the computation of the clique number is polynomial-time solvable (e.g. perfect graphs and planar graphs). / Neste trabalho, estudamos alguns parÃmetros para a convexidade monofÃnica em algumas classes de grafos e apresentamos nossos resultados acerca do assunto. Provamos que decidir se o nÃmero de $m$-intervalo à no mÃximo 2 e decidir se o tempo de $m$-percolaÃÃo à no mÃximo 1 sÃo problemas NP-completos mesmo em grafos bipartidos. TambÃm provamos que o nÃmero de $m$-convexidade à tÃo difÃcil de aproximar quanto o problema da Clique MÃxima, que Ã, $O(n^{1-varepsilon})$-inaproximÃvel em tempo polinomial, a menos que P=NP, para cada $varepsilon>0$. Finalmente, apresentamos um algoritmo de tempo polinomial para determinar o nÃmero de $m$-convexidade em classes hereditÃrias de grafos onde a computaÃÃo do tamanho da clique mÃxima à em tempo polinomial (como grafos perfeitos e grafos planares).
|
85 |
De grafos a emparelhamentos : uma possibilidade viável de encantar-se com a matemáticaFerreira, Verônica Craveiro de Santana 10 April 2014 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / This thesis aims to show that the theory of graphs, especially matching, can be
studied in high school and gradually as the implementation of this theory in the
classroom can foster in students interest in mathematics. Thus, this paper aims
to demystify the idea that mathematics content ends with high school approaching
students the theories recently developed in academy. The graph theory is considered
an e cient tool to solve problems in various areas. There are numerous situations that
can be modeled by that enable develop a range of skills, so it becomes so appealing to
anyone who comes into contact with it. For the development of this thesis began our
study addressing basic concepts of graph theory useful for understanding this work
then present some problems that can be worked in high school and nalized with a
speci c topic of this theory, matchings, with many applications that can be modeled
as contextualized and practical problems of everyday life. / A presente disserta ção tem como objetivo mostrar que a teoria de grafos, sobretudo emparelhamentos, pode ser abordada no ensino m édio de forma gradativa. E como a implementa ção desta teoria em sala de aula pode despertar nos estudantes o interesse pela matem atica. Dessa forma, este trabalho pretende desmitifi car a ideia de que a matem atica se encerra com o conte udo do ensino m édio aproximando os estudantes das teorias desenvolvidas recentemente na academia. A teoria dos grafos é considerada uma ferramenta e ficiente para resolver problemas em diferentes áreas. São in úmeras situa ções que podem ser modeladas por grafos que possibilitam desenvolver uma s érie de habilidades, por isso ela se torna tao atraente para quem entra em contato com a mesma. Para o desenvolvimento desta disserta ção, iniciamos nosso estudo abordando conceitos b ásicos da teoria de grafos úteis a compreensão deste trabalho, em seguida apresentamos alguns problemas que podem ser trabalhados no ensino m édio e a nalisamos com um t ópico específi co desta teoria, emparelhamentos, com muitas aplica coes que podem ser contextualizadas e modeladas como problemas pr áticos do nosso cotidiano.
|
86 |
Inspection automatisée d’assemblages mécaniques aéronautiques par vision artificielle : une approche exploitant le modèle CAO / Automated inspection of mechanical parts by computer vision : an approach based on CAD modelViana do Espírito Santo, Ilísio 12 December 2016 (has links)
Les travaux présentés dans ce manuscrit s’inscrivent dans le contexte de l’inspection automatisée d’assemblages mécaniques aéronautiques par vision artificielle. Il s’agit de décider si l’assemblage mécanique a été correctement réalisé (assemblage conforme). Les travaux ont été menés dans le cadre de deux projets industriels. Le projet CAAMVis d’une part, dans lequel le capteur d’inspection est constitué d’une double tête stéréoscopique portée par un robot, le projet Lynx© d’autre part, dans lequel le capteur d’inspection est une caméra Pan/Tilt/Zoom (vision monoculaire). Ces deux projets ont pour point commun la volonté d’exploiter au mieux le modèle CAO de l’assemblage (qui fournit l’état de référence souhaité) dans la tâche d’inspection qui est basée sur l’analyse de l’image ou des images 2D fournies par le capteur. La méthode développée consiste à comparer une image 2D acquise par le capteur (désignée par « image réelle ») avec une image 2D synthétique, générée à partir du modèle CAO. Les images réelles et synthétiques sont segmentées puis décomposées en un ensemble de primitives 2D. Ces primitives sont ensuite appariées, en exploitant des concepts de la théorie de graphes, notamment l’utilisation d’un graphe biparti pour s’assurer du respect de la contrainte d’unicité dans le processus d’appariement. Le résultat de l’appariement permet de statuer sur la conformité ou la non-conformité de l’assemblage. L’approche proposée a été validée à la fois sur des données de simulation et sur des données réelles acquises dans le cadre des projets sus-cités. / The work presented in this manuscript deals with automated inspection of aeronautical mechanical parts using computer vision. The goal is to decide whether a mechanical assembly has been assembled correctly i.e. if it is compliant with the specifications. This work was conducted within two industrial projects. On one hand the CAAMVis project, in which the inspection sensor consists of a dual stereoscopic head (stereovision) carried by a robot, on the other hand the Lynx© project, in which the inspection sensor is a single Pan/Tilt/Zoom camera (monocular vision). These two projects share the common objective of exploiting as much as possible the CAD model of the assembly (which provides the desired reference state) in the inspection task which is based on the analysis of the 2D images provided by the sensor. The proposed method consists in comparing a 2D image acquired by the sensor (referred to as "real image") with a synthetic 2D image generated from the CAD model. The real and synthetic images are segmented and then decomposed into a set of 2D primitives. These primitives are then matched by exploiting concepts from the graph theory, namely the use of a bipartite graph to guarantee the respect of the uniqueness constraint required in such a matching process. The matching result allows to decide whether the assembly has been assembled correctly or not. The proposed approach was validated on both simulation data and real data acquired within the above-mentioned projects.
|
87 |
Generalizations Of The Popular Matching ProblemNasre, Meghana 08 1900 (has links) (PDF)
Matching problems arise in several real-world scenarios like assigning posts to applicants, houses to trainees and room-mates to one another. In this thesis we consider the bipartite matching problem where one side of the bipartition specifies preferences over the other side. That is, we
are given a bipartite graph G = (A ∪ P,E) where A denotes the set of applicants, P denotes the set of posts, and the preferences of applicants are specified by ranks on the edges. Several notions of optimality like pareto-optimality, rank-maximality, popularity have been studied in the literature; we focus on the notion of popularity. A matching M is more popular than another matching M′ if the number of applicants that prefer M to M′ exceeds the number of applicants that prefer M′ to M. A matching M is said to be popular if there exists no matching that is more popular than M. Popular matchings have the desirable property that no applicant majority can force a migration to another matching. However, popular matchings do not provide a complete answer since there exist simple instances that do not admit any
popular matching. Abraham et al. (SICOMP 2007) characterized instances that admit a
popular matching and also gave efficient algorithms to find one when it exists. We present several generalizations of the popular matchings problem in this thesis.
Majority of our work deals with instances that do not admit any popular matching. We
propose three different solution concepts for such instances. A reasonable solution when an instance does not admit a popular matching is to output a matching that is least unpopular amongst the set of unpopular matchings. McCutchen (LATIN 2008) introduced and studied measures of unpopularity, namely the unpopularity factor and unpopularity margin. He proved that computing either a least unpopularity factor matching or a least unpopularity margin matching is NP-hard. We build upon this work and design an O(km√n) time algorithm which produces matchings with bounded unpopularity provided a certain subgraph of G admits an A-complete matching (a matching that matches all the applicants). Here n and m denote the number of vertices and the number of edges in G respectively, and k, which is bounded by |A|,
is the number of iterations taken by our algorithm to terminate. We also show that if a certain subgraph of G admits an A-complete matching, then we have computed a matching with the least unpopularity factor.
Another feasible solution for instances without any popular matching is to output a mixed matching that is popular. A mixed matching is simply a probability distribution over the set of matchings. A mixed matching Q is popular if no mixed matching is more popular than Q. We
seek to answer the existence and computation of popular mixed matchings in a given instance G. We begin with a linear programming formulation to compute a mixed matching with the least unpopularity margin. We show that although the linear program has exponentially many constraints, we have a polynomial time separation oracle and hence a least unpopularity margin mixed matching can be computed in polynomial time. By casting the popular mixed matchings problem as a two player zero-sum game, it is possible to prove that every instance of the popular matchings problem admits a popular mixed matching. Therefore, the matching returned by our linear program is indeed a popular mixed matching.
Finally, we propose augmentation of the input graph for instances that do not admit any popular matching. Assume that we are dealing with a set of items B (say, DVDs/books) instead of posts and it is possible to make duplicates of these items. Our goal is to make duplicates of appropriate items such that the augmented graph admits a popular matching. However, since allowing arbitrarily many copies for items is not feasible in practice, we impose restrictions in two forms – (i) associating costs with items, and (ii) bounding the number of copies. In the first case, we assume that we pay a price of cost(b) for every extra copy of b that we make; the first
copy is assumed to be given to us at free. The total cost of the augmented instance is the sum of costs of all the extra copies that we make. Our goal is to compute a minimum cost augmented instance which admits a popular matching. In the second case, along with the input graph G = (A ∪ B,E), we are given a vector hc1, c2, . . . , c|B|i denoting upper bounds on the number of copies of every item. We seek to answer whether there exists a vector hx1, x2, . . . , x|B|i such that having xi copies of item bi where 1 ≤ xi ≤ ci enables the augmented graph to admit a popular matching. We prove that several problems under both these models turn out to be NP-hard – in fact they remain NP-hard even under severe restrictions on the preference lists.
Our final results deal with instances that admit popular matchings. When the input instance admits a popular matching, there may be several popular matchings – in fact there may be several maximum cardinality popular matchings. Hence one may not be content with returning any maximum cardinality popular matching and instead ask for an optimal popular matching. Assuming that the notion of optimality is specified as a part of the problem, we present an
O(m + n21 ) time algorithm for computing an optimal popular matching in G. Here m denotes
the number of edges in G and n1 denotes the number of applicants. We also consider the
problem of computing a minimum cost popular matching where with every post p, a price
cost(p) and a capacity cap(p) are associated. A post with capacity cap(p) can be matched with up to cap(p) many applicants. We present an O(mn1) time algorithm to compute a minimum cost popular matching in such instances.
We believe that the work provides interesting insights into the popular matchings problem
and its variants.
|
88 |
Learning with Complex Performance Measures : Theory, Algorithms and ApplicationsNarasimhan, Harikrishna January 2016 (has links) (PDF)
We consider supervised learning problems, where one is given objects with labels, and the goal is to learn a model that can make accurate predictions on new objects. These problems abound in applications, ranging from medical diagnosis to information retrieval to computer vision. Examples include binary or multiclass classi cation, where the goal is to learn a model that can classify objects into two or more categories (e.g. categorizing emails into spam or non-spam); bipartite ranking, where the goal is to learn a model that can rank relevant objects above the irrelevant ones (e.g. ranking documents by relevance to a query); class probability estimation (CPE), where the goal is to predict the probability of an object belonging to different categories (e.g. probability of an internet ad being clicked by a user). In each case, the accuracy of a model is evaluated in terms of a specified `performance measure'.
While there has been much work on designing and analyzing algorithms for different supervised learning tasks, we have complete understanding only for settings where the performance measure of interest is the standard 0-1 or a loss-based classification measure. These performance measures have a simple additive structure, and can be expressed as an expectation of errors on individual examples. However, in many real-world applications, the performance measure used to evaluate a model is often more complex, and does not decompose into a sum or expectation of point-wise errors. These include the binary or multiclass G-mean used in class-imbalanced classification problems; the F1-measure and its multiclass variants popular in text retrieval; and the (partial) area under the ROC curve (AUC) and precision@ employed in ranking applications. How does one design efficient learning algorithms for such complex performance measures, and can these algorithms be shown to be statistically consistent, i.e. shown to converge in the limit of infinite data to the optimal model for the given measure? How does one develop efficient learning algorithms for complex measures in online/streaming settings where the training examples need to be processed one at a time? These are questions that we seek to address in this thesis. Firstly, we consider the bipartite ranking problem with the AUC and partial AUC performance measures. We start by understanding how bipartite ranking with AUC is related to the
standard 0-1 binary classification and CPE tasks. It is known that a good binary CPE model can be used to obtain both a good binary classification model and a good bipartite ranking model (formally, in terms of regret transfer bounds), and that a binary classification model does not necessarily yield a CPE model. However, not much is known about other directions. We show that in a weaker sense (where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution), a good bipartite ranking model for AUC can indeed be used to construct a good binary classification model, and also a good binary CPE model. Next, motivated by the increasing number of applications (e.g. biometrics, medical diagnosis, etc.), where performance is measured, not in terms of the full AUC, but in terms of the partial AUC between two false positive rates (FPRs), we design batch algorithms for optimizing partial AUC in any given FPR range. Our algorithms optimize structural support vector machine based surrogates, which unlike for the full AUC; do not admit a straightforward decomposition into simpler terms. We develop polynomial time cutting plane solvers for solving the optimization, and provide experiments to demonstrate the efficacy of our methods. We also present an application of our approach to predicting chemotherapy outcomes for cancer patients, with the aim of improving treatment decisions.
Secondly, we develop algorithms for optimizing (surrogates for) complex performance mea-sures in the presence of streaming data. A well-known method for solving this problem for standard point-wise surrogates such as the hinge surrogate, is the stochastic gradient descent (SGD) method, which performs point-wise updates using unbiased gradient estimates. How-ever, this method cannot be applied to complex objectives, as here one can no longer obtain unbiased gradient estimates from a single point. We develop a general stochastic method for optimizing complex measures that avoids point-wise updates, and instead performs gradient updates on mini-batches of incoming points. The method is shown to provably converge for any performance measure that satis es a uniform convergence requirement, such as the partial AUC, precision@ and F1-measure, and in experiments, is often several orders of magnitude faster than the state-of-the-art batch methods, while achieving similar or better accuracies. Moreover, for specific complex binary classification measures, which are concave functions of the true positive rate (TPR) and true negative rate (TNR), we are able to develop stochastic (primal-dual) methods that can indeed be implemented with point-wise updates, by using an adaptive linearization scheme. These methods admit convergence rates that match the rate of the SGD method, and are again several times faster than the state-of-the-art methods.
Finally, we look at the design of consistent algorithms for complex binary and multiclass measures. For binary measures, we consider the practically popular plug-in algorithm that constructs a classifier by applying an empirical threshold to a suitable class probability estimate,
and provide a general methodology for proving consistency of these methods. We apply this technique to show consistency for the F1-measure, and under a continuity assumption on the distribution, for any performance measure that is monotonic in the TPR and TNR. For the case of multiclass measures, a simple plug-in method is no longer tractable, as in the place of a single threshold parameter, one needs to tune at least as many parameters as the number of classes. Using an optimization viewpoint, we provide a framework for designing learning algorithms for multiclass measures that are general functions of the confusion matrix, and as an instantiation, provide an e cient and provably consistent algorithm based on the bisection method for multiclass measures that are ratio-of-linear functions of the confusion matrix (e.g. micro F1). The algorithm outperforms the state-of-the-art SVMPerf method in terms of both accuracy and running time.
Overall, in this thesis, we have looked at various aspects of complex performance measures used in supervised learning problems, leading to several new algorithms that are often significantly better than the state-of-the-art, to improved theoretical understanding of the performance measures studied, and to novel real-life applications of the algorithms developed.
|
89 |
Rotulação de símbolos matemáticos manuscritos via casamento de expressões / Labeling of Handwritten Mathematical Symbols via Expression MatchingWillian Yukio Honda 23 January 2013 (has links)
O problema de reconhecimento de expressões matemáticas manuscritas envolve três subproblemas importantes: segmentação de símbolos, reconhecimento de símbolos e análise estrutural de expressões. Para avaliar métodos e técnicas de reconhecimento, eles precisam ser testados sobre conjuntos de amostras representativos do domínio de aplicação. Uma das preocupações que tem sido apontada ultimamente é a quase inexistência de base de dados pública de expressões matemáticas, o que dificulta o desenvolvimento e comparação de diferentes abordagens. Em geral, os resultados de reconhecimento apresentados na literatura restringem-se a conjuntos de dados pequenos, não disponíveis publicamente, e muitas vezes formados por dados que visam avaliar apenas alguns aspectos específicos do reconhecimento. No caso de expressões online, para treinar e testar reconhecedores de símbolos, as amostras são em geral obtidas solicitando-se que as pessoas escrevam uma série de símbolos individualmente e repetidas vezes. Tal tarefa é monótona e cansativa. Uma abordagem alternativa para obter amostras de símbolos seria solicitar aos usuários a transcrição de expressões modelo previamente definidas. Dessa forma, a escrita dos símbolos seria realizada de forma natural, menos monótona, e várias amostras de símbolos poderiam ser obtidas de uma única expressão. Para evitar o trabalho de anotar manualmente cada símbolo das expressões transcritas, este trabalho propõe um método para casamento de expressões matemáticas manuscritas, no qual símbolos de uma expressão transcrita por um usuário são associados aos correspondentes símbolos (previamente identificados) da expressão modelo. O método proposto é baseado em uma formulação que reduz o problema a um problema de associação simples, no qual os custos são definidos em termos de características dos símbolos e estrutura da expressão. Resultados experimentais utilizando o método proposto mostram taxas médias de associação correta superiores a 99%. / The problem of recognizing handwritten mathematical expressions includes three important subproblems: symbol segmentation, symbol recognition, and structural analysis of expressions. In order to evaluate recognition methods and techniques, they should be tested on representative sample sets of the application domain. One of the concerns that are being repeatedly pointed recently is the almost non-existence of public representative datasets of mathematical expressions, which makes difficult the development and comparison of distinct approaches. In general, recognition results reported in the literature are restricted to small datasets, not publicly available, and often consisting of data aiming only evaluation of some specific aspects of the recognition. In the case of online expressions, to train and test symbol recognizers, samples are in general obtained asking users to write a series of symbols individually and repeatedly. Such task is boring and tiring. An alternative approach for obtaining samples of symbols would be to ask users to transcribe previously defined model expressions. By doing so, writing would be more natural and less boring, and several symbol samples could be obtained from one transcription. To avoid the task of manually labeling the symbols of the transcribed expressions, in this work a method for handwritten expression matching, in which symbols of a transcribed expression are assigned to the corresponding ones in the model expression, is proposed. The proposed method is based on a formulation that reduces the matching problem to a linear assignment problem, where costs are defined based on symbol features and expression structure. Experimental results using the proposed method show that mean correct assignment rate superior to 99% is achieved.
|
90 |
Metody pro komparativní analýzu metagenomických dat / Methods for Comparative Analysis of Metagenomic DataSedlář, Karel January 2018 (has links)
Moderní výzkum v environmentální mikrobiologii využívá k popisu mikrobiálních komunit genomická data, především sekvenaci DNA. Oblast, která zkoumá veškerý genetický materiál přítomný v environmentálním vzorku, se nazývá metagenomika. Tato doktorská práce se zabývá metagenomikou z pohledu bioinformatiky, která je nenahraditelná při výpočetním zpracování dat. V teoretické části práce jsou popsány dva základní přístupy metagenomiky, včetně jejich základních principů a slabin. První přístup, založený na cíleném sekvenování, je dobře rozpracovanou oblastí s velkou řadou bioinformatických technik. Přesto mohou být metody pro porovnávání vzorků z několika prostředí podstatně vylepšeny. Přístup představený v této práci používá unikátní transformaci dat do podoby bipartitního grafu, kde je jedna partita tvořena taxony a druhá vzorky, případně různými prostředími. Takový graf plně reflektuje kvalitativní i kvantitativní složení analyzované mikrobiální sítě. Umožňuje masivní redukci dat pro jednoduché vizualizace bez negativních vlivů na automatickou detekci komunit, která dokáže odhalit shluky podobných vzorků a jejich typických mikrobů. Druhý přístup využívá sekvenace celého metagenomu. Tato strategie je novější a příslušející bioinformatické nástroje jsou méně propracované. Hlavní výzvou přitom zůstává rychlá klasifikace sekvencí, v metagenomice označovaná jako „binning“. Metoda představená v této práci využívá přístupu zpracování genomických signálů. Tato unikátní metodologie byla navržena na základě podrobné analýzy redundance genetické informace uložené v genomických signálech. Využívá transformace znakových sekvencí do několika variant fázových signálů. Navíc umožňuje přímé zpracování dat ze sekvenace nanopórem v podobě nativních proudových signálů.
|
Page generated in 0.08 seconds