• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • Tagged with
  • 7
  • 7
  • 4
  • 4
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Fast Algorithms for Large-Scale Phylogenetic Reconstruction

Truszkowski, Jakub January 2013 (has links)
One of the most fundamental computational problems in biology is that of inferring evolutionary histories of groups of species from sequence data. Such evolutionary histories, known as phylogenies are usually represented as binary trees where leaves represent extant species, whereas internal nodes represent their shared ancestors. As the amount of sequence data available to biologists increases, very fast phylogenetic reconstruction algorithms are becoming necessary. Currently, large sequence alignments can contain up to hundreds of thousands of sequences, making traditional methods, such as Neighbor Joining, computationally prohibitive. To address this problem, we have developed three novel fast phylogenetic algorithms. The first algorithm, QTree, is a quartet-based heuristic that runs in O(n log n) time. It is based on a theoretical algorithm that reconstructs the correct tree, with high probability, assuming every quartet is inferred correctly with constant probability. The core of our algorithm is a balanced search tree structure that enables us to locate an edge in the tree in O(log n) time. Our algorithm is several times faster than all the current methods, while its accuracy approaches that of Neighbour Joining. The second algorithm, LSHTree, is the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+γ(g)} log^2 n) time, where γ is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is close to linear. In experiments, our prototype implementation was more accurate than the current fast algorithms, while being comparably fast. In the final part of this thesis, we apply the algorithmic framework behind LSHTree to the problem of placing large numbers of short sequence reads onto a fixed phylogenetic tree. Our initial results in this area are promising, but there are still many challenges to be resolved.
2

Fast Algorithms for Large-Scale Phylogenetic Reconstruction

Truszkowski, Jakub January 2013 (has links)
One of the most fundamental computational problems in biology is that of inferring evolutionary histories of groups of species from sequence data. Such evolutionary histories, known as phylogenies are usually represented as binary trees where leaves represent extant species, whereas internal nodes represent their shared ancestors. As the amount of sequence data available to biologists increases, very fast phylogenetic reconstruction algorithms are becoming necessary. Currently, large sequence alignments can contain up to hundreds of thousands of sequences, making traditional methods, such as Neighbor Joining, computationally prohibitive. To address this problem, we have developed three novel fast phylogenetic algorithms. The first algorithm, QTree, is a quartet-based heuristic that runs in O(n log n) time. It is based on a theoretical algorithm that reconstructs the correct tree, with high probability, assuming every quartet is inferred correctly with constant probability. The core of our algorithm is a balanced search tree structure that enables us to locate an edge in the tree in O(log n) time. Our algorithm is several times faster than all the current methods, while its accuracy approaches that of Neighbour Joining. The second algorithm, LSHTree, is the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+γ(g)} log^2 n) time, where γ is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is close to linear. In experiments, our prototype implementation was more accurate than the current fast algorithms, while being comparably fast. In the final part of this thesis, we apply the algorithmic framework behind LSHTree to the problem of placing large numbers of short sequence reads onto a fixed phylogenetic tree. Our initial results in this area are promising, but there are still many challenges to be resolved.
3

MiR-Drug Relationships: Mining and discovering bi-domain dense subclusters using greedy randomized algorithm

Shahdeo, Sandhya 20 April 2011 (has links)
No description available.
4

Interaction Testing, Fault Location, and Anonymous Attribute-Based Authorization

January 2019 (has links)
abstract: This dissertation studies three classes of combinatorial arrays with practical applications in testing, measurement, and security. Covering arrays are widely studied in software and hardware testing to indicate the presence of faulty interactions. Locating arrays extend covering arrays to achieve identification of the interactions causing a fault by requiring additional conditions on how interactions are covered in rows. This dissertation introduces a new class, the anonymizing arrays, to guarantee a degree of anonymity by bounding the probability a particular row is identified by the interaction presented. Similarities among these arrays lead to common algorithmic techniques for their construction which this dissertation explores. Differences arising from their application domains lead to the unique features of each class, requiring tailoring the techniques to the specifics of each problem. One contribution of this work is a conditional expectation algorithm to build covering arrays via an intermediate combinatorial object. Conditional expectation efficiently finds intermediate-sized arrays that are particularly useful as ingredients for additional recursive algorithms. A cut-and-paste method creates large arrays from small ingredients. Performing transformations on the copies makes further improvements by reducing redundancy in the composed arrays and leads to fewer rows. This work contains the first algorithm for constructing locating arrays for general values of $d$ and $t$. A randomized computational search algorithmic framework verifies if a candidate array is $(\bar{d},t)$-locating by partitioning the search space and performs random resampling if a candidate fails. Algorithmic parameters determine which columns to resample and when to add additional rows to the candidate array. Additionally, analysis is conducted on the performance of the algorithmic parameters to provide guidance on how to tune parameters to prioritize speed, accuracy, or a combination of both. This work proposes anonymizing arrays as a class related to covering arrays with a higher coverage requirement and constraints. The algorithms for covering and locating arrays are tailored to anonymizing array construction. An additional property, homogeneity, is introduced to meet the needs of attribute-based authorization. Two metrics, local and global homogeneity, are designed to compare anonymizing arrays with the same parameters. Finally, a post-optimization approach reduces the homogeneity of an anonymizing array. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2019
5

Study of FPGA Implementation of Entropy Norm Computation for IP Data Streams

Nagalakshmi, Subramanya 18 April 2008 (has links)
Recent literature has reported the use of entropy measurements for anomaly detection purposes in IP data streams. Space efficient randomized algorithms for estimating entropy of data streams are available in the literature. However no hardware implementation of these algorithms is available. The main challenge to software implementation for IP data streams has been in storing large volumes of data, along with, the requirement of high speed at which they have to be analyzed. In this thesis, a recent randomized algorithm available in the literature is analyzed for hardware implementation. Software/hardware simulations indicate it is possible to implement a large portion of the algorithm on a low cost Xilinx Virtex-II Pro FPGA with trade-offs for real-time operation. The thesis reports on the feasibility of this algorithm's FPGA implementation and the corresponding trade-offs and limitations.
6

Computational Methods For Functional Motif Identification and Approximate Dimension Reduction in Genomic Data

Georgiev, Stoyan January 2011 (has links)
<p>Uncovering the DNA regulatory logic in complex organisms has been one of the important goals of modern biology in the post-genomic era. The sequencing of multiple genomes in combination with the advent of DNA microarrays and, more recently, of massively parallel high-throughput sequencing technologies has made possible the adoption of a global perspective to the inference of the regulatory rules governing the context-specific interpretation of the genetic code that complements the more focused classical experimental approaches. Extracting useful information and managing the complexity resulting from the sheer volume and the high-dimensionality of the data produced by these genomic assays has emerged as a major challenge which we attempt to address in this work by developing computational methods and tools, specifically designed for the study of the gene regulatory processes in this new global genomic context. </p><p>First, we focus on the genome-wide discovery of physical interactions between regulatory sequence regions and their cognate proteins at both the DNA and RNA level. We present a motif analysis framework that leverages the genome-wide</p><p>evidence for sequence-specific interactions between trans-acting factors and their preferred cis-acting regulatory regions. The utility of the proposed framework is demonstarted on DNA and RNA cross-linking high-throughput data.</p><p>A second goal of this thesis is the development of scalable approaches to dimension reduction based on spectral decomposition and their application to the study of population structure in massive high-dimensional genetic data sets. We have developed computational tools and have performed theoretical and empirical analyses of their statistical properties with particular emphasis on the analysis of the individual genetic variation measured by Single Nucleotide Polymorphism (SNP) microrarrays.</p> / Dissertation
7

Σχεδιασμός και ανάλυση αλγορίθμων σε Τυχαία Γραφήματα Τομής / Design and analysis of algorithms on Random Intersection Graphs

Ραπτόπουλος, Χριστόφορος 16 May 2007 (has links)
Στη διπλωματική αυτή εργασία ορίζουμε δυο νέα μοντέλα τυχαίων γραφημάτων τομής ετικετών και εξετάζονται ως προς ορισμένες σημαντικές γραφοθεωρητικές ιδιότητές τους. Ένα τυχαίο γράφημα τομής ετικετών παράγεται αντιστοιχώντας σε κάθε κορυφή ένα \\\\emph{τυχαίο} υποσύνολο ενός πεπερασμένου) σύμπαντος $M$ από $m$ στοιχεία και βάζοντας μια ακμή μεταξύ δυο κορυφών αν και μόνον εάν τα αντίστοιχα σύνολά τους έχουν μη κενή τομή. Συγκεκριμενοποιώντας την κατανομή που ακολουθεί το τυχαίο υποσύνολο που αντιστοιχείται σε κάθε κορυφή παίρνουμε διαφορετικά μοντέλα τυχαίων γραφημάτων τομής. Στο γενικευμένο μοντέλο τυχαίων γραφημάτων τομής κάθε στοιχείο $i$ του $M$ επιλέγεται ανεξάρτητα από κάθε κορυφή με πιθανότητα $p_i$. Το ομοιόμορφο μοντέλο τυχαίων γραφημάτων τομής ετικετών αποτελεί μια ειδική περίπτωση του γενικευμένου μοντέλου όπου η πιθανότητα επιλογής των στοιχείων του $M$ είναι ίση με $p$, δηλαδή ίδια για όλα τα στοιχεία του $M$. Όπως θα δούμε, για ορισμένες τιμές των παραμέτρων $m$ και $p$, το ομοιόμορφο μοντέλο είναι ισοδύναμο με το μοντέλο $G_{n, \\\\hat{p}}$, δηλαδή με το μοντέλο τυχαίων γραφημάτων στο οποίο κάθε πλευρά εμφανίζεται στοχαστικά ανεξάρτητα με πιθανότητα $\\\\hat{p}$. Τέλος, στο κανονικό μοντέλο τυχαίων γραφημάτων τομής ετικετών το υποσύνολο του $M$ που αντιστοιχείται σε κάθε κορυφή έχει σταθερό αριθμό στοιχείων. Λόγω της στοχαστικής εξάρτησης που υποννοείται για την ύπαρξη πλευρών, τα γραφήματα αυτά θεωρούνται αρκετά ρεαλιστικά μοντέλα (σε σχέση με τα κλασσικά τυχαία γραφήματα) σε πολλές πρακτικές εφαρμογές, ιδιαίτερα σε αλγοριθμικά θέματα δικτύων. Στην εργασία αυτή αρχικά παρουσιάζουμε μερικά χαρακτηριστικά αποτελέσματα από τη σχετική βιβλιογραφία για τα μοντέλα αυτά. Ακόμα, μελετάμε, για πρώτη φορά στη βιβλιογραφία, την ύπαρξη ανεξάρτητων συνόλων κορυφών μεγέθους $k$ στο γενικευμένο μοντέλο τυχαίων γραφημάτων τομής ετικετών, υπολογίζοντας τη μέση τιμή και τη διασπορά του αριθμού τους. Επίσης, προτείνουμε και αναλύουμε τρείς πιθανοτικούς αλγόριθμους που τρέχουν σε μικρό πολυωνυμικό χρόνο (ως προς τον αριθμό των κορυφών και τον αριθμό των στοιχείων του $M$) για την εύρεση αρκετά μεγάλων συνόλων ανεξάρτητων κορυφών. / In this Master thesis we define and analyse two new models of random intersection graphs. A random intersection graph is produced by assigning to each vertex a random subset of a (finite) universe $M$ of $m$ elements and by drawing an edge between two vertices if and only if their corresponding subsets have some elements in common. By specifying the distribution of the subsets assigned to each vertex, we get various models of random intersection graphs. In the generalized random intersection graphs model each element $i$ in $M$ is chosen independently with probability $p_i$. The uniform random intersection graphs model is a special case of the generalized model where the probability of selecting an element of $M$ is $p$, i.e. the same for every element. As we will see, for some range of values of the parameters $m$ and $p$, the uniform model is equivalent in some sense with the model $G_{n, \\\\hat{p}}$, i.e. the random graphs model in which each edge appears independently with probability $\\\\hat{p}$. Finally, in the regular random intersection graphs model, the subset of $M$ assigned to each vertex has a fixed number of elements. Due to the dependence implied in the appearance of edges, these models are considered to be more realistic (than classic random graphs) in many applications. This thesis begins by presenting several important results concearning these models. Also, we study for the first time the existence of independent sets of size $k$ in the generalized random intersection graphs model, and we give exact formulae for the mean and variance of their number. Additionally, we propose three randomized algorithms, that run in small polynomial time (with respect to the number of vertices and the number of elements of $M$), for finding large independent sets of vertices.

Page generated in 0.124 seconds