Global ETD Search

71	Game theoretic and machine learning techniques for balancing games Long, Jeffrey Richard 29 August 2006 (has links) Game balance is the problem of determining the fairness of actions or sets of actions in competitive, multiplayer games. This problem primarily arises in the context of designing board and video games. Traditionally, balance has been achieved through large amounts of play-testing and trial-and-error on the part of the designers. In this thesis, it is our intent to lay down the beginnings of a framework for a formal and analytical solution to this problem, combining techniques from game theory and machine learning. We first develop a set of game-theoretic definitions for different forms of balance, and then introduce the concept of a strategic abstraction. We show how machine classification techniques can be used to identify high-level player strategy in games, using the two principal methods of sequence alignment and Naive Bayes classification. Bioinformatics sequence alignment, when combined with a 3-nearest neighbor classification approach, can, with only 3 exemplars of each strategy, correctly identify the strategy used in 55\% of cases using all data, and 77\% of cases on data that experts indicated actually had a strategic class. Naive Bayes classification achieves similar results, with 65\% accuracy on all data and 75\% accuracy on data rated to have an actual class. We then show how these game theoretic and machine learning techniques can be combined to automatically build matrices that can be used to analyze game balance properties. games game balance machine learning sequence alignment game theory naive bayes
72	Scrambling analysis of ciliates Liu, Jing 10 September 2009 (has links) Ciliates are a class of organisms which undergo a genetic process called gene descrambling after mating. In order to better understand the problem, a literature review of past works has been presented in this thesis. This includes a brief summary of both the relevant biology and bioinformatics literature. Then, a formal definition of scrambling systems is developed which attempts to model the problem of sequence alignment between scrambled and descrambled genes. With this system, sequences can be classified into relevant functional segments. It also provides a framework whereby we can compare various ciliate sequence alignment algorithms. After that, a new method of predicting the various functional segments is studied. This method shows better coverage, and usually a better labelling score with certain parameters. Then we discuss several recent hypotheses as to how ciliates naturally descramble genes. An algorithm suite is developed to test these hypotheses. With the tests, we are able to computationally check which factors are potentially the most important. According to the current results with 247 pointer sequences of 13 micronuclear genes, examining repeats which are the same distance together with either the sequence or the size, as the real pointers, is almost always enough information to guide descrambling. Indeed, the real pointer sequence is the unique repeat 92.7% and 94.3% of the time within the 247 pointers, from the left and right respectively, using only the pointer distance and the pointer sequence information. theoretical computer science ciliate scrambling system scrambling analysis sequence alignment bioinformatics
73	Speeding Up the Convergence of Online Heuristic Search and Scaling Up Offline Heuristic Search Furcy, David Andre 25 November 2004 (has links) The most popular methods for solving the shortest-path problem in Artificial Intelligence are heuristic search algorithms. The main contributions of this research are new heuristic search algorithms that are either faster or scale up to larger problems than existing algorithms. Our contributions apply to both online and offline tasks. For online tasks, existing real-time heuristic search algorithms learn better informed heuristic values and in some cases eventually converge to a shortest path by repeatedly executing the action leading to a successor state with a minimum cost-to-goal estimate. In contrast, we claim that real-time heuristic search converges faster to a shortest path when it always selects an action leading to a state with a minimum f-value, where the f-value of a state is an estimate of the cost of a shortest path from start to goal via the state, just like in the offline A* search algorithm. We support this claim by implementing this new non-trivial action-selection rule in FALCONS and by showing empirically that FALCONS significantly reduces the number of actions to convergence of a state-of-the-art real-time search algorithm. For offline tasks, we improve on two existing ways of scaling up best-first search to larger problems. First, it is known that the WA* algorithm (a greedy variant of A) solves larger problems when it is either diversified (i.e., when it performs expansions in parallel) or committed (i.e., when it chooses the state to expand next among a fixed-size subset of the set of generated but unexpanded states). We claim that WA solves even larger problems when it is enhanced with both diversity and commitment. We support this claim with our MSC-KWA* algorithm. Second, it is known that breadth-first search solves larger problems when it prunes unpromising states, resulting in the beam search algorithm. We claim that beam search quickly solves even larger problems when it is enhanced with backtracking based on limited discrepancy search. We support this claim with our BULB algorithm. We show that both MSC-KWA* and BULB scale up to larger problems than several state-of-the-art offline search algorithms in three standard benchmark domains. Finally, we present an anytime variant of BULB and apply it to the multiple sequence alignment problem in biology. Anytime search algorithms Beam search Real-time search Heuristic search Multiple sequence alignment
74	PSSMs : not just roadkill on the information superhighway / Ng, Pauline Crystal. January 2002 (has links) Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (leaves 93-101).
75	Structural and functional studies on heat shock protein Hsp40-Hdj1 and Golgi ER trafficking protein Get3 Hu, Junbin. January 2009 (has links) (PDF) Thesis (Ph.D.)--University of Alabama at Birmingham, 2009. / Title from PDF title page (viewed on Feb. 2, 2010). Includes bibliographical references.
76	Improving secondary structure prediction with covariation analysis and structure-based alignment system of RNA sequences Shang, Lei, active 2013 10 February 2014 (has links) RNA molecules form complex higher-order structures which are essential to perform their biological activities. The accurate prediction of an RNA secondary structure and other higher-order structural constraints will significantly enhance the understanding of RNA molecules and help interpret their functions. Covariation analysis is the predominant computational method to accurately predict the base pairs in the secondary structure of RNAs. I developed a novel and powerful covariation method, Phylogenetic Events Count (PEC) method, to determine the positional covariation. The application of the PEC method onto a bacterial 16S rRNA sequence alignment proves that it is more sensitive and accurate than other mutual information based method in the identification of base-pairs and other structural constraints of the RNA structure. The analysis also discoveries a new type of structural constraint – neighbor effect, between sets of nucleotides that are in proximity in the three dimensional RNA structure with weaker but significant covariation with one another. Utilizing these covariation methods, a proposed secondary structure model of an entire HIV-1 genome RNA is evaluated. The results reveal that vast majority of the predicted base pairs in the proposed HIV-1 secondary structure model do not have covariation, thus lack the support from comparative analysis. Generating the most accurate multiple sequence alignment is fundamental and essential of performing high-quality comparative analysis. The rapid determination of nucleic acid sequences dramatically increases the number of available sequences. Thus developing the accurate and rapid alignment program for these RNA sequences has become a vital and challenging task to decipher the maximum amount of information from the data. A template-based RNA sequence alignment system, CRWAlign-2, is developed to accurately align new sequences to an existing reference sequence alignment based on primary and secondary structural similarity. A comparison of CRWAlign-2 with eight alternative widely-used alignment programs reveals that CRWAlign-2 outperforms other programs in aligning new sequences with higher accuracy. In addition to aligning sequences accurately, CRWAlign-2 also creates secondary structure models for each sequence to be aligned, which provides very useful information for the comparative analysis of RNA sequences and structures. The CRWAlign-2 program also provides opportunities for multiple areas including the identification of chimeric 16S rRNA sequences generated in microbiome sequencing projects. / text PEC Comparative analysis Covariation analysis Secondary structure prediction Sequence alignment Structure-based alignment CRWAlign2 HIV
77	Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction Mir arabbaygi, Siavash 18 September 2015 (has links) The amount of biological sequence data is increasing rapidly, a promising development that would transform biology if we can develop methods that can analyze large-scale data efficiently and accurately. A fundamental question in evolutionary biology is building the tree of life: a reconstruction of relationships between organisms in evolutionary time. Reconstructing phylogenetic trees from molecular data is an optimization problem that involves many steps. In this dissertation, we argue that to answer long-standing phylogenetic questions with large-scale data, several challenges need to be addressed in various steps of the pipeline. One challenges is aligning large number of sequences so that evolutionarily related positions in all sequences are put in the same column. Constructing alignments is necessary for phylogenetic reconstruction, but also for many other types of evolutionary analyses. In response to this challenge, we introduce PASTA, a scalable and accurate algorithm that can align datasets with up to a million sequences. A second challenge is related to the interesting fact that various parts of the genome can have different evolutionary histories. Reconstructing a species tree from genome-scale data needs to account for these differences. A main approach for species tree reconstruction is to first reconstruct a set of ``gene trees'' from different parts of the genome, and to then summarize these gene trees into a single species tree. We argue that this approach can suffer from two challenges: reconstruction of individual gene trees from limited data can be plagued by estimation error, which translates to errors in the species tree, and also, methods that summarize gene trees are not scalable or accurate enough under some conditions. To address the first challenge, we introduce statistical binning, a method that re-estimates gene trees by grouping them into bins. We show that binning improves gene tree accuracy, and consequently the species tree accuracy. To address the second challenge, we introduce ASTRAL, a new summary method that can run on a thousand genes and a thousand species in a day and has outstanding accuracy. We show that the development of these methods has enabled biological analyses that were otherwise not possible. Phylogenetics Multi-species coalescent model Multiple sequence alignment Statistical binning Species tree estimation
78	Fast and accurate estimation of large-scale phylogenetic alignments and trees Liu, Kevin Jensen 06 July 2011 (has links) Phylogenetics is the study of evolutionary relationships. Phylogenetic trees and alignments play important roles in a wide range of biological research, including reconstruction of the Tree of Life - the evolutionary history of all organisms on Earth - and the development of vaccines and antibiotics. Today's phylogenetic studies seek to reconstruct trees and alignments on a greater number and variety of organisms than ever before, primarily due to exponential growth in affordable sequencing and computing power. The importance of phylogenetic trees and alignments motivates the need for methods to reconstruct them accurately and efficiently on large-scale datasets. Traditionally, phylogenetic studies proceed in two phases: first, an alignment is produced from biomolecular sequences with differing lengths, and, second, a tree is produced using the alignment. My dissertation presents the first empirical performance study of leading two-phase methods on datasets with up to hundreds of thousands of sequences. Relatively accurate alignments and trees were obtained using methods with high computational requirements on datasets with a few hundred sequences, but as datasets grew past 1000 sequences and up to tens of thousands of sequences, the set of methods capable of analyzing a dataset diminished and only the methods with the lowest computational requirements and lowest accuracy remained. Alternatively, methods have been developed to simultaneously estimate phylogenetic alignments and trees. Methods optimizing the treelength optimization problem - the most widely-used approach for simultaneous estimation - have not been shown to return more accurate trees and alignments than two-phase approaches. I demonstrate that treelength optimization under a particular class of optimization criteria represents a promising means for inferring accurate trees and alignments. The other methods for simultaneous estimation are not known to support analyses of datasets with a few hundred sequences due to their high computational requirements. The main contribution of my dissertation is SATe, the first fast and accurate method for simultaneous estimation of alignments and trees on datasets with up to several thousand nucleotide sequences. SATe improves upon the alignment and topological accuracy of all existing methods, especially on the most difficult-to-align datasets, while retaining reasonable computational requirements. / text Computational phylogenetics Multiple sequence alignment Phylogeny Treelength optimization problem Simultaneous estimation Biological datasets
79	Algorithms for Sequence Similarity Measures MOHAMAD, Mustafa Amid 17 November 2010 (has links) Given two sets of points $A$ and $B$ ($\|A\| = m$, $\|B\| = n$), we seek to find a minimum-weight many-to-many matching which seeks to match each point in $A$ to at least one point in $B$ and vice versa. Each matched pair (an edge) has a weight. The goal is to find the matching that minimizes the total weight. We study two kinds of problems depending on the edge weight used. The first edge weight is the Euclidean distance, $d_1$. The second is edge weight is the square of the Euclidean distance, $d_2$. There already exists an $O(k\log k)$ algorithm for $d_1$, where $k=m+n$. We provide an $O(mn)$ algorithm for the $d_2$ problem. We also solve the problem of finding the minimum-weight matching when the sets $A$ and $B$ are allowed to be translated on the real line. We present an $O(mnk \log k)$ algorithm for the $d_1$ problem and an $O(3^{mn})$ algorithm for the $d_2$. Furthermore, we also deal with the special case where $A$ and $B$ lie on a circle of a specific circumference. We present an $O(k^2 \log k)$ algorithm and $O(kmn)$ algorithm for solving the minimum-weight matching for the $d_1$, and $d_2$ weights respectively. Much like the problem on the real line, we extend this problem to allow the sets $A$ and $B$ to be rotated on the circle. We try to find the minimum-weight many-to-many matching when rotations are allowed. For $d_1$ we present an $O(k^2mn \log k)$ algorithm and a $O(3^{mn})$ algorithm for $d_2$. / Thesis (Master, Computing) -- Queen's University, 2010-11-08 20:48:18.968 many-to-many matching algorithm matching minimum cost circle line translation rotation sequence alignment
80	MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data 2014 November 1900 (has links) The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs. Bioinformatics Sequence Alignment Smith-Waterman Algorithm GPU Computing CUDA Sequence Assembly Metagenomics Next-Generation-Sequencing

Search results