Global ETD Search

681	Generalized pattern matching applied to genetic analysis. / 通用性模式匹配在基因序列分析中的應用 / CUHK electronic theses & dissertations collection / Digital dissertation consortium / Tong yong xing mo shi pi pei zai ji yin xu lie fen xi zhong de ying yong January 2011 (has links) Approximate pattern matching problem is, given a reference sequence T, a pattern (query) Q, and a maximum allowed error e, to find all the substrings in the reference, such that the edit distance between the substrings and the pattern is smaller than or equal to the maximum allowed error. Though it is a well-studied problem in Computer Science, it gains a resurrection in Bioinformatics in recent years, largely due to the emergence of the next-generation high-throughput sequencing technologies. This thesis contributes in a novel generalized pattern matching framework, and applies it to solve pattern matching problems in general and alternative splicing detection (AS) in particular. AS is to map a large amount of next-generation sequencing short reads data to a reference human genome, which is the first and an important step in analyzing the sequenced data for further Biological analysis. The four parts of my research are as follows. / In the first part of my research work, we propose a novel deterministic pattern matching algorithm which applies Agrep, a well-known bit-parallel matching algorithm, to a truncated suffix array. Due to the linear cost of Agrep, the cost of our approach is linear to the number of characters processed in the truncated suffix array. We analyze the matching cost theoretically, and .obtain empirical costs from experiments. We carry out experiments using both synthetic and real DNA sequence data (queries) and search them in Chromosome-X of a reference human genome. The experimental results show that our approach achieves a speed-up of several magnitudes over standard Agrep algorithm. / In the fourth part, we focus on the seeding strategies for alternative splicing detection. We review the history of seeding-and-extending (SAE), and assess both theoretically and empirically the seeding strategies adopted in existing splicing detection tools, including Bowtie's heuristic and ABMapper's exact seedings, against the novel complementary quad-seeding strategy we proposed and the corresponding novel splice detection tool called CS4splice, which can handle inexact seeding (with errors) and all 3 types of errors including mismatch (substitution), insertion, and deletion. We carry out experiments using short reads (queries) of length 105bp comprised of several data sets consisting of various levels of errors, and align them back to a reference human genome (hg18). On average, CS4splice can align 88. 44% (recall rate) of 427,786 short reads perfectly back to the reference; while the other existing tools achieve much smaller recall rates: SpliceMap 48.72%, MapSplice 58.41%, and ABMapper 51.39%. The accuracies of CS4splice are also the highest or very close to the highest in all the experiments carried out. But due to the complementary quad-seeding that CS4splice use, it takes more computational resources, about twice (or more) of the other alternative splicing detection tools, which we think is practicable and worthy. / In the second part, we define a novel generalized pattern (query) and a framework of generalized pattern matching, for which we propose a heuristic matching algorithm. Simply speaking, a generalized pattern is Q 1G1Q2 ... Qc--1Gc--1 Qc, which consists of several substrings Q i and gaps Gi occurring in-between two substrings. The prototypes of the generalized pattern come from several real Biological problems that can all be modeled as generalized pattern matching problems. Based on a well-known seeding-and-extending heuristic, we propose a dual-seeding strategy, with which we solve the matching problem effectively and efficiently. We also develop a specialized matching tool called Gpattern-match. We carry out experiments using 10,000 generalized patterns and search them in a reference human genome (hg18). Over 98.74% of them can be recovered from the reference. It takes 1--2 seconds on average to recover a pattern, and memory peak goes to a little bit more than 1G. / In the third part, a natural extension of the second part, we model a real biological problem, alternative splicing detection, into a generalized pattern matching problem, and solve it using a proposed bi-directional seeding-and-extending algorithm. Different from all the other tools which depend on third-party tools, our mapping tool, ABMapper, is not only stand-alone but performs unbiased alignments. We carry out experiments using 427,786 real next-generation sequencing short reads data (queries) and align them back to a reference human genome (hg18). ABMapper achieves 98.92% accuracy and 98.17% recall rate, and is much better than the other state-of-the-art tools: SpliceMap achieves 94.28% accuracy and 78.13% recall rate;while TopHat 88.99% accuracy and 76.33% recall rate. When the seed length is set to 12 in ABMapper, the whole searching and alignment process takes about 20 minutes, and memory peak goes to a little bit more than 2G. / Ni, Bing. / Adviser: Kwong-Sak Leung. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical referencesTexture mapping (leaves 151-161). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Combinatorial analysis Computational biology Computer algorithms DNA--Analysis--Data processing Genetics--Methodology Matching theory Proteins--Analysis--Data processing Computational Biology--methods Sequence Analysis, DNA Sequence Analysis, Protein
682	Predictive Models for Ebola using Machine Learning Algorithms Unknown Date (has links) Identifying and tracking individuals affected by this virus in densely populated areas is a unique and an urgent challenge in the public health sector. Currently, mapping the spread of the Ebola virus is done manually, however with the help of social contact networks we can model dynamic graphs and predictive diffusion models of Ebola virus based on the impact on either a specific person or a specific community. With the help of this model, we can make more precise forward predictions of the disease propagations and to identify possibly infected individuals which will help perform trace – back analysis to locate the possible source of infection for a social group. This model will visualize and identify the families and tightly connected social groups who have had contact with an Ebola patient and is a proactive approach to reduce the risk of exposure of Ebola spread within a community or geographic location. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2017. / FAU Electronic Theses and Dissertations Collection Communicable diseases--Epidemiology. Public health surveillance. Ebola virus disease--Transmission. Machine learning. Computer algorithms. Virtual reality. Interactive multimedia. Computer graphics. History--Graphic methods.
683	High Level Preprocessor of a VHDL-based Design System Palanisamy, Karthikeyan 27 October 1994 (has links) This thesis presents the work done on a design automation system in which high-level synthesis is integrated with logic synthesis. DIADESfa design automation system developed at PSU, starts the synthesis process from a language called ADL. The major part of this thesis deals with transforming the ADL -based DIADES system into a VHDL -based DIADES system. In this thesis I have upgraded and modified the existing DIADES system so that it becomes a preprocessor to a comprehensive VHDL -based design system from Mentor Graphics. The high-level synthesis in the DIADES system includes two stages: data path synthesis and control unit synthesis. The conversion of data path synthesis is done in this thesis. In the DIADES system a digital system is described on the behavioral level in terms of variables and operations using the language ADL. The digital system described in ADL is compiled to a format called GRAPH language. In the GRAPH language the behavior of a digital system is represented by a specific sequence of program statements. The descriptions in the GRAPH language is compiled to a format called STRU CT language. The system is described in the STRU CT language in terms of lists of nodes and arrows. The main task of this thesis is to convert the descriptions in the GRAPH language and the descriptions in the STRUCT language to the VHDL format. All the generated VHDL Code will be Mentor Graphics VHDL format compatible, and all the VHDL code can be compiled, simulated and synthesised by the Mentor Graphics tools. Computer algorithms DIADES (Computer program) Electrical and Computer Engineering
684	Quasi-Static Deflection Compensation Control of Flexible Manipulator Feng, Jingbin 06 May 1993 (has links) The growing need in industrial applications of high-performance robots has led to designs of lightweight robot arms. However the light-weight robot arm introduces accuracy and vibration problems. The classical robot design and control method based on the rigid body assumption is no longer satisfactory for the light-weight manipulators. The effects of flexibility of light-weight manipulators have been an active research area in recent years. A new approach to correct the quasi-static position and orientation error of the end-effector of a manipulator with flexible links is studied in this project. In this approach, strain gages are used to monitor the elastic reactions of the flexible links due to the weight of the manipulator and the payload in real time, the errors are then compensated on-line by a control algorithm. Although this approach is designed to work for general loading conditions, only the bending deflection in a plane is investigated in detail. It is found that a minimum of two strain gages per link are needed to monitor the deflection of a robot arm subjected to bending. A mathematical model relating the deflections and strains is developed using Castigliano's theorem of least work. The parameters of the governing equations are obtained using the identification method. With the identification method, the geometric details of the robot arms and the carrying load need not be known. The deflections monitored by strain gages are fed back to the kinematic model of the manipulator to find the position and orientation of the end-effector of the manipulator. A control algorithm is developed to compensate the deflections. The inverse kinematics that includes deflections as variables is solved in closed form. If the deflections at target position are known, this inverse kinematics will generate the exact joint command for the flexible manipulator. However the deflections of the robot arms at the target position are unknown ahead of time, the current deflections at each sampling time are used to predict the deflections at target position and the joint command is modified until the required accuracy is obtained. An experiment is set up to verify the mathematical model relating the strains to the deflections. The results of the experiment show good agreement with the model. The compensation control algorithm is first simulated in a computer program. The simulation also shows good convergence. An experimental manipulator with two flexible links is built to prove this approach. The experimental results show that this compensation control improves the position accuracy of the flexible manipulator significantly. The following are the brief advantages of this approach: the deflections can be monitored without measuring the payload directly and without the detailed knowledge of link geometry~ the manipulator calibrates itself with minimum human intervention; the compensation control algorithm can be easily integrated with the existing uncompensated rigid-body algorithm~ it is inexpensive and practical for implementation to manipulators installed in workplaces. Computer algorithms Mechanical Engineering
685	A probabilistic framework and algorithms for modeling and analyzing multi-instance data Behmardi, Behrouz 28 November 2012 (has links) Multi-instance data, in which each object (e.g., a document) is a collection of instances (e.g., word), are widespread in machine learning, signal processing, computer vision, bioinformatic, music, and social sciences. Existing probabilistic models, e.g., latent Dirichlet allocation (LDA), probabilistic latent semantic indexing (pLSI), and discrete component analysis (DCA), have been developed for modeling and analyzing multiinstance data. Such models introduce a generative process for multi-instance data which includes a low dimensional latent structure. While such models offer a great freedom in capturing the natural structure in the data, their inference may present challenges. For example, the sensitivity in choosing the hyper-parameters in such models, requires careful inference (e.g., through cross-validation) which results in large computational complexity. The inference for fully Bayesian models which contain no hyper-parameters often involves slowly converging sampling methods. In this work, we develop approaches for addressing such challenges and further enhancing the utility of such models. This dissertation demonstrates a unified convex framework for probabilistic modeling of multi-instance data. The three main aspects of the proposed framework are as follows. First, joint regularization is incorporated into multiple density estimation to simultaneously learn the structure of the distribution space and infer each distribution. Second, a novel confidence constraints framework is used to facilitate a tuning-free approach to control the amount of regularization required for the joint multiple density estimation with theoretical guarantees on correct structure recovery. Third, we formulate the problem using a convex framework and propose efficient optimization algorithms to solve it. This work addresses the unique challenges associated with both discrete and continuous domains. In the discrete domain we propose a confidence-constrained rank minimization (CRM) to recover the exact number of topics in topic models with theoretical guarantees on recovery probability and mean squared error of the estimation. We provide a computationally efficient optimization algorithm for the problem to further the applicability of the proposed framework to large real world datasets. In the continuous domain, we propose to use the maximum entropy (MaxEnt) framework for multi-instance datasets. In this approach, bags of instances are represented as distributions using the principle of MaxEnt. We learn basis functions which span the space of distributions for jointly regularized density estimation. The basis functions are analogous to topics in a topic model. We validate the efficiency of the proposed framework in the discrete and continuous domains by extensive set of experiments on synthetic datasets as well as on real world image and text datasets and compare the results with state-of-the-art algorithms. / Graduation date: 2013 Multi-instance learning Maximum entropy Low-rank matrix recovery Confidence-constraint Nuclear norm minimization Entropy (Information theory) Machine learning -- Mathematical models Computer algorithms
686	Analyzing hybrid architectures for massively parallel graph analysis Ediger, David 08 April 2013 (has links) The quantity of rich, semi-structured data generated by sensor networks, scientific simulation, business activity, and the Internet grows daily. The objective of this research is to investigate architectural requirements for emerging applications in massive graph analysis. Using emerging hybrid systems, we will map applications to architectures and close the loop between software and hardware design in this application space. Parallel algorithms and specialized machine architectures are necessary to handle the immense size and rate of change of today's graph data. To highlight the impact of this work, we describe a number of relevant application areas ranging from biology to business and cybersecurity. With several proposed architectures for massively parallel graph analysis, we investigate the interplay of hardware, algorithm, data, and programming model through real-world experiments and simulations. We demonstrate techniques for obtaining parallel scaling on multithreaded systems using graph algorithms that are orders of magnitude faster and larger than the state of the art. The outcome of this work is a proposed hybrid architecture for massive-scale analytics that leverages key aspects of data-parallel and highly multithreaded systems. In simulations, the hybrid systems incorporating a mix of multithreaded, shared memory systems and solid state disks performed up to twice as fast as either homogeneous system alone on graphs with as many as 18 trillion edges. Data intensive computing Computer architectures Cray XMT Streaming graph algorithms Multithreaded graph algorithms Computer algorithms Graph algorithms Parallel algorithms
687	Implementation of adaptive digital FIR and reprogrammable mixed-signal filters using distributed arithmetic Huang, Walter 12 November 2009 (has links) When computational resources are limited, especially multipliers, distributed arithmetic (DA) is used in lieu of the typical multiplier-based filtering structures. However, DA is not well suited for adaptive applications. The bottleneck is updating the memory table. Several attempts have been done to accelerate updating the memory, but at the expense of additional memory usage and of convergence speed. To develop an adaptive DA filter with an uncompromised convergence rate, the memory table must be fully updated. In this research, an efficient method for fully updating a DA memory table is proposed. The proposed update method is based on exploiting the temporal locality of the stored data and subexpression sharing. The proposed update method reduces the computational workload and requires no additional memory resources. DA using the proposed update method is called conjugate distributed arithmetic. Filters can also be constructed from analog components. Often, for lower precision computations, analog circuits use less power and less chip area than their digital counterparts. However, digital components are often used because of their ease of reprogrammability. Achieving such reprogrammability in analog is possible, but at the expense of additional chip area. A reprogrammable mixed-signal DA finite impulse response (FIR) filter is proposed to address the issues with reprogrammable analog FIR filters that are constructing compact reprogrammable filtering structures, non-symmetric and imprecise filter coefficients, inconsistent sampling of the input data, and input sample data corruption. These issues are successfully addressed using distributed arithmetic, digital registers, and epots. Also, a mixed-signal DA second-order section (SOS), which is used as the building block for higher order infinite impulse response filters, was proposed. The type of issues with an analog SOS filter are similar to those of an analog FIR filter, which are the lack of a compact reprogrammable filtering structure, the imprecise filter coefficients, the inconsistent sampling of the data, and the corruption of the data samples. These issues are successfully addressed using distributed arithmetic and digital registers. Mixed-signal implementations Reprogrammable Distributed arithmetic Adaptive filtering implementations Adaptive filters Adaptive signal processing Computer arithmetic and logic units Computer algorithms
688	Statistical modeling of the human sleep process via physiological recordings Fairley, Jacqueline Antoinette 09 January 2009 (has links) The main objective of this work was the development of a computer-based Expert Sleep Analysis Methodology (ESAM) to aid sleep care physicians in the diagnosis of pre-Parkinson's disease symptoms using polysomnogram data. ESAM is significant because it streamlines the analysis of the human sleep cycles and aids the physician in the identification, treatment, and prediction of sleep disorders. In this work four aspects of computer-based human sleep analysis were investigated: polysomnogram interpretation, pre-processing, sleep event classification, and abnormal sleep detection. A review of previous developments in these four areas is provided along with their relationship to the establishment of ESAM. Polysomnogram interpretation focuses on the ambiguities found in human polysomnogram analysis when using the rule based 1968 sleep staging manual edited by Rechtschaffen and Kales (R&K). ESAM is presented as an alternative to the R&K approach in human polysomnogram interpretation. The second area, pre-processing, addresses artifact processing techniques for human polysomnograms. Sleep event classification, the third area, discusses feature selection, classification, and human sleep modeling approaches. Lastly, abnormal sleep detection focuses on polysomnogram characteristics common to patients suffering from Parkinson's disease. The technical approach in this work utilized polysomnograms of control subjects and pre-Parkinsonian disease patients obtained from the Emory Clinic Sleep Disorders Center (ECSDC) as inputs into ESAM. The engineering tools employed during the development of ESAM included the Generalized Singular Value Decomposition (GSVD) algorithm, sequential forward and backward feature selection algorithms, Particle Swarm Optimization algorithm, k-Nearest Neighbor classification, and Gaussian Observation Hidden Markov Modeling (GOHMM). In this study polysomnogram data was preprocessed for artifact removal and compensation using band-pass filtering and the GSVD algorithm. Optimal features for characterization of polysomnogram data of control subjects and pre-Parkinsonian disease patients were obtained using the sequential forward and backward feature selection algorithms, Particle Swarm Optimization, and k-Nearest Neighbor classification. ESAM output included GOHMMs constructed for both control subjects and pre-Parkinsonian disease patients. Furthermore, performance evaluation techniques were implemented to make conclusions regarding the constructed GOHMM's reflection of the underlying nature of the human sleep cycle. Evolutionary computer algorithms Biosignal processing Quantitative-based human sleep analysis Human sleep pathology Sleep Polysomnography Parkinson's disease Computer simulation Automatic classification
689	Worst-case robot navigation in deterministic environments Mudgal, Apurva 02 December 2009 (has links) We design and analyze algorithms for the following two robot navigation problems: 1. TARGET SEARCH. Given a robot located at a point s in the plane, how will a robot navigate to a goal t in the presence of unknown obstacles ? 2. LOCALIZATION. A robot is "lost" in an environment with a map of its surroundings. How will it find its true location by traveling the minimum distance ? Since efficient algorithms for these two problems will make a robot completely autonomous, they have held the interest of both robotics and computer science communities. Previous work has focussed mainly on designing competitive algorithms where the robot's performance is compared to that of an omniscient adversary. For example, a competitive algorithm for target search will compare the distance traveled by the robot with the shortest path from s to t. We analyze these problems from the worst-case perspective, which, in our view, is a more appropriate measure. Our results are : 1. For target search, we analyze an algorithm called Dynamic A. The robot continuously moves to the goal on the shortest path which it recomputes on the discovery of obstacles. A variant of this algorithm has been employed in Mars Rover prototypes. We show that D takes O(n log n) time on planar graphs and also show a comparable bound on arbitrary graphs. Thus, our results show that D* combines the optimistic possibility of reaching the goal very soon while competing with depth-first search within a logarithmic factor. 2. For the localization problem, worst-case analysis compares the performance of the robot with the optimal decision tree over the set of possible locations. No approximation algorithm has been known. We give a polylogarithmic approximation algorithm and also show a near-tight lower bound for the grid graphs commonly used in practice. The key idea is to plan travel on a "majority-rule map" which eliminates uncertainty and permits a link to the half-Group Steiner problem. We also extend the problem to polygonal maps by discretizing the domain using novel geometric techniques. Worst-case robot navigation Dynamic A* Robot localization problem Approximation algorithm Group Steiner tree problem Robots Control systems Robotics Computer algorithms Algorithms Mobile robots Autonomous robots Control systems
690	Algorithms for large graphs Das Sarma, Atish 01 July 2010 (has links) No description available. Random walks Distances Distributed computing Graphs PageRank Distributed algorithms Online algorithms Streaming algorithms Algorithms Parallel algorithms Graph algorithms Computer algorithms

Search results