1 |
Nonribosomal Peptide Identification with Tandem Mass Spectrometry by Searching Structural DatabaseYang, Lian 19 April 2012 (has links)
Nonribosomal peptides (NRP) are highlighted in pharmacological studies as novel NRPs are often promising substances for new drug development. To effectively discover novel NRPs from microbial fermentations, a crucial step is to identify known NRPs in an early stage and exclude them from further investigation. This so-called dereplication step ensures the scarce resource is only spent on the novel NRPs in the following up experiments. Tandem mass spectrometry has been routinely used for NRP dereplication. However, few bioinformatics tools have been developed to computationally identify NRP compounds from mass spectra, while manual identification is currently the roadblock hindering the throughput of novel NRP discovery.
In this thesis, we review the nature of nonribosomal peptides and investigate the challenges in computationally solving the identification problem. After that, iSNAP software is proposed as an automated and high throughput solution for tandem mass spectrometry based NRP identification. The algorithm has been evolved from the traditional database search approach for identifying sequential peptides, to one that is competent at handling complicated NRP structures. It is designed to be capable of identifying mixtures of NRP compounds from LC-MS/MS of complex extract, and also finding structural analogs which differ from an identified known NRP compound with one monomer. Combined with an
in-house NRP structural database of 1107 compounds, iSNAP is tested to be an effective tool for mass spectrometry based NRP identification.
The software is available as a web service at http://monod.uwaterloo.ca/isnap for the research community.
|
2 |
Nonribosomal Peptide Identification with Tandem Mass Spectrometry by Searching Structural DatabaseYang, Lian 19 April 2012 (has links)
Nonribosomal peptides (NRP) are highlighted in pharmacological studies as novel NRPs are often promising substances for new drug development. To effectively discover novel NRPs from microbial fermentations, a crucial step is to identify known NRPs in an early stage and exclude them from further investigation. This so-called dereplication step ensures the scarce resource is only spent on the novel NRPs in the following up experiments. Tandem mass spectrometry has been routinely used for NRP dereplication. However, few bioinformatics tools have been developed to computationally identify NRP compounds from mass spectra, while manual identification is currently the roadblock hindering the throughput of novel NRP discovery.
In this thesis, we review the nature of nonribosomal peptides and investigate the challenges in computationally solving the identification problem. After that, iSNAP software is proposed as an automated and high throughput solution for tandem mass spectrometry based NRP identification. The algorithm has been evolved from the traditional database search approach for identifying sequential peptides, to one that is competent at handling complicated NRP structures. It is designed to be capable of identifying mixtures of NRP compounds from LC-MS/MS of complex extract, and also finding structural analogs which differ from an identified known NRP compound with one monomer. Combined with an
in-house NRP structural database of 1107 compounds, iSNAP is tested to be an effective tool for mass spectrometry based NRP identification.
The software is available as a web service at http://monod.uwaterloo.ca/isnap for the research community.
|
3 |
IMPROVING REMOTE HOMOLOGY DETECTION USING A SEQUENCE PROPERTY APPROACHCooper, Gina Marie 29 September 2009 (has links)
No description available.
|
4 |
Novel data analysis methods and algorithms for identification of peptides and proteins by use of tandem mass spectrometryXu, Hua 30 August 2007 (has links)
No description available.
|
5 |
An enhanced GPU architecture for not-so-regular parallelism with special implications for database searchNarasiman, Veynu Tupil 27 June 2014 (has links)
Graphics Processing Units (GPUs) have become a popular platform for executing general purpose (i.e., non-graphics) applications. To run efficiently on a GPU, applications must be parallelized into many threads, each of which performs the same task but operates on different data (i.e., data parallelism). Previous work has shown that some applications experience significant speedup when executed on a GPU instead of a CPU. The applications that benefit most tend to have certain characteristics such as high computational intensity, regular control-flow and memory access patterns, and little to no communication among threads. However, not all parallel applications have these characteristics. Applications with a more balanced compute to memory ratio, divergent control flow, irregular memory accesses, and/or frequent communication (i.e., not-so-regular applications) will not take full advantage of the GPU's resources, resulting in performance far short of what could be delivered. The goal of this dissertation is to enhance the GPU architecture to better handle not-so-regular parallelism. This is accomplished in two parts. First, I analyze a diverse set of data parallel applications that suffer from divergent control-flow and/or significant stall time due to memory. I propose two microarchitectural enhancements to the GPU called the Large Warp Microarchitecture and Two-Level Warp Scheduling to address these problems respectively. When combined, these mechanisms increase performance by 19% on average. Second, I examine one of the most important and fundamental applications in computing: database search. Database search is an excellent example of an application that is rich in parallelism, but rife with not-so-regular characteristics. I propose enhancements to the GPU architecture including new instructions that improve intra-warp thread communication and decision making, and also a row-buffer locality hint bit to better handle the irregular memory access patterns of index-based tree search. These proposals improve performance by 21% for full table scans, and 39% for index-based search. The result of this dissertation is an enhanced GPU architecture that better handles not-so-regular parallelism. This increases the scope of applications that run efficiently on the GPU, making it a more viable platform not only for current parallel workloads such as databases, but also for future and emerging parallel applications. / text
|
6 |
Combinatorial structures for anonymous database searchStokes, Klara 18 October 2011 (has links)
This thesis treats a protocol for anonymous database search (or if one prefer, a protocol for user-private information retrieval), that is based on the use of combinatorial configurations. The protocol is called P2P UPIR. It is proved that the (v,k,1)-balanced incomplete block designs (BIBD) and in particular the finite projective planes are optimal configurations for this protocol. The notion of n-anonymity is applied to the configurations for P2P UPIR protocol and the transversal designs are proved to be n-anonymous configurations for P2P UPIR, with respect to the neighborhood points of the points of the configuration. It is proved that to the configurable tuples one can associate a numerical semigroup. This theorem implies results on existence of combinatorial configurations. The proofs are constructive and can be used as algorithms for finding combinatorial configurations. It is also proved that to the triangle-free configurable tuples one can associate a numerical semigroup. This implies results on existence of triangle-free combinatorial configurations.
|
7 |
Hamming DHTe HCube : arquiteturas distribuídas para busca por similaridade / Hamming DHTand HCube : distributed architectures for similarity searchVillaça, Rodolfo da Silva, 1974- 23 August 2018 (has links)
Orientador: Maurício Ferreira Magalhães / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-23T11:36:13Z (GMT). No. of bitstreams: 1
Villaca_RodolfodaSilva_D.pdf: 2446951 bytes, checksum: c6d907cab0de18a43fe707cae0e827a4 (MD5)
Previous issue date: 2013 / Resumo: Atualmente, a quantidade de dados disponíveis na Internet supera a casa dos Zettabytes (ZB), definindo um cenário conhecido na literatura como Big Data. Embora as soluções de banco de dados tradicionais sejam eficientes na busca e recuperação de um conteúdo específico e exato, elas são ineficientes nesse cenário de Big Data, visto que não foram projetadas para isso. Outra dificuldade é que esses dados são essencialmente não-estruturados e encontram-se diluídos em toda a vastidão da Internet. Desta forma, novas soluções de infraestruturas de bancos de dados são necessárias de modo a suportar a busca e recuperação de dados similares de maneira não exata, configurando-se a busca por similaridade, isto é, busca por grupos de dados que compartilham entre si alguma semelhança. Nesse cenário, a proposta desta tese é explorar a similaridade de Hamming existente entre identificadores de objetos gerados através da função Random Hyperplane Hashing. Essa característica presente nesses identificadores servirá de base para propostas de infra-estruturas distribuídas de armazenamento de dados capazes de suportar eficientemente a busca por similaridade. Nesta tese serão apresentadas a Hamming DHT, uma solução P2P baseada em redes sobrepostas, e o HCube, uma solução baseada em servidores para Data Center. As avaliações de ambas as soluções são apresentadas e mostram que elas são capazes de reduzir as distâncias entre conteúdos similares em ambientes distribuídos, o que contribui para o aumento da cobertura em cenários de busca por similaridade / Abstract: Nowadays, the amount of data available on the Internet is over Zettabytes (ZB). Such condition defines a scenario known in the literature as Big Data. Although traditional database solutions are very efficient for finding and retrieving a specific content, they are inefficient on Big Data scenario, since the great majority of such data is unstructured and scattered across the Internet. In this way, new databases are required in order to support queries capable of finding and recovering similar datasets, i.e., retrieving groups of data that share a common meaning. In order to handle such challenging scenario, the proposal in this thesis is to explore the Hamming similarity existent between content identifiers that are generated using the Random Hyperplane Hashing function. Such identifiers provide the basis for building distributed infrastructures that facilitate the similarity search. In this thesis, we present two different approaches: a P2P solution named Hamming DHT, and a Data Center solution named HCube. Evaluations of both solutions are presented and indicate that such solutions are capable of reducing the distance between similar content, improving the recall in a similarity search / Doutorado / Engenharia de Computação / Doutor em Engenharia Elétrica
|
8 |
Online nástroj pro průběžné sledování a zpracovávání výsledků z vyhledávacích strojů / Online Tool for Monitoring and Processing of Searching Engine ResultsSedlář, Petr January 2007 (has links)
The aim of this work was to design and implement the application that is easily, quickly and effectively able to determine the position in the www search engines based on the given World Wide Web pages and theirs key words. The application is accessible by the web interface and it enables comparison of the position statements from the archives, comparison of the graph outputs, as well as sending of PDF format files containing regular results statements. Together with the position it is stored the number of references of particular word in the search engine. Furthermore, it is possible to display also the position changes compared with the former statements. As stated above, it is possible to store the outputs in the PDF format documents and therefore it is easy to shore the entire development in your personal computer, in case online system is insufficient.
|
9 |
Implementace algoritmu pro hledání podobností DNA řetězců v FPGA / Approximate String Matching Algorithm Implementation in FPGAPařenica, Martin January 2007 (has links)
This paper describes sequence alignment algorithms of nucleotide sequences. There are described pairwise alignment algorithms using database search or dynamic programming. Then in the paper is description of dynamic programming for multiple sequences and algorithm that builds phylogenetic trees. At the end of the first part of the paper is the description of technology FPGA. In the second part that is more practical is described implemntation of the choosen one algorithm. This part includes also examples of some multiple alignments.
|
10 |
Towards decision-making to choose among different component originsBadampudi, Deepika January 2016 (has links)
Context: The amount of software in solutions provided in various domains is continuously growing. These solutions are a mix of hardware and software solutions, often referred to as software-intensive systems. Companies seek to improve the software development process to avoid delays or cost overruns related to the software development. Objective: The overall goal of this thesis is to improve the software development/building process to provide timely, high quality and cost efficient solutions. The objective is to select the origin of the components (in-house, outsource, components off-the-shelf (COTS) or open source software (OSS)) that facilitates the improvement. The system can be built of components from one origin or a combination of two or more (or even all) origins. Selecting a proper origin for a component is important to get the most out of a component and to optimize the development. Method: It is necessary to investigate the component origins to make decisions to select among different origins. We conducted a case study to explore the existing challenges in software development. The next step was to identify factors that influence the choice to select among different component origins through a systematic literature review using a snowballing (SB) strategy and a database (DB) search. Furthermore, a Bayesian synthesis process is proposed to integrate the evidence from literature into practice. Results: The results of this thesis indicate that the context of software-intensive systems such as domain regulations hinder the software development improvement. In addition to in-house development, alternative component origins (outsourcing, COTS, and OSS) are being used for software development. Several factors such as time, cost and license implications influence the selection of component origins. Solutions have been proposed to support the decision-making. However, these solutions consider only a subset of factors identified in the literature. Conclusions: Each component origin has some advantages and disadvantages. Depending on the scenario, one component origin is more suitable than the others. It is important to investigate the different scenarios and suitability of the component origins, which is recognized as future work of this thesis. In addition, the future work is aimed at providing models to support the decision-making process.
|
Page generated in 0.0487 seconds