Spelling suggestions: "subject:"suffix tre"" "subject:"zeffix tre""
11 |
Spectrum access in cognitive radio networks based on prediction and estimationDevanarayana, Chamara January 2016 (has links)
In the literature, Cognitive radio (CR) as well as full-duplex (FD) communication technologies are proposed to increase the spectrum efficiency. The main contribution of this thesis is to introduce prediction and estimation techniques with low control overhead, and use the predicted or estimated information in resource allocation in CR networks, both in the overlay networks and the underlay networks. Prediction and estimation are important in increasing the data rate and keeping the interference at a low level.
In the overlay scheme, I modeled the primary user (PU) traffic characteristics of the channels using the Probabilistic Suffix Tree (PST) algorithm. Then using this PST algorithm, I introduced a frequency hopping based control channel and derived its theoretical properties. Then I proposed two methods for selecting a channel set for transmission, which took into account both the PU channel usage statistics and, secondary user (SU) channel usage statistics as perceived by an SU of interest. The first scheme selected channels having the highest probability of successful transmission, while the second calculated a net reward using a marked Markov chain. Then using simulations, I showed that our scheme caused acceptable interference to the PUs and has better throughput performance, compared to a scheme selecting channels randomly.
Then I proposed two joint channel assignment and power allocation schemes for a bi-directional FD underlay CR network with network assistance. The first scheme used the information on the number of total SU pairs present in the network. In the second scheme, I used least squares based estimation and Kalman filtering to estimate the interference at the monitoring stations using the local interference. It reduced the control overhead of keeping track of active SUs. In both of these schemes each SU pair decided on the channels to be used in the half-duplex mode and the full-duplex mode using local information. This joint optimization was done running channel assignment and power allocation algorithms alternatively. In the power allocation problem, I used a technique called monotonic optimization. After simulating both of these schemes I showed that the scheme based on estimation performs satisfactorily given that it has less control overhead. / October 2016
|
12 |
Computational Representation Of Protein Sequences For Homology Detection And ClassificationOgul, Hasan 01 January 2006 (has links) (PDF)
Machine learning techniques have been widely used for classification problems in computational biology. They require that the input must be a collection of fixedlength feature vectors. Since proteins are of varying lengths, there is a need for a
means of representing protein sequences by a fixed-number of features. This thesis
introduces three novel methods for this purpose: n-peptide compositions with
reduced alphabets, pairwise similarity scores by maximal unique matches, and
pairwise similarity scores by probabilistic suffix trees.
New sequence representations described in the thesis are applied on three
challenging problems of computational biology: remote homology detection,
subcellular localization prediction, and solvent accessibility prediction, with some
problem-specific modifications. Rigorous experiments are conducted on common
benchmarking datasets, and a comparative analysis is performed between the new
methods and the existing ones for each problem.
On remote homology detection tests, all three methods achieve competitive
accuracies with the state-of-the-art methods, while being much more efficient. A
combination of new representations are used to devise a hybrid system, called
PredLOC, for predicting subcellular localization of proteins and it is tested on two
distinct eukaryotic datasets. To the best of author&rsquo / s knowledge, the accuracy
achieved by PredLOC is the highest one ever reported on those datasets. The
maximal unique match method is resulted with only a slight improvement in
solvent accessibility predictions.
|
13 |
Celogenomové zarovnání pomocí suffixových stromů / Whole genome alignment using suffix treesKlouba, Lukáš January 2017 (has links)
The aim of this thesis is to create an algorithm that allows the alignment of the genome of two organisms by means of suffix structures and to implement it into the programming language environment R. The thesis deals with the description of the construction of the suffix structures and the methods of whole genome alignment. The result of the thesis is a functional algorithm for whole genome alignment by means of suffix structures implemented in the software environment R and its comparison with similar programs for the whole genome alignment.
|
14 |
Filtros para a busca e extração de padrões aproximados em cadeias biológicas / Filter Algorithms for Approximate Patterns Matching and Extraction from Biological StringsSoares Neto, Domingos 10 September 2008 (has links)
Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros. / This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others.
|
15 |
Filtros para a busca e extração de padrões aproximados em cadeias biológicas / Filter Algorithms for Approximate Patterns Matching and Extraction from Biological StringsDomingos Soares Neto 10 September 2008 (has links)
Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros. / This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others.
|
16 |
Vyhledávání přibližných palindromů v DNA sekvencích / Approximate Palindrome Detection in DNA SequencesDivila, Jaroslav January 2012 (has links)
This work deals with conception and implemetation of tools for finding approximate palindromes in DNA sequences. The work focuses on the description of DNA structure, and on the function of palindromes in DNA sequences, and on the description of methods for finding approximate palindromes. Main part of thesis is focused on conclusion and description of implementation approximate palidromes finding tool.
|
17 |
Reprezentace textu a její vliv na kategorizaci / Representation of Text and Its Influence on CategorizationŠabatka, Ondřej January 2010 (has links)
The thesis deals with machine processing of textual data. In the theoretical part, issues related to natural language processing are described and different ways of pre-processing and representation of text are also introduced. The thesis also focuses on the usage of N-grams as features for document representation and describes some algorithms used for their extraction. The next part includes an outline of classification methods used. In the practical part, an application for pre-processing and creation of different textual data representations is suggested and implemented. Within the experiments made, the influence of these representations on accuracy of classification algorithms is analysed.
|
18 |
Algorithmes pour la fouille de données et la bio-informatique / Algorithms for data mining and bio-informaticsMondal, Kartick Chandra 12 July 2013 (has links)
L'extraction de règles d'association et de bi-clusters sont deux techniques de fouille de données complémentaires majeures, notamment pour l'intégration de connaissances. Ces techniques sont utilisées dans de nombreux domaines, mais aucune approche permettant de les unifier n'a été proposée. Hors, réaliser ces extractions indépendamment pose les problèmes des ressources nécessaires (mémoire, temps d'exécution et accès aux données) et de l'unification des résultats. Nous proposons une approche originale pour extraire différentes catégories de modèles de connaissances tout en utilisant un minimum de ressources. Cette approche est basée sur la théorie des ensembles fermés et utilise une nouvelle structure de données pour extraire des représentations conceptuelles minimales de règles d'association, bi-clusters et règles de classification. Ces modèles étendent les règles d'association et de classification et les bi-clusters classiques, les listes d'objets supportant chaque modèle et les relations hiérarchiques entre modèles étant également extraits. Cette approche a été appliquée pour l'analyse de données d'interaction protéomiques entre le virus VIH-1 et l'homme. L'analyse de ces interactions entre espèces est un défi majeur récent en bio-informatique. Plusieurs bases de données intégrant des informations hétérogènes sur les interactions et des connaissances biologiques sur les protéines ont été construites. Les résultats expérimentaux montrent que l'approche proposée peut traiter efficacement ces bases de données et que les modèles conceptuels extraits peuvent aider à la compréhension et à l'analyse de la nature des relations entre les protéines interagissant. / Knowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins.
|
19 |
Towards expressive melodic accompaniment using parametric modeling of continuous musical elements in a multi-attribute prediction suffix trie frameworkMallikarjuna, Trishul 22 November 2010 (has links)
Elements of continuous variation such as tremolo, vibrato and portamento enable dimensions of their own in expressive melodic music in styles such as in Indian Classical Music. There is published work on parametrically modeling some of these elements individually, and to apply the modeled parameters to automatically generated musical notes in the context of machine musicianship, using simple rule-based mappings. There have also been many systems developed for generative musical accompaniment using probabilistic models of discrete musical elements such as MIDI notes and durations, many of them inspired by computational research in linguistics. There however doesn't seem to have been a combined approach of parametrically modeling expressive elements in a probabilistic framework. This documents presents a real-time computational framework that uses a multi-attribute trie / n-gram structure to model parameters like frequency, depth and/or lag of the expressive variations such as vibrato and portamento, along with conventionally modeled elements such as musical notes, their durations and metric positions in melodic audio input. This work proposes storing the parameters of expressive elements as metadata in the individual nodes of the traditional trie structure, along with the distribution of their probabilities of occurrence. During automatic generation of music, the expressive parameters as learned in the above training phase are applied to the associated re-synthesized musical notes. The model is aimed at being used to provide automatic melodic accompaniment in a performance scenario. The parametric modeling of the continuous expressive elements in this form is hypothesized to be able to capture deeper temporal relationships among musical elements and thereby is expected to bring about a more expressive and more musical outcome in such a performance than what has been possible using other works of machine musicianship using only static mappings or randomized choice. A system was developed on Max/MSP software platform with this framework, which takes in a pitched audio input such as human singing voice, and produces a pitch track which may be applied to synthesized sound of a continuous timbre. The system was trained and tested with several vocal recordings of North Indian Classical Music, and a subjective evaluation of the resulting audio was made using an anonymous online survey. The results of the survey show the output tracks generated from the system to be as musical and expressive, if not more, than the case where the pitch track generated from the original audio was directly rendered as output, and also show the output with expressive elements to be perceivably more expressive than the version of the output without expressive parameters. The results further suggest that more experimentation may be required to conclude the efficacy of the framework employed in relation to using randomly selected parameter values for the expressive elements. This thesis presents the scope, context, implementation details and results of the work, suggesting future improvements.
|
Page generated in 0.0467 seconds