Global ETD Search

21	An Extension of The Berry-Ravindran Algorithm for protein and DNA data Riekkola, Jesper January 2022 (has links) String matching algorithms are the algorithms used to search through different types of text in search of a certain pattern. Many of these algorithms achieve their impressive performance by analysing the pattern and saving that information. That information is then continuously used during the searching phase to know what parts of the text can be skipped. One such algorithm is the Berry-Ravindran. The Berry-Ravindran checks the two characters past the current try for a match and sees if those characters exist in the pattern. This thesis compares the Berry-Ravindran algorithm to new versions of itself that check three and four characters instead of two, along with the Boyer-Moore algorithm. Checking more characters improves the amount of the text that can be skipped by reducing the number of attempts needed but exponentially increases the pre-processing time. The improved performance in attempts does not necessarily mean a faster run-time because of the increased pre-processing time. The variable impacting the pre-processing time the biggest is the size of the alphabet that the text uses. This is researched by testing these algorithms with patterns ranging from 4 to 100 characters long on two different data sets. Protein data which has an alphabet size of 27 and DNA data which has an alphabet size of 4. Berry-Ravindran text matching string matching exact-matching string algorithms pattern matching Berry-Ravindran textmatchning strängmatchning Engineering and Technology Teknik och teknologier
22	Local Alignment of Gradient Features for Face Photo and Face Sketch Recognition Alex, Ann Theja January 2012 (has links) No description available. Computer Engineering Computer Science Electrical Engineering Face recognition Face sketch recognition String Matching Smith Waterman Algorithm Edge features Biometrics
23	Intrusion Detection and High-Speed Packet Classification Using Memristor Crossbars Bontupalli, Venkataramesh January 2015 (has links) No description available. Computer Engineering Electrical Engineering Intrusion Detection Memristor Crossbars High Speed Packet Classification Low Power Network Security SNORT String Matching Regular Expression Matching
24	Filtros para a busca e extração de padrões aproximados em cadeias biológicas / Filter Algorithms for Approximate Patterns Matching and Extraction from Biological Strings Soares Neto, Domingos 10 September 2008 (has links) Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros. / This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others. algoritmos bit-paralelos algoritmos de filtragem approximate string matching árvores dos sufixos bit-parallel algorithms busca aproximada de padrões extração de padrões filter algorithms motifs motifs patterns extraction q-gramas q-grams suffix array suffix tree vetor dos sufixos
25	Paralelização em CUDA do algoritmo Aho-Corasick utilizando as hierarquias de memórias da GPU e nova compactação da Tabela de Transcrição de Estados Silva Júnior, José Bonifácio da 21 June 2017 (has links) The Intrusion Detection System (IDS) needs to compare the contents of all packets arriving at the network interface with a set of signatures for indicating possible attacks, a task that consumes much CPU processing time. In order to alleviate this problem, some researchers have tried to parallelize the IDS's comparison engine, transferring execution from the CPU to GPU. This This dissertation aims to parallelize the Brute Force and Aho-Corasick string matching algorithms and to propose a new compression of the State Transition Table of the Aho-Corasick algorithm in order to make it possible to use it in shared memory and accelerate the comparison of strings. The two algorithms were parallelized using the NVIDIA CUDA platform and executed in the GPU memories to allow a comparative analysis of the performance of these memories. Initially, the AC algorithm proved to be faster than the Brute Force algorithm and so it was followed for optimization. The AC algorithm was compressed and executed in parallel in shared memory, achieving a performance gain of 15% over other GPU memories and being 48 times faster than its serial version when testing with real network packets. When the tests were done with synthetic data (less random data) the gain reached 73% and the parallel algorithm was 56 times faster than its serial version. Thus, it can be seen that the use of compression in shared memory becomes a suitable solution to accelerate the processing of IDSs that need agility in the search for patterns. / Um Sistema de Detecção de Intrusão (IDS) necessita comparar o conteúdo de todos os pacotes que chegam na interface da rede com um conjunto de assinaturas que indicam possíveis ataques, tarefa esta que consome bastante tempo de processamento da CPU. Para amenizar esse problema, tem-se tentado paralelizar o motor de comparação dos IDSs transferindo sua execução da CPU para a GPU. Esta dissertação tem como objetivo fazer a paralelização dos algoritmos de comparação de strings Força-Bruta e Aho-Corasick e propor uma nova compactação da Tabela de Transição de Estados do algoritmo Aho-Corasick a fim de possibilitar o uso dela na memória compartilhada e acelerar a comparação de strings. Os dois algoritmos foram paralelizados utilizando a plataforma CUDA da NVIDIA e executados nas memórias da GPU a fim de possibilitar uma análise comparativa de desempenho dessas memórias. Inicialmente, o algoritmo AC mostrou-se mais veloz do que o algoritmo Força-Bruta e por isso seguiu-se para sua otimização. O algoritmo AC foi compactado e executado de forma paralela na memória compartilhada, alcançando um ganho de desempenho de 15% em relação às outras memórias da GPU e sendo 48 vezes mais rápido que sua versão na CPU quando os testes foram feitos com pacotes de redes reais. Já quando os testes foram feitos com dados sintéticos (dados menos aleatórios) o ganho chegou a 73% e o algoritmo paralelo chegou a ser 56 vezes mais rápido que sua versão serial. Com isso, pode-se perceber que o uso da compactação na memória compartilhada torna-se uma solução adequada para acelerar o processamento de IDSs que necessitem de agilidade na busca por padrões. Ciência da computação Computação de alto desempenho Arquitetura de computador Segurança da informação GPUS CUDA Algoritmos de comparação de strings Aho-Corasick IDS Hierarquia de memória da GPU Técnicas de compactação String matching algorithms Aho-Corasick GPU memory hierarchy Compaction techniques
26	Filtros para a busca e extração de padrões aproximados em cadeias biológicas / Filter Algorithms for Approximate Patterns Matching and Extraction from Biological Strings Domingos Soares Neto 10 September 2008 (has links) Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros. / This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others. algoritmos bit-paralelos algoritmos de filtragem árvores dos sufixos busca aproximada de padrões extração de padrões motifs q-gramas vetor dos sufixos approximate string matching bit-parallel algorithms filter algorithms motifs patterns extraction q-grams suffix array suffix tree
27	Přibližné vyhledávání řetězců v předzpracovaných dokumentech / Approximate String Matching in Preprocessed Documents Toth, Róbert January 2014 (has links) This thesis deals with the problem of approximate string matching, also called string matching allowing errors. The thesis targets the area of offline algorithms, which allows very fast pattern matching thanks to index created during initial text preprocessing phase. Initially, we will define the problem itself and demonstrate variety of its applications, followed by short survey of different approaches to cope with this problem. Several existing algorithms based on suffix trees will be explained in detail and new hybrid algorithm will be proposed. Algorithms wil be implemented in C programming language and thoroughly compared in series of experiments with focus on newly presented algorithm.
28	Přibližná shoda znakových řetězců a její aplikace na ztotožňování metadat vědeckých publikací / Approximate equality of character strings and its application to record linkage in metadata of scientific publications Dobiášovský, Jan January 2020 (has links) The thesis explores the application of approximate string matching in scientific publication record linkage process. An introduction to record matching along with five commonly used metrics for string distance (Levenshtein, Jaro, Jaro-Winkler, Cosine distances and Jaccard coefficient) are provided. These metrics are applied on publication metadata from V3S current research information system of the Czech Technical University in Prague. Based on the findings, optimal thresholds in the F1, F2 and F3-measures are determined for each metric.
29	Detektering av fusk vid användning av AI : En studie av detektionsmetoder / Detection of cheating when using AI : A study of detection methods Ennajib, Karim, Liang, Tommy January 2023 (has links) Denna rapport analyserar och testar olika metoder som syftar till att särskiljamänskligt genererade lösningar på uppgifter och texter från de som genereras avartificiell intelligens. På senare tid har användningen av artificiell intelligens setten betydande ökning, särskilt bland studenter. Syftet med denna studie är attavgöra om det för närvarande är möjligt att upptäcka fusk från högskolestudenterinom elektroteknik som använder sig av AI. I rapporten testas lösningar påuppgifter och texter genererade av programmet ChatGPT med hjälp av en generellmetod och externa AI-verktyg. Undersökningen omfattar områdena matematik,programmering och skriven text. Resultatet av undersökningen tyder på att detinte är möjligt att upptäcka fusk med hjälp av AI i ämnena matematik ochprogrammering. Dock när det gäller text kan i viss utsträckning fusk vidanvändning av en AI upptäckas. / This report analyzes and tests various methods aimed at distinguishinghuman-generated solutions to tasks and texts from those generated by artificialintelligence. Recently the use of artificial intelligence has seen a significantincrease, especially among students. The purpose of this study is to determinewhether it is currently possible to detect if a college student in electricalengineering is using AI to cheat. In this report, solutions to tasks and textsgenerated by the program ChatGPT are tested using a general methodology andexternal AI-based tools. The research covers the areas of mathematics,programming and written text. The results of the investigation suggest that it is notpossible to detect cheating with the help of an AI in the subjects of mathematicsand programming. In the case of text, cheating by using an AI can be detected tosome extent. Artificial intelligence ChatGPT GPTZero AI text classifier artificial general intelligence artificial narrow intelligence artificial superintelligence plagiarism checker string matching stylometry Artificiell intelligens ChatGPT GPTZero AI text classifier artificial general intelligence artificial narrow intelligence artificial superintelligence plagiatkontroll strängmatchning stilometri Computer Systems Datorsystem
30	Hardwarová akcelerace algoritmu pro hledání podobnosti dvou DNA řetězců / Hardware Acceleration of Algorithms for Approximate String Matching Nosek, Ondřej January 2007 (has links) Methods for aproximate string matching of various sequences used in bioinformatics are crucial part of development in this branch. Tasks are of very large time complexity and therefore we want create a hardware platform for acceleration of these computations. Goal of this work is to design a generalized architecture based on FPGA technology, which can work with various types of sequences. Designed acceleration card will use especially dynamic algorithms like Needleman-Wunsch and Smith-Waterman.

Search results