Global ETD Search

11	Towards Secure Multipath TCP Communication Afzal, Zeeshan January 2017 (has links) The evolution in networking coupled with an increasing demand to improve user experience has led to different proposals to extend the standard TCP. Multipath TCP (MPTCP) is one such extension that has the potential to overcome few inherent limitations in the standard TCP. While MPTCP's design and deployment progresses, most of the focus has been on its compatibility. The security aspect is confined to making sure that the MPTCP protocol itself offers the same security level as the standard TCP. The topic of this thesis is to investigate the unexpected security implications raised by using MPTCP in the traditional networking environment. The Internet of today has security middle-boxes that perform traffic analysis to detect intrusions and attacks. Such middle-boxes make use of different assumptions about the traffic, e.g., traffic from a single connection always arrives along the same path. This along with many other assumptions may not be true anymore with the advent of MPTCP as traffic can be fragmented and sent over multiple paths simultaneously. We investigate how practical it is to evade a security middle-box by fragmenting and sending traffic across multiple paths using MPTCP. Realistic attack traffic is used to evaluate such attacks against Snort IDS to show that these attacks are feasible. We then go on to propose possible solutions to detect such attacks and implement them in an MPTCP proxy. The proxy aims to extend the MPTCP performance advantages to servers that only support standard TCP, while ensuring that intrusions can be detected as before. Finally, we investigate the potential MPTCP scenario where security middle-boxes only have access to some of the traffic. We propose and implement an algorithm to perform intrusion detection in such situations and achieve a nearly 90% detection accuracy. Another contribution of this work is a tool, that converts IDS rules into equivalent attack traffic to automate the evaluation of a middle-box. / Multipath TCP (MPTCP) is an extension to standard TCP that is close to being standardized. The design of the protocol is progressing, but most of the focus has so far been on its compatibility. The security aspect is confined to making sure that the MPTCP protocol itself offers the same security level as standard TCP. The topic of this thesis is to investigate the unexpected security implications raised by using MPTCP in a traditional networking environment. Today, the security middleboxes make use of different assumptions that may not be true anymore with the advent of MPTCP.We investigate how practical it is to evade a security middlebox by fragmenting and sending traffic across multiple paths using MPTCP. Realistic attack traffic generated from a tool that is also presented in this thesis is used to show that these attacks are feasible. We then go on to propose possible solutions to detect such attacks and implement them in an MPTCP proxy. The proxy aims to extend secure MPTCP performance advantages. We also investigate the MPTCP scenario where security middleboxes can only observe some of the traffic. We propose and implement an algorithm to perform intrusion detection in such situations and achieve a high detection accuracy. / HITS network security MPTCP TCP IDS snort edit-distance Computer Sciences Datavetenskap (datalogi)
12	Nové Odhady pro Kombinatorických Problémů a Kvazi-Grayových Kódů / New Bounds for Combinatorial Problems and Quasi-Gray Codes Das, Debarati January 2019 (has links) This thesis consists of two parts. In part I, a group of combinatorial problems pertaining to strings, boolean matrices and graphs is studied. For given two strings x and y, their edit distance is the minimum number of character insertions, deletions and substitutions required to convert x into y. In this thesis we provide an algorithm that computes a constant approximation of edit distance in truly sub-quadratic time. Based on the provided ideas, we construct a separate sub- quadratic time algorithm that can find an occurrence of a pattern P in a given text T while allowing a few edit errors. Afterwards we study the boolean matrix multiplication (BMM) problem where given two boolean matrices, the aim is to find their product over boolean semi-ring. For this problem, we present two combinatorial models and show in these models BMM requires Ω(n3 /2O( √ log n) ) and Ω(n7/3 /2O( √ log n) ) work respectively. Furthermore, we also give a construction of a sparse sub-graph that preserves the distance between a designated source and any other vertex as long as the total weight increment of all the edges is bounded by some constant. In part II, we study the efficient construction of quasi-Gray codes. We give a construction of space optimal quasi-Gray codes over odd sized alphabets with read complexity 4...
13	Methods for Analyzing Tree-Structured Data and their Applications to Computational Biology / 木構造データの解析手法とその計算生物学への応用 Mori, Tomoya 24 September 2015 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19336号 / 情博第588号 / 新制\|\|情\|\|103(附属図書館) / 32338 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授阿久津達也, 教授山本章博, 教授岡部寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM tree-structured data tree edit distance tree inclusion dynamic programming computational biology 007
14	Weighting Edit Distance to Improve Spelling Correction in Music Entity Search / Viktat ändringsavstånd för förbättrad stavningskorrigering vid sökning i en musikdatabas Samuelsson, Axel January 2017 (has links) This master’s thesis project undertook investigation of whether the extant Damerau- Levenshtein edit distance measurement between two strings could be made more useful for detecting and adjusting misspellings in a search query. The idea was to use the knowledge that many users type their queries using the QWERTY keyboard layout, and weighting the edit distance in a manner that makes it cheaper to correct misspellings caused by confusion of nearer keys. Two different weighting approaches were tested, one with a linear spread from 2/9 to 2 depending on the keyboard distance, and the other had neighbors preferred over non-neighbors (either with half the cost or no cost at all). They were tested against an unweighted baseline as well as inverted versions of themselves (nearer keys more expensive to replace) against a dataset of 1,162,145 searches. No significant improvement in the retrieval of search results were observed when compared to the baseline. However, each of the weightings performed better than its corresponding inversion on a p < 0.05 significance level. This means that while the weighted edit distance did not outperform the baseline, the data still clearly points toward a correlation between the physical position of keys on the keyboard, and what spelling mistakes are made. / Detta examensarbete åtog sig att undersöka om det etablerade Damerau-Levenshtein-avståndet som mäter avståndet kan anpassas för att bättre hitta och korrigera stavningsfel i sökfrågor. Tanken var att använda det faktum att många användare skriver sina sökfrågor på ett tangentbord med QWERTY-layout, och att vikta ändrings- avståndet så att det blir billigare att korrigera stavfel orsakade av hopblandning av två knappar som är närmare varandra. Två olika viktningar testades, en hade vikterna utspridda linjärt mellan 2/9 och 2, och den andra föredrog grannar över icke-grannar (antingen halva kostnaden eller ingen alls). De testades mot ett oviktat referensavstånd samt inversen av sig själva (så att närmare knappar blev dyrare att byta ut) mot ett dataset bestående av 1 162 145 sökningar. Ingen signifikant förbättring uppmättes gentemot referensen. Däremot presterade var och en av viktningarna bättre än sin inverterade motpart på konfidensnivå p < 0,05. Det innebär att trots att de viktade distansavstånden inte presterade bättre än referensen så pekar datan tydligt mot en korrelation mellan den fysiska positioneringen av knapparna på tangentbordet och vilka stavningsmisstag som begås. Spelling correction edit distance search music spotify trie Damerau Levenshtein Computer Sciences Datavetenskap (datalogi)
15	Repairing strings and trees Riveros Jaeger, Cristian January 2013 (has links) What do you do if a computational object fails a specification? An obvious approach is to repair it, namely, to modify the object minimally to get something that satisfies the constraints. In this thesis we study foundational problems of repairing regular specifications over strings and trees. Given two regular specifications R and T we aim to understand how difficult it is to transform an object satisfying R into an object satisfying T. The setting is motivated by considering R to be a restriction -- a constraint that the input object is guaranteed to satisfy -- while T is a target -- a constraint that we want to enforce. We first study which pairs of restriction and target specifications can be repaired with a ``small'' numbers of changes. We formalize this as the bounded repair problem -- to determine whether one can repair each object satisfying R into T with a uniform number of edits. We provide effective characterizations of the bounded repair problem for regular specifications over strings and trees. These characterizations are based on a good understanding of the cyclic behaviour of finite automata. By exploiting these characterizations, we give optimal algorithms to decide whether two specifications are bounded repairable or not. We also consider the impact of limitations on the editing process -- what happens when we require the repair to be done sequentially over serialized objects. We study the bounded repair problem over strings and trees restricted to this streaming setting and show that this variant can be characterized in terms of finite games. Furthermore, we use this characterization to decide whether one can repair a pair of specifications in a streaming fashion with bounded cost and how to obtain a streaming repair strategy in this case. The previous notion asks for a uniform bound on the number of edits, but having this property is a strong requirement. To overcome this limitation, we study how to calculate the maximum number of edits per character needed to repair any object in R into T. We formalize this as the asymptotic cost -- the limit of the number of edits divided by the length of the input in the worst case. Our contribution is an algorithm to compute the asymptotic cost for any pair of regular specifications over strings. We also consider the streaming variant of this cost and we show how to compute it by reducing this problem to mean-payoff games. 005.14
16	Distância de edição para estruturas de dados Silva Junior, Paulo Matias da January 2018 (has links) Orientador: Prof. Dr. Rodrigo de Alencar Hausen / Coorientador: Prof. Dr. Jerônimo Cordoni Pellegrini / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, Santo André, 2018. / O problema de distância de edição geral de árvores consiste na comparação de duas Árvores enraizadas e rotuladas a partir de operações de edição tais como a deleção e a inserção de nós, buscando obter o menor custo necessário para uma sequência de operações que transforme uma árvore em outra. Neste trabalho provamos que encontrar a maior subfloresta comum pela deleção de nós dentre duas árvores dadas, chamada de LCS-floresta, é um caso particular de distância de edição. Para o problema de encontrar a subárvore comum máxima entre duas árvores, existe uma demonstração feita por Valiente[Val02] de que esse problema é um caso particular de distância de edição considerando uma condição que preserva fortemente a ancestralidade entre os pares de nós das árvores comparadas. Realizamos uma demonstração alternativa para esse problema que toma por condição a existência de caminhos entre os pares de nós. Também estabelecemos uma hierarquia que relaciona as distâncias obtidas como solução desses três problemas, mostrando que a distância que se obtém como solução do problema de edição mais geral é limite inferior para a distância encontrada como solução do LCS-floresta, e esta última é limite inferior para a distância obtida com a subárvore comum máxima. Na segunda parte do trabalho, descrevemos as estruturas de dados como árvores enraizadas e rotuladas, assim pudemos aplicar o conceito de distância de edição e, com isso, analisar os custos para comparar uma estrutura de dados consigo mesma após uma sequência de operações. Para tal, modelamos os custos das operações nas árvores das respectivas estruturas considerando informações como o número de nós da árvore e o nível do nó que passou pela operação. Nos modelos de pilha, lista ligada e árvore de busca binária as distâncias de edição foram relacionadas às complexidades de tempo de se operar nessas estruturas. Adaptamos também os custos operacionais para tries e árvores B. Realizamos experimentos para calcular as distâncias de edição de uma estrutura de dados consigo mesma após uma sequência aleatória de operações com o intuito de verificar como essas medidas de distância atuavam sobre cada estrutura. Observamos nesses testes que o tamanho da sequência influencia na distância final. Também verificamos que os custos operacionais que consideram o nível do nó operado obtinham distâncias menores se comparadas com aquelas obtidas pelo custo de tamanho da estrutura. / The general tree edit distance problem consists in the comparison between two rooted labelled trees using operations which change one tree into another. The tree edit distance is defined as the minimum cost sequence of edit operations needed to transform two trees. The edit operations studied are inserting, deleting and replacing nodes. In this work, we prove that find the largest common subforest between trees restricted to node deletion, called LCS-forest, is a particular case of tree edit distance. Valiente [Val02] proved that find the maximum common subtree is a particular case of tree edit distance considering a ancestrality preserving condition, while we present an alternative proof using paths between pair of nodes. These three problems of distance are shown related in a hierarchy, where the general tree edit distance is a lower bound of the distance value obtained from LCS-forest solution. The latter is a lower bound of the distance obtained from maximum common subtree solution. In the second part of this work, we describe data structures as rooted labelled trees. Then it is possible to compare a data structure with itself after a sequence of operations applying the tree edit distance. For this, the model of operational cost of a tree considers information like number of nodes in the tree and level of operated node. The data structures modeled as trees were stack, linked list and binary search tree. The models associate the edit distance with the time complexities of these data structures operations. The operational costs of tries and B-trees also were adaptated for the edit distances. Some experiments to compute the distances are presented. They compare each data structure with itself after random sequences of operations. The results show how each proposed measure operate on the respective structure. The sequence size was an influence factor on distance values. For the operational costs, the cost defined as the level of operated nodes obtain smaller distances compared to the case of cost defined as the structure size. DISTÂNCIA DE EDIÇÃO ESTRUTURAS DE DADOS ÁRVORES EDIT DISTANCE DATA STRUCTURES TREES
17	Approximate string matching distance for image classification / Distance d’édition entre chaines d’histogrammes pour la classification d’images Nguyen, Hong-Thinh 29 August 2014 (has links) L'augmentation exponentielle du nombre d'images nécessite des moyens efficaces pour les classer en fonction de leur contenu visuel. Le sac de mot visuel (Bag-Of-visual-Words, BOW), en raison de sa simplicité et de sa robustesse, devient l'approche la plus populaire. Malheureusement, cette approche ne prend pas en compte de l'information spatiale, ce qui joue un rôle important dans les catégories de modélisation d'image. Récemment, Lazebnik ont introduit la représentation pyramidale spatiale (Spatial Pyramid Representation, SPR) qui a incorporé avec succès l'information spatiale dans le modèle BOW. Néanmoins, ce système de correspondance rigide empêche la SPR de gérer les variations et les transformations d'image. L'objectif principal de cette thèse est d'étudier un modèle de chaîne de correspondance plus souple qui prend l'avantage d'histogrammes de BOW locaux et se rapproche de la correspondance de la chaîne. Notre première contribution est basée sur une représentation en chaîne et une nouvelle distance d'édition (String Matching Distance, SMD) bien adapté pour les chaînes de l'histogramme qui peut calculer efficacement par programmation dynamique. Un noyau d'édition correspondant comprenant à la fois d'une pondération et d'un système pyramidal est également dérivée. La seconde contribution est une version étendue de SMD qui remplace les opérations d'insertion et de suppression par les opérations de fusion entre les symboles successifs, ce qui apporte de la souplesse labours et correspond aux images. Toutes les distances proposées sont évaluées sur plusieurs jeux de données tâche de classification et sont comparés avec plusieurs approches concurrentes / The exponential increasing of the number of images requires efficient ways to classify them based on their visual content. The most successful and popular approach is the Bag of visual Word (BoW) representation due to its simplicity and robustness. Unfortunately, this approach fails to capture the spatial image layout, which plays an important roles in modeling image categories. Recently, Lazebnik et al (2006) introduced the Spatial Pyramid Representation (SPR) which successfully incorporated spatial information into the BoW model. The idea of their approach is to split the image into a pyramidal grid and to represent each grid cell as a BoW. Assuming that images belonging to the same class have similar spatial distributions, it is possible to use a pairwise matching as similarity measurement. However, this rigid matching scheme prevents SPR to cope with image variations and transformations. The main objective of this dissertation is to study a more flexible string matching model. Keeping the idea of local BoW histograms, we introduce a new class of edit distance to compare strings of local histograms. Our first contribution is a string based image representation model and a new edit distance (called SMD for String Matching Distance) well suited for strings composed of symbols which are local BoWs. The new distance benefits from an efficient Dynamic Programming algorithm. A corresponding edit kernel including both a weighting and a pyramidal scheme is also derived. The performance is evaluated on classification tasks and compared to the standard method and several related methods. The new method outperforms other methods thanks to its ability to detect and ignore identical successive regions inside images. Our second contribution is to propose an extended version of SMD replacing insertion and deletion operations by merging operations between successive symbols. In this approach, the number of sub regions ie. the grid divisions may vary according to the visual content. We describe two algorithms to compute this merge-based distance. The first one is a greedy version which is efficient but can produce a non optimal edit script. The other one is an optimal version but it requires a 4th degree polynomial complexity. All the proposed distances are evaluated on several datasets and are shown to outperform comparable existing methods. Distance d'édition Sac de mots visuels Correspondance spatiale Classification d'images Edit distance Bag of visual words Spatial matching Image classification
18	Spell checkers and correctors : a unified treatment Liang, Hsuan Lorraine 25 June 2009 (has links) The aim of this dissertation is to provide a unified treatment of various spell checkers and correctors. Firstly, the spell checking and correcting problems are formally described in mathematics in order to provide a better understanding of these tasks. An approach that is similar to the way in which denotational semantics used to describe programming languages is adopted. Secondly, the various attributes of existing spell checking and correcting techniques are discussed. Extensive studies on selected spell checking/correcting algorithms and packages are then performed. Lastly, an empirical investigation of various spell checking/correcting packages is presented. It provides a comparison and suggests a classification of these packages in terms of their functionalities, implementation strategies, and performance. The investigation was conducted on packages for spell checking and correcting in English as well as in Northern Sotho and Chinese. The classification provides a unified presentation of the strengths and weaknesses of the techniques studied in the research. The findings provide a better understanding of these techniques in order to assist in improving some existing spell checking/correcting applications and future spell checking/correcting package designs and implementations. / Dissertation (MSc)--University of Pretoria, 2009. / Computer Science / unrestricted Performance N-gram Fsa Formal concept analysis Edit distance Dictionary lookup Classification Spell checking Spell correcting UCTD
19	Mining for Frequent Community Structures using Approximate Graph Matching Kolli, Lakshmi Priya 15 July 2021 (has links) No description available. Computer Science Frequent subgraph mining Approximate Graph Matching Random Walks Markov Clustering Graph Edit Distance KL Divergence
20	Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction Pettersson, Eva January 2016 (has links) Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting. NLP for historical text spelling normalisation digital humanities information extraction SMT Levenshtein edit distance language technology computational linguistics

Search results