Global ETD Search

11	Authorship Attribution with Function Word N-Grams Johnson, Russell Clark 01 January 2013 (has links) Prior research has considered the sequential order of function words, after the contextual words of the text have been removed, as a stylistic indicator of authorship. This research describes an effort to enhance authorship attribution accuracy based on this same information source with alternate classifiers, alternate n-gram construction methods, and a genetically tuned configuration. The approach is original in that it is the first time that probabilistic versions of Burrows's Delta have been used. Instead of using z-scores as an input for a classifier, the z-scores were converted to probabilistic equivalents (since z-scores cannot be subtracted, added, or divided without the possibility of distorting their probabilistic meaning); this adaptation enhanced accuracy. Multiple versions of Burrows's Delta were evaluated; this includes a hybrid of the Probabilistic Burrows's Delta and the version proposed by Smith & Aldridge (2011); in this case accuracy was enhanced when individual frequent words were evaluated as indicators of style. Other novel aspects include alternate n-gram construction methods; a reconciliation process that allows texts of various lengths from different authors to be compared; and a GA selection process that determines which function (or frequent) words (see Smith & Rickards, 2008; see also Shaker, Corne, & Everson, 2007) may be used in the construction of function word n-grams. Authorship Attribution collocations frequent words function words n-grams word order Computer Sciences
12	Translating sensor measurements into texts for localization and mapping with mobile robots / Traduzindo leituras de sensores em textos para localização de mapeamento de robôs móveis Maffei, Renan de Queiroz January 2017 (has links) Localização e Mapeamento Simultâneos (SLAM), fundamental para robôs dotados de verdadeira autonomia, é um dos problemas mais difíceis na Robótica e consiste em estimar a posição de um robô que está se movendo em um ambiente desconhecido, enquanto incrementalmente constrói-se o mapa de tal ambiente. Provavelmente o requisito mais importante para localização e mapeamento adequados seja um preciso reconhecimento de local, isto é, determinar se um robô estava no mesmo lugar em diferentes ocasiões apenas analizando as observações feitas pelo robô em cada ocasião. A maioria das abordagens da literatura são boas quando se utilizam sensores altamente expressivos, como câmeras, ou quando o robô está situado em ambientes com pouco ambiguidade. No entanto, este não é o caso, por exemplo, quando o robô equipado apenas com sensores de alcance está em ambientes internos estruturados altamente ambíguos. Uma boa estratégia deve ser capaz de lidar com tais ambientes, lidar com ruídos e erros nas observações e, especialmente, ser capaz de modelar o ambiente e estimar o estado do robô de forma eficiente. Nossa proposta consiste em traduzir sequências de medições de laser em uma representação de texto eficiente e compacta, para então lidar com o problema de reconhecimento de local usando técnicas de processamento lingüísticos. Nós traduzimos as medições dos sensores em valores simples computados através de um novo modelo de observação baseado em estimativas de densidade de kernel chamado de Densidade de Espaço Livre (FSD). Estes valores são quantificados permitindo a divisão do ambiente em regiões contíguas de densidade homogênea, como corredores e cantos. Regiões são representadas de forma compacta por simples palavras descrevendo o valor de densidade espacial, o tamanho e a variação da orientação daquela região. No final, as cadeias de palavras compõem um texto, no qual se buscam casamentos de n-gramas (isto é, sequências de palavras). Nossa técnica também é aplicada com sucesso em alguns cenários de operação de longo-prazo, onde devemos lidar com objetos semi-estáticos (i.e. que se movem ocasionalmente, como portas e mobílias). Todas as abordagens foram avaliadas em cenários simulados e reais obtendo-se bons resultados. / Simultaneous Localization and Mapping (SLAM), fundamental for building robots with true autonomy, is one of the most difficult problems in Robotics and consists of estimating the position of a robot that is moving in an unknown environment while incrementally building the map of such environment. Arguably the most crucial requirement to obtain proper localization and mapping is precise place recognition, that is, determining if the robot is at the same place in different occasions just by looking at the observations taken by the robot. Most approaches in literature are good when using highly expressive sensors such as cameras or when the robot is situated in low ambiguous environments. However this is not the case, for instance, using robots equipped only with range-finder sensors in highly ambiguous indoor structured environments. A good SLAM strategy must be able to handle these scenarios, deal with noise and observation errors, and, especially, model the environment and estimate the robot state in an efficient way. Our proposal in this work is to translate sequences of raw laser measurements into an efficient and compact text representation and deal with the place recognition problem using linguistic processing techniques. First, we translate raw sensor measurements into simple observation values computed through a novel observation model based on kernel-density estimation called Free-Space Density (FSD). These values are quantized into significant classes allowing the division of the environment into contiguous regions of homogeneous spatial density, such as corridors and corners. Regions are represented in a compact form by simple words composed of three syllables – the value of spatial density, the size and the variation of orientation of that region. At the end, the chains of words associated to all observations made by the robot compose a text, in which we search for matches of n-grams (i.e. sequences of words), which is a popular technique from shallow linguistic processing. The technique is also successfully applied in some scenarios of long-term operation, where we must deal with semi-static objects (i.e. that can move occasionally, such as doors and furniture). All approaches were evaluated in simulated and real scenarios obtaining good results. Robótica Computação móvel Mobile robots Localization Mapping Place recognition Free-space density n-grams
13	Σύνθεση πληροφορίας στην πιστοποίηση γραφέα με ηλεκτρονικό κείμενο : μελέτη των n-grams Αναστοπούλου, Ελένη 31 March 2015 (has links) Η εργασία αυτή περιλαμβάνει στην συνέχεια τέσσερα κεφάλαια τα οποία έχουν ως εξής. Στο κεφάλαιο 2 περιγράφονται τα σύγχρονα εργαλεία ανάλυσης κειμένου αλλά και οι βάσεις δεδομένων (Corpus) που είναι διαθέσιμα. Περιγράφεται επίσης η βάση δεδομένων που χρησιμοποιούμε για να εξάγουμε πειραματικά αποτελέσματα. Στο κεφάλαιο 3, γίνεται εισαγωγή στα n-grams και αναπτύσσονται τα βασικότερα μέτρα ομοιότητας τα οποία είναι απαραίτητα για τον διαχωρισμό του ύφους γραφής από ηλεκτρονικό κείμενο. Στο κεφάλαιο 4 παρουσιάζονται τα πειραματικά αποτελέσματα που έχουν ληφθεί με τα n-grams. Σε αυτά περιλαμβάνονται και τα δίκτυα νευρωνίων. Τέλος στο κεφάλαιο 5 δίνονται τα συμπεράσματα και προτάσεις για περαιτέρω έρευνα στον τομέα αυτόν. / This work includes then four chapters which are as follows. Chapter 2 describes the modern text analysis tools and databases (Corpus) that are available. Also describes the database used to derive test results. In Chapter 3, are inserted into n-grams and developed the basic similarity measures which are necessary to separate the writing style of electronic text. Chapter 4 presents the experimental results obtained with n-grams. These include neural networks. Finally in Chapter 5 are given conclusions and suggestions for further research in this area Αναγνώριση γραφέα 410.151 N-grams Text analysis tools
14	Paieškos metodų analizė ir realizacija išskirstytos maišos lentelėmis grindžiamose P2P sistemose / Analysis and implementation of search methods in P2P systems based on distributed hash tables Balčiūnas, Jonas 11 August 2008 (has links) DHT sistemų privalumas yra jų didelis plečiamumas ir nepriklausomumas, tačiau esami paieškos sprendimai reikalauja išorinių mechanizmų ir taip mažina DHT privalumus. Šio darbo tikslas – padidinti paieškos DHT sistemose galimybes, sukuriant vidinį paieškos mechanizmą Chord algoritmo pagrindu veikiančiai DHT sistemai ir ištiriant jo efektyvumą. Šiame darbe pristatomi galimi vidinės paieškos mechanizmai DHT sistemose pagrįsti n-gramomis ir užtvindymo pranešimais mechanizmu. Tyrimas parodė, kad n-gramos labiau tinkamos sistemoms, kurių dydis yra santykinai mažas, tuo tarpu užtvindymo mechanizmas priimtinesnis sistemose, kuriose įgyvendintas duomenų replikavimas. / The key idea of DHT systems is hash table distributed over the distributed independent nodes. The DHT are decentralized, scalable, fault tolerant and have high hit guaranties for data lookup. However, they do not support arbitrary querying which flooding schemes do: users must know exact key of the resource they are looking up in the system. In the most common solution for this is external searching engine like ftp or http. This work presents research experiment of possible methods for arbitrary querying in DHT based on the “n-grams” and “broadcasting” techniques. Experiment was carried out using experimental P2P system created for this purpose on the base of Chord algorithm. Experimental results showed that, the most expensive (in terms of message generation) process in “n-gram” is publishing of keys to network. The analysis of both methods showed that n-grams are more practical on the relatively smaller network and “broadcasting” is more effective on the networks with implemented data replication. Informatics Engineering DHT P2P Paskirstytos sistemos N-gramos DHT P2P Distributed systems N-grams
15	Translating sensor measurements into texts for localization and mapping with mobile robots / Traduzindo leituras de sensores em textos para localização de mapeamento de robôs móveis Maffei, Renan de Queiroz January 2017 (has links) Localização e Mapeamento Simultâneos (SLAM), fundamental para robôs dotados de verdadeira autonomia, é um dos problemas mais difíceis na Robótica e consiste em estimar a posição de um robô que está se movendo em um ambiente desconhecido, enquanto incrementalmente constrói-se o mapa de tal ambiente. Provavelmente o requisito mais importante para localização e mapeamento adequados seja um preciso reconhecimento de local, isto é, determinar se um robô estava no mesmo lugar em diferentes ocasiões apenas analizando as observações feitas pelo robô em cada ocasião. A maioria das abordagens da literatura são boas quando se utilizam sensores altamente expressivos, como câmeras, ou quando o robô está situado em ambientes com pouco ambiguidade. No entanto, este não é o caso, por exemplo, quando o robô equipado apenas com sensores de alcance está em ambientes internos estruturados altamente ambíguos. Uma boa estratégia deve ser capaz de lidar com tais ambientes, lidar com ruídos e erros nas observações e, especialmente, ser capaz de modelar o ambiente e estimar o estado do robô de forma eficiente. Nossa proposta consiste em traduzir sequências de medições de laser em uma representação de texto eficiente e compacta, para então lidar com o problema de reconhecimento de local usando técnicas de processamento lingüísticos. Nós traduzimos as medições dos sensores em valores simples computados através de um novo modelo de observação baseado em estimativas de densidade de kernel chamado de Densidade de Espaço Livre (FSD). Estes valores são quantificados permitindo a divisão do ambiente em regiões contíguas de densidade homogênea, como corredores e cantos. Regiões são representadas de forma compacta por simples palavras descrevendo o valor de densidade espacial, o tamanho e a variação da orientação daquela região. No final, as cadeias de palavras compõem um texto, no qual se buscam casamentos de n-gramas (isto é, sequências de palavras). Nossa técnica também é aplicada com sucesso em alguns cenários de operação de longo-prazo, onde devemos lidar com objetos semi-estáticos (i.e. que se movem ocasionalmente, como portas e mobílias). Todas as abordagens foram avaliadas em cenários simulados e reais obtendo-se bons resultados. / Simultaneous Localization and Mapping (SLAM), fundamental for building robots with true autonomy, is one of the most difficult problems in Robotics and consists of estimating the position of a robot that is moving in an unknown environment while incrementally building the map of such environment. Arguably the most crucial requirement to obtain proper localization and mapping is precise place recognition, that is, determining if the robot is at the same place in different occasions just by looking at the observations taken by the robot. Most approaches in literature are good when using highly expressive sensors such as cameras or when the robot is situated in low ambiguous environments. However this is not the case, for instance, using robots equipped only with range-finder sensors in highly ambiguous indoor structured environments. A good SLAM strategy must be able to handle these scenarios, deal with noise and observation errors, and, especially, model the environment and estimate the robot state in an efficient way. Our proposal in this work is to translate sequences of raw laser measurements into an efficient and compact text representation and deal with the place recognition problem using linguistic processing techniques. First, we translate raw sensor measurements into simple observation values computed through a novel observation model based on kernel-density estimation called Free-Space Density (FSD). These values are quantized into significant classes allowing the division of the environment into contiguous regions of homogeneous spatial density, such as corridors and corners. Regions are represented in a compact form by simple words composed of three syllables – the value of spatial density, the size and the variation of orientation of that region. At the end, the chains of words associated to all observations made by the robot compose a text, in which we search for matches of n-grams (i.e. sequences of words), which is a popular technique from shallow linguistic processing. The technique is also successfully applied in some scenarios of long-term operation, where we must deal with semi-static objects (i.e. that can move occasionally, such as doors and furniture). All approaches were evaluated in simulated and real scenarios obtaining good results. Robótica Computação móvel Mobile robots Localization Mapping Place recognition Free-space density n-grams
16	Translating sensor measurements into texts for localization and mapping with mobile robots / Traduzindo leituras de sensores em textos para localização de mapeamento de robôs móveis Maffei, Renan de Queiroz January 2017 (has links) Localização e Mapeamento Simultâneos (SLAM), fundamental para robôs dotados de verdadeira autonomia, é um dos problemas mais difíceis na Robótica e consiste em estimar a posição de um robô que está se movendo em um ambiente desconhecido, enquanto incrementalmente constrói-se o mapa de tal ambiente. Provavelmente o requisito mais importante para localização e mapeamento adequados seja um preciso reconhecimento de local, isto é, determinar se um robô estava no mesmo lugar em diferentes ocasiões apenas analizando as observações feitas pelo robô em cada ocasião. A maioria das abordagens da literatura são boas quando se utilizam sensores altamente expressivos, como câmeras, ou quando o robô está situado em ambientes com pouco ambiguidade. No entanto, este não é o caso, por exemplo, quando o robô equipado apenas com sensores de alcance está em ambientes internos estruturados altamente ambíguos. Uma boa estratégia deve ser capaz de lidar com tais ambientes, lidar com ruídos e erros nas observações e, especialmente, ser capaz de modelar o ambiente e estimar o estado do robô de forma eficiente. Nossa proposta consiste em traduzir sequências de medições de laser em uma representação de texto eficiente e compacta, para então lidar com o problema de reconhecimento de local usando técnicas de processamento lingüísticos. Nós traduzimos as medições dos sensores em valores simples computados através de um novo modelo de observação baseado em estimativas de densidade de kernel chamado de Densidade de Espaço Livre (FSD). Estes valores são quantificados permitindo a divisão do ambiente em regiões contíguas de densidade homogênea, como corredores e cantos. Regiões são representadas de forma compacta por simples palavras descrevendo o valor de densidade espacial, o tamanho e a variação da orientação daquela região. No final, as cadeias de palavras compõem um texto, no qual se buscam casamentos de n-gramas (isto é, sequências de palavras). Nossa técnica também é aplicada com sucesso em alguns cenários de operação de longo-prazo, onde devemos lidar com objetos semi-estáticos (i.e. que se movem ocasionalmente, como portas e mobílias). Todas as abordagens foram avaliadas em cenários simulados e reais obtendo-se bons resultados. / Simultaneous Localization and Mapping (SLAM), fundamental for building robots with true autonomy, is one of the most difficult problems in Robotics and consists of estimating the position of a robot that is moving in an unknown environment while incrementally building the map of such environment. Arguably the most crucial requirement to obtain proper localization and mapping is precise place recognition, that is, determining if the robot is at the same place in different occasions just by looking at the observations taken by the robot. Most approaches in literature are good when using highly expressive sensors such as cameras or when the robot is situated in low ambiguous environments. However this is not the case, for instance, using robots equipped only with range-finder sensors in highly ambiguous indoor structured environments. A good SLAM strategy must be able to handle these scenarios, deal with noise and observation errors, and, especially, model the environment and estimate the robot state in an efficient way. Our proposal in this work is to translate sequences of raw laser measurements into an efficient and compact text representation and deal with the place recognition problem using linguistic processing techniques. First, we translate raw sensor measurements into simple observation values computed through a novel observation model based on kernel-density estimation called Free-Space Density (FSD). These values are quantized into significant classes allowing the division of the environment into contiguous regions of homogeneous spatial density, such as corridors and corners. Regions are represented in a compact form by simple words composed of three syllables – the value of spatial density, the size and the variation of orientation of that region. At the end, the chains of words associated to all observations made by the robot compose a text, in which we search for matches of n-grams (i.e. sequences of words), which is a popular technique from shallow linguistic processing. The technique is also successfully applied in some scenarios of long-term operation, where we must deal with semi-static objects (i.e. that can move occasionally, such as doors and furniture). All approaches were evaluated in simulated and real scenarios obtaining good results. Robótica Computação móvel Mobile robots Localization Mapping Place recognition Free-space density n-grams
17	DBpedia Type and Entity Detection Using Word Embeddings and N-gram Models Zhou, Hanqing January 2018 (has links) Nowadays, knowledge bases are used more and more in Semantic Web tasks, such as knowledge acquisition (Hellmann et al., 2013), disambiguation (Garcia et al., 2009) and named entity corpus construction (Hahm et al., 2014), to name a few. DBpedia is playing a central role on the linked open data cloud; therefore, the quality of this knowledge base is becoming a central point of focus. However, there are some issues with the quality of DBpedia. In particular, DBpedia suffers from three major types of problems: a) invalid types for entities, b) missing types for entities, and c) invalid entities in the resources’ description. In order to enhance the quality of DBpedia, it is important to detect these invalid types and resources, as well as complete missing types. The three main goals of this thesis are: a) invalid entity type detection in order to solve the problem of invalid DBpedia types for entities, b) automatic detection of the types of entities in order to solve the problem of missing DBpedia types for entities, and c) invalid entity detection in order to solve the problem of invalid entities in the resource description of a DBpedia entity. We compare several methods for the detection of invalid types, automatic typing of entities, and invalid entities detection in the resource descriptions. In particular, we compare different classification and clustering algorithms based on various sets of features: entity embedding features (Skip-gram and CBOW models) and traditional n-gram features. We present evaluation results for 358 DBpedia classes extracted from the DBpedia ontology. The main contribution of this work consists of the development of automatic invalid type detection, automatic entity typing, and automatic invalid entity detection methods using clustering and classification. Our results show that entity embedding models usually perform better than n-gram models, especially the Skip-gram embedding model. Natural Language Processing Semantic Web Machine Learning DBpedia Entity Embeddings N-grams
18	Algoritmus pro detekci pozitívního a negatívního textu / The algorithm for the detection of positive and negative text Musil, David January 2016 (has links) As information and communication technology develops swiftly, amount of information produced by various sources grows as well. Sorting and obtaining knowledge from this data requires significant effort which is not ensured easily by a human, meaning machine processing is taking place. Acquiring emotion from text data is an interesting area of research and it’s going through considerable expansion while being used widely. Purpose of this thesis is to create a system for positive and negative emotion detection from text along with evaluation of its performance. System was created with Java programming language and it allows training with use of large amount of data (known as Big Data), exploiting Spark library. Thesis describes structure and handling text from database used as source of input data. Classificator model was created with use of Support Vector Machines and optimized by the n-grams method.
19	A Latent Dirichlet Allocation/N-gram Composite Language Model Kulhanek, Raymond Daniel 08 November 2013 (has links) No description available. Computer Science natural language processing language models topic clustering Latent Dirichlet Allocation n-grams
20	Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT / Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT Mašek, Jan January 2015 (has links) We studied the treebanks included in HamleDT and partially unified their label sets. Afterwards, we used a method based on variation n-grams to automatically detect errors in morphological and dependency annotation. Then we used the output of a part-of-speech tagger / dependency parser trained on each treebank to correct the detected errors. The performance of both the detection and the correction of errors on both annotation levels was manually evaluated on a randomly selected samples of suspected errors from several treebanks. Powered by TCPDF (www.tcpdf.org)

Search results