Global ETD Search

21	Long-term vehicle movement prediction using Machine Learning methods / Långsiktig fordonsrörelseförutsägelse med maskininlärningsmetoder Yus, Diego January 2018 (has links) The problem of location or movement prediction can be described as the task of predicting the future location of an item using the past locations of that item. It is a problem of increasing interest with the arrival of location-based services and autonomous vehicles. Even if short term prediction is more commonly studied, especially in the case of vehicles, long-term prediction can be useful in many applications like scheduling, resource managing or traffic prediction. In this master thesis project, I present a feature representation of movement that can be used for learning of long-term movement patterns and for long-term movement prediction both in space and time. The representation relies on periodicity in data and is based on weighted n-grams of windowed trajectories. The algorithm is evaluated on heavy transport vehicles movement data to assess its ability to from a search index retrieve vehicles that with high probability will move along a route that matches a desired transport mission. Experimental results show the algorithm is able to achieve a consistent low prediction distance error rate across different transport lengths in a limited geographical area under business operation conditions. The results also indicate that the total population of vehicles in the index is a critical factor in the algorithm performance and therefore in its real-world applicability. / Lokaliserings- eller rörelseprognosering kan beskrivas som uppgiften att förutsäga ett objekts framtida placering med hjälp av de tidigare platserna för objektet. Intresset för problemet ökar i och med införandet av platsbaserade tjänster och autonoma fordon. Även om det är vanligare att studera kortsiktiga förutsägelser, särskilt när det gäller fordon, kan långsiktiga förutsägelser vara användbara i många applikationer som schemaläggning, resurshantering eller trafikprognoser. I detta masterprojekt presenterar jag en feature-representation av rörelse som kan användas för att lära in långsiktiga rörelsemönster och för långsiktig rörelseprediktion både i rymden och tiden. Representationen bygger på periodicitet i data och är baserad på att dela upp banan i fönster och sedan beräkna viktade n-grams av banorna från de olika fönstren. Algoritmen utvärderas på transportdata för tunga transportfordon för att bedöma dess förmåga att från ett sökindex hämta fordon som med stor sannolikhet kommer att röra sig längs en rutt som matchar ett önskat transportuppdrag. Experimentella resultat visar att algoritmen kan uppnå ett konsekvent lågt fel i relativt predikterat avstånd över olika transportlängder i ett begränsat geografiskt område under verkliga förhållanden. Resultaten indikerar även att den totala populationen av fordon i indexet är en kritisk faktor för algoritmens prestanda och därmed även för dess applicerbarhet för verklig användning. long-term movement prediction machine learning transport vehicle feature representation n-gram geohash periodicity Computer Sciences Datavetenskap (datalogi)
22	Alternative Approaches to Correction of Malapropisms in AIML Based Conversational Agents Brock, Walter A. 26 November 2014 (has links) The use of Conversational Agents (CAs) utilizing Artificial Intelligence Markup Language (AIML) has been studied in a number of disciplines. Previous research has shown a great deal of promise. It has also documented significant limitations in the abilities of these CAs. Many of these limitations are related specifically to the method employed by AIML to resolve ambiguities in the meaning and context of words. While methods exist to detect and correct common errors in spelling and grammar of sentences and queries submitted by a user, one class of input error that is particularly difficult to detect and correct is the malapropism. In this research a malapropism is defined a "verbal blunder in which one word is replaced by another similar in sound but different in meaning" ("malapropism," 2013). This research explored the use of alternative methods of correcting malapropisms in sentences input to AIML CAs using measures of Semantic Distance and tri-gram probabilities. Results of these alternate methods were compared against AIML CAs using only the Symbolic Reductions built into AIML. This research found that the use of the two methodologies studied here did indeed lead to a small, but measurable improvement in the performance of the CA in terms of the appropriateness of its responses as classified by human judges. However, it was also noted that in a large number of cases, the CA simply ignored the existence of a malapropism altogether in formulating its responses. In most of these cases, the interpretation and response to the user's input was of such a general nature that one might question the overall efficacy of the AIML engine. The answer to this question is a matter for further study. Information science Conversational Agent disambiguation malapropism Natural Language Processing n-gram probability semantic distance Artificial Intelligence and Robotics Computer Sciences Programming Languages and Compilers
23	Contribuições para a construção de taxonomias de tópicos em domínios restritos utilizando aprendizado estatístico / Contributions to topic taxonomy construction in a specific domain using statistical learning Moura, Maria Fernanda 26 October 2009 (has links) A mineração de textos vem de encontro à realidade atual de se compreender e utilizar grandes massas de dados textuais. Uma forma de auxiliar a compreensão dessas coleções de textos é construir taxonomias de tópicos a partir delas. As taxonomias de tópicos devem organizar esses documentos, preferencialmente em hierarquias, identificando os grupos obtidos por meio de descritores. Construir manual, automática ou semi-automaticamente taxonomias de tópicos de qualidade é uma tarefa nada trivial. Assim, o objetivo deste trabalho é construir taxonomias de tópicos em domínios de conhecimento restrito, por meio de mineração de textos, a fim de auxiliar o especialista no domínio a compreender e organizar os textos. O domínio de conhecimento é restrito para que se possa trabalhar apenas com métodos de aprendizado estatístico não supervisionado sobre representações bag of words dos textos. Essas representações independem do contexto das palavras nos textos e, conseqüentemente, nos domínios. Assim, ao se restringir o domínio espera-se diminuir erros de interpretação dos resultados. A metodologia proposta para a construção de taxonomias de tópicos é uma instanciação do processo de mineração de textos. A cada etapa do processo propôem-se soluções adaptadas às necessidades específicas de construçao de taxonomias de tópicos, dentre as quais algumas contribuições inovadoras ao estado da arte. Particularmente, este trabalho contribui em três frentes no estado da arte: seleção de atributos n-gramas em tarefas de mineração de textos, dois modelos para rotulação de agrupamento hierárquico de documentos e modelo de validação do processo de rotulação de agrupamento hierárquico de documentos. Além dessas contribuições, ocorrem outras em adaptações e metodologias de escolha de processos de seleção de atributos, forma de geração de atributos, visualização das taxonomias e redução das taxonomias obtidas. Finalmente, a metodologia desenvolvida foi aplicada a problemas reais, tendo obtido bons resultados. / Text mining provides powerful techniques to help on the current needs of understanding and organizing huge amounts of textual documents. One way to do this is to build topic taxonomies from these documents. Topic taxonomies can be used to organize the documents, preferably in hierarchies, and to identify groups of related documents and their descriptors. Constructing high quality topic taxonomies, either manually, automatically or semi-automatically, is not a trivial task. This work aims to use text mining techniques to build topic taxonomies for well defined knowledge domains, helping the domain expert to understand and organize document collections. By using well defined knowledge domains, only unsupervised statistical methods are used, with a bag of word representation for textual documents. These representations are independent of the context of the words in the documents as well as in the domain. Thus, if the domain is well defined, a decrease of mistakes of the result interpretation is expected. The proposed methodology for topic taxonomy construction is an instantiation of the text mining process. At each step of the process, some solutions are proposed and adapted to the specific needs of topic taxonomy construction. Among these solutions there are some innovative contributions to the state of the art. Particularly, this work contributes to the state of the art in three different ways: the selection of n-grams attributes in text mining tasks, two models for hierarchical document cluster labeling and a validation model of the hierarchical document cluster labeling. Additional contributions include adaptations and methodologies of attribute selection process choices, attribute representation, taxonomy visualization and obtained taxonomy reduction. Finally, the proposed methodology was also validated by successfully applying it to real problems Hierarchial document cluster labeling Mineração de textos n-gram attribute selection Seleção de atributos n-gramas Taxonomia de tópicos Text mining Topic taxonomy
24	Large-scale semi-supervised learning for natural language processing Bergsma, Shane A 11 1900 (has links) Natural Language Processing (NLP) develops computational approaches to processing language data. Supervised machine learning has become the dominant methodology of modern NLP. The performance of a supervised NLP system crucially depends on the amount of data available for training. In the standard supervised framework, if a sequence of words was not encountered in the training set, the system can only guess at its label at test time. The cost of producing labeled training examples is a bottleneck for current NLP technology. On the other hand, a vast quantity of unlabeled data is freely available. This dissertation proposes effective, efficient, versatile methodologies for 1) extracting useful information from very large (potentially web-scale) volumes of unlabeled data and 2) combining such information with standard supervised machine learning for NLP. We demonstrate novel ways to exploit unlabeled data, we scale these approaches to make use of all the text on the web, and we show improvements on a variety of challenging NLP tasks. This combination of learning from both labeled and unlabeled data is often referred to as semi-supervised learning. Although lacking manually-provided labels, the statistics of unlabeled patterns can often distinguish the correct label for an ambiguous test instance. In the first part of this dissertation, we propose to use the counts of unlabeled patterns as features in supervised classifiers, with these classifiers trained on varying amounts of labeled data. We propose a general approach for integrating information from multiple, overlapping sequences of context for lexical disambiguation problems. We also show how standard machine learning algorithms can be modified to incorporate a particular kind of prior knowledge: knowledge of effective weightings for count-based features. We also evaluate performance within and across domains for two generation and two analysis tasks, assessing the impact of combining web-scale counts with conventional features. In the second part of this dissertation, rather than using the aggregate statistics as features, we propose to use them to generate labeled training examples. By automatically labeling a large number of examples, we can train powerful discriminative models, leveraging fine-grained features of input words. natural language processing semi-supervised learning NLP web-scale N-gram selectional preference string similarity non-referential pronoun pleonastic pronoun non-anaphoric pronoun computational linguistics
25	The predictability problem Ong, James Kwan Yau January 2007 (has links) Wir versuchen herauszufinden, ob das subjektive Maß der Cloze-Vorhersagbarkeit mit der Kombination objektiver Maße (semantische und n-gram-Maße) geschätzt werden kann, die auf den statistischen Eigenschaften von Textkorpora beruhen. Die semantischen Maße werden entweder durch Abfragen von Internet-Suchmaschinen oder durch die Anwendung der Latent Semantic Analysis gebildet, während die n-gram-Wortmaße allein auf den Ergebnissen von Internet-Suchmaschinen basieren. Weiterhin untersuchen wir die Rolle der Cloze-Vorhersagbarkeit in SWIFT, einem Modell der Blickkontrolle, und wägen ab, ob andere Parameter den der Vorhersagbarkeit ersetzen können. Unsere Ergebnisse legen nahe, dass ein computationales Modell, welches Vorhersagbarkeitswerte berechnet, nicht nur Maße beachten muss, die die Relatiertheit eines Wortes zum Kontext darstellen; das Vorhandensein eines Maßes bezüglich der Nicht-Relatiertheit ist von ebenso großer Bedeutung. Obwohl hier jedoch nur Relatiertheits-Maße zur Verfügung stehen, sollte SWIFT ebensogute Ergebnisse liefern, wenn wir Cloze-Vorhersagbarkeit mit unseren Maßen ersetzen. / We try to determine whether it is possible to approximate the subjective Cloze predictability measure with two types of objective measures, semantic and word n-gram measures, based on the statistical properties of text corpora. The semantic measures are constructed either by querying Internet search engines or by applying Latent Semantic Analysis, while the word n-gram measures solely depend on the results of Internet search engines. We also analyse the role of Cloze predictability in the SWIFT eye movement model, and evaluate whether other parameters might be able to take the place of predictability. Our results suggest that a computational model that generates predictability values not only needs to use measures that can determine the relatedness of a word to its context; the presence of measures that assert unrelatedness is just as important. In spite of the fact, however, that we only have similarity measures, we predict that SWIFT should perform just as well when we replace Cloze predictability with our measures. Cloze-Vorhersagbarkeit Blickbewegungen Latente-Semantische-Analyse Wort-n-Gramme-Wahrscheinlichkeit Ähnlichkeit-Masse Cloze predictability eye movements Latent Semantic Analysis word n-gram probability similarity measures Mathematics
26	Turkish Large Vocabulary Continuous Speech Recognition By Using Limited Audio Corpus Susman, Derya 01 March 2012 (has links) (PDF) Speech recognition in Turkish Language is a challenging problem in several perspectives. Most of the challenges are related to the morphological structure of the language. Since Turkish is an agglutinative language, it is possible to generate many words from a single stem by using suffixes. This characteristic of the language increases the out-of-vocabulary (OOV) words, which degrade the performance of a speech recognizer dramatically. Also, Turkish language allows words to be ordered in a free manner, which makes it difficult to generate robust language models. In this thesis, the existing models and approaches which address the problem of Turkish LVCSR (Large Vocabulary Continuous Speech Recognition) are explored. Different recognition units (words, morphs, stem and endings) are used in generating the n-gram language models. 3-gram and 4-gram language models are generated with respect to the recognition unit. Since the solution domain of speech recognition is involved with machine learning, the performance of the recognizer depends on the sufficiency of the audio data used in acoustic model training. However, it is difficult to obtain rich audio corpora for the Turkish language. In this thesis, existing approaches are used to solve the problem of Turkish LVCSR by using a limited audio corpus. We also proposed several data selection approaches in order to improve the robustness of the acoustic model. QA Computer Software 76.75-76.765
27	N-gram modeling of tabla sequences using Variable-Length Hidden Markov Models for improvisation and composition Sastry, Avinash 20 September 2011 (has links) This work presents a novel approach for the design of a predictive model of music that can be used to analyze and generate musical material that is highly context dependent. The system is based on an approach known as n-gram modeling, often used in language processing and speech recognition algorithms, implemented initially upon a framework of Variable-Length Markov Models (VLMMs) and then extended to Variable-Length Hidden Markov Models (VLHMMs). The system brings together various principles like escape probabilities, smoothing schemes and uses multiple representations of the data stream to construct a multiple viewpoints system that enables it to draw complex relationships between the different input n-grams, and use this information to provide a stronger prediction scheme. It is implemented as a MAX/MSP external in C++ and is intended to be a predictive framework that can be used to create generative music systems and educational and compositional tools for music. A formal quantitative evaluation scheme based on entropy of the predictions is used to evaluate the model in sequence prediction tasks on a database of tabla compositions. The results show good model performance for both the VLMM and the VLHMM while highlighting the expensive computational cost of higher-order VLHMMs. Musical sequence modeling Markov models Hidden markov models N-gram modeling Tabla North Indian classical music Machine learning Music model Markov processes Improvisation (Music) Music
28	Large-scale semi-supervised learning for natural language processing Bergsma, Shane A Unknown Date No description available. natural language processing semi-supervised learning NLP web-scale N-gram selectional preference string similarity non-referential pronoun pleonastic pronoun non-anaphoric pronoun computational linguistics
29	Automatizovaná analýza sentimentu / Automated Sentiment Analysis Zeman, Matěj January 2014 (has links) The goal of my master thesis is to describe the Automated Sentiment Analysis, its methods and Cross-domain problems and to test the already existing model. I have applied this model on the data from the Czech-Slovak film database website CSFD.cz, Czech e-shop MALL.cz and one of the biggest Czech websites about books Databazeknih.cz to contribute to the solution of the Cross-Domain issue by using n-grams and the analytic software RapidMiner.
30	Contribuições para a construção de taxonomias de tópicos em domínios restritos utilizando aprendizado estatístico / Contributions to topic taxonomy construction in a specific domain using statistical learning Maria Fernanda Moura 26 October 2009 (has links) A mineração de textos vem de encontro à realidade atual de se compreender e utilizar grandes massas de dados textuais. Uma forma de auxiliar a compreensão dessas coleções de textos é construir taxonomias de tópicos a partir delas. As taxonomias de tópicos devem organizar esses documentos, preferencialmente em hierarquias, identificando os grupos obtidos por meio de descritores. Construir manual, automática ou semi-automaticamente taxonomias de tópicos de qualidade é uma tarefa nada trivial. Assim, o objetivo deste trabalho é construir taxonomias de tópicos em domínios de conhecimento restrito, por meio de mineração de textos, a fim de auxiliar o especialista no domínio a compreender e organizar os textos. O domínio de conhecimento é restrito para que se possa trabalhar apenas com métodos de aprendizado estatístico não supervisionado sobre representações bag of words dos textos. Essas representações independem do contexto das palavras nos textos e, conseqüentemente, nos domínios. Assim, ao se restringir o domínio espera-se diminuir erros de interpretação dos resultados. A metodologia proposta para a construção de taxonomias de tópicos é uma instanciação do processo de mineração de textos. A cada etapa do processo propôem-se soluções adaptadas às necessidades específicas de construçao de taxonomias de tópicos, dentre as quais algumas contribuições inovadoras ao estado da arte. Particularmente, este trabalho contribui em três frentes no estado da arte: seleção de atributos n-gramas em tarefas de mineração de textos, dois modelos para rotulação de agrupamento hierárquico de documentos e modelo de validação do processo de rotulação de agrupamento hierárquico de documentos. Além dessas contribuições, ocorrem outras em adaptações e metodologias de escolha de processos de seleção de atributos, forma de geração de atributos, visualização das taxonomias e redução das taxonomias obtidas. Finalmente, a metodologia desenvolvida foi aplicada a problemas reais, tendo obtido bons resultados. / Text mining provides powerful techniques to help on the current needs of understanding and organizing huge amounts of textual documents. One way to do this is to build topic taxonomies from these documents. Topic taxonomies can be used to organize the documents, preferably in hierarchies, and to identify groups of related documents and their descriptors. Constructing high quality topic taxonomies, either manually, automatically or semi-automatically, is not a trivial task. This work aims to use text mining techniques to build topic taxonomies for well defined knowledge domains, helping the domain expert to understand and organize document collections. By using well defined knowledge domains, only unsupervised statistical methods are used, with a bag of word representation for textual documents. These representations are independent of the context of the words in the documents as well as in the domain. Thus, if the domain is well defined, a decrease of mistakes of the result interpretation is expected. The proposed methodology for topic taxonomy construction is an instantiation of the text mining process. At each step of the process, some solutions are proposed and adapted to the specific needs of topic taxonomy construction. Among these solutions there are some innovative contributions to the state of the art. Particularly, this work contributes to the state of the art in three different ways: the selection of n-grams attributes in text mining tasks, two models for hierarchical document cluster labeling and a validation model of the hierarchical document cluster labeling. Additional contributions include adaptations and methodologies of attribute selection process choices, attribute representation, taxonomy visualization and obtained taxonomy reduction. Finally, the proposed methodology was also validated by successfully applying it to real problems Mineração de textos Seleção de atributos n-gramas Taxonomia de tópicos Hierarchial document cluster labeling n-gram attribute selection Text mining Topic taxonomy

Search results