Global ETD Search

1	Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act N-gram Takeda, Kazuya, Kitaoka, Norihide, Hara, Sunao 26 September 2010 (has links) No description available. n-gram dialog history dialog act spoken dialog system
2	Error detection and correction in annotated corpora Dickinson, Markus 24 August 2005 (has links) No description available. tags corpus Taggers/Markov annotation variation n-gram treebank
3	應用情感分析於媒體新聞傾向之研究-以中央社為例 / Applying sentiment analysis to the tendency of media news: a case study of central news agency 吳信維, Wu, Xin-Wei Unknown Date (has links) 本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫，並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法，從中央通訊社中抓取國民黨以及民進黨的相關新聞文本，建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。　　本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘，於這些詞中具有真正意義的詞共有28838筆，成功率可達88%。　　本研究比較兩種分群方法建立主題模型，分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中，因為文本數量遠大於議題詞數量，造成TFIDF矩陣過於稀疏，造成分群效果不佳。在LDA的分群結果底下，因為LDA模型其多文章多主題共享的特性，主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本，採用LDA模型來進行議題詞分群會有較佳的表現。　　本研究透過結合不同的資料時間區間，呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候，而近三次總統大選的結果顯示，相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出，不論執政黨為何，中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容，大抵不會特別偏向單一政黨 / The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results. 　　This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%. 　　In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering. 　　Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party. 情感分析 LDA主題模型 n-gram a-priori Sentiment analysis LDA N-gram A-priori
4	Subword Spotting and Its Applications Davis, Brian Lafayette 01 May 2018 (has links) We propose subword spotting, a generalization of word spotting where the search is for groups of characters within words. We present a method for performing subword spotting based on state-of-the-art word spotting techniques and evaluate its performance at three granularitires (unigrams, bigrams and trigrams) on two datasets. We demonstrate three applications of subword spotting, though others may exist. The first is assisting human transcribers identify unrecognized characters by locating them in other words. The second is searching for suffixes directly in word images (suffix spotting). And the third is computer assisted transcription (semi-automated transcription). We investigate several variations of computer assisted transcription using subword spotting, but none achieve transcription speeds above manual transcription. We investigate the causes. subword spotting CAT semi-automated handwriting n-gram character Computer Sciences
5	Subword Spotting and Its Applications Davis, Brian Lafayette 01 May 2018 (has links) We propose subword spotting, a generalization of word spotting where the search is for groups of characters within words. We present a method for performing subword spotting based on state-of-the-art word spotting techniques and evaluate its performance at three granularitires (unigrams, bigrams and trigrams) on two datasets.We demonstrate three applications of subword spotting, though others may exist. The first is assisting human transcribers identify unrecognized characters by locating them in other words. The second is searching for suffixes directly in word images (suffix spotting). And the third is computer assisted transcription (semi-automated transcription). We investigate several variations of computer assisted transcription using subword spotting, but none achieve transcription speeds above manual transcription. We investigate the causes. subword spotting CAT semi-automated handwriting n-gram character Physical Sciences and Mathematics
6	An Unsupervised Approach to Detecting and Correcting Errors in Text Islam, Md Aminul 01 June 2011 (has links) In practice, most approaches for text error detection and correction are based on a conventional domain-dependent background dictionary that represents a fixed and static collection of correct words of a given language and, as a result, satisfactory correction can only be achieved if the dictionary covers most tokens of the underlying correct text. Again, most approaches for text correction are for only one or at best a very few types of errors. The purpose of this thesis is to propose an unsupervised approach to detecting and correcting text errors, that can compete with supervised approaches and answer the following questions: Can an unsupervised approach efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature? What is the magnitude of error coverage, in terms of the number of errors that can be corrected? We conclude that (1) it is possible that an unsupervised approach can efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature. Error types include: real-word spelling errors, typographical errors, lexical choice errors, unwanted words, missing words, prepositional errors, article errors, punctuation errors, and many of the grammatical errors (e.g., errors in agreement and verb formation). (2) The magnitude of error coverage, in terms of the number of errors that can be corrected, is almost double of the number of correct words of the text. Although this is not the upper limit, this is what is practically feasible. We use engineering approaches to answer the first question and theoretical approaches to answer and support the second question. We show that finding inherent properties of a correct text using a corpus in the form of an n-gram data set is more appropriate and practical than using other approaches to detecting and correcting errors. Instead of using rule-based approaches and dictionaries, we argue that a corpus can effectively be used to infer the properties of these types of errors, and to detect and correct these errors. We test the robustness of the proposed approach separately for some individual error types, and then for all types of errors. The approach is language-independent, it can be applied to other languages, as long as n-grams are available. The results of this thesis thus suggest that unsupervised approaches, which are often dismissed in favor of supervised ones in the context of many Natural Language Processing (NLP) related tasks, may present an interesting array of NLP-related problem solving strengths. Text Error Detection Spelling Error Google n-gram Unsupervised Text Error Correction
7	An Unsupervised Approach to Detecting and Correcting Errors in Text Islam, Md Aminul 01 June 2011 (has links) In practice, most approaches for text error detection and correction are based on a conventional domain-dependent background dictionary that represents a fixed and static collection of correct words of a given language and, as a result, satisfactory correction can only be achieved if the dictionary covers most tokens of the underlying correct text. Again, most approaches for text correction are for only one or at best a very few types of errors. The purpose of this thesis is to propose an unsupervised approach to detecting and correcting text errors, that can compete with supervised approaches and answer the following questions: Can an unsupervised approach efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature? What is the magnitude of error coverage, in terms of the number of errors that can be corrected? We conclude that (1) it is possible that an unsupervised approach can efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature. Error types include: real-word spelling errors, typographical errors, lexical choice errors, unwanted words, missing words, prepositional errors, article errors, punctuation errors, and many of the grammatical errors (e.g., errors in agreement and verb formation). (2) The magnitude of error coverage, in terms of the number of errors that can be corrected, is almost double of the number of correct words of the text. Although this is not the upper limit, this is what is practically feasible. We use engineering approaches to answer the first question and theoretical approaches to answer and support the second question. We show that finding inherent properties of a correct text using a corpus in the form of an n-gram data set is more appropriate and practical than using other approaches to detecting and correcting errors. Instead of using rule-based approaches and dictionaries, we argue that a corpus can effectively be used to infer the properties of these types of errors, and to detect and correct these errors. We test the robustness of the proposed approach separately for some individual error types, and then for all types of errors. The approach is language-independent, it can be applied to other languages, as long as n-grams are available. The results of this thesis thus suggest that unsupervised approaches, which are often dismissed in favor of supervised ones in the context of many Natural Language Processing (NLP) related tasks, may present an interesting array of NLP-related problem solving strengths. Text Error Detection Spelling Error Google n-gram Unsupervised Text Error Correction
8	Detection of task-incomplete dialogs based on utterance-and-behavior tag N-gram for spoken dialog systems Takeda, Kazuya, Kitaoka, Norihide, Hara, Sunao 27 August 2011 (has links) No description available. task incomplete dialog detection N-gram breakdowns in dialog spoken dialog system
9	An Unsupervised Approach to Detecting and Correcting Errors in Text Islam, Md Aminul 01 June 2011 (has links) In practice, most approaches for text error detection and correction are based on a conventional domain-dependent background dictionary that represents a fixed and static collection of correct words of a given language and, as a result, satisfactory correction can only be achieved if the dictionary covers most tokens of the underlying correct text. Again, most approaches for text correction are for only one or at best a very few types of errors. The purpose of this thesis is to propose an unsupervised approach to detecting and correcting text errors, that can compete with supervised approaches and answer the following questions: Can an unsupervised approach efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature? What is the magnitude of error coverage, in terms of the number of errors that can be corrected? We conclude that (1) it is possible that an unsupervised approach can efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature. Error types include: real-word spelling errors, typographical errors, lexical choice errors, unwanted words, missing words, prepositional errors, article errors, punctuation errors, and many of the grammatical errors (e.g., errors in agreement and verb formation). (2) The magnitude of error coverage, in terms of the number of errors that can be corrected, is almost double of the number of correct words of the text. Although this is not the upper limit, this is what is practically feasible. We use engineering approaches to answer the first question and theoretical approaches to answer and support the second question. We show that finding inherent properties of a correct text using a corpus in the form of an n-gram data set is more appropriate and practical than using other approaches to detecting and correcting errors. Instead of using rule-based approaches and dictionaries, we argue that a corpus can effectively be used to infer the properties of these types of errors, and to detect and correct these errors. We test the robustness of the proposed approach separately for some individual error types, and then for all types of errors. The approach is language-independent, it can be applied to other languages, as long as n-grams are available. The results of this thesis thus suggest that unsupervised approaches, which are often dismissed in favor of supervised ones in the context of many Natural Language Processing (NLP) related tasks, may present an interesting array of NLP-related problem solving strengths. Text Error Detection Spelling Error Google n-gram Unsupervised Text Error Correction
10	Implementace suggesteru pro vyhledávač OpenGrok / Suggester implementation for the OpenGrok search engine Hornáček, Adam January 2018 (has links) The suggester functionality is an important feature of modern search engines. The aim of the thesis is to implement it for the OpenGrok project. The OpenGrok search engine is based on Apache Lucene and supports its query syntax. Presented suggester implementation supports this query syntax and provides suggestions not only for prefixes but also for wildcards, regular expressions, or phrases. The implementation also takes into account the possibility of grouping queries. That means, if one query is already specified and user is typing another query, then the first query will restrict the suggestions for the second query. The promotion of specific suggestions is based on the underlying Lucene index data structure and previous searches of the users.

Search results