• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 256
  • 27
  • 25
  • 22
  • 9
  • 8
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 402
  • 191
  • 147
  • 142
  • 141
  • 126
  • 105
  • 95
  • 81
  • 74
  • 73
  • 67
  • 67
  • 63
  • 58
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Implementation of an Acoustic Echo Canceller Using Matlab

Raghavendran, Srinivasaprasath 15 October 2003 (has links)
The rapid growth of technology in recent decades has changed the whole dimension of communications. Today people are more interested in hands-free communication. In such a situation, the use a regular loudspeaker and a high-gain microphone, in place of a telephone receiver, might seem more appropriate. This would allow more than one person to participate in a conversation at the same time such as a teleconference environment. Another advantage is that it would allow the person to have both hands free and to move freely in the room. However, the presence of a large acoustic coupling between the loudspeaker and microphone would produce a loud echo that would make conversation difficult. Furthermore, the acoustic system could become instable, which would produce a loud howling noise to occur. The solution to these problems is the elimination of the echo with an echo suppression or echo cancellation algorithm. The echo suppressor offers a simple but effective method to counter the echo problem. However, the echo suppressor possesses a main disadvantage since it supports only half-duplex communication. Half-duplex communication permits only one speaker to talk at a time. This drawback led to the invention of echo cancellers. An important aspect of echo cancellers is that full-duplex communication can be maintained, which allows both speakers to talk at the same time. This objective of this research was to produce an improved echo cancellation algorithm, which is capable of providing convincing results. The three basic components of an echo canceller are an adaptive filter, a doubletalk detector and a nonlinear processor. The adaptive filter creates a replica of the echo and subtracts it from the combination of the actual echo and the near-end signal. The doubletalk detector senses the doubletalk. Doubletalk occurs when both ends are talking, which stops the adaptive filter in order to avoid divergence. Finally, the nonlinear processor removes the residual echo from the error signal. Usually, a certain amount of speech is clipped in the final stage of nonlinear processing. In order to avoid clipping, a noise gate was used as a nonlinear processor in this research. The noise gate allowed a threshold value to be set and all signals below the threshold were removed. This action ensured that only residual echoes were removed in the final stage. To date, the real time implementation of echo an cancellation algorithm was performed by utilizing both a VLSI processor and a DSP processor. Since there has been a revolution in the field of personal computers, in recent years, this research attempted to implement the acoustic echo canceller algorithm on a natively running PC with the help of the MATLAB software.
42

GeneTUC: Natural Language Understanding in Medical Text

Sætre, Rune January 2006 (has links)
<p>Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists.</p><p>The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems.</p><p>The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities.</p><p>The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis.</p>
43

Corpus-Based Techniques for Word Sense Disambiguation

Levow, Gina-Anne 27 May 1998 (has links)
The need for robust and easily extensible systems for word sense disambiguation coupled with successes in training systems for a variety of tasks using large on-line corpora has led to extensive research into corpus-based statistical approaches to this problem. Promising results have been achieved by vector space representations of context, clustering combined with a semantic knowledge base, and decision lists based on collocational relations. We evaluate these techniques with respect to three important criteria: how their definition of context affects their ability to incorporate different types of disambiguating information, how they define similarity among senses, and how easily they can generalize to new senses. The strengths and weaknesses of these systems provide guidance for future systems which must capture and model a variety of disambiguating information, both syntactic and semantic.
44

GeneTUC: Natural Language Understanding in Medical Text

Sætre, Rune January 2006 (has links)
Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists. The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems. The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities. The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis.
45

A Graph Approach to Measuring Text Distance

Tsang, Vivian 26 February 2009 (has links)
Text comparison is a key step in many natural language processing (NLP) applications in which texts can be classified on the basis of their semantic distance (how similar or different the texts are). For example, comparing the local context of an ambiguous word with that of a known word can help identify the sense of the ambiguous word. Typically, a distributional measure is used to capture the implicit semantic distance between two pieces of text. In this thesis, we introduce an alternative method of measuring the semantic distance between texts as a combination of distributional information and relational/ontological knowledge. In this work, we propose a novel distance measure within a network-flow formalism that combines these two distinct components in a way that they are not treated as separate and orthogonal pieces of information. First, we represent each text as a collection of frequency-weighted concepts within a relational thesaurus. Then, we make use of a network-flow method which provides an efficient way of measuring the semantic distance between two texts by taking advantage of the inherently graphical structure in an ontology. We evaluate our method in a variety of NLP tasks. In our task-based evaluation, we find that our method performs well on two of three tasks. We introduce a novel measure which is intended to capture how well our network-flow method perform on a dataset (represented as a collection of frequency-weighted concepts). In our analysis, we find that an integrated approach, rather than a purely distributional or graphical analysis, is more effective in explaining the performance inconsistency. Finally, we address a complexity issue that arises from the overhead required to incorporate more sophisticated concept-to-concept distances into the network-flow framework. We propose a graph transformation method which generates a pared-down network that requires less time to process. The new method achieves a significant speed improvement, and does not seriously hamper performance as a result of the transformation, as indicated in our analysis.
46

A Graph Approach to Measuring Text Distance

Tsang, Vivian 26 February 2009 (has links)
Text comparison is a key step in many natural language processing (NLP) applications in which texts can be classified on the basis of their semantic distance (how similar or different the texts are). For example, comparing the local context of an ambiguous word with that of a known word can help identify the sense of the ambiguous word. Typically, a distributional measure is used to capture the implicit semantic distance between two pieces of text. In this thesis, we introduce an alternative method of measuring the semantic distance between texts as a combination of distributional information and relational/ontological knowledge. In this work, we propose a novel distance measure within a network-flow formalism that combines these two distinct components in a way that they are not treated as separate and orthogonal pieces of information. First, we represent each text as a collection of frequency-weighted concepts within a relational thesaurus. Then, we make use of a network-flow method which provides an efficient way of measuring the semantic distance between two texts by taking advantage of the inherently graphical structure in an ontology. We evaluate our method in a variety of NLP tasks. In our task-based evaluation, we find that our method performs well on two of three tasks. We introduce a novel measure which is intended to capture how well our network-flow method perform on a dataset (represented as a collection of frequency-weighted concepts). In our analysis, we find that an integrated approach, rather than a purely distributional or graphical analysis, is more effective in explaining the performance inconsistency. Finally, we address a complexity issue that arises from the overhead required to incorporate more sophisticated concept-to-concept distances into the network-flow framework. We propose a graph transformation method which generates a pared-down network that requires less time to process. The new method achieves a significant speed improvement, and does not seriously hamper performance as a result of the transformation, as indicated in our analysis.
47

Knowledge integration in machine reading

Kim, Doo Soon 04 November 2011 (has links)
Machine reading is the artificial-intelligence task of automatically reading a corpus of texts and, from the contents, building a knowledge base that supports automated reasoning and question answering. Success at this task could fundamentally solve the knowledge acquisition bottleneck – the widely recognized problem that knowledge-based AI systems are difficult and expensive to build because of the difficulty of acquiring knowledge from authoritative sources and building useful knowledge bases. One challenge inherent in machine reading is knowledge integration – the task of correctly and coherently combining knowledge snippets extracted from texts. This dissertation shows that knowledge integration can be automated and that it can significantly improve the performance of machine reading. We specifically focus on two contributions of knowledge integration. The first contribution is for improving the coherence of learned knowledge bases to better support automated reasoning and question answering. Knowledge integration achieves this benefit by aligning knowledge snippets that contain overlapping content. The alignment is difficult because the snippets can use significantly different surface forms. In one common type of variation, two snippets might contain overlapping content that is expressed at different levels of granularity or detail. Our matcher can “see past” this difference to align knowledge snippets drawn from a single document, from multiple documents, or from a document and a background knowledge base. The second contribution is for improving text interpretation. Our approach is to delay ambiguity resolution to enable a machine-reading system to maintain multiple candidate interpretations. This is useful because typically, as the system reads through texts, evidence accumulates to help the knowledge integration system resolve ambiguities correctly. To avoid a combinatorial explosion in the number of candidate interpretations, we propose the packed representation to compactly encode all the candidates. Also, we present an algorithm that prunes interpretations from the packed representation as evidence accumulates. We evaluate our work by building and testing two prototype machine reading systems and measuring the quality of the knowledge bases they construct. The evaluation shows that our knowledge integration algorithms improve the cohesiveness of the knowledge bases, indicating their improved ability to support automated reasoning and question answering. The evaluation also shows that our approach to postponing ambiguity resolution improves the system’s accuracy at text interpretation. / text
48

Structured classification for multilingual natural language processing

Blunsom, Philip Unknown Date (has links) (PDF)
This thesis investigates the application of structured sequence classification models to multilingual natural language processing (NLP). Many tasks tackled by NLP can be framed as classification, where we seek to assign a label to a particular piece of text, be it a word, sentence or document. Yet often the labels which we’d like to assign exhibit complex internal structure, such as labelling a sentence with its parse tree, and there may be an exponential number of them to choose from. Structured classification seeks to exploit the structure of the labels in order to allow both generalisation across labels which differ by only a small amount, and tractable searches over all possible labels. In this thesis we focus on the application of conditional random field (CRF) models (Lafferty et al., 2001). These models assign an undirected graphical structure to the labels of the classification task and leverage dynamic programming algorithms to efficiently identify the optimal label for a given input. We develop a range of models for two multilingual NLP applications: word-alignment for statistical machine translation (SMT), and multilingual super tagging for highly lexicalised grammars.
49

Efficient techniques for streaming cross document coreference resolution

Shrimpton, Luke William January 2017 (has links)
Large text streams are commonplace; news organisations are constantly producing stories and people are constantly writing social media posts. These streams should be analysed in real-time so useful information can be extracted and acted upon instantly. When natural disasters occur people want to be informed, when companies announce new products financial institutions want to know and when celebrities do things their legions of fans want to feel involved. In all these examples people care about getting information in real-time (low latency). These streams are massively varied, people’s interests are typically classified by the entities they are interested in. Organising a stream by the entity being referred to would help people extract the information useful to them. This is a difficult task: fans of ‘Captain America’ films will not want to be incorrectly told that ‘Chris Evans’ (the main actor) was appointed to host ‘Top Gear’ when it was a different ‘Chris Evans’. People who use local idiosyncrasies such as referring to their home county (‘Cornwall’) as ‘Kernow’ (the Cornish for ‘Cornwall’ that has entered the local lexicon) should not be forced to change their language when finding out information about their home. This thesis addresses a core problem for real-time entity-specific NLP: Streaming cross document coreference resolution (CDC), how to automatically identify all the entities mentioned in a stream in real-time. This thesis address two significant problems for streaming CDC: There is no representative dataset and existing systems consume more resources over time. A new technique to create datasets is introduced and it was applied to social media (Twitter) to create a large (6M mentions) and challenging new CDC dataset that contains a much more variend range of entities than typical newswire streams. Existing systems are not able to keep up with large data streams. This problem is addressed with a streaming CDC system that stores a constant sized set of mentions. New techniques to maintain the sample are introduced significantly out-performing existing ones maintaining 95% of the performance of a non-streaming system while only using 20% of the memory.
50

Aprimorando o corretor gramatical CoGrOO / Refining the CoGrOO Grammar Checker

William Daniel Colen de Moura Silva 06 March 2013 (has links)
O CoGrOO é um corretor gramatical de código aberto em uso por milhares de usuários de uma popular suíte de escritório de código aberto. Ele é capaz de identificar erros como: colocação pronominal, concordância nominal, concordância sujeito-verbo, uso da crase, concordância nominal e verbal e outros erros comuns de escrita em Português do Brasil. Para tal, o CoGrOO realiza uma análise híbrida: inicialmente o texto é anotado usando técnicas estatísticas de Processamento de Linguagens Naturais e, em seguida, um sistema baseado em regras é responsável por identificar os possíveis erros gramaticais. O objetivo deste trabalho é reduzir a quantidade de omissões e intervenções indevidas e, ao mesmo tempo, aumentar a quantidade de verdadeiros positivos sem, entretanto, adicionar novas regras de detecção de erros. A última avaliação científica do corretor gramatical foi realizada em 2006 e, desde então, não foram realizados estudos detalhados quanto ao seu desempenho, apesar de o código do sistema ter passado por substancial evolução. Este trabalho contribuirá com uma detalhada avaliação dos anotadores estatísticos e os resultados serão comparados com o estado da arte. Uma vez que os anotadores do CoGrOO estão disponíveis como software livre, melhorias nesses módulos gerarão boas alternativas a sistemas proprietários. / CoGrOO is an open source Brazilian Portuguese grammar checker currently used by thousands of users of a popular open source office suite. It is capable of identifying Brazilian Portuguese mistakes such as pronoun placement, noun agreement, subject-verb agreement, usage of the accent stress marker, subject-verb agreement, and other common errors of Brazilian Portuguese writing. To accomplish this, it performs a hybrid analysis; initially it annotates the text using statistical Natural Language Processing (NLP) techniques, and then a rule-based check is performed to identify possible grammar errors. The goal of this work is to reduce omissions and false alarms while improving true positives without adding new error rules. The last rigorous evaluation of the grammar checker was done in 2006 and since then there has been no detailed study on how it has been performing. This work will also contribute a detailed evaluation of low-level NLP modules and the results will be compared to state-of-the-art results. Since the low-level NLP modules are available as open source software, improvements on their performance will make them robust, free and ready-to-use alternatives for other systems.

Page generated in 0.0455 seconds