• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1010
  • 191
  • 86
  • 71
  • 34
  • 19
  • 18
  • 13
  • 11
  • 10
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1805
  • 1805
  • 1571
  • 667
  • 584
  • 474
  • 412
  • 391
  • 277
  • 269
  • 262
  • 237
  • 230
  • 219
  • 206
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

What's the point? : a (computational) theory of punctuation

Jones, Bernard January 1996 (has links)
Although punctuation is clearly an important part of the written language, many natural language processing systems developed to date simply ignore punctuation in input text, or do not place it in output text. The reason for this is the lack of any clear, implementable theory of punctuation function suitable for transfer to the computational domain. The work described in this thesis aims to build on previous linguistic work on the function of punctuation, particularly that by Nunberg (1990),with experimental and theoretical investi- gations into the potential usefulness of including punctuation in natural language analyses, the variety of punctuation marks present in text, and the syntactic and semantic functions of those marks . Results from these investigations are combined into a taxonomy of punctuation marks and synthesised into a theory describing principles and rule schemata whereby punctuation functionality can be added to natural language processing systems. The thesis begins with some introductory chapters, discussing the nature of punctuation, its history, and previous approaches to theoretical description. Subsequent chapters describe the experimental and theoretical investigations into the potential uses of punctuation in compu- tational systems, the variety of punctuation marks used, and the syntactic and semantic functions thatpunctuation marks fulfil. Further chapters then construct a taxonomy of punctu- ation marks and describe the theory synthesised from the results of the investigations. The concluding chapters sum up the research and discuss its possible extension to languages other than English.
12

Combining Text Structure and Meaning to Support Text Mining

McDonald, Daniel Merrill January 2006 (has links)
Text mining methods strive to make unstructured text more useful for decision making. As part of the mining process, language is processed prior to analysis. Processing techniques have often focused primarily on either text structure or text meaning in preparing documents for analysis. As approaches have evolved over the years, increases in the use of lexical semantic parsing usually have come at the expense of full syntactic parsing. This work explores the benefits of combining structure and meaning or syntax and lexical semantics to support the text mining process.Chapter two presents the Arizona Summarizer, which includes several processing approaches to automatic text summarization. Each approach has varying usage of structural and lexical semantic information. The usefulness of the different summaries is evaluated in the finding stage of the text mining process. The summary produced using structural and lexical semantic information outperforms all others in the browse task. Chapter three presents the Arizona Relation Parser, a system for extracting relations from medical texts. The system is a grammar-based system that combines syntax and lexical semantic information in one grammar for relation extraction. The relation parser attempts to capitalize on the high precision performance of semantic systems and the good coverage of the syntax-based systems. The parser performs in line with the top reported systems in the literature. Chapter four presents the Arizona Entity Finder, a system for extracting named entities from text. The system greatly expands on the combination grammar approach from the relation parser. Each tag is given a semantic and syntactic component and placed in a tag hierarchy. Over 10,000 tags exist in the hierarchy. The system is tested on multiple domains and is required to extract seven additional types of entities in the second corpus. The entity finder achieves a 90 percent F-measure on the MUC-7 data and an 87 percent F-measure on the Yahoo data where additional entity types were extracted.Together, these three chapters demonstrate that combining text structure and meaning in algorithms to process language has the potential to improve the text mining process. A lexical semantic grammar is effective at recognizing domain-specific entities and language constructs. Syntax information, on the other hand, allows a grammar to generalize its rules when possible. Balancing performance and coverage in light of the world's growing body of unstructured text is important.
13

Financial information extraction using pre-defined and user-definable templates in the Lolita system

Constantino, Marco January 1997 (has links)
Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artificial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Differently, little progress has been made in the processing of qualitative data, which mainly consists of financial news articles from financial newspapers or on-line news providers. As a result the financial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the financial operators. The research involves the identification of the information in the source financial articles which is relevant for the financial operators' investment decision making process and to implement the associated templates in the LOLITA system. The system should process a large number of source articles and extract specific templates according to the relevant information located in the source articles. The project also involves the design and implementation in LOLITA of a user- definable template interface for allowing the users to easily design new templates using sentences in natural language. This allows user-defined information extraction from source texts. This differs from most of existing information extraction systems which require the developers to code the templates directly in the system. The results of the research have shown that the system performed well in the extraction of financial templates from source articles which would allow the financial operator to reduce his qualitative data-overload. The results have also shown that the user-definable template interface is a viable approach to user-defined information extraction. A trade-off has been identified between the ease of use of the user-definable template interface and the loss of performance compared to hand- coded templates.
14

A logical approach to schema-based inference

Wobcke, W. R. January 1988 (has links)
No description available.
15

Robust processing for constraint-based grammar formalisms

Fouvry, Frederik January 2003 (has links)
No description available.
16

Ontology learning from Swedish text

Bothma, Bothma January 2015 (has links)
Ontology learning from text generally consists roughly of NLP, knowledge extraction and ontology construction. While NLP and information extraction for Swedish is approaching that of English, these methods have not been assembled into the full ontology learning pipeline. This means that there is currently very little automated support for using knowledge from Swedish literature in semantically-enabled systems. This thesis demonstrates the feasibility of using some existing OL methods for Swedish text and elicits proposals for further work toward building and studying open domain ontology learning systems for Swedish and perhaps multiple languages. This is done by building a prototype ontology learning system based on the state of the art architecture of such systems, using the Korp NLP framework for Swedish text, the GATE system for corpus and annotation management, and embedding it as a self-contained plugin to the Protege ontology engineering framework. The prototype is evaluated similarly to other OL systems. As expected, it is found that while sufficient for demonstrating feasibility, the ontology produced in the evaluation is not usable in practice, since many more methods and fewer cascading errors are necessary to richly and accurately model the domain. In addition to simply implementing more methods to extract more ontology elements, a framework for programmatically defining knowledge extraction and ontology construction methods and their dependencies is recommended to enable more effective research and application of ontology learning.
17

A corpus-based study of anaphora in dialogues in English and Portuguese

Rocha, Marco Antonio Esteves da January 1998 (has links)
No description available.
18

An engineering approach to knowledge acquisition by the interactive analysis of dictionary definitions

Poria, Sanjay January 1998 (has links)
It has long been recognised that everyday dictionaries are a potential source of lexical and world knowledge of the type required by many Natural Language Processing (NLP) systems. This research presents a semi-automated approach to the extraction of rich semantic relationships from dictionary definitions. The definitions are taken from the recently published "Cambridge International Dictionary of English" (CIDE). The thesis illustrates how many of the innovative features of CIDE can be exploited during the knowledge acquisition process. The approach introduced in this thesis uses the LOLITA NLP system to extract and represent semantic relationships, along with a human operator to resolve the different forms of ambiguity which exist within dictionary definitions. Such a strategy combines the strengths of both participants in the acquisition process: automated procedures provide consistency in the construction of complex and inter-related semantic relationships, while the human participant can use his or her knowledge to determine the correct interpretation of a definition. This semi-automated strategy eliminates the weakness of many existing approaches because it guarantees feasibility and correctness: feasibility is ensured by exploiting LOLITA's existing NLP capabilities so that humans with minimal linguistic training can resolve the ambiguities within dictionary definitions; and correctness is ensured because incorrectly interpreted definitions can be manually eliminated. The feasibility and correctness of the solution is supported by the results of an evaluation which is presented in detail in the thesis.
19

Interpretation of anaphoric expressions in the Lolita system

Urbanowicz, Agnieszka Joanna January 1998 (has links)
This thesis addresses the issue of anaphora resolution in the large scale natural language system, LOLITA. The work described here involved a thorough analysis of the system’s initial performance, the collection of evidence for and the design of the new anaphora resolution algorithm, and subsequent implementation and evaluation of the system. Anaphoric expressions are elements of a discourse whose resolution depends on other elements of the preceding discourse. The processes involved in anaphora resolution have long been the subject of research in a variety of fields. The changes carried out to LOLITA first involved substantial improvements to the core, lower level modules which form the basis of the system. A major change specific to the interpretation of anaphoric expressions was then introduced. A system of filters, in which potential candidates for resolution are filtered according to a set of heuristics, has been changed to a system of penalties, where candidates accumulate points throughout the application of the heuristics. At the end of the process, the candidate with the smallest penalty is chosen as a referent. New heuristics, motivated by evidence drawn from research in linguistics, psycholinguistics and AI, have been added to the system. The system was evaluated using a procedure similar to that defined by MUC6 (DARPA 1995). Blind and open tests were used. The first evaluation was carried out after the general improvements to the lower level modules; the second after the introduction of the new anaphora algorithm. It was found that the general improvements led to a considerable rise in scores in both the blind and the open test sets. As a result of the anaphora specific improvements, on the other hand, the rise in scores on the open set was larger than the rise on the blind set. In the open set the category of pronouns showed the most marked improvement. It was concluded that it is the work carried out to the basic, lower level modules of a large scale system which leads to biggest gains. It was also concluded that considerable extra advantage can be gained by using the new weights-based algorithm together with the generally improved system.
20

Nuggeteer: Automatic Nugget-Based Evaluation Using Descriptions and Judgements

Marton, Gregory 09 January 2006 (has links)
TREC Definition and Relationship questions are evaluated on thebasis of information nuggets that may be contained in systemresponses. Human evaluators provide informal descriptions of eachnugget, and judgements (assignments of nuggets to responses) for eachresponse submitted by participants.The best present automatic evaluation for these kinds of questions isPourpre. Pourpre uses a stemmed unigram similarity of responses withnugget descriptions, yielding an aggregate result that is difficult tointerpret, but is useful for relative comparison. Nuggeteer, bycontrast, uses both the human descriptions and the human judgements,and makes binary decisions about each response, so that the end resultis as interpretable as the official score.I explore n-gram length, use of judgements, stemming, and termweighting, and provide a new algorithm quantitatively comparable to,and qualitatively better than the state of the art.

Page generated in 0.0471 seconds