Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
1 |
Computer assisted grammar constructionShih, Hsue-Hueh January 1995 (has links)
No description available.
|
2 |
An investigation into statistically-based lexical ambiguity resolutionSutton, Stephen January 1992 (has links)
No description available.
|
3 |
Syntactic pre-processing in single-word prediction for disabled peopleWood, Matthew Edward John January 1996 (has links)
No description available.
|
4 |
A computational model of task oriented discourseElliot, Mark James January 1995 (has links)
No description available.
|
5 |
Temporal information in newswire articles : an annotation scheme and corpus studySetzer, Andrea January 2002 (has links)
Many natural language processing applications, such as information extraction, question answering, topic detection and tracking, would benefit significantly from the ability to accurately position reported events in time, either relatively with respect to other events or absolutely with respect to calendrical time. However, relatively little work has been done to date on the automatic extraction of temporal information from text. Before we can progress to automatically position reported events in time, we must gain an understanding of the mechanisms used to do this in language. This understanding can be promoted through the development of all annotation scheme, which allows us to identify the textual expressions conveying events, times and temporal relations in a corpus of 'real' text. This thesis describes a fine-grained annotation scheme with which we can capture all events, times and temporal relations reported ill a text. To aid the application of the scheme to text, a graphical annotation tool has been developed. This tool not only allows easy markup of sophisticated temporal annotations, it also contains an interactive, inference-based component supporting the gathering of temporal relations. The annotation scheme and the tool have been evaluated through the construction of a trial corpus during a pilot study. In this study, a group of annotators was supplied with a description of the annotation scheme and asked to apply it to a trial corpus. The pilot study showed that the annotation scheme was difficult to apply, but is feasible with improvements to the definition of the annotation scheme and the tool. Analysis of the resulting trial corpus also provides preliminary results on the relative extent to which different linguistic mechanisms, explicit and implicit, are used to convey temporal relational information in text.
|
6 |
Combining Text Structure and Meaning to Support Text MiningMcDonald, Daniel Merrill January 2006 (has links)
Text mining methods strive to make unstructured text more useful for decision making. As part of the mining process, language is processed prior to analysis. Processing techniques have often focused primarily on either text structure or text meaning in preparing documents for analysis. As approaches have evolved over the years, increases in the use of lexical semantic parsing usually have come at the expense of full syntactic parsing. This work explores the benefits of combining structure and meaning or syntax and lexical semantics to support the text mining process.Chapter two presents the Arizona Summarizer, which includes several processing approaches to automatic text summarization. Each approach has varying usage of structural and lexical semantic information. The usefulness of the different summaries is evaluated in the finding stage of the text mining process. The summary produced using structural and lexical semantic information outperforms all others in the browse task. Chapter three presents the Arizona Relation Parser, a system for extracting relations from medical texts. The system is a grammar-based system that combines syntax and lexical semantic information in one grammar for relation extraction. The relation parser attempts to capitalize on the high precision performance of semantic systems and the good coverage of the syntax-based systems. The parser performs in line with the top reported systems in the literature. Chapter four presents the Arizona Entity Finder, a system for extracting named entities from text. The system greatly expands on the combination grammar approach from the relation parser. Each tag is given a semantic and syntactic component and placed in a tag hierarchy. Over 10,000 tags exist in the hierarchy. The system is tested on multiple domains and is required to extract seven additional types of entities in the second corpus. The entity finder achieves a 90 percent F-measure on the MUC-7 data and an 87 percent F-measure on the Yahoo data where additional entity types were extracted.Together, these three chapters demonstrate that combining text structure and meaning in algorithms to process language has the potential to improve the text mining process. A lexical semantic grammar is effective at recognizing domain-specific entities and language constructs. Syntax information, on the other hand, allows a grammar to generalize its rules when possible. Balancing performance and coverage in light of the world's growing body of unstructured text is important.
|
7 |
Financial information extraction using pre-defined and user-definable templates in the Lolita systemConstantino, Marco January 1997 (has links)
Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artificial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Differently, little progress has been made in the processing of qualitative data, which mainly consists of financial news articles from financial newspapers or on-line news providers. As a result the financial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the financial operators. The research involves the identification of the information in the source financial articles which is relevant for the financial operators' investment decision making process and to implement the associated templates in the LOLITA system. The system should process a large number of source articles and extract specific templates according to the relevant information located in the source articles. The project also involves the design and implementation in LOLITA of a user- definable template interface for allowing the users to easily design new templates using sentences in natural language. This allows user-defined information extraction from source texts. This differs from most of existing information extraction systems which require the developers to code the templates directly in the system. The results of the research have shown that the system performed well in the extraction of financial templates from source articles which would allow the financial operator to reduce his qualitative data-overload. The results have also shown that the user-definable template interface is a viable approach to user-defined information extraction. A trade-off has been identified between the ease of use of the user-definable template interface and the loss of performance compared to hand- coded templates.
|
8 |
Robust processing for constraint-based grammar formalismsFouvry, Frederik January 2003 (has links)
No description available.
|
9 |
Ontology learning from Swedish textBothma, Bothma January 2015 (has links)
Ontology learning from text generally consists roughly of NLP, knowledge extraction and ontology construction. While NLP and information extraction for Swedish is approaching that of English, these methods have not been assembled into the full ontology learning pipeline. This means that there is currently very little automated support for using knowledge from Swedish literature in semantically-enabled systems. This thesis demonstrates the feasibility of using some existing OL methods for Swedish text and elicits proposals for further work toward building and studying open domain ontology learning systems for Swedish and perhaps multiple languages. This is done by building a prototype ontology learning system based on the state of the art architecture of such systems, using the Korp NLP framework for Swedish text, the GATE system for corpus and annotation management, and embedding it as a self-contained plugin to the Protege ontology engineering framework. The prototype is evaluated similarly to other OL systems. As expected, it is found that while sufficient for demonstrating feasibility, the ontology produced in the evaluation is not usable in practice, since many more methods and fewer cascading errors are necessary to richly and accurately model the domain. In addition to simply implementing more methods to extract more ontology elements, a framework for programmatically defining knowledge extraction and ontology construction methods and their dependencies is recommended to enable more effective research and application of ontology learning.
|
10 |
A corpus-based study of anaphora in dialogues in English and PortugueseRocha, Marco Antonio Esteves da January 1998 (has links)
No description available.
|
Page generated in 0.1198 seconds