Spelling suggestions: "subject:"forminformation extraction"" "subject:"forminformation axtraction""
1 |
Alleviating Search Uncertainty through Concept Associations: Automatic Indexing, Co-Occurrence Analysis, and Parallel ComputingChen, Hsinchun, Martinez, Joanne, Kirchhoff, Amy, Ng, Tobun Dorbin, Schatz, Bruce R. January 1998 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurrence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400,000/ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compared with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in concept recall, but in concept precision the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase â â varietyâ â in search terms and thereby reduce search uncertainty.
|
2 |
Information extraction in the practical applications system and techniques /Xiao, Luo. January 1900 (has links)
Erlangen, Nürnberg, Univ., Diss., 2003. / Dateien im PDF-Format. Computerdatei im Fernzugriff.
|
3 |
Information extraction in the practical applications system and techniques /Xiao, Luo. January 1900 (has links)
Erlangen, Nürnberg, University, Diss., 2003. / Dateien im PDF-Format.
|
4 |
Grading knowledge extracting degree information from texts /Staab, Steffen. January 2000 (has links)
Zugl.: Freiburg (Breisgau), University, Diss., 1999. / Lizenzpflichtig.
|
5 |
Joint Biomedical Event Extraction and Entity Linking via Iterative Collaborative TrainingLi, Xiaochu 05 1900 (has links)
Biomedical entity linking and event extraction are two crucial tasks to support text understanding and retrieval in the biomedical domain. These two tasks intrinsically benefit each other: entity linking disambiguates the biomedical concepts by referring to external knowledge bases and the domain knowledge further provides additional clues to understand and extract the biological processes, while event extraction identifies a key trigger and entities involved to describe each biological process which also captures the structural context to better disambiguate the biomedical entities. However, previous research typically solves these two tasks separately or in a pipeline, leading to error propagation. What's more, it's even more challenging to solve these two tasks together as there is no existing dataset that contains annotations for both tasks. To solve these challenges, we propose joint biomedical entity linking and event extraction by regarding the event structures and entity references in knowledge bases as latent variables and updating the two task-specific models in an iterative training framework: (1) predicting the missing variables for each partially annotated dataset based on the current two task-specific models, and (2) updating the parameters of each model on the corresponding pseudo completed dataset. Experimental results on two benchmark datasets: Genia 2011 for event extraction and BC4GO for entity linking, show that our joint framework significantly improves the model for each individual task and outperforms the strong baselines for both tasks. We will make the code and model checkpoints publicly available once the paper is accepted. / M.S. / Biomedical entity linking and event extraction are essential tasks in understanding and retrieving information from biomedical texts. These tasks mutually benefit each other, as entity linking helps disambiguate biomedical concepts by leveraging external knowledge bases, while domain knowledge provides valuable insights for understanding and extracting biological processes. Event extraction, on the other hand, identifies triggers and entities involved in describing biological processes, capturing their contextual relationships for improved entity disambiguation. However, existing approaches often address these tasks separately or in a sequential manner, leading to error propagation. Furthermore, the joint solution becomes even more challenging due to the lack of datasets with annotations for both tasks.
To overcome these challenges, we propose a novel approach for jointly performing biomedical entity linking and event extraction. Our method treats the event structures and entity references in knowledge bases as latent variables and employs an iterative training framework. This framework involves predicting missing variables in partially annotated datasets based on the current task-specific models and updating the model parameters using the completed datasets. Experimental results on benchmark datasets, namely Genia 2011 for event extraction and BC4GO for entity linking, demonstrate the effectiveness of our joint framework. It significantly improves the performance of each individual task and outperforms strong baselines for both tasks.
|
6 |
Interpretable Models for Information ExtractionValenzuela Escárcega, Marco Antonio January 2016 (has links)
There is an abundance of information being generated constantly, most of it encoded as unstructured text. The information expressed this way, although publicly available, is not directly usable by computer systems because it is not organized according to a data model that could inform us how different data nuggets relate to each other. Information extraction provides a way of scanning unstructured text and extracting structured knowledge suitable for querying and manipulation. Most information extraction research focuses on machine learning approaches that can be considered black boxes when deployed in information extraction systems. We propose a declarative language designed for the information extraction task. It allows the use of syntactic patterns alongside token-based surface patterns that incorporate shallow linguistic features. It captures complex constructs such as nested structures, and complex regular expressions over syntactic patterns for event arguments. We implement a novel information extraction runtime system designed for the compilation and execution of the proposed language. The runtime system has novel features for better declarative support, while preserving practicality. It supports features required for handling natural language, like the preservation of ambiguity and the efficient use of contextual information. It has a modular architecture that allows it to be extended with new functionality, which, together with the language design, provides a powerful framework for the development and research of new ideas for declarative information extraction. We use our language and runtime system to build a biomedical information extraction system. This system is capable of recognizing biological entities (e.g., genes, proteins, protein families, simple chemicals), events over entities (e.g., biochemical reactions), and nested events that take other events as arguments (e.g., catalysis). Additionally, it captures complex natural language phenomena like coreference and hedging. Finally, we propose a rule learning procedure to extract rules from statistical systems trained for information extraction. Rule learning combines the advantages of machine learning with the interpretability of our models. This enables us to train information extraction systems using annotated data that can then be extended and modified by human experts, and in this way accelerate the deployment of new systems that can still be extended or modified by human experts.
|
7 |
Intelligent internet searching agent based on hybrid simulated annealingYang, Christopher C., Yen, Jerome, Chen, Hsinchun January 2000 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / The World-Wide Web WWW based Internet services have become a major channel for information delivery. For the same reason, information overload also has become a serious problem to the users of such services. It has been estimated that the amount of information stored on the Internet doubled every 18 months. The speed of increase of homepages can be even faster, some people estimated that it doubled every 6 months. Therefore, a scalable approach to support Internet searching is critical to the success of Internet services and other current or future National Information Infrastructure NII applications. In this paper, we discuss a modified version of simulated annealing algorithm to develop an intelligent personal
spider agent, which is based on automatic textual analysis of the Internet documents and hybrid simulated annealing.
|
8 |
Framework of Ontology-based Blogroll Recommendation SystemChiu, Chien-Pei 27 July 2005 (has links)
Weblogs have been growing quickly and transforming the World Wide Web toward a dynamic environment that Web pages are frequently updated. Although, Google has developed the search engine successfully in cope with the traditional web pages, it cannot effectively handle the dynamic blogspace.
This research proposes an ontology-based semantic annotation framework based on concepts level in order to adaptively recommend blogrolls. The keyword match is being replaced by the semantic annotation technology of IE (Information Extraction) domain to implement a recommendation system. The objective of the recommendation system is to produce a recommended blogroll to the target weblog based on the weblog¡¦s concept affinities. Data sources of this research are from java.blogs community. The experiment of recommendation system is evaluated by Java programmers.
The recommended blogrolls are evaluated by relevance that subjects score the degree of relevance between a target blogger and the recommended blogroll. The reliability of relevance among subjects is also tested. The results show that the recommended blogrolls obtain the middle level of relevance measured by subjects. The relevance evaluation of the recommended blogroll is independent from the concept density of the target blogger. The recommendation system is also reliable. Moreover, this study shed light on directions to improve the automated blogroll recommendation.
|
9 |
Filling Preposition-based Templates To Capture Information from Medical AbstractsLeroy, Gondy, Chen, Hsinchun January 2002 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Due to the recent explosion of information in the biomedical field, it is hard for a single researcher to review the complex network involving genes, proteins, and interactions. We are currently building GeneScene, a toolkit that will assist researchers in reviewing existing literature, and report on the first phase in our development effort: extracting the relevant information from medical abstracts. We are developing a medical parser that extracts information, fills basic prepositional-based templates, and combines the templates to capture the underlying sentence logic. We tested our parser on 50 unseen abstracts and found that it extracted 246 templates with a precision of 70%. In comparison with many other techniques, more information was extracted without sacrificing precision. Future improvement in precision will be achieved by correcting three categories of errors.
|
10 |
Multilingual Input System for the Web - an Open Multimedia Approach of Keyboard and Handwriting Recognition for Chinese and JapaneseRamsey, Marshall C., Ong, Thian-Huat, Chen, Hsinchun January 1998 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / The basic building block of a multilingual information
retrieval system is the input system. Chinese and
Japanese characters pose great challenges for the
conventional 101 -key alphabet-based keyboard, because
they are radical-based and number in the thousands. This
paper reviews the development of various approaches and
then presents a framework and working demonstrations of
Chinese and Japanese input methods implemented in
Java, which allow open deployment over the web to any
platform, The demo includes both popular keyboard input
methods and neural network handwriting recognition
using a mouse or pen. This framework is able to
accommodate future extension to other input mediums
and languages of interest.
|
Page generated in 0.1143 seconds