51 |
From Information Retrieval to Knowledge Management Enabling Technologies and Best PracticesChen, Hsinchun 11 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / In this era of the Internet and distributed multimedia computing, new and emerging classes of information technologies have swept into the lives of office workers and everyday people. As technologies and applications become more overwhelming, pressing, and diverse, several well-known information technology problems have become even more urgent. Information overload, a result of the ease of information creation and rendering via Internet and WWW, has become more evident in people’s lives. Significant variations of database formats and structures, the richness of information media text, audio, and video , and an abundance of multilingual information content also have created various information interoperability problems - structural interoperability, media interoperability, and multilingual interoperability.
|
52 |
A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated AnnealingChen, Hsinchun, Shankaranarayanan, Ganesan, She, Linlin, Iyer, Anand 06 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers
in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also made an impressive contribution to â â intelligentâ â information retrieval and indexing. More recently, information science researchers have turned to other newer inductive learning techniques including symbolic learning, genetic algorithms, and simulated annealing. These newer
techniques, which are grounded in diverse paradigms, have provided great opportunities for researchers to enhance the information processing and retrieval capabilities of current information systems. In this article, we first provide an overview of these newer techniques and their use in information retrieval research. In order to familiarize readers with the techniques, we present three promising methods: The symbolic ID3 algorithm, evolution-based genetic algorithms, and simulated annealing. We discuss their knowledge representations and algorithms in the unique context of information retrieval. An experiment using a 8000-record COMPEN database was performed to examine the performances of these inductive query-by-example techniques in comparison with the performance of the conventional relevance feedback method. The machine learning techniques were shown to be able to help identify new documents which are similar to documents initially suggested by users, and documents which contain similar concepts to each other. Genetic algorithms, in particular, were found to out-perform relevance feedback in both document recall and precision. We believe these inductive machine learning techniques hold promise for the ability to analyze usersâ preferred documents (or records), identify usersâ underlying information needs, and also suggest alternatives for search for database management systems and Internet applications.
|
53 |
Information navigation on the web by clustering and summarizing query resultsRoussinov, Dmitri G., Chen, Hsinchun January 2001 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / We report our experience with a novel approach to interactive information seeking that is grounded in the idea of summarizing query results through automated document clustering. We went through a complete system development and evaluation cycle: designing the algorithms and interface for our prototype, implementing them and testing with human users. Our prototype acted as an intermediate layer between the user and a commercial Internet search engine (Alta Vista), thus allowing searches of the significant portion of the World Wide Web. In our final evaluation, we processed data from 36 users and concluded that our prototype improved search performance over using the same search engine (Alta Vista) directly. We also analyzed effects of various related demographic and task related parameters.
|
54 |
Lernen und Interpretieren strukturierter Dokumente ein qualitativer Ansatz /Walischewski, Hanno. Unknown Date (has links) (PDF)
Universiẗat, Diss., 1999--Freiburg (Breisgau).
|
55 |
Indexing collections of XML documents with arbitrary linksSayed, Awny Abd el-Hady Ahmed. Unknown Date (has links) (PDF)
Essen, University Duisburg-Essen, Informatik und Wirtschaftsinformatik, Diss., 2005--Duisburg.
|
56 |
Indexing collections of XML documents with arbitrary linksSayed, Awny Abd el-Hady Ahmed January 2005 (has links) (PDF)
Duisburg, Essen, Univ. Duisburg-Essen, Informatik und Wirtschaftsinformatik, Diss., 2005
|
57 |
Context-specific Consistencies in Information Extraction: Rule-based and Probabilistic Approaches / Kontextspezifische Konsistenzen in der Informationsextraktion: Regelbasierte und Probabilistische AnsätzeKlügl, Peter January 2015 (has links) (PDF)
Large amounts of communication, documentation as well as knowledge and information are stored in textual documents. Most often, these texts like webpages, books, tweets or reports are only available in an unstructured representation since they are created and interpreted by humans. In order to take advantage of this huge amount of concealed information and to include it in analytic processes, it needs to be transformed into a structured representation. Information extraction considers exactly this task. It tries to identify well-defined entities and relations in unstructured data and especially in textual documents.
Interesting entities are often consistently structured within a certain context, especially in semi-structured texts. However, their actual composition varies and is possibly inconsistent among different contexts. Information extraction models stay behind their potential and return inferior results if they do not consider these consistencies during processing. This work presents a selection of practical and novel approaches for exploiting these context-specific consistencies in information extraction tasks. The approaches direct their attention not only to one technique, but are based on handcrafted rules as well as probabilistic models.
A new rule-based system called UIMA Ruta has been developed in order to provide optimal conditions for rule engineers. This system consists of a compact rule language with a high expressiveness and strong development support. Both elements facilitate rapid development of information extraction applications and improve the general engineering experience, which reduces the necessary efforts and costs when specifying rules.
The advantages and applicability of UIMA Ruta for exploiting context-specific consistencies are illustrated in three case studies. They utilize different engineering approaches for including the consistencies in the information extraction task. Either the recall is increased by finding additional entities with similar composition, or the precision is improved by filtering inconsistent entities. Furthermore, another case study highlights how transformation-based approaches are able to correct preliminary entities using the knowledge about the occurring consistencies.
The approaches of this work based on machine learning rely on Conditional Random Fields, popular probabilistic graphical models for sequence labeling. They take advantage of a consistency model, which is automatically induced during processing the document. The approach based on stacked graphical models utilizes the learnt descriptions as feature functions that have a static meaning for the model, but change their actual function for each document. The other two models extend the graph structure with additional factors dependent on the learnt model of consistency. They include feature functions for consistent and inconsistent entities as well as for additional positions that fulfill the consistencies.
The presented approaches are evaluated in three real-world domains: segmentation of scientific references, template extraction in curricula vitae, and identification and categorization of sections in clinical discharge letters. They are able to achieve remarkable results and provide an error reduction of up to 30% compared to usually applied techniques. / Diese Arbeit befasst sich mit regelbasierten und probabilistischen Ansätzen der Informationsextraktion, welche kontextspezifische Konsistenzen ausnutzen und somit die Extraktionsgenauigkeit verbessern.
|
58 |
FROntIER: A Framework for Extracting and Organizing Biographical Facts in Historical DocumentsPark, Joseph 01 January 2015 (has links) (PDF)
The tasks of entity recognition through ontological commitment, fact extraction and organization with respect to a target schema, and entity deduplication have all been examined in recent years, and systems exist that can perform each individual task. A framework combining all these tasks, however, is still needed to accomplish the goal of automatically extracting and organizing biographical facts about persons found in historical documents into disambiguated entity records. We introduce FROntIER (Fact Recognizer for Ontologies with Inference and Entity Resolution) as the framework to recognize and extract facts using an ontology and organize facts of interest through inferring implicit facts using inference rules, a target ontology, and entity resolution. We give two case studies of FROntIER's performance over a few select pages from The Ely Ancestry [BEV02] and Index to The Register of Marriages and Baptisms in the Parish of Kilbarchan, 1649-1772 [Gra12].
|
59 |
Tang dynasty clothing folds information extraction based on single imagesZhu, Y.L., Liu, Y.Q., Wan, Tao Ruan, Wu, T. January 2014 (has links)
No
|
60 |
Autonomous Consolidation of Heterogeneous Record-Structured HTML Data in ChameleonChouvarine, Philippe 07 May 2005 (has links)
While progress has been made in querying digital information contained in XML and HTML documents, success in retrieving information from the so called "hidden Web" (data behind Web forms) has been modest. There has been a nascent trend of developing autonomous tools for extracting information from the hidden Web. Automatic tools for ontology generation, wrapper generation, Weborm querying, response gathering, etc., have been reported in recent research. This thesis presents a system called Chameleon for automatic querying of and response gathering from the hidden Web. The approach to response gathering is based on automatic table structure identification, since most information repositories of the hidden Web are structured databases, and so the information returned in response to a query will have regularities. Information extraction from the identified record structures is performed based on domain knowledge corresponding to the domain specified in a query. So called "domain plug-ins" are used to make the dynamically generated wrappers domain-specific, rather than conventionally used document-specific.
|
Page generated in 0.1356 seconds