Spelling suggestions: "subject:"forminformation extraction,"" "subject:"forminformation axtraction,""
21 |
Semantic Research for Digital LibrariesChen, Hsinchun 10 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / As applications become more pervasive, pressing, and diverse, several well-known information retrieval (IR) problems have become even more urgent. Information overload, a result of the ease of information creation and transmission via the Internet and WWW, has become more troublesome (e.g., even stockbrokers and elementary school students, heavily exposed to various WWW search engines, are versed in such IR terminology as recall and precision). Significant variations in database formats and structures, the richness of information media (text, audio, and video), and an abundance of multilingual information content also have created severe information interoperability problems -- structural interoperability, media interoperability, and multilingual interoperability.
|
22 |
Cognitive Process as a Basis for Intelligent Retrieval Systems DesignChen, Hsinchun, Dhar, Vasant January 1991 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Two studies were conducted to investigate the cognitive processes involved in online document-based information retrieval. These studies led to the development of five computational models of online document retrieval. These models were then incorporated into the design of an "intelligent" document-based retrieval system. Following a discussion of this system, we discuss the broader implications of our research for the design of information retrieval systems.
|
23 |
Validating a Geographic Image Retrieval SystemZhu, Bin, Chen, Hsinchun January 2000 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / This paper summarizes a prototype geographical image
retrieval system that demonstrates how to integrate image
processing and information analysis techniques to
support large-scale content-based image retrieval. By
using an image as its interface, the prototype system
addresses a troublesome aspect of traditional retrieval
models, which require users to have complete knowledge
of the low-level features of an image. In addition
we describe an experiment to validate the performance
of this image retrieval system against that of human
subjects in an effort to address the scarcity of research
evaluating performance of an algorithm against that of
human beings. The results of the experiment indicate
that the system could do as well as human subjects in
accomplishing the tasks of similarity analysis and image
categorization. We also found that under some circumstances
texture features of an image are insufficient to
represent a geographic image. We believe, however,
that our image retrieval system provides a promising
approach to integrating image processing techniques
and information retrieval algorithms.
|
24 |
Interactive Term Suggestion for Users of Digital Libraries: Using Subject Thesauri and Co-occurrence Lists for Information RetrievalSchatz, Bruce R., Johnson, Eric H., Cochrane, Pauline A., Chen, Hsinchun January 1996 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / The basic problem in information retrieval is that large scale searches can only match terms specified by the user to terms appearing in documents in the digital library collection. Intermediate sources that support term suggestion can thus enhance retrieval by providing altentative search terms for the user. Term suggestion increases the recall, while interaction enables the user to attempt to not decrease the precision. We are building a prototype user interface that will become the Web interface for the University of Illinois Digital Library
Initiative (DLI) testbed. It supports the principle of multiple views, wherc different kinds of term suggestors can be used to complement search and each other. This paper discusses its operation with two complementary term suggestors, subject thesauri and co-occurrence lists, and compares their utility. Thesauri are generatad by human indexers and place selected terms in a subject hierarchy. Co-occurrence lists are generated by computer and place all terms in frequency order of occurrence together. This paper concludes with a discussion of how multiple views can help provide good quality Search for the Net. This is a paper about the design of a retrieval system prototype
that allows users to simultaneously combine terms offered by different suggestion techniques, not about comparing the merits of each in a systematic and controlled way. It offers no experimental results.
|
25 |
Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic AlgorithmsChen, Hsinchun 04 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also made an impressive contribution to “intelligent” information retrieval and indexing. More recently, information science researchers have turned to other newer artificial-intelligence- based inductive learning techniques including neural networks, symbolic learning, and genetic algorithms. These newer techniques, which are grounded on diverse paradigms, have provided great opportunities for researchers to enhance the information processing and retrieval capabilities of current information storage and retrieval systems. In this article, we first provide an overview of these newer techniques and their use in information science research. To familiarize readers with these techniques, we present three popular methods: the connectionist Hopfield network; the symbolic ID3/ID5R; and evolution- based genetic algorithms. We discuss their knowledge representations and algorithms in the context of information retrieval. Sample implementation and testing results from our own research are also provided for each technique. We believe these techniques are promising in their ability to analyze user queries, identify users’ information needs, and suggest alternatives for search. With proper user-system interactions, these methods can greatly complement the prevailing full-text, keywordbased, probabilistic, and knowledge-based techniques.
|
26 |
A Knowledge-Based Approach to the Design of Document-Based Retrieval SystemsChen, Hsinchun, Dhar, Vasant January 1990 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / This article presents a knowledge-based approach to the design of document-based retrieval systems. We conducted two empirical studies investigating the users' behavior using an online catalog. The studies revcaled a range of knowledge elements which are necessary for performing a successful search. We proposed a semantic network based representation to capture these knowledge elements. The findings we derived from our empirical studies were used to construct a knowledge-based retrieval system. We performed a laboratory experiment to calculate the search performance of our system. The experiment showed that our system out-performed a conventional retrieval system in recall and user satisfaction. The implications of our study to the design of document-based retrieval systems are also discussed in this article.
|
27 |
Apprentissage interactif de règles d'extraction d'information textuelle / Iteractive learning of textual information extraction rulesBannour, Sondes 16 June 2015 (has links)
L’Extraction d’Information est une discipline qui a émergé du Traitement Automatique des Langues afin de proposer des analyses fines d’un texte écrit en langage naturel et d’améliorer la recherche d’informations spécifiques. Les techniques d’extraction d’information ont énormément évolué durant les deux dernières décennies.Les premiers systèmes d’extraction d’information étaient des systèmes à base de règles écrites manuellement. L’écriture manuelle des règles étant devenue une tâche fastidieuse, des algorithmes d’apprentissage automatique de règles ont été développés.Ces algorithmes nécessitent cependant la rédaction d’un guide d’annotation détaillé, puis l’annotation manuelle d’une grande quantité d’exemples d’entraînement. Pour minimiser l’effort humain requis dans les deux familles d’approches de mise au point de règles, nous avons proposé, dans ce travail de thèse, une approche hybride qui combine les deux en un seul système interactif qui procède en plusieurs itérations.Ce système que nous avons nommé IRIES permet à l’utilisateur de travailler de manière duale sur les règles d’extraction d’information et les exemples d’apprentissage.Pour mettre en place l’approche proposée, nous avons proposé une chaîne d’annotation linguistique du texte et l’utilisation d’un langage de règles expressif pour la compréhensibilité et la généricité des règles écrites ou inférées, une stratégie d’apprentissage sur un corpus réduit pour ne pas discriminer les exemples positifs non encore annotés à une itération donnée, la mise en place d’un concordancier pour l’écriture de règles prospectives et la mise en place d’un module d’apprentissage actif(IAL4Sets) pour une sélection intelligente d’exemples.Ces propositions ont été mises en place et évaluées sur deux corpus : le corpus de BioNLP-ST 2013 et le corpus SyntSem. Une étude de différentes combinaisons de traits linguistiques utilisés dans les expressions des règles a permis de voir l’impactde ces traits sur les performances des règles. L’apprentissage sur un corpus réduit a permis un gain considérable en temps d’apprentissage sans dégradationde performances. Enfin, le module d’apprentissage actif proposé (IAL4Sets) a permis d’améliorer les performances de l’apprentissage actif de base de l’algorithme WHISK grâce à l’introduction de la notion de distance ou de similarité distributionnelle qui permet de proposer à l’utilisateur des exemples sémantiquement proches des exemples positifs déjà couverts. / Non communiqué
|
28 |
Representing Information Collections for Visual CognitionKoh, Eunyee 15 May 2009 (has links)
The importance of digital information collections is growing. Collections are
typically represented with text-only, in a linear list format, which turns out to be a
weak representation for cognition. We learned this from empirical research in cognitive
psychology, and by conducting a study to develop an understanding of current
practices and resulting breakdowns in human experiences of building and utilizing collections.
Because of limited human attention and memory, participants had trouble
finding specific elements in their collections, resulting in low levels of collection utilization.
To address these issues, this research develops new collection representations
for visual cognition. First, we present the image+text surrogate, a concise representation
for a document, or portion thereof, which is easy to understand and think
about. An information extraction algorithm is developed to automatically transform
a document into a small set of image+text surrogates. After refinement, the average
accuracy performance of the algorithm was 90%. Then, we introduce the composition
space to represent collections, which helps people connect elements visually in a
spatial format. To ensure diverse information from multiple sources to be presented
evenly in the composition space, we developed a new control structure, the ResultDis-
tributor. A user study has demonstrated that the participants were able to browse
more diverse information using the ResultDistributor-enhanced composition space.
Participants also found it easier and more entertaining to browse information in this
representation. This research is applicable to represent the information resources in contexts such as search engines or digital libraries. The better representation will enhance
the cognitive efficacy and enjoyment of people’s everyday tasks of information
searching, browsing, collecting, and discovering.
|
29 |
Internet Categorization and Search: A Self-Organizing ApproachChen, Hsinchun, Schuffels, Chris, Orwig, Richard E. January 1996 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CMU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering
algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according
to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing.
|
30 |
GANNET: A machine learning approach to document retrievalChen, Hsinchun, Kim, Jinwoo 12 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Information science researchers have recently turned to new artificial intelligence-based inductive learning techniques including neural networks, symbolic learning and genetic algorithms. An overview of the new techniques and their usage in information science research is provided. The algorithms adopted for a hybrid genetic algorithms and neural nets based system, called GANNET, are presented. GANNET performed concept (keyword) optimization for user-selected documents during information retrieval using the genetic algorithms. It then used the optimized concepts to perform concept exploration in a large network of related concepts through the Hopfield net parallel relaxation procedure. Based on a test collection of about 3,000 articles from DIALOG and an automatically created thesaurus, and using Jaccard's score as a performance measure, the experiment showed that GANNET improved the Jaccard's scores by about 50% and helped identify the underlying concepts that best describe the user-selected documents.
|
Page generated in 0.1357 seconds