11 |
Inductive Query by Examples (IQBE): A Machine Learning ApproachChen, Hsinchun, She, Linlin January 1994 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / This paper presents an incremental, inductive learning approach to query-by examples for information retrieval (IR) and database management systems (DBMS). After briefly reviewing conventional information retrieval techniques and the prevailing database query paradigms, we introduce the ID5R algorithm, previously developed by Utgoff, for ``intelligent'' and system-supported query processing. We describe in detail how we adapted the ID5R algorithm for IR/DBMS applications and we present two examples, one for IR applications and the other for DBMS applications, to demonstrate the feasibility of the approach. Using a larger test collection of about 1000 document records from the COMPEN CD-ROM computing literature database and using recall as a performance measure, our experiment showed that the incremental ID5R performed significantly better than a batch inductive learning algorithm (called ID3) which we developed earlier. Both algorithms, however, were
robust and efficient in helping users develop abstract queries from examples. We believe this research has shed light on the feasibility and the novel characteristics of a new query paradigm, namely, inductive query-by examples
(IQBE). Directions of our current research are summarized at the end of the paper.
|
12 |
Updateable PAT-Tree Approach to Chinese Key Phrase Extraction using Mutual Information: A Linguistic Foundation for Knowledge ManagementOng, Thian-Huat, Chen, Hsinchun January 1999 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / There has been renewed research interest in using the statistical approach to extraction
of key phrases from Chinese documents because existing approaches do not allow online
frequency updates after phrases have been extracted. This consequently results in
inaccurate, partial extraction. In this paper, we present an updateable PAT-tree
approach. In our experiment, we compared our approach with that of Lee-Feng Chien
with that showed an improvement in recall from 0.19 to 0.43 and in precision from 0.52
to 0.70. This paper also reviews the requirements for a data structure that facilitates
implementation of any statistical approaches to key-phrase extraction, including PATtree,
PAT-array and suffix array with semi-infinite strings.
|
13 |
Multilingual input system for the Web - an open multimedia approach of keyboard and handwritten recognition for Chinese and JapaneseRamsey, Marshall C., Ong, Thian-Huat, Chen, Hsinchun January 1998 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / The basic building block of a multilingual information
retrieval system is the input system. Chinese and
Japanese characters pose great challenges for the
conventional 101-key alphabet-based keyboard, because
they are radical-based and number in the thousands. This
paper reviews the development of various approaches and
then presents a framework and working demonstrations of
Chinese and Japanese input methods implemented in
Java, which allow open deployment over the web to any
platform, The demo includes both popular keyboard input
methods and neural network handwriting recognition
using a mouse or pen. This framework is able to
accommodate future extension to other input mediums
and languages of interest.
|
14 |
Concept-based searching and browsing: a geoscience experimentHauck, Roslin V., Sewell, Robin R., Ng, Tobun Dorbin, Chen, Hsinchun January 2001 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / In the recent literature, we have seen the expansion of information retrieval techniques to include a variety of different collections of information. Collections can have certain characteristics that can lead to different results for the various classification techniques. In addition, the ways and reasons that users explore each collection can affect the success of the information retrieval technique. The focus of this research was to extend the application of our statistical and neural network techniques to the domain of geological science information retrieval. For this study, a test bed of 22,636 geoscience abstracts was obtained through the NSF/DARPA/NASA funded Alexandria Digital Library Initiative project at the University of California at Santa Barbara. This collection was analyzed using algorithms previously developed by our research group: concept space algorithm for searching and a Kohonen self-organizing map (SOM) algorithm for browsing. Included in this paper are discussions of our techniques, user evaluations and lessons learned.
|
15 |
DGPort: A Web Portal for Digital GovernmentYin, C.Q., Nickels, L.D., Chen, C.Z., Ng, Gavin, Chen, Hsinchun January 2003 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / This paper provides a summary of the initial development of a Web portal for the digital government
domain. Information retrieval techniques commonly used to find information on the Internet are
discussed along with the problems associated with these techniques that led to the development of the
Digital Government Web portal (DGPort). We also discuss the advantages that DGPort could have for
researchers in the digital government domain as well as the value-added features that this portal provides.
Future evaluation plans for the portal are also described.
|
16 |
An Issues Identifier for Online Financial DatabasesYen, J., Chen, Hsinchun, Ma, P., Bui, T. January 1995 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / A major problem that decision makers are facing in an information-rich society is how to absorb, filter and make effective use of available data. The problem caused by information overflow could lead to the losses of competitiveness. This paper presents a knowledge-based approach to building an issues identifier to help investors
overcome information overflow problems when dealing with very large on-line financial databases. The proposed software system is able to extract critical issues from the on-line financial databases. The system was developed based on a number of techniques: automatic indexing, concept space genemtion, and neural network classification. In this paper, we describe how these techniques are used to extract subject descriptors, their semantic relationships, and the related texts (documents
or paragraphs) to each descriptor. The proposed system has been tested with the annual reports from thirteen of the largest international banks.
|
17 |
Semantic Retrieval for the NCSA MosaicChen, Hsinchun, Schatz, Bruce R. January 1994 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / In this paper we report an automatic and scalable concept space approach to enhancing the deep searching capability of the NCSA Mosaic. The research, which is based on the findings from a previous NSF National Collaboratory project and which will be expanded in a new Illinois NSF/ARPA/NASA Digital Library project, centers around semantic retrieval and user customization. Semantic retrieval supports a higher level of abstraction in user search, which can overcome the vocabulary problem for information retrieval. Rather than searching for words within the object space, the search is for terms within a concept space (graph of terms occurring within objects linked to each other by the frequency with which they occur together). Co-occurrence graphs seem to provide good suggestive power in specialized domains, such as biology. By providing a more understandable, system-generated, semantics-rich concept space as an abstraction of the enormously complex object space plus algorithms and interface to assist in object/concept spaces traversal, we believe we can greatly alleviate both information overload and the vocabulary problem of internet services. These techniques will also be used to provide a form of customized retrieval and automatic information routing. Results from past research, the specific algorithms and techniques, and the research plan for enhancing the NCSA Mosaic's search capability in the NSF/ARPA/NASA Digital Library project will be discussed.
|
18 |
A Graph Model for E-Commerce Recommender SystemsHuang, Zan, Chung, Wingyan, Chen, Hsinchun January 2004 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Information overload on the Web has created enormous
challenges to customers selecting products for online
purchases and to online businesses attempting to identify
customersâ preferences efficiently. Various recommender
systems employing different data representations
and recommendation methods are currently used
to address these challenges. In this research, we developed
a graph model that provides a generic data representation
and can support different recommendation
methods. To demonstrate its usefulness and flexibility,
we developed three recommendation methods: direct
retrieval, association mining, and high-degree association
retrieval. We used a data set from an online bookstore
as our research test-bed. Evaluation results
showed that combining product content information and
historical customer transaction information achieved
more accurate predictions and relevant recommendations
than using only collaborative information. However,
comparisons among different methods showed
that high-degree association retrieval did not perform
significantly better than the association mining method
or the direct retrieval method in our test-bed.
|
19 |
A Path to Concept-based Information Access: From National Collaboratories to Digital LibrariesHouston, Andrea L., Chen, Hsinchun January 2000 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / This research aims to provide a semantic, concept-based retrieval option that could supplement existing information retrieval options. Our proposed approach is based on textual analysis of a large corpus of domain-specific documents in order to generate a large set of subject vocabularies. By adopting cluster analysis techniques to analyze the co-occurrence probabilities of the subject vocabularies, a similarity matrix of vocabularies can be built to represent the important concepts and their weighted “relevance” relationships in the subject domain. To create a network of concepts, which we refer to as the “concept space” for the subject domain, we propose to develop general AI-based graph traversal algorithms and graph matching algorithms to automatically translate a searcher’ s preferred vocabularies into a set of the most semantically relevant terms in the database’s underlying subject domain. By providing a more understandable, system-generated, semantics-rich concept space plus algorithms to assist in concept/information spaces traversal, we believe we can greatly alleviate both information overload and the vocabulary problem. In this chapter, we first review our concept space approach and the associated algorithms in Section 2. In Section 3, we describe our experience in using such an approach. In Section 4, we summarize our research findings and our plan for building a semantics-rich Interspace for the Illinois Digital Library project.
|
20 |
Knowledge-Based Document Retrieval: Framework and DesignChen, Hsinchun 06 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / This article presents research on the design of knowledge-based document retrieval systems. We adopted a semantic network structure to represent subject knowledge and classification scheme knowledge and modeled experts' search strategies and user modeling capability as procedural knowledge. These functionalities were incorporated into a prototype knowledge-based retrieval system, Metacat. Our system, the design of which was based on the blackboard architecture, was able to create a user profile, identify task requirements, suggest heuristics-based search strategies, perform semantic-based search assistance, and assist online query refinement.
|
Page generated in 0.1576 seconds