Tennis, Joseph T.
This is a revised version of the paper available at: http://www.cais-acsi.ca/proceedings/2007/tennis_2007.pdf / This paper outlines a model of conceptual change in indexing languages. Findings from this modeling effort point to three ways meaning and relationships are established and then change in an indexing language. These ways: structural, terminological, and textual point to ways indexing language metadata can aid in managing conceptual change in indexing languages. Résumé : Cette communication esquisse un modèle du changement conceptuel des langages d’indexation. Les résultats de cette tentative de modélisation convergent vers une triple dimension. Les relations sont établies, puis modifiées dans un langage d’indexation. Ces dimensions, structurelle, terminologique et textuelle, indiquent de quelle manière les langages de métadonnées peuvent contribuer à la gestion du changement conceptuel des langages d’indexation. RÃ©sumÃ© : Cette communication esquisse un modÃ¨le du changement conceptuel des langages dâ indexation. Les rÃ©sultats de cette tentative de modÃ©lisation convergent vers une triple dimension. Les relations sont Ã©tablies, puis modifiÃ©es dans un langage dâ indexation. Ces dimensions, structurelle, terminologique et textuelle, indiquent de quelle maniÃ¨re les langages de mÃ©tadonnÃ©es peuvent contribuer Ã la gestion du changement conceptuel des langages dâ indexation.
Everything Old is New Again: Perspectivism and Polyhierarchy in Julius O. Kaiser's Theory of Systematic IndexingDousa, Thomas January 2007 (has links)
In the early years of the 20th century, Julius Otto Kaiser (1868â 1927), a special librarian and indexer of technical literature, developed a method of knowledge organization (KO) known as systematic indexing. Certain elements of the method - its stipulation that all indexing terms be divided into fundamental categories "concretes", "countries", and "processes", which are then to be synthesized into indexing "statements" formulated according to strict rules of citation order - have long been recognized as precursors to key principles of the theory of faceted classification. However, other, less well-known elements of the method may prove no less interesting to practitioners of KO. In particular, two aspects of systematic indexing seem to prefigure current trends in KO: (1) a perspectivist outlook that rejects universal classifications in favor of information organization systems customized to reflect local needs and (2) the incorporation of index terms extracted from source documents into a polyhierarchical taxonomical structure. Kaiserâ s perspectivism anticipates postmodern theories of KO, while his principled use of polyhierarchy to organize terms derived from the language of source documents provides a potentially fruitful model that can inform current discussions about harvesting natural-language terms, such as tags, and incorporating them into a flexibly structured controlled vocabulary.
Jayakanth, Francis, Shivaram, B.S., Venkatalakshmi, K., Singh, Sukhdev
CDS/ISIS is an advanced non-numerical information storage and retrieval software developed by UNESCO since 1985 to satisfy the need expressed by many institutions, especially in developing countries, to be able to streamline their information processing activities by using modern (and relatively inexpensive) technologies. CDS/ISIS is available for MS-DOS, Windows and Unix operating system platforms. The formatting language of CDS/ISIS is one of its several strengths. It is not only used for formatting records for display but is also used for creating customized indexes. CDS/ISIS by itself does not facilitate in publishing its databases on the Internet nor does it facilitate in publishing on CD-ROMs. However, numbers of open source tools are now available, which enables in publishing CDS/ISIS databases on the Internet and also on CD-ROMs. In this paper, we have discussed the ways and means of integrating CDS/ISIS databases with GSDL, an open source digital library (DL) software.
A preliminary investigation of image indexing: The influence of domain knowledge, indexer experience and image characteristicsBeaudoin, Joan January 2008 (has links)
This study concerns image indexing and the impact of indexer experience levels and subject expertise on interindexer consistency and term selection. While the inherent complexities of applying terms to images are broadly acknowledged few studies have addressed interindexer consistency of visual materials. Two studies to investigate this topic are those of Markey (1984) and Wells-Angerer (2005). Markey's investigation looked at the indexing terms applied by thirty-nine individuals to one hundred images of medieval works on three different categories (objects, expressional, events). A low percentage of agreement of terms was reported by Markey, with an average of seven percent for exact term matches, and thirteen percent for conceptual matches in indexing terms. In a study assessing the influence of indexer subject knowledge on image retrieval rates of online museum collections Wells-Angerer (2005) investigated the terms applied to ten works of art by thirty participants falling into three categories of image indexers (expert, knowledgeable, novice). Wells-Angerer found the terms applied by indexers with the highest level of knowledge about the objects in the collections (scholars, curators and collection staff) had retrieval success rates of approximately sixteen percent. Indexer retrieval rates for those who had less subject knowledge were considerably lower, at approximately five percent (Wells-Angerer, 2005). The results of this investigation indicate that indexer experience and subject expertise ought to be considered in discussions of interindexer consistency. Markey's study has been used on several occasions to support the hypothesis that image indexing produces low returns for the effort involved in the work. This is remarkable as Markey (1984) states that '[t]he use of inexperienced indexers and non-subject specialists in this study may have diminished interindexer consistency scores.' The limited number of studies investigating the practices of image indexers and the conflicting results of these two studies indicate additional research is warranted in the area of image indexing. Thus, the present study was undertaken in order to explore some of the issues at work which influence image indexing. Using a Web-based questionnaire 140 participants provided demographic data and indexing terms for eight images. Images of cultural works formed the focus of the study. Several documentary style photographs were included, however, to assess the influence of an image's subject accessibility and mode of representation on the terms chosen by the study's participants. For data analysis purposes the participants were divided into several groups according to their subject expertise (2 or less courses or 11 or more courses with an art/cultural focus) or their professional experience (self identification as an image indexer). The data collected from the participants has been analyzed using qualitative and descriptive statistics. Further analysis of the data using quantitative methods is in process. Subject expertise and indexing experience were found to have an impact on the terms applied to images. The number of terms applied and the co-occurrence of terms was related to the level of indexing experience and subject expertise of participants. On the most basic level of analysis, the experienced image indexers provided on average the highest number of terms per image, with the subject experts supplying a slightly reduced number and the subject novice participants the fewest. Co-occurrence of applied terms among participant groups also followed this pattern. In addition, the images themselves were found to have an influence on the number of terms applied and the interindexer consistency achieved by the indexers of these images. The legibility of images with easily accessible subjects and realistic representation, while scoring well in terms of interindexer consistency were found to receive fewer term applications by the image indexers and the subject experts. This finding suggests that while interindexer consistency might be highest among skilled indexers and those with solid domain knowledge, a broader range of terms were sometimes applied to images with readily accessible subjects by those individuals who lacked training or subject expertise. Other interesting findings of the study point to the various kinds of terms applied by the three groups. The subject novices applied a greater number of generic terms to the images with the indexers and subject experts providing a higher number of terms which identified specific features of the image. Finally, while the number of emotive or interpretive terms applied to the images was found to be very low across all three groups the subject novices applied these terms more often than the other participant groups. The results of this study provide a preliminary account of the influence of subject knowledge and indexing experience on image indexing.
Folksonomies vs. Bag-of-Words: The Evaluation & Comparison of Different Types of Document RepresentationsGruzd, Anatoliy A January 2006 (has links)
This poster (2-page summary) was presented at The 17th Annual SIG/CR Classification Research Workshop, a part of the 2006 Annual Meeting of the American Society for Information Science and Technology (ASIST), November 4, 2006, Austin, Texas. Among the factors that influence the effectiveness of retrieval systems, the most influential is the quality of document representation (docrep) (Lancaster, 1998). Most Internet search engines rely on docreps automatically extracted from web pages (commonly called Bag-of-Words). Unfortunately, this automatic approach often introduces noise (items unrelated to the pageâ s core topic) to docreps. One way to reduce noise is to utilize user-created docreps which are less susceptible to it. Until recently, it was impractical to rely on user-created docreps on Internet-size collections. This all changed when online bookmarking web-services such as citeulike.org and del.icio.us started to appear. These bookmarking web-services made it easier for the vast Internet communities to collaborate and produce community-generated descriptors (known as folksonomies). Due to their multi-representational nature (from various community members), folksonomies provide retrieval systems with docreps that tend to be more user-oriented. With this observation in mind, I am investigating whether folksonomies-based retrieval systems would yield more relevant results than conventional systems.
Mao, Rui, 1975-
29 August 2008
Keywords in the mist automated keyword extraction for very large documents and back of the book indexing /Csomai, Andras. Mihalcea, Rada F., January 2008 (has links)
Thesis (Ph. D.)--University of North Texas, May, 2008. / Title from title page display. Includes bibliographical references.
Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) DocumentsLin, Chung-hsin, Chen, Hsinchun 02 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear termphrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the aupors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information.
12 April 2006
Back-of-the-book indexes are usually only printed in non-fiction books. This research investigated the opinions of literature faculty and students on including indexes in fiction books. Publishers may claim that an index for a fiction book is not worth the cost. However, no empirical studies have been conducted which try to assess demand. In order to begin to fill this gap in the literature, a survey was distributed to literature faculty and students at the University of North Carolina at Chapel Hill in order to assess their opinions towards the usefulness and value of fiction book indexes. The results suggest that there is a demand for indexes in fiction but some concerns may need to be addressed first. The results of this study may serve as a starting point for gauging market interest in buying fictional works printed with indexes that could potentially lead to a new field in indexing.
Page generated in 0.0532 seconds