Global ETD Search

21	Inference networks for document retrieval Turtle, Howard Robert 01 January 1991 (has links) Information retrieval is concerned with selecting documents from a collection that will be of interest to a user with a stated information need or query. Research aimed at improving the performance of retrieval systems, that is, selecting those documents most likely to match the user's information need, remains an area of considerable theoretical and practical importance. This dissertation describes a new formal retrieval model that uses probabilistic inference networks to represent documents and information needs. Retrieval is viewed as an evidential reasoning process in which multiple sources of evidence about document and query content are combined to estimate the probability that a given document matches a query. This model generalizes several current retrieval models and provides a framework within which disparate information retrieval research results can be integrated. To test the effectiveness of the inference network model, a retrieval system based on the model was implemented. Two test collections were built and used to compare retrieval performance with that of conventional retrieval models. The inference network model gives substantial improvements in retrieval performance with computational costs that are comparable to those associated with conventional retrieval models and which are feasible for large collections. Computer science\|Information Systems
22	Sentence level information patterns for novelty detection Li, Xiaoyan 01 January 2006 (has links) The detection of new information in a document stream is an important component of many potential applications. In this thesis, a new novelty detection approach based on the identification of sentence level information patterns is proposed. Given a user's information need, some information patterns in sentences such as combinations of query words, sentence lengths, named entities and phrases, and other sentence patterns, may contain more important and relevant information than single words. The work of the thesis includes three parts. First, we redefine "what is novelty detection" in the lights of the proposed information patterns. Examples of several different types of information patterns are given corresponding to different types of uses' information need. Second, we analyze why the proposed information pattern concept has a significant impact in novelty detection. A thorough analysis of sentence level information patterns is elaborated on data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, we present how we perform novelty detection based on information patterns, which focuses on the identification of previously unseen query-related patterns in sentences. A unified pattern-based approach is presented to novelty detection for both specific NE topics and more general topics. Experiments on novelty detection were carried out on data from the TREC 2002, 2003 and 2004 novelty tracks. Experimental results show that the proposed approach significantly improves the performance of novelty detection for both specific and general topics, therefore the overall performance for all topics, in terms of precision at top ranks. Future research directions are suggested. Computer science\|Information systems
23	The smoothed Dirichlet distribution: Understanding cross -entropy ranking in information retrieval Nallapati, Ramesh 01 January 2006 (has links) Unigram Language modeling is a successful probabilistic framework for Information Retrieval (IR) that uses the multinomial distribution to model documents and queries. An important feature in this approach is the usage of the empirically successful cross-entropy function between the query model and document models as a document ranking function. However, this function does not follow directly from the underlying models and as such there is no justification available for its usage till date. Another related and interesting observation is that the naïve Bayes model for text classification uses the same multinomial distribution to model documents but in contrast, employs document-log-likelihood that follows directly from the model, as a scoring function. Curiously, the document-log-likelihood closely corresponds to cross entropy, but to an asymmetric counterpart of the function used in language modeling. It has been empirically demonstrated that the version of cross entropy used in IR is a better performer than document-log-likelihood, but this interesting phenomenon remains largely unexplained. One of the main objectives of this work is to develop a theoretical understanding of the reasons for the success of the version of cross entropy function used for ranking in IR. We also aim to construct a likelihood based generative model that directly corresponds to this cross-entropy function. Such a model, if successful, would allow us to view IR essentially as a machine learning problem. A secondary objective is to bridge the gap between the generative approaches used in IR and text classification through a unified model. In this work we show that the cross entropy ranking function corresponds to the log-likelihood of documents w.r.t. the approximate Smoothed-Dirichlet (SD) distribution, a novel variant of the Dirichlet distribution. We also empirically demonstrate that this new distribution captures term occurrence patterns in documents much better than the multinomial, thus offering a reason behind the superior performance of the cross entropy ranking function compared to the multinomial document-likelihood. Our experiments in text classification show that a classifier based on the Smoothed Dirichlet performs significantly better than the multinomial based naïve Bayes model and on par with the Support Vector Machines (SVM), confirming our reasoning. In addition, this classifier is as quick to train as the naïve Bayes and several times faster than the SVMs owing to its closed form maximum likelihood solution, making it ideal for many practical IR applications. We also construct a well-motivated generative classifier for IR based on SD distribution that uses the EM algorithm to learn from pseudo-feedback and show that its performance is equivalent to the Relevance model (RM), a state-of-the-art model for IR in the language modeling framework that uses the same cross-entropy as its ranking function. In addition, the SD based classifier provides more flexibility than RM in modeling documents owing to a consistent generative framework. We demonstrate that this flexibility translates into a superior performance compared to RM on the task of topic tracking, an online classification task. Computer science\|Information systems
24	RA: A memory organization to model the evolution of scientific knowledge Swaminathan, Kishore S 01 January 1990 (has links) This dissertation addresses the dichotomy between semantic and episodic knowledge by focusing on the evolution of scientific knowledge. Even timeless scientific knowledge about the nature of the world accrues only through discrete episodes, with each scientist building upon the work of his/her predecessors. Hence, a memory organization to model the knowledge of a scientific field should reflect not only the knowledge pertaining to the field, but also the knowledge pertaining to the evolution of the field. A computer program called RA is described: RA proposes a memory organization for scientific knowledge in terms of a representational idea called Research Schemas. Research Schemas view research papers, not as isolated pieces of text, but as related episodes that contribute to the growth of a scientific discipline. This memory organization is validated by showing that it supports a number of different capabilities: it enables RA to suggest new research directions, acquire new research schemas, retrieve papers that have similar research strategies, and generate both chronological and analogical summaries of research papers. A combination of these capabilities constitutes a framework for 'Computer-Aided Research.' The RA system also includes a learning technique to acquire new research schemas. While similarity-based techniques use multiple examples (and some form of encoded bias) and explanation-based techniques use a domain theory as the basis for generalization, there is no apparent basis for RA's generalization. An analysis of RA's learning strategy shows that the category structure of RA's world provides a basis for its generalization: RA generalizes instantiations into categories that are both associative and discriminative. Interestingly, this turns out to be precisely the property that characterizes basic-level categories that have been studied by psychologists. This dissertation explores the implication of this results to learning and knowledge representation. Computer science\|Information science
25	Information system capabilities and emergent competitive strategies: An investigation of the strategic fit of supply chain management information systems McLaren, Tim 06 1900 (has links) <p>This study develops a model for analyzing fit between a firm's competitive strategies and the capabilities of their Supply Chain Management Information Systems (SCM IS). Concepts such as configurational theory, the resource based view of the firm, and emergent strategies and capabilites--all of which are underutilized in current IS literature--ground the study theoretically. A positivist case study of five manufacturers is used to explore the constructs and identify appropriate measures for operationalizing the model. The developed model enables IS planners to quickly analyze their firm's competitive strategy patterns and determine the ideal level of support required for each SCM IS capability. Firms can improve the effectiveness of their IS and reduce the risk and cost of misfits, by implementing information systems that fit their emergent competitive strategies. The developed model is a significant improvement over traditional models that advocate aligning information systems with a firm's intended strategies or their current functional requirements, both of which change more frequently than a firm's emergent competitive strategy patterns. The case study investigations yielded several important findings. First, Miles and Snow's (1978) competitive strategy typology proved useful for classifying a firm's emergent competitive strategy patterns and reducing the complexity of analysis. However, the qualitative evidence more strongly supported the use of Conant et al .'s (1990) multi-dimensional questionnaire measure of competitive strategy type rather than Miles and Snow's (1978) paragraph measure. Second, existing conceptualizations of IS capabilities were not well suited to analyzing SCM IS specifically. The findings support the conceptualization of SCM IS capabilities as the level of support provided for: operational efficiency, operational flexibility, planning, internal analysis, and external analysis. Finally, the empirical results strongly supported modeling the strategic fit of a firm's SCM IS as the amount the perceived level of support provided for each SCM IS capability was less than the theoretically ideal level, rather than the more common approach of modeling strategic fit as the absolute deviation between perceived and theoretically ideal levels.</p> / Doctor of Philosophy (PhD) management science/information systems
26	SQL pattern design, development & evaluation of its efficacy Al-Shuaily, Huda January 2013 (has links) Databases provide the foundation of most software systems. This means that system developers will inevitably need to write code to query these databases. The de facto language for querying is SQL and this, consequently, is the language primarily taught by higher education institutions. There is some evidence that learners find it hard to master SQL. These issues and concerns were confirmed by reviewing the literature and establishing the scope and context. The literature review allowed extraction of the common issues in impacting SQL acquisition. The identified issues were confirmed and justified by empirical evidence as reported here. A model of SQL learning was derived. This framework or model involves SQL learning taxonomy, a model of SQL problem solving and incorporates cross-cutting factors. The framework is used as map to the design of a proposed instructional design. The design employed pattern concepts and the related research to structure SQL knowledge as SQL patterns. Also presented are details on how SQL patterns could be organized and presented. A strong theoretical background (checklist, component-level design) was employed to organize, present and facilitated SQL pattern collection. The evaluation of the SQL patterns yielded new insight such as novice problem solving strategies and the types of errors students made in attempting to solve SQL problems. SQL patterns, as proposed as a result of this research, yielded statistically significant important in novice performance in writing SQL queries. A longitudinal field study with a large number of learners in a flexible environment should be conducted to confirm the findings of this research.
27	The role of citation in interdisciplinary discourse : an investigation into citation practices in the journal 'Global Environmental Change' Aljabr, Fahad Saleh January 2018 (has links) This thesis proposes an innovative model for citation analysis and applies it to 1186 citations derived from twenty papers from one interdisciplinary journal: Global Environmental Change. The main aim of this thesis is to build, not to quantify, a model which facilitates understanding of how citations act, and are acted upon, in citing texts. The model builds on, extends and modifies certain aspects of some existing models on citation form, stance and function. This thesis argues that stance and function are different but related concepts in the analysis of citation. They operate in different directions and, when combined, can reflect the role of citation in the citing text. In order to achieve a fine-grained understanding of the role of citation, citations are analysed within and beyond the level of the statements in which they occur. To achieve this, a new level is proposed for the analysis of citation function: the ‘citation block’. In this thesis, it is argued that citations operate in different directions within and beyond the proposition-level. The current thesis aligns and compares analyses at the clause- and block-levels for every citation. This alignment results in the identification of conventional and unconventional patterns of citing. The model is applied to four sub-corpora of texts from two time periods and representing the more ‘science-like’ and ‘social science-like’ papers in the journal. The text-based analysis demonstrates the complexity of citation practices in interdisciplinary discourse. Overall it is suggested that in this journal the ‘social science’ papers over time have become more similar to the ‘science’ papers. The results also show variation in citation practices between the individual selected papers in each sub-corpus. This variation is attributed to the interdisciplinary nature of GEC. The proposed model has the potential to be used to investigate variation in citation practices beyond interdisciplinary discourse, within and between disciplines or genres.
28	Intermediary XML schemas Gartner, R. January 2018 (has links) The methodology of intermediary XML schemas is introduced and its application to complex metadata environments is explored. Intermediary schemas are designed to mediate to other ‘referent’ schemas: instances conforming to these are not generally intended for dissemination but must usually be realized by XSLT transformations for delivery. In some cases, these schemas may also generate instances conforming to themselves. Three subsidiary methods of this methodology are introduced. The first is application-specific schemas that act as intermediaries to established schemas which are problematic by virtue of their over-complexity or flexibility. The second employs the METS packaging standard as a template for navigating instances of a complex schema by defining an abstract map of its instances. The third employs the METS structural map to define templates or conceptual models from which instances of metadata for complex applications may be realized by XSLT transformations. The first method is placed in the context of earlier approaches to semantic interoperability such as crosswalks, switching across, derivation and application profiles. The second is discussed in the context of such methods for mapping complex objects as OAI-ORE and the Fedora Content Model Architecture. The third is examined in relation to earlier approaches to templating within XML architectures. The relevance of these methods to contemporary research is discussed in three areas: digital ecosystems, archival description and Linked Open Data in digital asset management and preservation. Their relevance to future research is discussed in the form of suggested enhancements to each, a possible synthesis of the second and third to overcome possible problems of interoperability presented by the first, and their potential role in future developments in digital preservation. This methodology offers an original approach to resolving issues of interoperability and the management of complex metadata environments; it significantly extends earlier techniques and does so entirely within XML architectures.
29	Virtue ethics and the narrative identity of American librarianship 1876 to present Burgess, John Timothy Freedom 12 November 2013 (has links) <p> The purpose of this study is to propose a means of reconciling the competing ideas of library and information science's identity, thereby strengthening professional autonomy. I make the case that developing a system of virtue ethics for librarianship would be an effective way to promote that reconciliation. The first step in developing virtue ethics is uncovering librarianship's function. Standard approaches to virtue ethics rely on classical Greek ideas about the nature of being to determine function. Since classical ideas of being may no longer be persuasive, I introduce another approach to uncover librarianship's function that still meets all of the criteria needed to establish a foundation for a system of virtue ethics. This approach is hermeneutical phenomenology, the philosophical discipline of interpreting the meaning given to historical events. Hans-Georg Gadamer's hermeneutic circle technique and Paul Ricoeur's theory of narrative intelligence are used to engage in a dialogue with three crises in the history of American librarianship. These pivotal events are the fiction question, librarian nationalism during World War I, and the dispute between supporters of the "Library Bill of Rights" and social responsibility. From these crises, three recurring themes become apparent: the tendency to reconcile idealism and pragmatism, the intent to do good for individuals and society, and the role of professional insecurity in precipitating the conflicts. Through emplotment of these themes, an identity narrative for librarianship emerges. My finding is that librarianship's function is the promotion of stability-happiness. This is the dual-process of supporting dominant socio-cultural institutions as a means of protecting librarianship's ability to offer the knowledge, cultural records, and avenues for information literacy that can improve lives and facilitate individuals' pursuit of happiness. In the conclusion, the ethical implications of having stability-happiness as the profession's function are considered. It includes a discussion of how librarianship's narrative identity could be applied to develop an ethical character for the profession and how such a character, combined with knowledge of function, might address persistent problems of race and gender disparity in library and information science.</p>
30	The structure and evolution of the academic discipline of law in the United States\| Generation and validation of course-subject cooccurrence (CSCO) maps Hook, Peter A. 08 October 2014 (has links) <p> This dissertation proposes, exemplifies, and validates the usage of course-subject co-occurrence (CSCO) data to generate topic maps of an academic discipline. CSCO is defined as course-subjects taught in the same academic year by the same teacher. This work is premised on the assumption that in the aggregate and for reasons of efficiency, faculty members teach course-subjects that are topically similar to one another. To exemplify and validate CSCO, more than 112,000 CSCO events were extracted from the annual directories of the American Association of Law Schools covering nearly eighty years of law school teaching in the United States. The CSCO events are used to extract and visualize the structure and evolution of law for the years 1931-32, 1972-73, and 2010-11—roughly, forty year intervals. Different normalization, ordination (layout), and clustering algorithms are compared and the best algorithm of each type is used to generate the final map. Validation studies demonstrate that CSCO produces topic maps that are consistent with expert opinion and four other indicators of the topical similarity of law school course-subjects. Resulting maps of the educational domain of law are useful as a reference system for additional thematic overlay of information about law school education in the United States. This research is the first to use CSCO to produce visualizations of a domain. It is the first to use an expanded, multi-part gold-standard to evaluate the validity of domain maps and the intermediate steps in their creation. Last but not least, this research contributes a metric analysis and visualizations of the evolution of law school course-subjects over nearly eighty years.</p> Law\|Library Science\|Information Science

Page generated in 0.1067 seconds