Global ETD Search

191	Applying the 5S Framework To Integrating Digital Libraries Shen, Rao 27 April 2006 (has links) We formalize the digital library (DL) integration problem and propose an overall approach based on the 5S (Streams, Structures, Spaces, Scenarios, and Societies) framework. We then apply that framework to integrate domain-specific (archaeological) DLs, illustrating our solutions for key problems in DL integration. An integrated Archaeological DL, ETANA-DL, is used as a case study to justify and evaluate our DL integration approach. We develop a minimum metamodel for archaeological DLs within the 5S theory. We implement the 5SSuite toolkit set to cover the process of union DL generation, including requirements gathering, conceptual modeling, rapid prototyping, and code generation. 5SSuite consists of 5SGraph, 5SGen, and SchemaMapper, which plays an important role during integration. SchemaMapper, a visual mapping tool, maps the schema of diverse DLs into a global schema for a union DL and generates a wrapper for each individual DL. Each wrapper transforms the metadata catalog of its DL to one conforming to the global schema. The converted catalogs are stored in the union catalog, so that the union DL has a global metadata format and union catalog. We also propose a formal approach to DL exploring services for integrated DLs based on 5S, which provides a systematic and functional method to design and implement DL exploring services. Finally, we propose a DL success model to assess integrated DLs from the perspective of DL end users by integrating 5S theory with diverse research on information systems success and adoption models, and information-seeking behavior models. / Ph. D. integration digital libraries interoperability exploring service quality
192	A Novel Hybrid Focused Crawling Algorithm to Build Domain-Specific Collections Chen, Yuxin 28 March 2007 (has links) The Web, containing a large amount of useful information and resources, is expanding rapidly. Collecting domain-specific documents/information from the Web is one of the most important methods to build digital libraries for the scientific community. Focused Crawlers can selectively retrieve Web documents relevant to a specific domain to build collections for domain-specific search engines or digital libraries. Traditional focused crawlers normally adopting the simple Vector Space Model and local Web search algorithms typically only find relevant Web pages with low precision. Recall also often is low, since they explore a limited sub-graph of the Web that surrounds the starting URL set, and will ignore relevant pages outside this sub-graph. In this work, we investigated how to apply an inductive machine learning algorithm and meta-search technique, to the traditional focused crawling process, to overcome the above mentioned problems and to improve performance. We proposed a novel hybrid focused crawling framework based on Genetic Programming (GP) and meta-search. We showed that our novel hybrid framework can be applied to traditional focused crawlers to accurately find more relevant Web documents for the use of digital libraries and domain-specific search engines. The framework is validated through experiments performed on test documents from the Open Directory Project. Our studies have shown that improvement can be achieved relative to the traditional focused crawler if genetic programming and meta-search methods are introduced into the focused crawling process. / Ph. D. meta-search digital libraries focused crawler classification
193	A Large Collection Learning Optimizer Framework Chakravarty, Saurabh 30 June 2017 (has links) Content is generated on the web at an increasing rate. The type of content varies from text on a traditional webpage to text on social media portals (e.g., social network sites and microblogs). One such example of social media is the microblogging site Twitter. Twitter is known for its high level of activity during live events, natural disasters, and events of global importance. Challenges with the data in the Twitter universe include the limit of 140 characters on the text length. Because of this limitation, the vocabulary in the Twitter universe includes short abbreviations of sentences, emojis, hashtags, and other non-standard usage. Consequently, traditional text classification techniques are not very effective on tweets. Fortunately, sophisticated text processing techniques like cleaning, lemmatizing, and removal of stop words and special characters will give us clean text which can be further processed to derive richer word semantic and syntactic relationships using state of the art feature selection techniques like Word2Vec. Machine learning techniques, using word features that capture semantic and context relationships, can be of benefit regarding classification accuracy. Improving text classification results on Twitter data would pave the way to categorize tweets relative to human defined real world events. This would allow diverse stakeholder communities to interactively collect, organize, browse, visualize, analyze, summarize, and explore content and sources related to crises, disasters, human rights, inequality, population growth, resiliency, shootings, sustainability, violence, etc. Having the events classified into different categories would help us study causality and correlations among real world events. To check the efficacy of our classifier, we would compare our experimental results with an Association Rules (AR) classifier. This classifier composes its rules around the most discriminating words in the training data. The hierarchy of rules, along with an ability to tune to a support threshold, makes it an effective classifier for scenarios where short text is involved. Traditionally, developing classification systems for these purposes requires a great degree of human intervention. Constantly monitoring new events, and curating training and validation sets, is tedious and time intensive. Significant human capital is required for such annotation endeavors. Also, involved efforts are required to tune the classifier for best performance. Developing and tuning classifiers manually using human intervention would not be a viable option if we are to monitor events and trends in real-time. We want to build a framework that would require very little human intervention to build and choose the best among the available performing classification techniques in our system. Another challenge with classification systems is related to their performance with unseen data. For the classification of tweets, we are continually faced with a situation where a given event contains a certain keyword that is closely related to it. If a classifier, built for a particular event, due to overfitting to what is a biased sample with limited generality, is faced with new tweets with different keywords, accuracy may be reduced. We propose building a system that will use very little training data in the initial iteration and will be augmented with automatically labelled training data from a collection that stores all the incoming tweets. A system that is trained on incoming tweets that are labelled using sophisticated techniques based on rich word vector representation would perform better than a system that is trained on only the initial set of tweets. We also propose to use sophisticated deep learning techniques like Convolutional Neural Networks (CNN) that can capture the combination of the words using an n-gram feature representation. Such sophisticated feature representation could account for the instances when the words occur together. We divide our case studies into two phases: preliminary and final case studies. The preliminary case studies focus on selecting the best feature representation and classification methodology out of the AR and the Word2Vec based Logistic Regression classification techniques. The final case studies focus on developing the augmented semi-supervised training methodology and the framework to develop a large collection learning optimizer to generate a highly performant classifier. For our preliminary case studies, we are able to achieve an F1 score of 0.96 that is based on Word2Vec and Logistic Regression. The AR classifier achieved an F1 score of 0.90 on the same data. For our final case studies, we are able to show improvements of F1 score from 0.58 to 0.94 in certain cases based on our augmented training methodology. Overall, we see improvement in using the augmented training methodology on all datasets. / Master of Science Digital Libraries Text Classification Tweets Apache Spark
194	A Framework for Hadoop Based Digital Libraries of Tweets Bock, Matthew 17 July 2017 (has links) The Digital Library Research Laboratory (DLRL) has collected over 1.5 billion tweets for the Integrated Digital Event Archiving and Library (IDEAL) and Global Event Trend Archive Research (GETAR) projects. Researchers across varying disciplines have an interest in leveraging DLRL's collections of tweets for their own analyses. However, due to the steep learning curve involved with the required tools (Spark, Scala, HBase, etc.), simply converting the Twitter data into a workable format can be a cumbersome task in itself. This prompted the effort to build a framework that will help in developing code to analyze the Twitter data, run on arbitrary tweet collections, and enable developers to leverage projects designed with this general use in mind. The intent of this thesis work is to create an extensible framework of tools and data structures to represent Twitter data at a higher level and eliminate the need to work with raw text, so as to make the development of new analytics tools faster, easier, and more efficient. To represent this data, several data structures were designed to operate on top of the Hadoop and Spark libraries of tools. The first set of data structures is an abstract representation of a tweet at a basic level, as well as several concrete implementations which represent varying levels of detail to correspond with common sources of tweet data. The second major data structure is a collection structure designed to represent collections of tweet data structures and provide ways to filter, clean, and process the collections. All of these data structures went through an iterative design process based on the needs of the developers. The effectiveness of this effort was demonstrated in four distinct case studies. In the first case study, the framework was used to build a new tool that selects Twitter data from DLRL's archive of tweets, cleans those tweets, and performs sentiment analysis within the topics of a collection's topic model. The second case study applies the provided tools for the purpose of sociolinguistic studies. The third case study explores large datasets to accumulate all possible analyses on the datasets. The fourth case study builds metadata by expanding the shortened URLs contained in the tweets and storing them as metadata about the collections. The framework proved to be useful and cut development time for all four of the case studies. / Master of Science big data digital libraries data structures
195	An Architecture for Collaborative Math and Science Digital Libraries Krowne, Aaron Phillip 25 September 2003 (has links) In this thesis I present Noosphere, a system for the collaborative production of digital libraries. Further, I describe the special features of Noosphere which allow it to support mathematical and scientific content, and how it applies an encyclopedic organizational style. I also describe how Noosphere frees the digital library maintainer from a heavy administrative burden by implementing the design pattern of zero content administration. Finally, I discuss evidence showing that Noosphere works and is sustainable, both in the a priori and empirical senses. / Master of Science Collaboration commons-based peer production Digital libraries
196	Reengineering PhysNet in the uPortal framework Zhou, Ye 11 July 2003 (has links) A Digital Library (DL) is an electronic information storage system focused on meeting the information seeking needs of its constituents. As modern DLs often stay in synchronization with the latest progress of technologies in all fields, interoperability among DLs is often hard to achieve. With the advent of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and Open Digital Libraries (ODL), lightweight protocols show a promising future in promoting DL interoperability. Furthermore, DL is envisaged as a network of independent components working collaboratively through simple standardized protocols. Prior work with ODL shows the feasibility of building componentized DLs with techniques that are a precursor to web services designs. In our study, we elaborate the feasibility to apply web services to DL design. DL services are modeled as a set of web services offering information dissemination through the Simple Object Access Protocol (SOAP). Additionally, a flexible DL user interface assembly framework is offered in order to build DLs with customizations and personalizations. Our hypothesis is proven and demonstrated in the PhysNet reengineering project. / Master of Science Digital Libraries OAI PhysNet uPortal Text Categorization
197	Smoothing the information seeking path: Removing representational obstacles in the middle-school digital library. Abbas, June M. 05 1900 (has links) Middle school student's interaction within a digital library is explored. Issues of interface features used, obstacles encountered, search strategies and search techniques used, and representation obstacles are examined. A mechanism for evaluating user's descriptors is tested and effects of augmenting the system's resource descriptions with these descriptors on retrieval is explored. Transaction log data analysis (TLA) was used, with external corroborating achievement data provided by teachers. Analysis was conducted using quantitative and qualitative methods. Coding schemes for the failure analysis, search strategies and techniques analysis, as well as extent of match analysis between terms in student's questions and their search terms, and extent of match analysis between search terms and controlled vocabulary were developed. There are five chapters with twelve supporting appendixes. Chapter One presents an introduction to the problem and reviews the pilot study. Chapter Two presents the literature review and theoretical basis for the study. Chapter Three describes the research questions, hypotheses and methods. Chapter Four presents findings. Chapter Five presents a summary of the findings and their support of the hypotheses. Unanticipated findings, limitations, speculations, and areas of further research are indicated. Findings indicate that middle school users interact with the system in various sequences of patterns. User groups' interactions and scaffold use are influenced by the teacher's objectives for using the ADL. Users preferred to use single word searches over Boolean, phrase or natural language searches. Users tended to use a strategy of repeating the same exact search, instead of using the advanced scaffolds. A high percent of users attempted at least one search that included spelling or typographical errors, punctuation, or sequentially repeated searches. Search terms matched the DQ's in some instantiation 54% of all searches. Terms used by the system to represent the resources do not adequately represent the user groups' information needs, however, using student generated keywords to augment resource descriptions can have a positive effect on retrieval. Digital libraries. Middle school libraries. Information retrieval. information seeking representation information retrieval children digital libraries
198	Full-Text Aggregation: An Examination Metadata Accuracy And the Implications For Resource Sharing Cummings, Joel January 2003 (has links) The author conducted a study comparing of two lists of full-text content available in Academic Search Full-Text Elite. EBSCO provided the lists to the University College of the Fraser Valley. The study was conducted to compare the accuracy of the claims of full-text content, because the staff and library users at University College of the Fraser Valley depend on this database as part of the librariesâ journal collection. Interlibrary loan staff routinely used a printed list of Academic Search Full Text Elite to check whether the journal was available at UCFV in electronic form; therefore, an accurate supplemental list or lists of the libraries electronic journals was essential for cost conscious interlibrary loan staff. The study found inaccuracies in the coverage of 57 percent of the journals sampled. Digital Libraries Databases Academic Libraries Libraries Electronic Publishing Metadata
199	Préparation, Caractérisation et Activation Electrochimique de Nouveaux Complexes Métallo-Cyclodextrines Deunf, Elise 24 November 2010 (has links) (PDF) L'association entre un métal de transition et des cyclodextrines est d'un grand intérêt pour développer de nouveaux catalyseurs, mimes enzymatiques, capteurs ou fils moléculaires. L'interaction entre un complexe métallique et une cyclodextrine ouvre de nouvelles opportunités dans la chimie de coordination où de nouvelles réactivités et sélectivités sont attendues. Dans ce contexte, plusieurs composés totalement originaux de type cobalt(II)-cyclodextrines et cuivre(II)-cyclodextrines ont été synthétisés et caractérisés par voie électrochimique. La réactivité des espèces bas-valentes électrogénérées a été également étudiée vis-à-vis de dérivés halogénés alkyles et aromatiques. Il a été démontré que la dualité stabilité / réactivité des espèces transitoires pouvait être modulée en fonction de la nature du ligand greffé sur la cyclodextrine. Des études spectrophotométriques UV-Visible et spectroscopiques on été associées à ce travail, afin d'étudier la structure en solution d'un de ces complexe métallo-cyclodextrines. Ce travail original ouvre de nouvelles opportunités dans l'utilisation d'effets supramoléculaires biomimétiques dans les processus catalytiques afin de les rendre plus efficaces et sélectifs aussi bien en solvants organiques qu'en milieu aqueux (chimie verte). Cyclodextrines Métaux de transition Complexes Catalyse
200	Modulayer-Berea Park learner Resource centre Strydom, Cornus. January 2003 (has links) Thesis (M. Arch.)--University of Pretoria, 2003. / Title from opening screen (viewed June 14, 2004). Includes bibliographical references.

Search results