Global ETD Search

1	Graph patterns : structure, query answering and applications in schema mappings and formal language theory Reutter, Juan L. January 2013 (has links) Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. Queries need to be posed against such data, but techniques for querying patterns are generally lacking, and even simple properties of graph patterns, such as the languages needed to specify them, are not well understood. In this dissertation we present several contributions in the study of graph patterns. We analyze how to query them and how to use them as queries. We also analyze some of their applications in two different contexts: schema mapping specification and data exchange for graph databases, and formal language theory. We first identify key features of patterns, such as node and label variables and edges specified by regular expressions, and define a classification of patterns based on them. Next we study how to answer standard graph queries over graph patterns, and give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lower-complexity restrictions that guarantee tractability. We then turn to the the study of schema mappings for graph databases. As for relational and XML databases, our mapping languages are based on patterns. They subsume all previously considered mapping languages for graph databases, and are capable of expressing many data exchange scenarios in the graph database context. We study the problems of materializing solutions and query answering for data exchange under these mappings, analyze their complexity, and identify relevant classes of mappings and queries for which these problems can be solved efficiently. We also introduce a new model of automata that is based on graph patterns, and define two modes of acceptance for them. We show that this model has applications not only in graph databases but in several other contexts. We study the basic properties of such automata, and the key computational tasks associated with them. 005.7 Computer Science ; Graph data
2	Reducing human effort in web data extraction Guo, Jinsong January 2017 (has links) The human effort in large-scale web data extraction significantly affects both the extraction flexibility and the economic cost. Our work aims to reduce the human effort required by web data extraction tasks in three specific scenarios. (I) Data demand is unclear, and the user has to guide the wrapper induction by annotations. To maximally save the human effort in the annotation process, wrappers should be robust, i.e., immune to the webpage's change, to avoid the wrapper re-generation which requires a re-annotation process. Existing approaches primarily aim at generating accurate wrappers but barely generate robust wrappers. We prove that the XPATH wrapper induction problem is NP-hard, and propose an approximate solution estimating a set of top-k robust wrappers in polynomial time. Our method also meets one additional requirement that the induction process should be noise resistant, i.e., tolerate slightly erroneous examples. (II) Data demand is clear, and the user's guide should be avoided, i.e., the wrapper generation should be fully-unsupervised. Existing unsupervised methods purely relying on the repeated patterns of HTML structures/visual information are far from being practical. Partially supervised methods, such as the state-of-the-art system DIADEM, can work well for tasks involving only a small number of domains. However, the human effort in the annotator preparation process becomes a heavier burden when the domain number increases. We propose a new approach, called RED (abbreviation for 'redundancy'), an automatic approach exploiting content redundancy between the result page and its corresponding detail pages. RED requires no annotation (thus requires no human effort) and its wrapper accuracy is significantly higher than that of previous unsupervised methods. (III) Data quality is unknown, and the user's related decisions are blind. Without knowing the error types and the error number of each type in the extracted data, the extraction effort could be wasted on useless websites, and even worse, the human effort could be wasted on unnecessary or wrongly-targeted data cleaning process. Despite the importance of error estimation, no methods have addressed it sufficiently. We focus on two types of common errors in web data, namely duplicates and violations of integrity constraints. We propose a series of error estimation approaches by adapting, extending, and synthesizing some recent innovations in diverse areas such as active learning, classifier calibration, F-measure estimation, and interactive training. Computer science ; Web data extraction
3	Data preparation for biomedical knowledge domain visualization : a probabilistic record linkage and information fusion approach to citation data / Synnestvedt, Marie B. Lin, Xia. January 2007 (has links) Thesis (Ph.D.)--Drexel University, 2007. / Includes abstract and vita. Includes bibliographical references (leaves 98-102).
4	Lock-free linked lists and skip lists / Fomitchev, Mikhail. January 2003 (has links) Thesis (M.Sc.)--York University, 2003. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 224-226). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL:http://gateway.proquest.com/openurl?url%5Fver=Z39.88-2004&res%5Fdat=xri:pqdiss&rft%5Fval%5Ffmt=info:ofi/fmt:kev:mtx:dissertation&rft%5Fdat=xri:pqdiss:MQ99307
5	Multidimensional data encryption with virtual optics / Yu, Lingfeng. January 2003 (has links) Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references. Also available in electronic version. Access restricted to campus users.
6	Metadata-aware query processing over data streams Ding, Luping. January 2008 (has links) Dissertation (Ph.D.)--Worcester Polytechnic Institute. / Keywords: metadata, constraint, data stream, continuous query, optimization. Includes bibliographical references (leaves 275-283).
7	Defining data as an art material Freeman, Julie January 2018 (has links) Digital technology, and speci cally digital data, forms the backbone of nearly all our communications including machine to machine, human to machine, and, increasingly, human to human. It is unsurprising that one of the most prevalent materials of our time is used by artists to create work. This thesis defines data as an art material. It investigates the variety of manifestations of data when used in art, through the review of existing artwork and the development of new artworks and visualisations that use a dataset collected for this research. Through the lens of conceptualising data as an art material, a definition and manifesto of data art is put forward (Chapter 2). In addition, a taxonomy for describing data as an art material is proposed and its usage explored by applying it to a number of data art descriptions and by analysing a database of data artworks tagged with relevant terms (Chapter 3). Temporal, biological, and real-time, terms from the taxonomy, are particularly relevant to the way in which digital technology mediates our connection to nature. To explore these forms of data within artwork, a collaboration with Dr Chris Faulkes, Reader in Evolutionary Ecology, facilitated the design and implementation of an electronic system to collect data from a colony of animals. Chapter 4 describes the tracking system which resulted in a real-time stream of biological temporal data. Translations of this data are explored in more detail through the practical application of various computational techniques including scientific analysis (Chapter 5), animation, sonification, data visualisation (Chapter 6) and soft robotic objects (Chapter 7). The thesis demonstrates that an inanimate object, animated through the translation of data, can have a body language through which to effectively convey characteristics of living things (Chapter 8). Finally, public engagement events are presented in Chapter 9, with reflections, contributions and future work concluded in Chapter 10.
8	Representing information about files / Mogul, Jeffrey C. January 1900 (has links) Thesis (Ph. D.)--Stanford University, 1986. / Cover title. "March 1986." Includes bibliographical references.
9	AMUSED : a multi-user software environment diagnostic / Foltman, Mary Ann. January 1989 (has links) Thesis (M.S.)--Rochester Institute of Technology, 1989. / Bibliography; leaves 71-73.
10	Distributed object-oriented C (DOC) : a strongly distributed object-oriented language for message passing concurrent architecture / Lui, Pak-hang. January 1900 (has links) Thesis (Ph. D.)--University of Hong Kong, 1992.

Search results