Global ETD Search

1	Data Modelling of Electricity Data in Sweden : Pre-study of the Envolve Project Do, Yen Thi Kim January 2011 (has links) Electricity has always had a great impact on our daily life. It plays an important role in every aspect of society, economy, and technology of every nation. Sweden among other Nordic countries has always strived to improve its energy landscape. Currently, Nuclear power and Hydroelectricity are the main methods of energy generation in this country. Together with exploring new ways of generating energy without dependency on nuclear power, Sweden also expresses an interest in encouraging households and companies to use energy in an efficient way in order to reduce energy consumption and its associated costs. The scope of this thesis is to review and evaluate various state-of-the-art data analysis tools and algorithms to generate a meaningful consumer behaviour model based on the electricity usage data collected from households in several areas of Sweden. Understanding the demand characteristics for electricity would give electric suppliers more power in shaping their marketing strategies as well as setting appropriate electricity pricing. Data Modelling Electricity data
2	A study of the administrative provisions governing personal data protection in the Hong Kong Government / Ng, Chi-kwan, Miranda. January 1987 (has links) Thesis (M. Soc. Sc.)--University of Hong Kong, 1987. Data protection. Data protection
3	A strategy for reducing I/O and improving query processing time in an Oracle data warehouse environment Titus, Chris. January 2009 (has links) (PDF) Thesis (M.S.C.I.T.)--Regis University, Denver, Colo., 2009. / Title from PDF title page (viewed on May 28, 2009). Includes bibliographical references. Data processing. Data warehousing.
4	A feature-based approach to visualizing and mining simulation data Jiang, Ming, January 2005 (has links) Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xvi, p. 116; also includes graphics. Includes bibliographical references (p. 108-116). Available online via OhioLINK's ETD Center Data mining. Data processing.
5	A study of the administrative provisions governing personal data protection in the Hong Kong Government Ng, Chi-kwan, Miranda. January 1987 (has links) Thesis (M.Soc.Sc.)--University of Hong Kong, 1987. / Also available in print. Data protection. Data protection
6	Prioritized data synchronization with applications Jin, Jiaxi January 2013 (has links) We are interested on the problem of synchronizing data on two distinct devices with differed priorities using minimum communication. A variety of distributed sys- tems require communication efficient and prioritized synchronization, for example, where the bandwidth is limited or certain information is more time sensitive than others. Our particular approach, P-CPI, involving the interactive synchronization of prioritized data, is efficient both in communication and computation. This protocol sports some desirable features, including (i) communication and computational com- plexity primarily tied to the number of di erences between the hosts rather than the amount of the data overall and (ii) a memoryless fast restart after interruption. We provide a novel analysis of this protocol, with proved high-probability performance bound and fast-restart in logarithmic time. We also provide an empirical model for predicting the probability of complete synchronization as a function of time and symmetric differences. We then consider two applications of our core algorithm. The first is a string reconciliation protocol, for which we propose a novel algorithm with online time com- plexity that is linear in the size of the string. Our experimental results show that our string reconciliation protocol can potentially outperform existing synchroniza- tion tools such like rsync in some cases. We also look into the benefit brought by our algorithm to delay-tolerant networks(DTNs). We propose an optimized DTN routing protocol with P-CPI implemented as middleware. As a proof of concept, we demonstrate improved delivery rate, reduced metadata and reduced average delay. Synchronizing data Prioritized data
7	Data analytics on Yelp data set Tata, Maitreyi January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / In this report, I describe a query-driven system which helps in deciding which restaurant to invest in or which area is good to open a new restaurant in a specific place. Analysis is performed on already existing businesses in every state. This is based on certain factors such as the average star rating, the total number of reviews associated with a specific restaurant, the price range of the restaurant etc. The results will give an idea of successful restaurants in a city, which helps you decide where to invest and what are the things to be kept in mind while starting a new business. The main scope of the project is to concentrate on Analytics and Data Visualization. Data analytics Data visualization
8	Data Sharing and Exchange: Semantics and Query Answering Awada, Rana January 2015 (has links) Exchanging and integrating data that belong to worlds of different vocabularies are two prominent problems in the database literature. While data coordination deals with managing and integrating data between autonomous yet related sources with possibly distinct vocabularies, data exchange is deﬁned as the problem of extracting data from a source and materializing it in an independent target to conform to the target schema. These two problems, however, have never been studied in a uniﬁed setting which allows both the exchange of the data as well as the coordination of different vocabularies between different sources. Our thesis shows that such a uniﬁed setting exhibits data integration capabilities that are beyond the ones provided by data exchange and data coordination separately. In this thesis, we propose a new setting – called DSE, for Data Sharing and Exchange – which allows the exchange of data between independent source and target applications that possess independent schemas, as well as independent yet related domains of constants. To facilitate this type of exchange, we extend the source-to-target dependencies used in the ordinary data exchange setting which allow the association between the source and the target at the schema level, with the mapping table construct introduced in the classical data coordination setting which deﬁnes the association between the source and the target at the instance level. A mapping table construct deﬁnes for each source element, the set of associated (or corresponding) elements in the domain of the target. The semantics of this association relationship between source and target elements change with different requirements of different applications. Ordinary DE settings can represent DSE settings; however, we show that there exist DSE settings with particular semantics of related values in mapping tables where DE is not the best exchange solution to adopt. The thesis introduces two DSE settings with such a property. We call the ﬁrst DSE with unique identity semantics. The semantics of a mapping table in this DSE setting speciﬁes that each source element should be uniquely mapped to at least one target element that is associated with it in the mapping table. ii In this setting, classical DE is one method to perform a data exchange; however, it is not the best method to adopt, since it can not represent exchange applications, that require – as DC applications – to compute both portions as well as complete sets of certain answers for conjunctive queries. In addition, we show that adopting known DE universal solutions as semantics for such DSE settings is not the best in terms of efﬁciency when computing certain answers for conjunctive queries. The second DSE setting that the thesis introduces with the same property is called DSE with equality semantics. This setting captures interesting meaning of related data in a mapping table. Such semantics impose that each source element in a mapping table is related to a target element only if both elements are equivalent (i.e they have the same meaning). We show in our thesis that this DSE setting differs from ordinary DE settings in the sense that additional information could be entailed under such interpretation of related data. Also, this added information needs to be augmented to both the source instance and the mapping table in order to generate target instances that correctly reﬂect both in a DSE scenario. In other words, we can say that in such a DSE setting, a source instance and a mapping table can be incomplete with respect to the semantics of the mapping table. We formally deﬁne the two aforementioned semantics of a DSE setting and we distinguish between two types of solutions for this setting, named,universal DSE solutions, which contain the complete set of exchanged information, and universal DSE KB-Solutions, which store a portion of the exchanged information with implicit information in the form of a set of rules over the target. DSEKB-Solutions allow applications to compute on demand both a portion and the complete set of certain answers for conjunctive queries. In addition,we deﬁne the semantics of conjunctive query answering, and we distinguish between sound and complete certain answers for conjunctive queries and we deﬁne the algorithms to compute these efﬁciently. Finally, we provide experimental results which compare the run times to generate DSE solutions versus DSE KB-solutions, and compare the performance of computing sound and complete certain answers for conjunctive queries using both types of solutions Data Exchange Data Integration
9	Extending dependencies for improving data quality Ma, Shuai January 2011 (has links) This doctoral thesis presents the results of my work on extending dependencies for improving data quality, both in a centralized environment with a single database and in a data exchange and integration environment with multiple databases. The first part of the thesis proposes five classes of data dependencies, referred to as CINDs, eCFDs, CFDcs, CFDps and CINDps, to capture data inconsistencies commonly found in practice in a centralized environment. For each class of these dependencies, we investigate two central problems: the satisfiability problem and the implication problem. The satisfiability problem is to determine given a set Σ of dependencies defined on a database schema R, whether or not there exists a nonempty database D of R that satisfies Σ. And the implication problem is to determine whether or not a set Σ of dependencies defined on a database schema R entails another dependency φ on R. That is, for each database D ofRthat satisfies Σ, the D must satisfy φ as well. These are important for the validation and optimization of data-cleaning processes. We establish complexity results of the satisfiability problem and the implication problem for all these five classes of dependencies, both in the absence of finite-domain attributes and in the general setting with finite-domain attributes. Moreover, SQL-based techniques are developed to detect data inconsistencies for each class of the proposed dependencies, which can be easily implemented on the top of current database management systems. The second part of the thesis studies three important topics for data cleaning in a data exchange and integration environment with multiple databases. One is the dependency propagation problem, which is to determine, given a view defined on data sources and a set of dependencies on the sources, whether another dependency is guaranteed to hold on the view. We investigate dependency propagation for views defined in various fragments of relational algebra, conditional functional dependencies (CFDs) [FGJK08] as view dependencies, and for source dependencies given as either CFDs or traditional functional dependencies (FDs). And we establish lower and upper bounds, all matching, ranging from PTIME to undecidable. These not only provide the first results for CFD propagation, but also extend the classical work of FD propagation by giving new complexity bounds in the presence of a setting with finite domains. We finally provide the first algorithm for computing a minimal cover of all CFDs propagated via SPC views. The algorithm has the same complexity as one of the most efficient algorithms for computing a cover of FDs propagated via a projection view, despite the increased expressive power of CFDs and SPC views. Another one is matching records from unreliable data sources. A class of matching dependencies (MDs) is introduced for specifying the semantics of unreliable data. As opposed to static constraints for schema design such as FDs, MDs are developed for record matching, and are defined in terms of similarity metrics and a dynamic semantics. We identify a special case of MDs, referred to as relative candidate keys (RCKs), to determine what attributes to compare and how to compare them when matching records across possibly different relations. We also propose a mechanism for inferring MDs with a sound and complete system, a departure from traditional implication analysis, such that when we cannot match records by comparing attributes that contain errors, we may still find matches by using other, more reliable attributes. We finally provide a quadratic time algorithm for inferring MDs, and an effective algorithm for deducing quality RCKs from a given set of MDs. The last one is finding certain fixes for data monitoring [CGGM03, SMO07], which is to find and correct errors in a tuple when it is created, either entered manually or generated by some process. That is, we want to ensure that a tuple t is clean before it is used, to prevent errors introduced by adding t. As noted by [SMO07], it is far less costly to correct a tuple at the point of entry than fixing it afterward. Data repairing based on integrity constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. 005.3
10	A More Decentralized Vision for Linked Data Polleres, Axel, Kamdar, Maulik R., Fernandez Garcia, Javier David, Tudorache, Tania, Musen, Mark A. 25 June 2018 (has links) (PDF) In this deliberately provocative position paper, we claim that ten years into Linked Data there are still (too?) many unresolved challenges towards arriving at a truly machine-readable and decentralized Web of data. We take a deeper look at the biomedical domain - currently, one of the most promising "adopters" of Linked Data - if we believe the ever-present "LOD cloud" diagram. Herein, we try to highlight and exemplify key technical and non-technical challenges to the success of LOD, and we outline potential solution strategies. We hope that this paper will serve as a discussion basis for a fresh start towards more actionable, truly decentralized Linked Data, and as a call to the community to join forces. / Series: Working Papers on Information Systems, Information Business and Operations

Search results