Global ETD Search

131	Design Guidelines for Reducing Redundancy in Relational and XML Data Kolahi, Solmaz 31 July 2008 (has links) In this dissertation, we propose new design guidelines to reduce the amount of redundancy that databases carry. We use techniques from information theory to define a measure that evaluates a database design based on the worst possible redundancy carried in the instances. We then continue by revisiting the design problem of relational data with functional dependencies, and measure the lowest price, in terms of redundancy, that has to be paid to guarantee a dependency-preserving normalization for all schemas. We provide a formal justification for the Third Normal Form (3NF) by showing that we can achieve this lowest price by doing a good 3NF normalization. We then study the design problem for XML documents that are views of relational data. We show that we can design a redundancy-free XML representation for some relational schemas while preserving all data dependencies. We present an algorithm for converting a relational schema to such an XML design. We finally study the design problem for XML documents that are stored in relational databases. We look for XML design criteria that ensure a relational storage with low redundancy. First, we characterize XML designs that have a redundancy-free relational storage. Then we propose a restrictive condition for XML functional dependencies that guarantees a low redundancy for data values in the relational storage. Databases Design Relational Databases XML Redundancy Normalization 0984
132	Design, development, and deployment of a locus specific mutation database : the PAHdb example Nowacki, Piotr Marek. January 1998 (has links) Genetics is concerned with inheritance, genomics with the study of genomes. Bioinformatics provides the tools to study the interface between the two. If a particular locus in the human genome could have 100 discrete alleles, then the genome (comprising an estimated 80,000 genes), could harbor 8 million different alleles. To record information about each of these alleles in a meaningful and systematic fashion is a task for the Mutation Database domain of bioinformatics. The HUGO Mutation Database Initiative is an international effort to capture, record and distribute information about variation in genomes. This initiative comprises a growing number of Locus-Specific Mutation databases, and a few large Federated Genomic databases [Cotton et al., 1998]. / Here I present work on a well recognized prototypical Locus-Specific database: PAHdb. PAHdb is a relatively large curated relational database. / This graduate project has had two major aims: to improve PAHdb , by careful analysis of version 1.0 and revision of its design, resulting in PAHdb version 2.0; to document the redesign process and share the experience by the conception of guidelines for content and structure of mutation databases in general. (Abstract shortened by UMI.) Mutation (Biology) -- Databases. Genetics -- Databases. Phenylketonuria. Metabolism, Inborn errors of.
133	Business Policy Modeling and Enforcement in Relational Database Systems Ataullah, Ahmed January 2014 (has links) Database systems maintain integrity of the stored information by ensuring that modifications to the database comply with constraints designed by the administrators. As the number of users and applications sharing a common database increases, so does the complexity of the set of constraints that originate from higher level business processes. The lack of a systematic mechanism for integrating and reasoning about a diverse set of evolving and potentially interfering policies manifested as database level constraints makes corporate policy management within relational systems a chaotic process. In this thesis we present a systematic method of mapping a broad set of process centric business policies onto database level constraints. We exploit the observation that the state of a database represents the union of all the states of every ongoing business process and thus establish a bijective relationship between progression in individual business processes and changes in the database state space. We propose graphical notations that are equivalent to integrity constraints specified in linear temporal logic of the past. Furthermore we demonstrate how this notation can accommodate a wide array of workflow patterns, can allow for multiple policy makers to implement their own process centric constraints independently using their own logical policy models, and can model check these constraints within the database system to detect potential conflicting constraints across several different business processes. A major contribution of this thesis is that it bridges several different areas of research including database systems, temporal logics, model checking, and business workflow/policy management to propose an accessible method of integrating, enforcing, and reasoning about the consequences of process-centric constraints embedded in database systems. As a result, the task of ensuring that a database continuously complies with evolving business rules governed by hundreds of processes, which is traditionally handled by an army of database programmers regularly updating triggers and batch procedures, is made easier, more manageable, and more predictable.
134	Security of genetic databases Giggins, Helen January 2009 (has links) Research Doctorate - Doctor of Philosophy (PhD) / The rapid pace of growth in the field of human genetics has left researchers with many new challenges in the area of security and privacy. To encourage participation and foster trust towards research, it is important to ensure that genetic databases are adequately protected. This task is a particularly challenging one for statistical agencies due to the high prevalence of categorical data contained within statistical genetic databases. The absence of natural ordering makes the application of traditional Statistical Disclosure Control (SDC) methods less straightforward, which is why we have proposed a new noise addition technique for categorical values. The main contributions of the thesis are as follows. We provide a comprehensive analysis of the trust relationships that occur between the different stakeholders in a genetic data warehouse system. We also provide a quantifiable model of trust that allows the database manager to granulate the level of protection based on the amount of trust that exists between the stakeholders. To the best of our knowledge, this is the first time that trust has been applied in the SDC context. We propose a privacy protection framework for genetic databases which is designed to deal with the fact that genetic data warehouses typically contain a high proportion of categorical data. The framework includes the use of a clustering technique which allows for the easier application of traditional noise addition techniques for categorical values. Another important contribution of this thesis is a new similarity measure for categorical values, which aims to capture not only the direct similarity between values, but also some sense of transitive similarity. This novel measure also has possible applications in providing a way of ordering categorical values, so that more traditional SDC methods can be more easily applied to them. Our analysis of experimental results also points to a numerical attribute phenomenon, whereby we typically have high similarity between numerical values that are close together, and where the similarity decreases as the absolute value of the difference between numerical values increases. However, some numerical attributes appear to not behave in a strictly `numerical' way. That is, values which are close together numerically do not always appear very similar. We also provide a novel noise addition technique for categorical values, which employs our similarity measure to partition the values in the data set. Our method - VICUS - then perturbs the original microdata file so that each value is more likely to be changed to another value in the same partition than one from a different partition. The technique helps to ensure that the perturbed microdata file retains data quality while also preserving the privacy of individual records. statistical disclosure control trust privacy security genetic databases statistical databases
135	TensorDB and Tensor-Relational Model (TRM) for Efficient Tensor-Relational Operations January 2014 (has links) abstract: Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network data analysis. Relational model, on the other hand, enables semantic manipulation of data using relational operators, such as projection, selection, Cartesian-product, and set operators. For many multidimensional data applications, tensor operations as well as relational operations need to be supported throughout the data life cycle. In this thesis, we introduce a tensor-based relational data model (TRM), which enables both tensor- based data analysis and relational manipulations of multidimensional data, and define tensor-relational operations on this model. Then we introduce a tensor-relational data management system, so called, TensorDB. TensorDB is based on TRM, which brings together relational algebraic operations (for data manipulation and integration) and tensor algebraic operations (for data analysis). We develop optimization strategies for tensor-relational operations in both in-memory and in-database TensorDB. The goal of the TRM and TensorDB is to serve as a single environment that supports the entire life cycle of data; that is, data can be manipulated, integrated, processed, and analyzed. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014 Computer science Array Databases Clustering Databases Machine Learning Tensor Decomposition
136	An infrastructure for secure distributed object-oriented databases Dreyer, Lucas Cornelius Johannes 10 September 2012 (has links) M.Sc. / In a society that is becoming increasingly reliant on information, it is necessary for information to be stored efficiently and safely. Database technology is used to store large chunks of information efficiently, while database security is concerned with storing information securely. More complex computer applications (CAD/CAM, multimedia and Groupware) led to then development of object-oriented programming, with object-oriented databases following shortly after. Object-oriented databases store the data of object-oriented systems efficiently and permanently. They provide a rich set of semantic structures that allows them to be used in applications where other database models are simply inadequate. In federations consisting of several interconnected databases, security plays a vital role in the proper management of information. This work describes a Secure Distributed Object Environment (SDOE) infrastructure. It is designed to be implementation-oriented, on which strict theoretic prototypes such as SPOP (Selfprotecting Object Prototype) can be built. SPOP is a prototype of a secure object-oriented database and is based on the SPO database model of Olivier. To describe federated database architectures (used by SDOE and SPOP), it is necessary to understand the architecture of federated database systems. Reference architectures for federated database systems are discussed first and a comparison is drawn between two prominent reference architectures. We proposed a generalised reference architecture based on these two architectures. created in order to make the use of object-oriented programming in a distributed environment as problem free as possible. A marshal buffer structure will be discussed thirdly. The latter structure is used to contain procedure parameters during an RPC (Remote Procedure Call). Fourthly, the communications infrastructure necessary to support higher-level services is discussed. The infrastructure is implemented in Linux (a UNIX variant), and this approach has provided several interesting challenges. The fifth discussion will deal with the requirements for a name service. A name service is necessary if objects were to be used transparently (without reference to their current locations in the federation). Object-oriented databases. Distributed databases.
137	Data-Driven Database Education: A Quantitative Study of SQL Learning in an Introductory Database Course Von Dollen, Andrew C 01 July 2019 (has links) The Structured Query Language (SQL) is widely used and challenging to master. Within the context of lab exercises in an introductory database course, this thesis analyzes the student learning process and seeks to answer the question: ``Which SQL concepts, or concept combinations, trouble students the most?'' We provide comprehensive taxonomies of SQL concepts and errors, identify common areas of student misunderstanding, and investigate the student problem-solving process. We present an interactive web application used by students to complete SQL lab exercises. In addition, we analyze data collected by this application and we offer suggestions for improvement to database lab activities. Databases Structured Query Language SQL Relational Model Databases and Information Systems
138	Design, development, and deployment of a locus specific mutation database : the PAHdb example Nowacki, Piotr Marek. January 1998 (has links) No description available. Genetics -- Databases. Mutation (Biology) -- Databases. Phenylketonuria. Metabolism, Inborn errors of.
139	Matching Based Diversity Modi, Amit 28 July 2011 (has links) No description available. Computer Science
140	Community detection in complex networks Bidoni, Zeynab Bahrami 01 July 2015 (has links) This research study has produced advances in the understanding of communities within a complex network. A community in this context is defined as a subgraph with a higher internal density and a lower crossing density with respect to other subgraphs. In this study, a novel and efficient distance-based ranking algorithm called the Correlation Density Rank (CDR) has been proposed and is utilized for a broad range of applications, such as deriving the community structure and the evolution graph of the organizational structure from a dynamic social network, extracting common members between overlapped communities, performance-based comparison between different service providers in a wireless network, and finding optimal reliability-oriented assignment tasks to processors in heterogeneous distributed computing systems. The experiments, conducted on both synthetic and real datasets, demonstrate the feasibility and applicability of the framework. Computer Sciences Databases and Information Systems

Search results