Global ETD Search

401	Performance Modelling of Database Designs using a Queueing Networks Approach. An investigation in the performance modelling and evaluation of detailed database designs using queueing network models. Osman, Rasha Izzeldin Mohammed January 2010 (has links) Databases form the common component of many software systems, including mission critical transaction processing systems and multi-tier Internet applications. There is a large body of research in the performance of database management system components, while studies of overall database system performance have been limited. Moreover, performance models specifically targeted at the database design have not been extensively studied. This thesis attempts to address this concern by proposing a performance evaluation method for database designs based on queueing network models. The method is targeted at designs of large databases in which I/O is the dominant cost factor. The database design queueing network performance model is suitable in providing what if comparisons of database designs before database system implementation. A formal specification that captures the essential database design features while keeping the performance model sufficiently simple is presented. Furthermore, the simplicity of the modelling algorithms permits the direct mapping between database design entities and queueing network models. This affords for a more applicable performance model that provides relevant feedback to database designers and can be straightforwardly integrated into early database design development phases. The accuracy of the modelling technique is validated by modelling an open source implementation of the TPC-C benchmark. The contribution of this thesis is considered to be significant in that the majority of performance evaluation models for database systems target capacity planning or overall system properties, with limited work in detailed database transaction processing and behaviour. In addition, this work is deemed to be an improvement over previous methodologies in that the transaction is modelled at a finer granularity, and that the database design queueing network model provides for the explicit representation of active database rules and referential integrity constraints. / Iqra Foundation Database design Performance modelling Queueing network approach
402	Performance Modelling of Database Designs using a Queueing Networks Approach. An investigation in the performance modelling and evaluation of detailed database designs using queueing network models. Osman, Rasha Izzeldin Mohammed January 2010 (has links) Databases form the common component of many software systems, including mission critical transaction processing systems and multi-tier Internet applications. There is a large body of research in the performance of database management system components, while studies of overall database system performance have been limited. Moreover, performance models specifically targeted at the database design have not been extensively studied. This thesis attempts to address this concern by proposing a performance evaluation method for database designs based on queueing network models. The method is targeted at designs of large databases in which I/O is the dominant cost factor. The database design queueing network performance model is suitable in providing what if comparisons of database designs before database system implementation. A formal specification that captures the essential database design features while keeping the performance model sufficiently simple is presented. Furthermore, the simplicity of the modelling algorithms permits the direct mapping between database design entities and queueing network models. This affords for a more applicable performance model that provides relevant feedback to database designers and can be straightforwardly integrated into early database design development phases. The accuracy of the modelling technique is validated by modelling an open source implementation of the TPC-C benchmark. The contribution of this thesis is considered to be significant in that the majority of performance evaluation models for database systems target capacity planning or overall system properties, with limited work in detailed database transaction processing and behaviour. In addition, this work is deemed to be an improvement over previous methodologies in that the transaction is modelled at a finer granularity, and that the database design queueing network model provides for the explicit representation of active database rules and referential integrity constraints. / Iqra Foundation Database design Performance modelling Queueing network approach
403	A generic construction process modelling method Karhu, Vesa January 2001 (has links) A variety of modelling methods has been used to modelconstruction processes and projects, either during normalproject planning or for process re-engineering efforts orresearch. One common method, which is widely used byconstruction industry practitioners, is scheduling. In additionto schedules, some companies have used a simple box-and-arrowmethod, which graphically resembles schedules, for analysingtheir working processes. More formal methods such as IDEF0 havebeen used in re-engineering projects and by researchers. Allthese methods are limited in scope and cannot be used to modelall the aspects of the processes that practitioners areinterested in. A new generic construction process modelling method, GEPM,was developed to overcome the deficiencies of the currentmethods. GEPM uses object-oriented principles, and has borrowedfeatures, such as activity, task, and temporal dependency, frommethods like IDEF0 and scheduling. GEPM is flexible in thesense that the conceptual model can be changed to achieveadditional special features. This capability is also supportedby the database implementation, which enables users to interactwith the developed process models through views that representpartial models. The views support the IDEF0, scheduling, andsimple flow methods. There are, though, rules for how toconvert between the partial models through views. The evaluation of GEPM showed that more modelling features,i.e. modelling power, are obtained in comparison with theearlier methods. One of the essential features of GEPM is thedistinction between activities and tasks. Activities define howan action will be carried out, generally using predeterminedinputs to achieve a predetermined output, whereas tasks areactivities with additionally specified starting and finishingtimes, duration and location. Moreover, a task has atype-attribute that refers to an activity where its overalltemplate is defined. Before the actual evaluation, case material from a realproject was preliminarily tested with GEPM along with theprototype application. It turned out that some additions wereneeded to the conceptual model of GEPM and to the prototypeapplication. GEPM can be used for process improvement, processmanagement, and for enhancing communication in a constructionprocess. One usage scenario for GEPM is to define qualitysystems and reference models, using the activity sub-model andstoring the results in the GEPM database. A project-specificmodel can be derived from the reference model using conversionrules, and it eventually turns into a project specific-schedulewith tasks. <b>Keywords:</b>process, modelling, generic, method, model,database, view process modelling generic method database view
404	Databases for antibody-based proteomics Björling, Erik January 2008 (has links) Humans are believed to have ~20,500 protein-coding genes andmuch effort has over the last years been put into the characterizationand localization of the encoded proteins in order to understand theirfunctions. One such effort is the Human Proteome Resource (HPR)project, started in Sweden 2003 with the aim to generate specificantibodies to each human protein and to use those antibodies toanalyze the human proteome by screening human tissues and cells.The work reported in this thesis deals with structuring of data fromantibody-based proteomics assays, with focus on the importance ofaggregating and presenting data in a way that is easy to apprehend.The goals were to model and build databases for collecting, searchingand analyzing data coming out of the large-scale HPR project and tomake all collected data publicly available. A public website, theHuman Protein Atlas, was developed giving all end-users in thescientific community access to the HPR database with proteinexpression data. In 2008, the Human Protein Atlas was released in its4th version containing more than 6000 antibodies, covering more than25% of the human proteins. All the collected protein expression datais searchable on the public website. End-users can query for proteinsthat show high expression in one tissue and no expression in anotherand possibly find tissue specific biomarkers. Queries can also beconstructed to find proteins with different expression levels in normalvs. cancer tissues. The proteins found by such a query could identifypotential biomarkers for cancer that could be used as diagnosticmarkers and maybe even be involved in cancer therapy in the future.Validation of antibodies is important in order to get reliable resultsfrom different assays. It has been noted that some antibodies arereliable in certain assays but not in others and therefore anotherpublicly available database, the Antibodypedia, has been createdwhere any antibody producer can submit their binders together withthe validation data in order for end users to purchase the bestantibody for their protein target and their intended assay. / QC 20100708 proteomics antibodies database biomarker website Bioengineering Bioteknik
405	Aggregation and Privacy in Multi-Relational Databases Jafer, Yasser 11 April 2012 (has links) Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process. Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database. Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database. In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time. Aggregation Privacy Relational Database Data Mining
406	Storing Protein Structure in Spatial Database Yeung, Tony 12 May 2005 (has links) In recent years, the field of bioinformatics has exploded in a scale that is unprecedented. The amount of data generated from different genome projects demands a new and efficient way of information storage and retrieval. The analysis and management of the protein structure information has become one of the main focuses. It is well-known that a protein’s functions differ depending on its structure’s position in 3-dimensional space. Due to the fact that protein structures are exceedingly large, complex, and multi-dimensional, there is a need for a data model that can fulfill the requirements of storing protein structures in accordance to its spatial arrangement and topological relationships and, at the same time, provide tools to analyze the information stored. With the emergence of spatial database, first used in the field of Geographical Information Systems, the data model for protein structure could be based on the geographic model, as they share several similar uncanny traits. The geometry of proteins can be modeled using the spatial types provided in a spatial database. In a similar way, special geometry queries used for geographical analysis can also be used to provide information for analysis on the structure of the proteins. This thesis will explore the mechanics of extracting structural information for a protein from a flat file (PDB), storing that information into a spatial data model based on a spatial data model, and making analysis using geometric operators provided by the spatial database. The database used is Oracle 9i. Most features are provided by the Oracle Spatial package. Queries using the ideas aforementioned will be demonstrated. protein structures spatial database Computer Sciences
407	Aggregation and Privacy in Multi-Relational Databases Jafer, Yasser 11 April 2012 (has links) Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process. Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database. Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database. In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time. Aggregation Privacy Relational Database Data Mining
408	Inter-university Upper atmosphere Global Observation NETwork (IUGONET) project Hashiguchi, N.O., Yatagai, Akiyo, Kaneda, Naoki, Umemura, Norio, UeNo, Satoru, Yagi, Manabu, Koyama, Yukinobu, Sato, Yuka, Shinbori, Atsuki, Tanaka, Yoshimasa, Abe, Shuji, Hori, Tomoaki 25 June 2013 (has links) International Living With a Star Workshop 2013, 2013/06/24-6/28, Irkutsk, Russia analysis software database metadata upper atmosphere IUGONET
409	Automatically Tuning Database Server Multiprogramming Level Abouzour, Mohammed January 2007 (has links) Optimizing database systems to achieve the maximum attainable throughput of the underlying hardware is one of the many difficult tasks that face Database Administrators. With the increased use of database systems in many environments, this task has even become more difficult. One of the parameters that needs to be configured is the number of worker tasks that the database server uses (the multiprogramming level). This thesis will focus on how to automatically adjust the number of database server worker tasks to achieve maximum throughput under varying workload characteristics. The underlying intuition is that every workload has an optimal multiprogramming level that can achieve the best throughput given the workload characteristic. Computer Science
410	Deciding Second-order Logics using Database Evaluation Techniques Unel, Gulay January 2008 (has links) We outline a novel technique that maps the satisfiability problems of second-order logics, in particular WSnS (weak monadic second-order logic with n successors), S1S (monadic second-order logic with one successor), and of μ-calculus, to the problem of query evaluation of Complex-value Datalog queries. In this dissertation, we propose techniques that use database evaluation and optimization techniques for automata-based decision procedures for the above logics. We show how the use of advanced implementation techniques for Deductive databases and for Logic Programs, in particular the use of tabling, yields a considerable improvement in performance over more traditional approaches. We also explore various optimizations of the proposed technique, in particular we consider variants of tabling and goal reordering. We then show that the decision problem for S1S can be mapped to the problem of query evaluation of Complex-value Datalog queries. We explore optimizations that can be applied to various types of formulas. Last, we propose analogous techniques that allow us to approach μ-calculus satisfiability problem in an incremental fashion and without the need for re-computation. In addition, we outline a top-down evaluation technique to drive our incremental procedure and propose heuristics that guide the problem partitioning to reduce the size of the problems that need to be solved. Automata Logics Database Query Evaluation Computer Science

Search results