401 |
Performance Modelling of Database Designs using a Queueing Networks Approach. An investigation in the performance modelling and evaluation of detailed database designs using queueing network models.Osman, Rasha Izzeldin Mohammed January 2010 (has links)
Databases form the common component of many software systems, including mission
critical transaction processing systems and multi-tier Internet applications. There is a
large body of research in the performance of database management system components,
while studies of overall database system performance have been limited. Moreover,
performance models specifically targeted at the database design have not been
extensively studied.
This thesis attempts to address this concern by proposing a performance evaluation
method for database designs based on queueing network models. The method is targeted
at designs of large databases in which I/O is the dominant cost factor. The database
design queueing network performance model is suitable in providing what if
comparisons of database designs before database system implementation.
A formal specification that captures the essential database design features while keeping
the performance model sufficiently simple is presented. Furthermore, the simplicity of
the modelling algorithms permits the direct mapping between database design entities
and queueing network models. This affords for a more applicable performance model
that provides relevant feedback to database designers and can be straightforwardly
integrated into early database design development phases. The accuracy of the
modelling technique is validated by modelling an open source implementation of the
TPC-C benchmark. The contribution of this thesis is considered to be significant in that the majority of
performance evaluation models for database systems target capacity planning or overall
system properties, with limited work in detailed database transaction processing and
behaviour. In addition, this work is deemed to be an improvement over previous
methodologies in that the transaction is modelled at a finer granularity, and that the
database design queueing network model provides for the explicit representation of
active database rules and referential integrity constraints. / Iqra Foundation
|
402 |
Performance Modelling of Database Designs using a Queueing Networks Approach. An investigation in the performance modelling and evaluation of detailed database designs using queueing network models.Osman, Rasha Izzeldin Mohammed January 2010 (has links)
Databases form the common component of many software systems, including mission
critical transaction processing systems and multi-tier Internet applications. There is a
large body of research in the performance of database management system components,
while studies of overall database system performance have been limited. Moreover,
performance models specifically targeted at the database design have not been
extensively studied.
This thesis attempts to address this concern by proposing a performance evaluation
method for database designs based on queueing network models. The method is targeted
at designs of large databases in which I/O is the dominant cost factor. The database
design queueing network performance model is suitable in providing what if
comparisons of database designs before database system implementation.
A formal specification that captures the essential database design features while keeping
the performance model sufficiently simple is presented. Furthermore, the simplicity of
the modelling algorithms permits the direct mapping between database design entities
and queueing network models. This affords for a more applicable performance model
that provides relevant feedback to database designers and can be straightforwardly
integrated into early database design development phases. The accuracy of the
modelling technique is validated by modelling an open source implementation of the
TPC-C benchmark. The contribution of this thesis is considered to be significant in that the majority of
performance evaluation models for database systems target capacity planning or overall
system properties, with limited work in detailed database transaction processing and
behaviour. In addition, this work is deemed to be an improvement over previous
methodologies in that the transaction is modelled at a finer granularity, and that the
database design queueing network model provides for the explicit representation of
active database rules and referential integrity constraints. / Iqra Foundation
|
403 |
A generic construction process modelling methodKarhu, Vesa January 2001 (has links)
A variety of modelling methods has been used to modelconstruction processes and projects, either during normalproject planning or for process re-engineering efforts orresearch. One common method, which is widely used byconstruction industry practitioners, is scheduling. In additionto schedules, some companies have used a simple box-and-arrowmethod, which graphically resembles schedules, for analysingtheir working processes. More formal methods such as IDEF0 havebeen used in re-engineering projects and by researchers. Allthese methods are limited in scope and cannot be used to modelall the aspects of the processes that practitioners areinterested in. A new generic construction process modelling method, GEPM,was developed to overcome the deficiencies of the currentmethods. GEPM uses object-oriented principles, and has borrowedfeatures, such as activity, task, and temporal dependency, frommethods like IDEF0 and scheduling. GEPM is flexible in thesense that the conceptual model can be changed to achieveadditional special features. This capability is also supportedby the database implementation, which enables users to interactwith the developed process models through views that representpartial models. The views support the IDEF0, scheduling, andsimple flow methods. There are, though, rules for how toconvert between the partial models through views. The evaluation of GEPM showed that more modelling features,i.e. modelling power, are obtained in comparison with theearlier methods. One of the essential features of GEPM is thedistinction between activities and tasks. Activities define howan action will be carried out, generally using predeterminedinputs to achieve a predetermined output, whereas tasks areactivities with additionally specified starting and finishingtimes, duration and location. Moreover, a task has atype-attribute that refers to an activity where its overalltemplate is defined. Before the actual evaluation, case material from a realproject was preliminarily tested with GEPM along with theprototype application. It turned out that some additions wereneeded to the conceptual model of GEPM and to the prototypeapplication. GEPM can be used for process improvement, processmanagement, and for enhancing communication in a constructionprocess. One usage scenario for GEPM is to define qualitysystems and reference models, using the activity sub-model andstoring the results in the GEPM database. A project-specificmodel can be derived from the reference model using conversionrules, and it eventually turns into a project specific-schedulewith tasks. <b>Keywords:</b>process, modelling, generic, method, model,database, view
|
404 |
Databases for antibody-based proteomicsBjörling, Erik January 2008 (has links)
Humans are believed to have ~20,500 protein-coding genes andmuch effort has over the last years been put into the characterizationand localization of the encoded proteins in order to understand theirfunctions. One such effort is the Human Proteome Resource (HPR)project, started in Sweden 2003 with the aim to generate specificantibodies to each human protein and to use those antibodies toanalyze the human proteome by screening human tissues and cells.The work reported in this thesis deals with structuring of data fromantibody-based proteomics assays, with focus on the importance ofaggregating and presenting data in a way that is easy to apprehend.The goals were to model and build databases for collecting, searchingand analyzing data coming out of the large-scale HPR project and tomake all collected data publicly available. A public website, theHuman Protein Atlas, was developed giving all end-users in thescientific community access to the HPR database with proteinexpression data. In 2008, the Human Protein Atlas was released in its4th version containing more than 6000 antibodies, covering more than25% of the human proteins. All the collected protein expression datais searchable on the public website. End-users can query for proteinsthat show high expression in one tissue and no expression in anotherand possibly find tissue specific biomarkers. Queries can also beconstructed to find proteins with different expression levels in normalvs. cancer tissues. The proteins found by such a query could identifypotential biomarkers for cancer that could be used as diagnosticmarkers and maybe even be involved in cancer therapy in the future.Validation of antibodies is important in order to get reliable resultsfrom different assays. It has been noted that some antibodies arereliable in certain assays but not in others and therefore anotherpublicly available database, the Antibodypedia, has been createdwhere any antibody producer can submit their binders together withthe validation data in order for end users to purchase the bestantibody for their protein target and their intended assay. / QC 20100708
|
405 |
Aggregation and Privacy in Multi-Relational DatabasesJafer, Yasser 11 April 2012 (has links)
Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process.
Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database.
Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database.
In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time.
|
406 |
Storing Protein Structure in Spatial DatabaseYeung, Tony 12 May 2005 (has links)
In recent years, the field of bioinformatics has exploded in a scale that is unprecedented. The amount of data generated from different genome projects demands a new and efficient way of information storage and retrieval. The analysis and management of the protein structure information has become one of the main focuses. It is well-known that a protein’s functions differ depending on its structure’s position in 3-dimensional space. Due to the fact that protein structures are exceedingly large, complex, and multi-dimensional, there is a need for a data model that can fulfill the requirements of storing protein structures in accordance to its spatial arrangement and topological relationships and, at the same time, provide tools to analyze the information stored. With the emergence of spatial database, first used in the field of Geographical Information Systems, the data model for protein structure could be based on the geographic model, as they share several similar uncanny traits. The geometry of proteins can be modeled using the spatial types provided in a spatial database. In a similar way, special geometry queries used for geographical analysis can also be used to provide information for analysis on the structure of the proteins. This thesis will explore the mechanics of extracting structural information for a protein from a flat file (PDB), storing that information into a spatial data model based on a spatial data model, and making analysis using geometric operators provided by the spatial database. The database used is Oracle 9i. Most features are provided by the Oracle Spatial package. Queries using the ideas aforementioned will be demonstrated.
|
407 |
Aggregation and Privacy in Multi-Relational DatabasesJafer, Yasser 11 April 2012 (has links)
Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process.
Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database.
Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database.
In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time.
|
408 |
Inter-university Upper atmosphere Global Observation NETwork (IUGONET) projectHashiguchi, N.O., Yatagai, Akiyo, Kaneda, Naoki, Umemura, Norio, UeNo, Satoru, Yagi, Manabu, Koyama, Yukinobu, Sato, Yuka, Shinbori, Atsuki, Tanaka, Yoshimasa, Abe, Shuji, Hori, Tomoaki 25 June 2013 (has links)
International Living With a Star Workshop 2013, 2013/06/24-6/28, Irkutsk, Russia
|
409 |
Automatically Tuning Database Server Multiprogramming LevelAbouzour, Mohammed January 2007 (has links)
Optimizing database systems to achieve the maximum attainable throughput of the underlying hardware is one of the many difficult tasks that face Database Administrators. With the increased use of database systems in many environments, this task has even become more difficult. One of the parameters that needs to be configured is the number of worker tasks that the database server uses (the multiprogramming level). This thesis will focus on how to automatically adjust the number of database server worker tasks to achieve maximum throughput under varying workload characteristics. The underlying intuition is that every workload has an optimal multiprogramming level that can achieve the best throughput given the workload characteristic.
|
410 |
Deciding Second-order Logics using Database Evaluation TechniquesUnel, Gulay January 2008 (has links)
We outline a novel technique that maps the satisfiability problems of
second-order logics, in particular WSnS (weak monadic second-order
logic with n successors), S1S (monadic second-order logic with one
successor), and of μ-calculus, to the problem of query evaluation
of Complex-value Datalog queries. In this dissertation, we propose
techniques that use database evaluation and optimization techniques
for automata-based decision procedures for the above logics. We show
how the use of advanced implementation techniques for Deductive
databases and for Logic Programs, in particular the use of tabling,
yields a considerable improvement in performance over more traditional
approaches. We also explore various optimizations of the proposed
technique, in particular we consider variants of tabling and goal
reordering. We then show that the decision problem for S1S can be
mapped to the problem of query evaluation of
Complex-value Datalog queries.
We explore optimizations that
can be applied to various types of formulas. Last, we propose
analogous techniques that allow us to approach μ-calculus
satisfiability problem in an incremental fashion and without the need
for re-computation. In addition, we outline a top-down evaluation
technique to drive our incremental procedure and propose heuristics
that guide the problem partitioning to reduce the size of the problems
that need to be solved.
|
Page generated in 0.0461 seconds