1 |
Fuzzy Querying In Xml DatabasesUstunkaya, Ekin 01 January 2005 (has links) (PDF)
Real-world information containing subjective opinions and judgments has emerged the need to represent complex and imprecise data in databases. Additionally, the challenge of transferring information between databases whose data storage methods are not compatible has been an important research topic. Extensible Markup Language (XML) has the potential to meet these challenges since it has the ability to
represent complex and imprecise data.
In this thesis, an XML based fuzzy data representation and querying system is designed and implemented. The resulting system enables fuzzy querying on XML documents by using XQuery, a language used for querying XML documents. In the
system, complex and imprecise data are represented using XML combined with the fuzzy representation. In addition to fuzzy querying, the system enables restructuring of XML Schemas by merging of elements of the XML documents. By using this feature of the system, one can generate a new XML Schema and new XML documents from the existing documents according to this new XML Schema. XML data used in the system are retrieved from Internet by Web Services, which can make use of XML&rsquo / s capabilities to transfer data and, XML documents are stored in a native XML database management system.
|
2 |
An XML-based Database of Molecular Pathways / En XML-baserad databas för molekylära reaktionerHall, David January 2005 (has links)
<p>Research of protein-protein interactions produce vast quantities of data and there exists a large number of databases with data from this research. Many of these databases offers the data for download on the web in a number of different formats, many of them XML-based.</p><p>With the arrival of these XML-based formats, and especially the standardized formats such as PSI-MI, SBML and BioPAX, there is a need for searching in data represented in XML. We wanted to investigate the capabilities of XML query tools when it comes to searching in this data. Due to the large datasets we concentrated on native XML database systems that in addition to search in XML data also offers storage and indexing specially suited for XML documents.</p><p>A number of queries were tested on data exported from the databases IntAct and Reactome using the XQuery language. There were both simple and advanced queries performed. The simpler queries consisted of queries such as listing information on a specified protein or counting the number of reactions.</p><p>One central issue with protein-protein interactions is to find pathways, i.e. series of interconnected chemical reactions between proteins. This problem involve graph searches and since we suspected that the complex queries it required would be slow we also developed a C++ program using a graph toolkit.</p><p>The simpler queries were performed relatively fast. Pathway searches in the native XML databases took long time even for short searches while the C++ program achieved much faster pathway searches.</p>
|
3 |
Global Semantic Integrity Constraint Checking for a System of DatabasesMadiraju, Praveen 09 August 2005 (has links)
In today’s emerging information systems, it is natural to have data distributed across multiple sites. We define a System of Databases (SyDb) as a collection of autonomous and heterogeneous databases. R-SyDb (System of Relational Databases) is a restricted form of SyDb, referring to a collection of relational databases, which are independent. Similarly, X-SyDb (System of XML Databases) refers to a collection of XML databases. Global integrity constraints ensure integrity and consistency of data spanning multiple databases. In this dissertation, we present (i) Constraint Checker, a general framework of a mobile agent based approach for checking global constraints on R-SyDb, and (ii) XConstraint Checker, a general framework for checking global XML constraints on X-SyDb. Furthermore, we formalize multiple efficient algorithms for varying semantic integrity constraints involving both arithmetic and aggregate predicates. The algorithms take as input an update statement, list of all global semantic integrity constraints with arithmetic predicates or aggregate predicates and outputs sub-constraints to be executed on remote sites. The algorithms are efficient since (i) constraint check is carried out at compile time, i.e. before executing update statement; hence we save time and resources by avoiding rollbacks, and (ii) the implementation exploits parallelism. We have also implemented a prototype of systems and algorithms for both R-SyDb and X-SyDb. We also present performance evaluations of the system.
|
4 |
Automatic Physical Design for XML DatabasesElghandour, Iman January 2010 (has links)
Database systems employ physical structures such as indexes and materialized views to improve query performance, potentially by orders of magnitude. It is therefore important for a database administrator to choose the appropriate configuration of these physical structures (i.e., the appropriate physical design) for a given database. Deciding on the physical design of a database is not an easy task, and a considerable amount of research exists on automatic physical design tools for relational databases. Recently, XML database systems are increasingly being used for managing highly structured XML data, and support for XML data is being added to commercial relational database systems. This raises the important question of how to choose the appropriate physical design (i.e., the appropriate set of physical structures) for an XML database. Relational automatic physical design tools are not adequate, so new research is needed in this area.
In this thesis, we address the problem of automatic physical design for XML databases, which is the process of automatically selecting the best set of physical structures for a given database and a given query workload representing the client application's usage patterns of this data. We focus on recommending two types of physical structures: XML indexes and relational materialized views of XML data. For each of these structures, we study the recommendation process and present a design advisor that automatically recommends a configuration of physical structures given an XML database and a workload of XML queries. The recommendation process is divided into four main phases: (1) enumerating candidate physical structures, (2) generalizing candidate structures in order to generate more candidates that are useful to queries that are not seen in the given workload but similar to the workload queries, (3) estimating the benefit of various candidate structures, and (4) selecting the best set of candidate structures for the given database and workload. We present a design advisor for recommending XML indexes, one for recommending materialized views, and an integrated design advisor that recommends both indexes and materialized views. A key characteristic of our advisors is that they are tightly coupled with the query optimizer of the database system, and rely on the optimizer for enumerating and evaluating physical designs whenever possible. This characteristic makes our techniques suitable for any database system that complies with a set of minimum requirements listed within the thesis. We have implemented the index, materialized view, and integrated advisors in a prototype version of IBM DB2 V9, which supports both relational and XML data, and we experimentally demonstrate the effectiveness of their
recommendations using this implementation.
|
5 |
Automatic Physical Design for XML DatabasesElghandour, Iman January 2010 (has links)
Database systems employ physical structures such as indexes and materialized views to improve query performance, potentially by orders of magnitude. It is therefore important for a database administrator to choose the appropriate configuration of these physical structures (i.e., the appropriate physical design) for a given database. Deciding on the physical design of a database is not an easy task, and a considerable amount of research exists on automatic physical design tools for relational databases. Recently, XML database systems are increasingly being used for managing highly structured XML data, and support for XML data is being added to commercial relational database systems. This raises the important question of how to choose the appropriate physical design (i.e., the appropriate set of physical structures) for an XML database. Relational automatic physical design tools are not adequate, so new research is needed in this area.
In this thesis, we address the problem of automatic physical design for XML databases, which is the process of automatically selecting the best set of physical structures for a given database and a given query workload representing the client application's usage patterns of this data. We focus on recommending two types of physical structures: XML indexes and relational materialized views of XML data. For each of these structures, we study the recommendation process and present a design advisor that automatically recommends a configuration of physical structures given an XML database and a workload of XML queries. The recommendation process is divided into four main phases: (1) enumerating candidate physical structures, (2) generalizing candidate structures in order to generate more candidates that are useful to queries that are not seen in the given workload but similar to the workload queries, (3) estimating the benefit of various candidate structures, and (4) selecting the best set of candidate structures for the given database and workload. We present a design advisor for recommending XML indexes, one for recommending materialized views, and an integrated design advisor that recommends both indexes and materialized views. A key characteristic of our advisors is that they are tightly coupled with the query optimizer of the database system, and rely on the optimizer for enumerating and evaluating physical designs whenever possible. This characteristic makes our techniques suitable for any database system that complies with a set of minimum requirements listed within the thesis. We have implemented the index, materialized view, and integrated advisors in a prototype version of IBM DB2 V9, which supports both relational and XML data, and we experimentally demonstrate the effectiveness of their
recommendations using this implementation.
|
6 |
RepliX: Um mecanismo para a replicação de dados XML / RepliX: a mechanism for XML data replicationSousa, Flávio Rubens de Carvalho January 2007 (has links)
SOUSA, Flávio Rubens de Carvalho. RepliX: Um mecanismo para a replicação de dados XML. 2007. 77 f. : Dissertação (mestrado) - Universidade Federal do Ceará, Centro de Ciências, Departamento de Computação, Fortaleza-CE, 2007. / Submitted by guaracy araujo (guaraa3355@gmail.com) on 2016-06-29T17:20:02Z
No. of bitstreams: 1
2007_dis_frcsousa.pdf: 4193448 bytes, checksum: 9f20d4c36e05e635c6fe3b3114e2228c (MD5) / Approved for entry into archive by guaracy araujo (guaraa3355@gmail.com) on 2016-06-29T17:21:26Z (GMT) No. of bitstreams: 1
2007_dis_frcsousa.pdf: 4193448 bytes, checksum: 9f20d4c36e05e635c6fe3b3114e2228c (MD5) / Made available in DSpace on 2016-06-29T17:21:26Z (GMT). No. of bitstreams: 1
2007_dis_frcsousa.pdf: 4193448 bytes, checksum: 9f20d4c36e05e635c6fe3b3114e2228c (MD5)
Previous issue date: 2007 / XML tem se tornado um padrão amplamente utilizado na representação e troca de dados em aplicações. Devido a essa crescente utilização do XML, torna-se necessária a existência de sistemas eficientes de armazenamento e recuperação de dados XML. Estão sendo desenvolvidos para este fim Bancos de Dados XML Nativos (BDXNs). Estes bancos implementam muitas das características presentes em Bancos de Dados tradicionais, tais como armazenamento, indexação, processamento de consultas, transações e replicação. Tratando-se especificamente de replicação, a maioria das soluções existentes resolve essa questão apenas utilizando técnicas tradicionais. Todavia, a exibilidade dos dados XML impõe novos desafios, de modo que novas técnicas de replicação devem ser desenvolvidas. Para melhorar o desempenho e a disponibilidade dos BDXNs, esta dissertação propõe o RepliX, um mecanismo para replicação de dados XML que considera as principais características desses dados. Dessa forma, é possível melhorar o tempo de resposta no processamento de consultas e tornar esses sistemas mais tolerantes a falhas. Dentre vários tipos de protocolos de replicação, a utilização da abstração de comunicação em grupos como estratégia de comunicação e detecção de falhas mostrase uma solução eficaz, visto que essa abstração possui técnicas eficientes para troca de mensagens e provê garantias de confiabilidade. Essa estratégia é utilizada no RepliX, que organiza os sites em dois grupos: de atualização e de leitura, permitindo assim balanceamento de carga entre os sites, além de tornar o sistema menos sensível a falhas, já que não há um ponto de falha único em cada grupo. Para validar o RepliX, uma nova camada de replicação foi implementada em um BDXN, a _m de introduzir as características e os comportamentos descritos no mecanismo proposto. Experimentos foram feitos usando essa camada e os resultados obtidos atestam a sua eficácia considerando diferentes aspectos de um banco de dados replicado, melhorando o desempenho desses banco de dados consideravelmente bem como sua disponibilidade. / XML has become a widely used standard for data representation and exchange in applications. The growing usage of XML creates a need for e cient storage and recovery systems for XML data. Native XML DBs (NXDBs) are being developed to target this demand. NXDBs implement many characteristics that are common to traditional DBs, such as storage, indexing, query processing, transactions and replication. Most existing solutions solve the replication issue through traditional techniques. However, the exibility of XML data imposes new challenges, so new replication techniques ought to be developed. To improve the performance and availability of NXDBs, this thesis proposes RepliX, a mechanism for XML data replication that takes into account the main characteristics of this data type, making it possible to reduce the response time in query processing and improving the fault-tolerance property of such systems. Although there are several replication protocols, using the group communication abstraction for communication and fault detection has proven to be a good solution, since this abstraction provides e cient message exchanging techniques and con ability guarantees. RepliX uses this strategy, organizing the sites into an update group and a read-only group in such a way that allows for the use of load balancing among the sites, and makes the system less susceptible to faults, since there is no single point of failure in each group. In order to evaluate RepliX, a new replication layer was implemented on top of an existing NXDB to introduce the characteristics of the proposed mechanism. Several experiments using this layer were conducted, and their results con rm the mechanism's e ciency considering the di erent aspects of a replicated database, improving its performance considerably, as well as its availability.
|
7 |
RepliX: Um mecanismo para a replicação de dados XML / RepliX: a mechanism for XML data replicationSousa, Flávio Rubens de Carvalho January 2007 (has links)
SOUSA, Flávio Rubens de Carvalho Sousa. RepliX: Um mecanismo para a replicação de dados XML. 2007. 91 f. Dissertação (Mestrado em ciência da computação)- Universidade Federal do Ceará, Fortaleza-CE, 2007. / Submitted by Elineudson Ribeiro (elineudsonr@gmail.com) on 2016-07-11T15:34:43Z
No. of bitstreams: 1
2007_dis_frcsousa.pdf: 4193448 bytes, checksum: 9f20d4c36e05e635c6fe3b3114e2228c (MD5) / Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2016-07-15T15:37:49Z (GMT) No. of bitstreams: 1
2007_dis_frcsousa.pdf: 4193448 bytes, checksum: 9f20d4c36e05e635c6fe3b3114e2228c (MD5) / Made available in DSpace on 2016-07-15T15:37:49Z (GMT). No. of bitstreams: 1
2007_dis_frcsousa.pdf: 4193448 bytes, checksum: 9f20d4c36e05e635c6fe3b3114e2228c (MD5)
Previous issue date: 2007 / XML has become a widely used standard for data representation and exchange in applications. The growing usage of XML creates a need for e cient storage and recovery systems for XML data. Native XML DBs (NXDBs) are being developed to target this demand. NXDBs implement many characteristics that are common to traditional DBs, such as storage, indexing, query processing, transactions and replication. Most existing solutions solve the replication issue through traditional techniques. However, the exibility of XML data imposes new challenges, so new replication techniques ought to be developed. To improve the performance and availability of NXDBs, this thesis proposes RepliX, a mechanism for XML data replication that takes into account the main characteristics of this data type, making it possible to reduce the response time in query processing and improving the fault-tolerance property of such systems. Although there are several replication protocols, using the group communication abstraction for communication and fault detection has proven to be a good solution, since this abstraction provides e cient message exchanging techniques and con ability guarantees. RepliX uses this strategy, organizing the sites into an update group and a read-only group in such a way that allows for the use of load balancing among the sites, and makes the system less susceptible to faults, since there is no single point of failure in each group. In order to evaluate RepliX, a new replication layer was implemented on top of an existing NXDB to introduce the characteristics of the proposed mechanism. Several experiments using this layer were conducted, and their results con rm the mechanism's e ciency considering the di erent aspects of a replicated database, improving its performance considerably, as well as its availability. / XML tem se tornado um padrão amplamente utilizado na representação e troca de dados em aplicações. Devido a essa crescente utilização do XML, torna-se necessária a existência de sistemas eficientes de armazenamento e recuperação de dados XML. Estão sendo desenvolvidos para este fim Bancos de Dados XML Nativos (BDXNs). Estes bancos implementam muitas das características presentes em Bancos de Dados tradicionais, tais como armazenamento, indexação, processamento de consultas, transações e replicação. Tratando-se especificamente de replicação, a maioria das soluções existentes resolve essa questão apenas utilizando técnicas tradicionais. Todavia, a exibilidade dos dados XML impõe novos desafios, de modo que novas técnicas de replicação devem ser desenvolvidas. Para melhorar o desempenho e a disponibilidade dos BDXNs, esta dissertação propõe o RepliX, um mecanismo para replicação de dados XML que considera as principais características desses dados. Dessa forma, é possível melhorar o tempo de resposta no processamento de consultas e tornar esses sistemas mais tolerantes a falhas. Dentre vários tipos de protocolos de replicação, a utilização da abstração de comunicação em grupos como estratégia de comunicação e detecção de falhas mostrase uma solução eficaz, visto que essa abstração possui técnicas eficientes para troca de mensagens e provê garantias de confiabilidade. Essa estratégia é utilizada no RepliX, que organiza os sites em dois grupos: de atualização e de leitura, permitindo assim balanceamento de carga entre os sites, além de tornar o sistema menos sensível a falhas, já que não há um ponto de falha único em cada grupo. Para validar o RepliX, uma nova camada de replicação foi implementada em um BDXN, a _m de introduzir as características e os comportamentos descritos no mecanismo proposto. Experimentos foram feitos usando essa camada e os resultados obtidos atestam a sua eficácia considerando diferentes aspectos de um banco de dados replicado, melhorando o desempenho desses banco de dados consideravelmente bem como sua disponibilidade.
|
8 |
An XML-based Database of Molecular Pathways / En XML-baserad databas för molekylära reaktionerHall, David January 2005 (has links)
Research of protein-protein interactions produce vast quantities of data and there exists a large number of databases with data from this research. Many of these databases offers the data for download on the web in a number of different formats, many of them XML-based. With the arrival of these XML-based formats, and especially the standardized formats such as PSI-MI, SBML and BioPAX, there is a need for searching in data represented in XML. We wanted to investigate the capabilities of XML query tools when it comes to searching in this data. Due to the large datasets we concentrated on native XML database systems that in addition to search in XML data also offers storage and indexing specially suited for XML documents. A number of queries were tested on data exported from the databases IntAct and Reactome using the XQuery language. There were both simple and advanced queries performed. The simpler queries consisted of queries such as listing information on a specified protein or counting the number of reactions. One central issue with protein-protein interactions is to find pathways, i.e. series of interconnected chemical reactions between proteins. This problem involve graph searches and since we suspected that the complex queries it required would be slow we also developed a C++ program using a graph toolkit. The simpler queries were performed relatively fast. Pathway searches in the native XML databases took long time even for short searches while the C++ program achieved much faster pathway searches.
|
9 |
RepliX: Um mecanismo para a replicaÃÃo de dados XML / RepliX: a mechanism for XML data replicationFlÃvio Rubens de Carvalho Sousa 09 March 2007 (has links)
Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico / XML has become a widely used standard for data representation and exchange in applications.
The growing usage of XML creates a need for ecient storage and recovery
systems for XML data. Native XML DBs (NXDBs) are being developed to target this
demand. NXDBs implement many characteristics that are common to traditional DBs,
such as storage, indexing, query processing, transactions and replication.
Most existing solutions solve the replication issue through traditional techniques.
However, the
exibility of XML data imposes new challenges, so new replication techniques
ought to be developed. To improve the performance and availability of NXDBs,
this thesis proposes RepliX, a mechanism for XML data replication that takes into account
the main characteristics of this data type, making it possible to reduce the response
time in query processing and improving the fault-tolerance property of such systems.
Although there are several replication protocols, using the group communication
abstraction for communication and fault detection has proven to be a good solution,
since this abstraction provides ecient message exchanging techniques and conability
guarantees. RepliX uses this strategy, organizing the sites into an update group and a
read-only group in such a way that allows for the use of load balancing among the sites,
and makes the system less susceptible to faults, since there is no single point of failure in
each group.
In order to evaluate RepliX, a new replication layer was implemented on top of
an existing NXDB to introduce the characteristics of the proposed mechanism. Several
experiments using this layer were conducted, and their results conrm the mechanism's
eciency considering the dierent aspects of a replicated database, improving its performance
considerably, as well as its availability. / XML tem se tornado um padrÃo amplamente utilizado na representaÃÃo e troca de dados em aplicaÃÃes. Devido a essa crescente utilizaÃÃo do XML, torna-se necessÃria a existÃncia de sistemas eficientes de armazenamento e recuperaÃÃo de dados XML. EstÃo sendo desenvolvidos para este fim Bancos de Dados XML Nativos (BDXNs). Estes bancos implementam muitas das caracterÃsticas presentes em Bancos de Dados tradicionais, tais como armazenamento, indexaÃÃo, processamento de consultas, transaÃÃes e replicaÃÃo. Tratando-se especificamente de replicaÃÃo, a maioria das soluÃÃes existentes resolve essa questÃo apenas utilizando tÃcnicas tradicionais. Todavia, a exibilidade dos dados XML impÃe novos desafios, de modo que novas tÃcnicas de replicaÃÃo devem ser desenvolvidas. Para melhorar o desempenho e a disponibilidade dos BDXNs, esta dissertaÃÃo propÃe o RepliX, um mecanismo para replicaÃÃo de dados XML que considera as principais caracterÃsticas desses dados. Dessa forma, à possÃvel melhorar o tempo de resposta no processamento de consultas e tornar esses sistemas mais tolerantes a falhas. Dentre vÃrios tipos de protocolos de replicaÃÃo, a utilizaÃÃo da abstraÃÃo de comunicaÃÃo em grupos como estratÃgia de comunicaÃÃo e detecÃÃo de falhas mostrase uma soluÃÃo eficaz, visto que essa abstraÃÃo possui tÃcnicas eficientes para troca de mensagens e provà garantias de confiabilidade. Essa estratÃgia à utilizada no RepliX, que organiza os sites em dois grupos: de atualizaÃÃo e de leitura, permitindo assim balanceamento de carga entre os sites, alÃm de tornar o sistema menos sensÃvel a falhas, jà que nÃo hà um ponto de falha Ãnico em cada grupo. Para validar o RepliX, uma nova camada de replicaÃÃo foi implementada em um BDXN, a _m de introduzir as caracterÃsticas e os comportamentos descritos no mecanismo proposto. Experimentos foram feitos usando essa camada e os resultados obtidos atestam a sua eficÃcia considerando diferentes aspectos de um banco de dados replicado, melhorando o desempenho desses banco de dados consideravelmente bem como sua disponibilidade.
|
10 |
Gerenciamento de anotações de biosseqüências utilizando associações entre ontologias e esquemas XMLTeixeira, Marcus Vinícius Carneiro 26 May 2008 (has links)
Made available in DSpace on 2016-06-02T19:05:31Z (GMT). No. of bitstreams: 1
2080.pdf: 1369419 bytes, checksum: 4100f6c7c0400bc50f4f2f9a28621613 (MD5)
Previous issue date: 2008-05-26 / Universidade Federal de Sao Carlos / Bioinformatics aims at providing computational tools to the development of genome researches. Among those tools are the annotations systems and the Database Management Systems (DBMS) that, associated to ontologies, allow the formalization of both domain conceptual and the data scheme. The data yielded by genome researches are often textual and with no regular structures and also requires scheme evolution. Due to these aspects, semi-structured DBMS might offer great potential to manipulate those data. Thus, this work presents architecture for biosequence annotation based on XML databases. Considering this architecture, a special attention was given to the database design and also to the manual annotation task performed by researchers. Hence, this architecture presents an interface that uses an ontology-driven model for XML schemas modeling and generation, and also a manual annotation interface prototype that uses molecular biology domain ontologies, such as Gene Ontology and Sequence Ontology. These interfaces were proven by Bioinformatics and Database experienced users, who answered questionnaires to evaluate them. The answers presented good assessments to issues like utility and speeding up the database design. The proposed architecture aims at extending and improving the Bio-TIM, an annotation system developed by the Database Group from the Computer Science Department of the Federal University from São Carlos (UFSCar). / A Bioinformática é uma área da ciência que visa suprir pesquisas de genomas com ferramentas computacionais que permitam o seu desenvolvimento tecnológico. Dentre essas ferramentas estão os ambientes de anotação e os Sistemas
Gerenciadores de Bancos de Dados (SGBDs) que, associados a ontologias, permitem a formalização de conceitos do domínio e também dos esquemas de dados. Os dados produzidos em projetos genoma são geralmente textuais e sem uma estrutura de tipo regular, além de requerer evolução de esquemas. Por suas características, SGBDs semi-estruturados oferecem enorme potencial para tratar tais dados. Assim, este
trabalho propõe uma arquitetura para um ambiente de anotação de biosseqüências baseada na persistência dos dados anotados em bancos de dados XML. Neste trabalho, priorizou-se o projeto de bancos de dados e também o apoio à anotação manual realizada por pesquisadores. Assim, foi desenvolvida uma interface que utiliza ontologias para guiar a modelagem de dados e a geração de esquemas XML. Adicionalmente, um protótipo de interface de anotação manual foi desenvolvido, o qual faz uso de ontologias do domínio de biologia molecular, como a Gene Ontology e a Sequence Ontology. Essas interfaces foram testadas por usuários com experiências nas áreas de Bioinformática e Banco de Dados, os quais responderam a questionários para avaliá-las. O resultado apresentou qualificações muito boas em
diversos quesitos avaliados, como exemplo agilidade e utilidade das ferramentas. A arquitetura proposta visa estender e aperfeiçoar o ambiente de anotação Bio-TIM,
desenvolvido pelo grupo de Banco de Dados do Departamento de Computação da Universidade Federal de São Carlos (UFSCar).
|
Page generated in 0.0542 seconds