• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 90
  • 41
  • 12
  • 9
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 183
  • 183
  • 42
  • 31
  • 30
  • 29
  • 28
  • 25
  • 25
  • 22
  • 22
  • 21
  • 20
  • 20
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Diseño de propuesta de arquitectura de gestión de datos científicos para el Centro de Investigación Web Intelligence Centre

Maldonado Bravo, Felipe Alejandro January 2016 (has links)
Ingeniero Civil Industrial / El presente trabajo de título busca diseñar una propuesta de arquitectura de gestión de datos de investigación enfocada en el acceso abierto a los datos para el Web Intelligence Centre (WIC, centro perteneciente a la Facultad de Ciencias Físicas y Matemáticas (FCFM) de la Universidad de Chile), con el fin de prevenir la falta de disponibilidad de datos generados en el centro para trabajos futuros y así evitar el tener que llevar a cabo los experimentos por segunda vez. El primer pilar de la arquitectura son las políticas de gestión de datos científicos. Para su diseño se utilizó una serie de pasos ofrecida por el Data Curation Centre (UK). El primer paso fue realizar una evaluación del estado actual del WIC respecto a la gestión de datos, posterior a ello se efectuó una revisión y comparación de las recomendaciones sobre los contenidos de las políticas de un número de entidades que ha desarrollado la materia en los últimos años. De este ejercicio se extrajeron los contenidos mínimos que debe contener este documento. A continuación, se analizó quiénes serán los encargados de aprobar la propuesta para poder realizar una versión borrador del documento sujeto a cambios, los cuales se vieron reflejados en el diseño de la versión final. El segundo pilar de la arquitectura es la infraestructura tecnológica (TI) que soportará la gestión de los datos. Para esto se realizó un levantamiento de requerimientos con el fin de realizar un benchmark de los repositorios y software especializados en la creación de repositorios científicos existentes. De esta fase se consideró a la plataforma DSpace como la mejor solución para diseñar y crear finalmente la propuesta de repositorio de datos del centro. El tercer y último pilar tiene relación al soporte necesario para realizar la gestión de datos en el centro, para esto se diseñaron guías, documentos y plantillas de documentos. Además se esquematizaron los procesos principales que se desprenden de las políticas diseñadas. Después se llevó a cabo un proceso de validación de la arquitectura creada respecto a los términos técnicos y requisitos para poder ejecutar la gestión buscada. Donde se concluye que la propuesta permitirá realizar una gestión de datos científicos en el Web Intelligence Centre y con esto traer beneficios, que a pesar de no ser directamente cuantificables en contraste a los costos, se verán reflejados en el corto y largo plazo tanto en el funcionamiento interno del centro como en la investigación que se realizará en el futuro, los cuales ayudarán a cumplir los objetivos del centro. También se presentaron propuestas de mejoras a la arquitectura desprendidas de la validación realizada. Por último, se realizó una propuesta relacionada a la gestión del cambio que se deberá llevar a cabo en el centro con el fin de implementar la arquitectura propuesta de manera correcta para poder generar finalmente el valor que se busca.
72

A framework for adoption of data warehouse in a South African government department.

Kademeteme, Edzai. January 2015 (has links)
M. Tech. Business Information Systems / Data underpins the smooth operations and strategic decision-making of the day-to-day operation of an organization’s business processes. With the increasing automation of an organization's activities, large volumes of data are generated faster than they may be consumed and digested. The Department of Rural Development and Land Reform is faced with such a generation of large volumes of data. The department could have benefited from data warehousing in which data is stored within a single repository. However, there is no framework that could inform the adoption of Data Warehousing by a South African government department. This research study therefore intended to design a framework for the adoption of Data Warehousing by a South African government department.
73

Texas Principals’ Data Use: Its Relationship to Leadership Style and Student Achievement

Bostic, Robert E. 05 1900 (has links)
This study applies an empirical research method determine whether Texas public school principals’ leadership styles, coupled with their use of real time data in a data warehouse, influenced their leadership ability as measured by student achievement. In today’s world of data rich environments that require campuses and districts to make data-driven decisions, principals find themselves having to organize and categorize data to help their school boards, campuses, and citizenry make informed decisions. Most school principals in Texas have access to data in multiple forms including national and state resources and a multitude of other data reports. A random sample of principals was selected to take the Multi Factor Leadership Questionnaire (MLQ5x) and the Principals Data Use Survey. The MLQ5x measured principals’ leadership styles as transformational, transactional, or passive avoidant. The Principals Data Use Survey measured how principals use data to inform campus decisions on student achievement, shaping the vision of the campus, and designing professional development. Data obtained from the survey were correlated to determine the relationship between principals’ use of data warehouses and their leadership styles on student achievement as measured by the Texas Assessment of Knowledge and Skills. The results yielded significant relationships between student achievement, principals’ leadership styles, and the principals’ data use with a data warehouse. Student achievement scores were highly correlated with the campuses that participated in the study and provided limited differences between those with data warehouses and those without data warehouses.
74

Faktory ovlivňující profitabilitu zákazníka: empirický výzkum v nesmluvním prostředí / Factors influencing customer profitability: an empirical examination in noncontractual settings

Hanuska, Norbert January 2014 (has links)
Understanding of how to manage relationships with customers has become an important topic for both academic and practitioners in recent years. The effectiveness of business can be greatly improved by identifying the drivers of the most profitable customers and using them to target the right customers. In this study we identify exchange characteristics such as amount of money spent per purchase, customer relationship duration with firm, ratio of cross-buying and demographic characteristics such as age and gender as important drivers of the most profitable customers. The results of the study have important implications for academicians in understanding what drives the most profitable customers in noncotractual settings as well as practitioners to help design more effective marketing strategies. Moreover, the results of knowledge discovery about customers by different data mining techniques also contribute to help researchers identifying feasibility of these methods. Powered by TCPDF (www.tcpdf.org)
75

Efficient Incremental View Maintenance for Data Warehousing

Chen, Songting 20 December 2005 (has links)
"Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality."
76

AGILE BUSINESS INTELLIGENCE DEVELOPMENT CORE PRACTICES

Devarapalli, Surendra January 2013 (has links)
Today we are in an age of Information. The systems that effectively use the vast amount of data available all over the world and provide meaningful insight (i.e. BI systems) for the people who need it are of critical importance. The development of such systems has always been a challenge as the development is outweighed by change. The methodologies that are devised for coping with the constant change during the system development are agile methodologies. So practitioners and researchers are showing keen interest to use agile strategies for the BI projects development.The research aims to find out how well the agile strategies suit for the development of BI projects. The research considers a case study in a very big organization as BI is organization centric. There by assessing the empirical results that are collected from interviews the author is trying to generalize the results. The results for the research will give an insight of the best practices that can be considered while considering agile strategies and also the practical problems that we may encounter on the journey. The findings have implications for both business and technical managers who want to consider agile strategies for the BI/DW development projects. / Program: Masterutbildning i Informatik
77

Design Space Exploration of Accelerators for Warehouse Scale Computing

Lottarini, Andrea January 2019 (has links)
With Moore’s law grinding to a halt, accelerators are one of the ways that new silicon can improve performance, and they are already a key component in modern datacenters. Accelerators are integrated circuits that implement parts of an application with the objective of higher energy efficiency compared to execution on a standard general purpose CPU. Many accelerators can target any particular workload, generally with a wide range of performance, and costs such as area or power. Exploring these design choices, called Design Space Exploration (DSE), is a crucial step in trying to find the most efficient accelerator design, the one that produces the largest reduction of the total cost of ownership. This work aims to improve this design space exploration phase for accelerators and to avoid pitfalls in the process. This dissertation supports the thesis that early design choices – including the level of specialization – are critical for accelerator development and therefore require benchmarks reflective of production workloads. We present three studies that support this thesis. First, we show how to benchmark datacenter applications by creating a benchmark for large video sharing infrastructures. Then, we present two studies focused on accelerators for analytical query processing. The first is an analysis on the impact of Network on Chip specialization while the second analyses the impact of the level of specialization. The first part of this dissertation introduces vbench: a video transcoding benchmark tailored to the growing video-as-a-service market. Video transcoding is not accurately represented in current computer architecture benchmarks such as SPEC or PARSEC. Despite posing a big computational burden for cloud video providers, such as YouTube and Facebook, it is not included in cloud benchmarks such as CloudSuite. Using vbench, we found that the microarchitectural profile of video transcoding is highly dependent on the input video, that SIMD extensions provide limited benefits, and that commercial hardware transcoders impose tradeoffs that are not ideal for cloud video providers. Our benchmark should spur architectural innovations for this critical workload. This work shows how to benchmark a real world warehouse scale application and the possible pitfalls in case of a mischaracterization. When considering accelerators for the different, but no less important, application of analytical query processing, design space exploration plays a critical role. We analyzed the Q100, a class of accelerators for this application domain, using TPC-H as the reference benchmark. We found that the hardware computational blocks have to be tailored to the requirements of the application, but also the Network on Chip (NoC) can be specialized. We developed an algorithm capable of producing more effective Q100 designs by tailoring the NoC to the communication requirements of the system. Our algorithm is capable of producing designs that are Pareto optimal compared to standard NoC topologies. This shows how NoC specialization is highly effective for accelerators and it should be an integral part of design space exploration for large accelerators’ designs. The third part of this dissertation analyzes the impact of the level of specialization, e.g. using an ASIC or Coarse Grain Reconfigurable Architecture (CGRA) implementation, on an accelerator performance. We developed a CGRA architecture capable of executing SQL query plans. We compare this architecture against Q100, an ASIC that targets the same class of workloads. Despite being less specialized, this programmable architecture shows comparable performance to the Q100 given an area and power budget. Resource usage explains this counterintuitive result, since a well programmed, homogeneous array of resources is able to more effectively harness silicon for the workload at hand. This suggests that a balanced accelerator research portfolio must include alternative programmable architectures – and their software stacks.
78

NoSQL: a análise da modelagem e consistência dos dados na era do Big Data

Rodrigues, Wagner Braz 19 October 2017 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2017-11-14T11:11:11Z No. of bitstreams: 1 Wagner Braz Rodrigues.pdf: 1280673 bytes, checksum: 018f4fcf8df340ef7175b709b9d870b7 (MD5) / Made available in DSpace on 2017-11-14T11:11:12Z (GMT). No. of bitstreams: 1 Wagner Braz Rodrigues.pdf: 1280673 bytes, checksum: 018f4fcf8df340ef7175b709b9d870b7 (MD5) Previous issue date: 2017-10-19 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The new storage models, known as NoSQL, arise to solve current data issues, defined by the properties volume, velocity and variety (3 V’s) established in the Big Data concept. These new storage models develop with the support of distributed computing and horizontal scalability, which allows the processing of the big amount of data necessary to the Big Data 3 V’s. In this thesis was used as theoretical framework the relational model, introducing its solutions and troubles. The relational model allowed the use of structures in secondary memory in a persistent way. Its modeling establishes rules to the creation of a solid data model, using mathematics concepts and tangible representation to the human interpretation. The properties defined by the transactional model ACID, implemented in the relational SGBDs brings assurance consistency of the storaged data. The use of the relational model distanced the transient structures in primary memory, used in execution time by software applications and those persisted in secondary memory, an effect known as impedance mismatch. The new models presented by the categories of the NoSQL, bring transient structures previously used in primary memory. The use of distributed computing presents the possibility of the transaction and storage of the data for several computers, known as nodes, present in clusters. Distributed computing increases availability and decreases the likelihood of system failures. However, its use brings inconsistency to the data, according to the properties defined by the CAP Theorem (FOX; BREWER, 1999). This study was carried out on behalf of a bibliographic review, analyzing primarily the needs, which led to the relational model creation. Later, we establish the state of the theoretical and techniques art that involves the NoSQL and the distributed data processing system, just as the different categories introduced by it. An adequate tool were chosen and analyzed from each NoSQL category, for the proper understanding about your structure, metadata and operations. Aside from establish the state of art regarding NoSQL, we demonstrate how the transient and persistent data structures rapprochement becomes possible due to the current machine advances, such as the possibilities to the consistency effect processing, outlined by CAP Theorem / Os novos modelos de armazenamento de dados, conhecidos como NoSQL (Not Only SQL), surgem para solucionar as problemáticas de dados atuais, definidas pelas propriedades volume, velocidade e variedade (3 V’s) presentes no conceito do Big Data. Esses novos modelos de armazenamento se desenvolvem com o suporte da computação distribuída e “escalabilidade horizontal”, o que possibilita o tratamento do grande volume de dados necessários para os V’s do Big Data. Nesta dissertação é utilizado como referencial teórico o modelo relacional, apresentando suas soluções e problemas. O modelo relacional possibilitou a persistência de estruturas de dados, em memória secundária não volátil. Sua modelagem estabelece regras para a criação de um modelo de dados fundamentado, utilizando conceitos de lógica formal e representação compreensível à interpretação humana. As propriedades definidas pelo modelo transacional ACID (Atomicity, Consistency, Isolation, Durability), utilizado em SGBDs (Sistema Gerenciador de Bando de Dados) relacionais, garantem que os dados transacionados serão “persistidos” de maneira consistente na base de dados. O emprego do modelo relacional distanciou as estruturas transientes em memória primária, utilizadas em tempo de execução por aplicações de software e as persistidas em memória secundária, efeito conhecido como “incompatibilidade de impedância”. Os novos modelos apresentados pelas categorias apresentadas no NoSQL trazem estruturas transientes anteriormente utilizadas em memória primária. Contudo, abrem mão da forte estruturação, apresentada pelo modelo relacional. A utilização da computação distribuída apresenta a possibilidade da realização de transações e armazenamento dos dados para vários computadores, conhecidos como nós, presentes em cluster. Esse conceito conhecido como tolerância a partição, aumenta a disponibilidade e diminui a possibilidade de falhas em um sistema. No entanto, sua utilização, traz inconsistência aos dados, conforme as propriedades definidas pelo Teorema CAP (FOX; BREWER, 1999). Este trabalho foi realizado através de revisão bibliográfica, analisando primeiramente as necessidades que levaram à criação do modelo relacional. Posteriormente, estabelecemos o estado da arte das teorias e técnicas que envolvem o NoSQL e o tratamento de dados em sistemas distribuídos, bem como as diferentes categorias apresentadas por ele. Foram escolhidas e analisadas uma ferramenta pertencente a cada categoria de NoSQL para o entendimento de duas estruturas, metamodelos e operações. Além de estabelecer o estado da arte referente ao NoSQL, demonstramos como a reaproximação das estruturas transientes e persistentes se torna possível dado os avanços de máquina atuais, que possibilitaram avanços computacionais, assim como as possibilidades para o tratamento dos efeitos na consistência, demonstrados pelo Teorema CAP
79

Automating the multidimensional design of data warehouses

Romero Moral, Oscar 09 February 2010 (has links)
Les experiències prèvies en l'àmbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament híbrid; això és, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procés de disseny.Com a qualsevol altre sistema, els requeriments són necessaris per garantir que el sistema desenvolupat satisfà les necessitats de l'usuari. A més, essent aquest un procés de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot ésser poblat amb dades de l'organització, i, a més, (ii) descobrir capacitats d'anàlisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procés de modelatge del magatzem de dades. No obstant això, les propostes basades en un anàlisi dels requeriments assumeixen que aquestos són exhaustius, i no consideren que pot haver-hi informació rellevant amagada a les fonts de dades. Contràriament, les propostes basades en un anàlisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqüentment, generen massa resultats. En aquest escenari, l'automatització del disseny del magatzem de dades és essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar únicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A més, l'automatització de la tasca allibera al dissenyador del sempre complex i costós anàlisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anàlisi dels requeriments no consideren l'automatització del procés, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situació es dona en els mètodes híbrids actual, que proposen un enfocament seqüencial, on l'anàlisi de les dades es complementa amb l'anàlisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a més, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clàssic, en el que els requeriments d'usuari són coneguts d'avantmà. Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procés des dels requeriments i, conseqüentment, és capaç de treballar sobre fonts de dades semànticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semànticament riques. Per aquest motiu, dirigeix el procés de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semànticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmà.Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament és més adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments són ben coneguts d'avantmà i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situació és quan l'usuari no té clares les capacitats d'anàlisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmà esmorteeix la necessitat de disposar de fonts de dades semànticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no són necessaris d'avantmà. Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades. / Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literature
80

Wrapping XML-Sources to Support Update Awareness

Thuresson, Marcus January 2000 (has links)
<p>Data warehousing is a generally accepted method of providing corporate decision support. Today, the majority of information in these warehouses originates from sources within a company, although changes often occur from the outside. Companies need to look outside their enterprises for valuable information, increasing their knowledge of customers, suppliers, competitors etc.</p><p>The largest and most frequently accessed information source today is the Web, which holds more and more useful business information. Today, the Web primarily relies on HTML, making mechanical extraction of information a difficult task. In the near future, XML is expected to replace HTML as the language of the Web, bringing more structure and content focus.</p><p>One problem when considering XML-sources in a data warehouse context is their lack of update awareness capabilities, which restricts eligible data warehouse maintenance policies. In this work, we wrap XML-sources in order to provide update awareness capabilities.</p><p>We have implemented a wrapper prototype that provides update awareness capabilities for autonomous XML-sources, especially change awareness, change activeness, and delta awareness. The prototype wrapper complies with recommendations and working drafts proposed by W3C, thereby being compliant with most off-the-shelf XML tools. In particular, change information produced by the wrapper is based on methods defined by the DOM, implying that any DOM-compliant software, including most off-the-shelf XML processing tools, can be used to incorporate identified changes in a source into an older version of it.</p><p>For the delta awareness capability we have investigated the possibility of using change detection algorithms proposed for semi-structured data. We have identified similarities and differences between XML and semi-structured data, which affect delta awareness for XML-sources. As a result of this effort, we propose an algorithm for change detection in XML-sources. We also propose matching criteria for XML-documents, to which the documents have to conform to be subject to change awareness extension.</p>

Page generated in 0.0797 seconds