31 |
Adaptive website recommentations with AWESOMEThor, Andreas, Golovin, Nick, Rahm, Erhard 16 October 2018 (has links)
Recommendations are crucial for the success of large websites. While there are many ways to determine recommendations, the relative quality of these recommenders depends on many factors and is largely unknown. We present the architecture and implementation of AWESOME (Adaptive website recommendations), a data warehouse-based recommendation system. It allows the coordinated use of a large number of recommenders to automatically generate website recommendations. Recommendations are dynamically selected by efficient rule-based approaches utilizing continuously measured user feedback on presented recommendations. AWESOME supports a completely automatic generation and optimization of selection rules to minimize website administration overhead and quickly adapt to changing situations. We propose a classification of recommenders and use AWESOME to comparatively evaluate the relative quality of several recommenders for a sample website. Furthermore, we propose and evaluate several rule-based schemes for dynamically selecting the most promising recommendations. In particular, we investigate two-step selection approaches that first determine the most promising recommenders and then apply their recommendations for the current situation. We also evaluate one-step schemes that try to directly determine the most promising recommendations.
|
32 |
L'interrogation du web de données garantissant des réponses valides par rapport à des critères donnés / Querying the Web of Data guaranteeing valid answers with respect to given criteriaNguyen, Thanh Binh 03 December 2018 (has links)
Le terme Linked Open Data (LOD) (ou données ouvertes liées) a été introduit pour la première fois par Tim Berners-Lee en 2006. Depuis, les LOD ont connu une importante évolution. Aujourd’hui,nous pouvons constater les milliers de jeux de données présents sur le Web de données. De ce fait, la communauté de recherche s’est confrontée à un certain nombre de défis concernant la récupération et le traitement de données liées.Dans cette thèse, nous nous intéressons au problème de la qualité des données extraites de diverses sources du LOD et nous proposons un système d’interrogation contextuelle qui garantit la qualité des réponses par rapport à un contexte spécifié par l’utilisateur. Nous définissons un cadre d’expression de contraintes et proposons deux approches : l’une naïve et l’autre de réécriture, permettant de filtrer dynamiquement les réponses valides obtenues à partir des sources éventuellement non-valides, ceci au moment de la requête et non pas en cherchant à les valider dans les sources des données. L’approche naïve exécute le processus de validation en générant et en évaluant des sous-requêtes pour chaque réponse candidate en fonction de chaque contrainte. Alors que l’approche de réécriture utilise les contraintes comme des règles de réécriture pour reformuler la requête en un ensemble de requêtes auxiliaires, de sorte que les réponses à ces requêtes réécrites ne sont pas seulement les réponses de la requête initiale mais aussi des réponses valides par rapport à toutes les contraintes intégrées. La preuve de la correction et de la complétude de notre système de réécriture est présentée après un travail de formalisation de la notion de réponse valide par rapport à un contexte. Ces deux approches ont été évaluées et ont montré la praticabilité de notre système.Ceci est notre principale contribution: nous étendons l’ensemble de systèmes de réécriture déjà connus(Chase, C&BC, PerfectRef, Xrewrite, etc.) avec une nouvelle solution efficace pour ce nouveau défi qu’est le filtrage des résultats en fonction d’un contexte utilisateur. Nous généralisons également les conditions de déclenchement de contraintes par rapport aux solutions existantes, en utilisant la notion de one-way MGU. / The term Linked Open Data (LOD) is proposed the first time by Tim Berners-Lee since 2006.Since then, LOD has evolved impressively with thousands datasets on the Web of Data, which has raised a number of challenges for the research community to retrieve and to process LOD.In this thesis, we focus on the problem of quality of retrieved data from various sources of the LOD and we propose a context-driven querying system that guarantees the quality of answers with respect to the quality context defined by users. We define a fragment of constraints and propose two approaches: the naive and the rewriting, which allows us to filter dynamically valid answers at the query time instead of validating them at the data source level. The naive approach performs the validation process by generating and evaluating sub-queries for each candidate answer w.r.t. each constraint. While the rewriting approach uses constraints as rewriting rules to reformulate query into a set of auxiliary queries such that the answers of rewritten-queries are not only the answers of the query but also valid answers w.r.t. all integrated constraints. The proof of the correction and completeness of our rewriting system is presented after formalizing the notion of a valid answers w.r.t. a context. These two approaches have been evaluated and have shown the feasibility of our system.This is our main contribution: we extend the set of well-known query-rewriting systems (Chase, Chase& backchase, PerfectRef, Xrewrite, etc.) with a new effective solution for the new purpose of filtering query results based on constraints in user context. Moreover, we also enlarge the trigger condition of the constraint compared with other works by using the notion of one-way MGU.
|
33 |
Automated Extraction of Data from Insurance Websites / Automatiserad Datautvinning från FörsäkringssidorHodzic, Amar January 2022 (has links)
Websites have become a critical source of information for many organizations in today's digital era. However, extracting and organizing semi-structured data from web pages from multiple websites poses challenges. This is especially true when a high level of automation is desired while maintaining generality. A natural progression in the quest for automation is to extend the methods for web data extraction from only being able to handle a single website to handling multiple ones, usually within the same domain. Although these websites share the same domain, the structure of the data can vary greatly. A key question becomes how generalized such a system can be to encompass a large number of websites while maintaining adequate accuracy. The thesis examined the efficiency of automated web data extraction on multiple Swedish insurance company websites. Previous work showed that good results can be achieved with a known English data set that contains web pages from a number of domains. The state-of-the-art model MarkupLM was chosen and trained with supervised learning using two pre-trained models, a Swedish and an English model, on a labeled training set of car insurance customers' web data using zero-shot learning. The results show that such a model can achieve good accuracy on a domain scale with Swedish as the source language with a relatively small data set by leveraging pre-trained models. / Webbsidor har blivit en kritisk källa av information för många organisationer idag. Men att extrahera och strukturera semistrukturerade data från webbsidor från flertal webbplatser är en utmaning. Speciellt när det är önskvärt med en hög nivå av automatisering i kombination med en generaliserbar lösning. En naturlig utveckling i målat av automation är att utöka metoderna för datautvinning från att endast kunna hantera en specifik webbplats till flertal webbplatser inom samma domän. Men även om dessa webbplatser delar samma domän så kan strukturen på data variera i stor utsträckning. En nyckelfråga blir då hur pass generell en sådan lösning kan vara samtidigt som en adekvat prestanda uppehålls. Detta arbete undersöker prestandan av automatiserad datautvinning från ett flertal svenska försäkringssidor. Tidigare arbete visar på att goda resultat kan uppnås på ett känt engelskt dataset som innehåller webbsidor från ett flertal domän. Den toppmoderna modellen MarkupLM valdes och blev tränad med två olika förtränade modeller, en svensk och en engelsk modell, med märkt data från konsumenters bilförsäkringsdata. Modellen blev utvärderad på data från webbplatser som inte ingick i träningsdatat. Resultaten visar på att en sådan modell kan nå god prestanda på domänskala när innehållsspråket är svenska trots en relativt liten datamängd när förtränade modeller används.
|
34 |
Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis ApproachBlázquez Soriano, María Desamparados 15 January 2020 (has links)
Tesis por compendio / [ES] En la Era Digital, el creciente uso de Internet y de dispositivos digitales está transformando completamente la forma de interactuar en el contexto económico y social. Miles de personas, empresas y organismos públicos utilizan Internet en sus actividades diarias, generando de este modo una enorme cantidad de datos actualizados ("Big Data") accesibles principalmente a través de la World Wide Web (WWW), que se ha convertido en el mayor repositorio de información del mundo. Estas huellas digitales se pueden rastrear y, si se procesan y analizan de manera apropiada, podrían ayudar a monitorizar en tiempo real una infinidad de variables económicas.
En este contexto, el objetivo principal de esta tesis doctoral es generar indicadores económicos, basados en datos web, que sean capaces de proveer regularmente de predicciones a corto plazo ("nowcasting") sobre varias actividades empresariales que son fundamentales para el crecimiento y desarrollo de las economías. Concretamente, tres indicadores económicos basados en la web han sido diseñados y evaluados: en primer lugar, un indicador de orientación exportadora, basado en un modelo que predice si una empresa es exportadora; en segundo lugar, un indicador de adopción de comercio electrónico, basado en un modelo que predice si una empresa ofrece la posibilidad de venta online; y en tercer lugar, un indicador de supervivencia empresarial, basado en dos modelos que indican la probabilidad de supervivencia de una empresa y su tasa de riesgo. Para crear estos indicadores, se han descargado una diversidad de datos de sitios web corporativos de forma manual y automática, que posteriormente se han procesado y analizado con técnicas de análisis Big Data.
Los resultados muestran que los datos web seleccionados están altamente relacionados con las variables económicas objeto de estudio, y que los indicadores basados en la web que se han diseñado en esta tesis capturan en un alto grado los valores reales de dichas variables económicas, siendo por tanto válidos para su uso por parte del mundo académico, de las empresas y de los decisores políticos. Además, la naturaleza online y digital de los indicadores basados en la web hace posible proveer regularmente y de forma barata de predicciones a corto plazo. Así, estos indicadores son ventajosos con respecto a los indicadores tradicionales.
Esta tesis doctoral ha contribuido a generar conocimiento sobre la viabilidad de producir indicadores económicos con datos online procedentes de sitios web corporativos. Los indicadores que se han diseñado pretenden contribuir a la modernización en la producción de estadísticas oficiales, así como ayudar a los decisores políticos y los gerentes de empresas a tomar decisiones informadas más rápidamente. / [CA] A l'Era Digital, el creixent ús d'Internet i dels dispositius digitals està transformant completament la forma d'interactuar al context econòmic i social. Milers de persones, empreses i organismes públics utilitzen Internet a les seues activitats diàries, generant d'aquesta forma una enorme quantitat de dades actualitzades ("Big Data") accessibles principalment mitjançant la World Wide Web (WWW), que s'ha convertit en el major repositori d'informació del món. Aquestes empremtes digitals poden rastrejar-se i, si se processen i analitzen de forma apropiada, podrien ajudar a monitoritzar en temps real una infinitat de variables econòmiques.
En aquest context, l'objectiu principal d'aquesta tesi doctoral és generar indicadors econòmics, basats en dades web, que siguen capaços de proveïr regularment de prediccions a curt termini ("nowcasting") sobre diverses activitats empresarials que són fonamentals per al creixement i desenvolupament de les economies. Concretament, tres indicadors econòmics basats en la web han sigut dissenyats i avaluats: en primer lloc, un indicador d'orientació exportadora, basat en un model que prediu si una empresa és exportadora; en segon lloc, un indicador d'adopció de comerç electrònic, basat en un model que prediu si una empresa ofereix la possibilitat de venda online; i en tercer lloc, un indicador de supervivència empresarial, basat en dos models que indiquen la probabilitat de supervivència d'una empresa i la seua tasa de risc. Per a crear aquestos indicadors, s'han descarregat una diversitat de dades de llocs web corporatius de forma manual i automàtica, que posteriorment s'han analitzat i processat amb tècniques d'anàlisi Big Data.
Els resultats mostren que les dades web seleccionades estan altament relacionades amb les variables econòmiques objecte d'estudi, i que els indicadors basats en la web que s'han dissenyat en aquesta tesi capturen en un alt grau els valors reals d'aquestes variables econòmiques, sent per tant vàlids per al seu ús per part del món acadèmic, de les empreses i dels decisors polítics. A més, la naturalesa online i digital dels indicadors basats en la web fa possible proveïr regularment i de forma barata de prediccions a curt termini. D'aquesta forma, són avantatjosos en comparació als indicadors tradicionals.
Aquesta tesi doctoral ha contribuït a generar coneixement sobre la viabilitat de produïr indicadors econòmics amb dades online procedents de llocs web corporatius. Els indicadors que s'han dissenyat pretenen contribuïr a la modernització en la producció d'estadístiques oficials, així com ajudar als decisors polítics i als gerents d'empreses a prendre decisions informades més ràpidament. / [EN] In the Digital Era, the increasing use of the Internet and digital devices is completely transforming the way of interacting in the economic and social framework. Myriad individuals, companies and public organizations use the Internet for their daily activities, generating a stream of fresh data ("Big Data") principally accessible through the World Wide Web (WWW), which has become the largest repository of information in the world. These digital footprints can be tracked and, if properly processed and analyzed, could help to monitor in real time a wide range of economic variables.
In this context, the main goal of this PhD thesis is to generate economic indicators, based on web data, which are able to provide regular, short-term predictions ("nowcasting") about some business activities that are basic for the growth and development of an economy. Concretely, three web-based economic indicators have been designed and evaluated: first, an indicator of firms' export orientation, which is based on a model that predicts if a firm is an exporter; second, an indicator of firms' engagement in e-commerce, which is based on a model that predicts if a firm offers e-commerce facilities in its website; and third, an indicator of firms' survival, which is based on two models that indicate the probability of survival of a firm and its hazard rate. To build these indicators, a variety of data from corporate websites have been retrieved manually and automatically, and subsequently have been processed and analyzed with Big Data analysis techniques.
Results show that the selected web data are highly related to the economic variables under study, and the web-based indicators designed in this thesis are capturing to a great extent their real values, thus being valid for their use by the academia, firms and policy-makers. Additionally, the digital and online nature of web-based indicators makes it possible to provide timely, inexpensive predictions about the economy. This way, they are advantageous with respect to traditional indicators.
This PhD thesis has contributed to generating knowledge about the viability of producing economic indicators with data coming from corporate websites. The indicators that have been designed are expected to contribute to the modernization of official statistics and to help in making earlier, more informed decisions to policy-makers and business managers. / Blázquez Soriano, MD. (2019). Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis Approach [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/116836 / Compendio
|
35 |
Stream-based statistical machine translationLevenberg, Abby D. January 2011 (has links)
We investigate a new approach for SMT system training within the streaming model of computation. We develop and test incrementally retrainable models which, given an incoming stream of new data, can efficiently incorporate the stream data online. A naive approach using a stream would use an unbounded amount of space. Instead, our online SMT system can incorporate information from unbounded incoming streams and maintain constant space and time. Crucially, we are able to match (or even exceed) translation performance of comparable systems which are batch retrained and use unbounded space. Our approach is particularly suited for situations when there is arbitrarily large amounts of new training material and we wish to incorporate it efficiently and in small space. The novel contributions of this thesis are: 1. An online, randomised language model that can model unbounded input streams in constant space and time. 2. An incrementally retrainable translationmodel for both phrase-based and grammarbased systems. The model presented is efficient enough to incorporate novel parallel text at the single sentence level. 3. Strategies for updating our stream-based language model and translation model which demonstrate how such components can be successfully used in a streaming translation setting. This operates both within a single streaming environment and also in the novel situation of having to translate multiple streams. 4. Demonstration that recent data from the stream is beneficial to translation performance. Our stream-based SMT system is efficient for tackling massive volumes of new training data and offers-up new ways of thinking about translating web data and dealing with other natural language streams.
|
36 |
Scalable view-based techniques for web data : algorithms and systems / Techniques efficaces basées sur des vues matérialisées pour la gestion des données du Web : algorithmes et systèmesKatsifodimos, Asterios 03 July 2013 (has links)
Le langage XML, proposé par le W3C, est aujourd’hui utilisé comme un modèle de données pour le stockage et l’interrogation de grands volumes de données dans les systèmes de bases de données. En dépit d’importants travaux de recherche et le développement de systèmes efficace, le traitement de grands volumes de données XML pose encore des problèmes des performance dus à la complexité et hétérogénéité des données ainsi qu’à la complexité des langages courants d’interrogation XML. Les vues matérialisées sont employées depuis des décennies dans les bases de données afin de raccourcir les temps de traitement des requêtes. Elles peuvent être considérées les résultats de requêtes pré-calculées, que l’on réutilise afin d’éviter de recalculer (complètement ou partiellement) une nouvelle requête. Les vues matérialisées ont fait l’objet de nombreuses recherches, en particulier dans le contexte des entrepôts des données relationnelles.Cette thèse étudie l’applicabilité de techniques de vues matérialisées pour optimiser les performances des systèmes de gestion de données Web, et en particulier XML, dans des environnements distribués. Dans cette thèse, nos apportons trois contributions.D’abord, nous considérons le problème de la sélection des meilleures vues à matérialiser dans un espace de stockage donné, afin d’améliorer la performance d’une charge de travail des requêtes. Nous sommes les premiers à considérer un sous-langage de XQuery enrichi avec la possibilité de sélectionner des noeuds multiples et à de multiples niveaux de granularités. La difficulté dans ce contexte vient de la puissance expressive et des caractéristiques du langage des requêtes et des vues, et de la taille de l’espace de recherche de vues que l’on pourrait matérialiser.Alors que le problème général a une complexité prohibitive, nous proposons et étudions un algorithme heuristique et démontrer ses performances supérieures par rapport à l’état de l’art.Deuxièmement, nous considérons la gestion de grands corpus XML dans des réseaux pair à pair, basées sur des tables de hachage distribuées. Nous considérons la plateforme ViP2P dans laquelle des vues XML distribuées sont matérialisées à partir des données publiées dans le réseau, puis exploitées pour répondre efficacement aux requêtes émises par un pair du réseau. Nous y avons apporté d’importantes optimisations orientées sur le passage à l’échelle, et nous avons caractérisé la performance du système par une série d’expériences déployées dans un réseau à grande échelle. Ces expériences dépassent de plusieurs ordres de grandeur les systèmes similaires en termes de volumes de données et de débit de dissémination des données. Cette étude est à ce jour la plus complète concernant une plateforme de gestion de contenus XML déployée entièrement et testée à une échelle réelle.Enfin, nous présentons une nouvelle approche de dissémination de données dans un système d’abonnements, en présence de contraintes sur les ressources CPU et réseau disponibles; cette approche est mise en oeuvre dans le cadre de notre plateforme Delta. Le passage à l’échelle est obtenu en déchargeant le fournisseur de données de l’effort de répondre à une partie des abonnements. Pour cela, nous tirons profit de techniques de réécriture de requêtes à l’aide de vues afin de diffuser les données de ces abonnements, à partir d’autres abonnements.Notre contribution principale est un nouvel algorithme qui organise les vues dans un réseau de dissémination d’information multi-niveaux ; ce réseau est calculé à l’aide d’outils techniques de programmation linéaire afin de passer à l’échelle pour de grands nombres de vues, respecter les contraintes de capacité du système, et minimiser les délais de propagation des information. L’efficacité et la performance de notre algorithme est confirmée par notre évaluation expérimentale, qui inclut l’étude d’un déploiement réel dans un réseau WAN. / XML was recommended by W3C in 1998 as a markup language to be used by device- and system-independent methods of representing information. XML is nowadays used as a data model for storing and querying large volumes of data in database systems. In spite of significant research and systems development, many performance problems are raised by processing very large amounts of XML data. Materialized views have long been used in databases to speed up queries. Materialized views can be seen as precomputed query results that can be re-used to evaluate (part of) another query, and have been a topic of intensive research, in particular in the context of relational data warehousing. This thesis investigates the applicability of materialized views techniques to optimize the performance of Web data management tools, in particular in distributed settings, considering XML data and queries. We make three contributions.We first consider the problem of choosing the best views to materialize within a given space budget in order to improve the performance of a query workload. Our work is the first to address the view selection problem for a rich subset of XQuery. The challenges we face stem from the expressive power and features of both the query and view languages and from the size of the search space of candidate views to materialize. While the general problem has prohibitive complexity, we propose and study a heuristic algorithm and demonstrate its superior performance compared to the state of the art.Second, we consider the management of large XML corpora in peer-to-peer networks, based on distributed hash tables (or DHTs, in short). We consider a platform leveraging distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. This thesis has contributed important scalability oriented optimizations, as well as a comprehensive set of experiments deployed in a country-wide WAN. These experiments outgrow by orders of magnitude similar competitor systems in terms of data volumes and data dissemination throughput. Thus, they are the most advanced in understanding the performance behavior of DHT-based XML content management in real settings.Finally, we present a novel approach for scalable content-based publish/subscribe (pub/sub, in short) in the presence of constraints on the available computational resources of data publishers. We achieve scalability by off-loading subscriptions from the publisher, and leveraging view-based query rewriting to feed these subscriptions from the data accumulated in others. Our main contribution is a novel algorithm for organizing subscriptions in a multi-level dissemination network in order to serve large numbers of subscriptions, respect capacity constraints, and minimize latency. The efficiency and effectiveness of our algorithm are confirmed through extensive experiments and a large deployment in a WAN.
|
37 |
Explorando dados provindos da internet em dispositivos móveis: uma abordagem baseada em visualização de informação / Exploring web data on mobile devices: an approach based on information visualizationDuarte, Felipe Simões Lage Gomes 12 February 2015 (has links)
Com o progresso da computação e popularização da Internet, a sociedade entrou na era da informação. Esta nova fase é marcada pela forma como produzimos e lidamos com a informação. Diariamente, são produzidos e armazenados milhares de Gigabytes de dados cujo valor é reduzido se a informação ali contida não puder ser transformada em conhecimento. Concomitante a este fato, o padrão da computação está caminhando para a miniaturização e acessibilidade com os dispositivos móveis. Estes novos equipamentos estão mudando o padrão de comportamento dos usuários que passam de leitores passivos a geradores de conteúdo. Neste contexto, este projeto de mestrado propõe a técnica de visualização de dados NMap e a ferramenta de visualização de dados web aplicável a dispositivo móvel SPCloud. A técnica NMap utiliza todo o espaço visual disponível para transmitir informações de grupos preservando a metáfora distância-similaridade. Teste comparativos com as principais técnicas do estado da arte mostraram que a técnica NMap tem melhores resultados em preservação de vizinhança com um tempo de processamento significativamente melhor. Este fato coloca a NMap como uma das principais técnicas de ocupação do espaço visual. A ferramenta SPCloud utiliza a NMap para visualizar notícias disponíveis na web. A ferramenta foi desenvolvida levando em consideração as características inerentes aos dispositivos moveis o que possibilita utiliza-la nestes equipamentos. Teste informais com usuários demonstraram que a ferramenta tem um bom desempenho para sumarizar grandes quantidades de notícias em um pequeno espaço visual. / With the development of computers and the increasing popularity of the Internet, our society has entered the information age. This era is marked by the way we produce and deal with information. Everyday, thousand of Gigabytes are stored, but their value is reduced if the data cannot be transformed into knowledge. Concomitantly, computing is moving towards miniaturization and affordability of mobile devices, which are changing users behavior who move from passive readers to content generators. In this context, in this master thesis we propose and develop a data visualization technique, called NMap, and a web data visualization tool for mobile devices, called SPCloud. NMap uses all available visual space to transmit information preserving the metaphor of distance-similarity between elements. Comparative evaluations were performed with the state of the art techniques and the result has shown that NMap produces better results of neighborhood preservation with a significant improvement in processing time. Our results place NMap as a major space-filling technique establishing a new state of the art. SPCloud, a tool which uses NMap to present news available on the web, was developed taking into account the inherent characteristics of mobile devices. Informal user tests revealed that SPCloud performs well to summarize large amounts of news in a small visual space.
|
38 |
View-Based techniques for the efficient management of web data.Karanasos, Konstantinos 29 June 2012 (has links) (PDF)
Data is being published in digital formats at very high rates nowadays. A large share of this data has complex structure, typically organized as trees (Web documents such as HTML and XML being the most representative) or graphs (in particular, graph-structured Semantic Web databases, expressed in RDF). There is great interest in exploiting such complex data, whether in an Open Data access model or within companies owning it, and efficiently doing so for large data volumes remains challenging. Materialized views have long been used to obtain significant performance improvements when processing queries. The principle is that a view stores pre-computed results that can be used to evaluate (possibly part of) a query. Adapting materialized view techniques to the Web data setting we consider is particularly challenging due to the structural and semantic complexity of the data. This thesis tackles two problems in the broad context of materialized view-based management of Web data. First, we focus on the problem of view selection for RDF query workloads. We present a novel algorithm, which, based on a query workload, proposes the most appropriate views to be materialized in the database, in order to minimize the combined cost of query evaluation, view maintenance and view storage. Although RDF query workloads typically feature many joins, hampering the view selection process, our algorithm scales to hundreds of queries, a number unattained by existing approaches. Furthermore, we propose new techniques to account for the implicit data that can be derived by the RDF Schemas and which further complicate the view selection process. The second contribution of our work concerns query rewriting based on materialized XML views. We start by identifying an expressive dialect of XQuery, corresponding to tree patterns with value joins, and study some important properties for these queries, such as containment and minimization. Based on these notions, we consider the problem of finding minimal equivalent rewritings of a query expressed in this dialect, using materialized views expressed in the same dialect, and provide a sound and complete algorithm for that purpose. Our work extends the state of the art by allowing each pattern node to return a set of attributes, supporting value joins in the patterns, and considering rewritings which combine many views. Finally, we show how our view-based query rewriting algorithm can be applied in a distributed setting, in order to efficiently disseminate corpora of XML documents carrying RDF annotations.
|
39 |
Pattern-based data and application integration in service oriented architecturesKongdenfha, Woralak, Computer Science & Engineering, Faculty of Engineering, UNSW January 2009 (has links)
The success of Web services comes from the benefits that it brings in reducing the cost and the time needed to develop data and applications by reusing them, and simplifying their integrations through standardization. However, standardization in Web services does not remove the need for adapters due to possible heterogeneity among service interface and protocol definitions. Moreover, the current service APIs are targeted toward professional programmers, but not accessible to a wider class of users without programming expertise, but would never the less like to build their own integrated applications. In this dissertation, we propose methods and tools to support both service developers and non-expert users in their data and application integration tasks. To support service developers, we propose a framework that enables rapid development of Web service adapters. We investigate particularly the problem of service adaptation focusing on business interface and protocol layers. Our study shows that many differences between business interfaces and protocols are recurring. We introduce mismatch patterns to capture these recurring differences and provide solutions to resolve them. We present the notion of adaptation aspects, which is based on the aspect-oriented programming paradigm, to enable rapid development and deployment of service adapters. We also present a comparative study between standalone and aspect-oriented adapters development. The study shows that the aspect-oriented approach is preferable in many cases, especially when adapters need to access internal states of services. The proposed approach is implemented in a prototype tool, which is used to illustrate how it simplifies adapters development through a case study. To support users without programming expertise, we propose a spreadsheet-based Web mashups development framework, which enables users to develop mashups in the popular spreadsheet environment. First, we provide a mechanism that makes structured data first class values of spreadsheet cells. Second, we propose a new component model that can be used to develop fairly sophisticated mashups, involving joining data sources and keeping spreadsheet data up to date. Third, to simplify mashup development, we provide a collection of spreadsheet-based mashup patterns that captures common Web data access and spreadsheet presentation functionalities. Users can reuse and customize these patterns to build spreadsheet-based Web mashups instead of developing them from scratch. Fourth, we enable users to manipulate structured data presented on spreadsheet in a drag-and-drop fashion. Finally, we have developed and tested a prototype tool to demonstrate the utility of the proposed framework.
|
40 |
Pattern-based data and application integration in service oriented architecturesKongdenfha, Woralak, Computer Science & Engineering, Faculty of Engineering, UNSW January 2009 (has links)
The success of Web services comes from the benefits that it brings in reducing the cost and the time needed to develop data and applications by reusing them, and simplifying their integrations through standardization. However, standardization in Web services does not remove the need for adapters due to possible heterogeneity among service interface and protocol definitions. Moreover, the current service APIs are targeted toward professional programmers, but not accessible to a wider class of users without programming expertise, but would never the less like to build their own integrated applications. In this dissertation, we propose methods and tools to support both service developers and non-expert users in their data and application integration tasks. To support service developers, we propose a framework that enables rapid development of Web service adapters. We investigate particularly the problem of service adaptation focusing on business interface and protocol layers. Our study shows that many differences between business interfaces and protocols are recurring. We introduce mismatch patterns to capture these recurring differences and provide solutions to resolve them. We present the notion of adaptation aspects, which is based on the aspect-oriented programming paradigm, to enable rapid development and deployment of service adapters. We also present a comparative study between standalone and aspect-oriented adapters development. The study shows that the aspect-oriented approach is preferable in many cases, especially when adapters need to access internal states of services. The proposed approach is implemented in a prototype tool, which is used to illustrate how it simplifies adapters development through a case study. To support users without programming expertise, we propose a spreadsheet-based Web mashups development framework, which enables users to develop mashups in the popular spreadsheet environment. First, we provide a mechanism that makes structured data first class values of spreadsheet cells. Second, we propose a new component model that can be used to develop fairly sophisticated mashups, involving joining data sources and keeping spreadsheet data up to date. Third, to simplify mashup development, we provide a collection of spreadsheet-based mashup patterns that captures common Web data access and spreadsheet presentation functionalities. Users can reuse and customize these patterns to build spreadsheet-based Web mashups instead of developing them from scratch. Fourth, we enable users to manipulate structured data presented on spreadsheet in a drag-and-drop fashion. Finally, we have developed and tested a prototype tool to demonstrate the utility of the proposed framework.
|
Page generated in 0.0632 seconds