Global ETD Search

1	Information Aggregation using the Cameleon# Web Wrapper Firat, Aykut, Madnick, Stuart, Yahaya, Nor Adnan, Kuan, Choo Wai, Bressan, Stéphane 29 July 2005 (has links) Cameleon# is a web data extraction and management tool that provides information aggregation with advanced capabilities that are useful for developing value-added applications and services for electronic business and electronic commerce. To illustrate its features, we use an airfare aggregation example that collects data from eight online sites, including Travelocity, Orbitz, and Expedia. This paper covers the integration of Cameleon# with commercial database management systems, such as MS SQL Server, and XML query languages, such as XQuery. Cameleon# web data extraction web data management
2	Distributed data management with access control : social Networks and Data of the Web Galland, Alban 28 September 2011 (has links) (PDF) The amount of information on the Web is spreading very rapidly. Users as well as companies bring data to the network and are willing to share with others. They quickly reach a situation where their information is hosted on many machines they own and on a large number of autonomous systems where they have accounts. Management of all this information is rapidly becoming beyond human expertise. We introduce WebdamExchange, a novel distributed knowledge-base model that includes logical statements for specifying information, access control, secrets, distribution, and knowledge about other peers. These statements can be communicated, replicated, queried, and updated, while keeping track of time and provenance. The resulting knowledge guides distributed data management. WebdamExchange model is based on WebdamLog, a new rule-based language for distributed data management that combines in a formal setting deductiverules as in Datalog with negation, (to specify intensional data) and active rules as in Datalog:: (for updates and communications). The model provides a novel setting with a strong emphasis on dynamicity and interactions(in a Web 2.0 style). Because the model is powerful, it provides a clean basis for the specification of complex distributed applications. Because it is simple, it provides a formal framework for studying many facets of the problem such as distribution, concurrency, and expressivity in the context of distributed autonomous peers. We also discuss an implementation of a proof-of-concept system that handles all the components of the knowledge base and experiments with a lighter system designed for smartphones. We believe that these contributions are a good foundation to overcome theproblems of Web data management, in particular with respect to access control. [INFO:INFO_OH] Computer Science/Other Distribution Access Control Social Network Web Data Management Distributed Datalog
3	Distributed data management with a declarative rule-based language webdamlog / Gestion des données distribuées avec le langage de règles Webdamlog Antoine, Emilien 05 December 2013 (has links) Notre but est de permettre à un utilisateur du Web d’organiser la gestionde ses données distribuées en place, c’est à dire sans l’obliger à centraliserses données chez un unique hôte. Par conséquent, notre système diffèrede Facebook et des autres systèmes centralisés, et propose une alternativepermettant aux utilisateurs de lancer leurs propres pairs sur leurs machinesgérant localement leurs données personnelles et collaborant éventuellementavec des services Web externes.Dans ma thèse, je présente Webdamlog, un langage dérivé de datalogpour la gestion de données et de connaissances distribuées. Le langage étenddatalog de plusieurs manières, principalement avec une nouvelle propriété ladélégation, autorisant les pairs à échanger non seulement des faits (les données)mais aussi des règles (la connaissance). J’ai ensuite mené une étude utilisateurpour démontrer l’utilisation du langage. Enfin je décris le moteur d’évaluationde Webdamlog qui étend un moteur d’évaluation de datalog distribué nomméBud, en ajoutant le support de la délégation et d’autres innovations tellesque la possibilité d’avoir des variables pour les noms de pairs et des relations.J’aborde de nouvelles techniques d’optimisation, notamment basées sur laprovenance des faits et des règles. Je présente des expérimentations quidémontrent que le coût du support des nouvelles propriétés de Webdamlogreste raisonnable même pour de gros volumes de données. Finalement, jeprésente l’implémentation d’un pair Webdamlog qui fournit l’environnementpour le moteur. En particulier, certains adaptateurs permettant aux pairsWebdamlog d’échanger des données avec d’autres pairs sur Internet. Pourillustrer l’utilisation de ces pairs, j’ai implémenté une application de partagede photos dans un réseau social en Webdamlog. / Our goal is to enable aWeb user to easily specify distributed data managementtasks in place, i.e. without centralizing the data to a single provider. Oursystem is therefore not a replacement for Facebook, or any centralized system,but an alternative that allows users to launch their own peers on their machinesprocessing their own local personal data, and possibly collaborating with Webservices.We introduce Webdamlog, a datalog-style language for managing distributeddata and knowledge. The language extends datalog in a numberof ways, notably with a novel feature, namely delegation, allowing peersto exchange not only facts but also rules. We present a user study thatdemonstrates the usability of the language. We describe a Webdamlog enginethat extends a distributed datalog engine, namely Bud, with the supportof delegation and of a number of other novelties of Webdamlog such as thepossibility to have variables denoting peers or relations. We mention noveloptimization techniques, notably one based on the provenance of facts andrules. We exhibit experiments that demonstrate that the rich features ofWebdamlog can be supported at reasonable cost and that the engine scales tolarge volumes of data. Finally, we discuss the implementation of a Webdamlogpeer system that provides an environment for the engine. In particular, a peersupports wrappers to exchange Webdamlog data with non-Webdamlog peers.We illustrate these peers by presenting a picture management applicationthat we used for demonstration purposes. Distribution Datalog Base de connaissances Pair à pair Gestion de données du Web Distribution Datalog Knowledge Base Peer to Peer Web Data Management
4	Distributed data management with a declarative rule-based language webdamlog Antoine, Emilien 05 December 2013 (has links) (PDF) Our goal is to enable aWeb user to easily specify distributed data managementtasks in place, i.e. without centralizing the data to a single provider. Oursystem is therefore not a replacement for Facebook, or any centralized system,but an alternative that allows users to launch their own peers on their machinesprocessing their own local personal data, and possibly collaborating with Webservices.We introduce Webdamlog, a datalog-style language for managing distributeddata and knowledge. The language extends datalog in a numberof ways, notably with a novel feature, namely delegation, allowing peersto exchange not only facts but also rules. We present a user study thatdemonstrates the usability of the language. We describe a Webdamlog enginethat extends a distributed datalog engine, namely Bud, with the supportof delegation and of a number of other novelties of Webdamlog such as thepossibility to have variables denoting peers or relations. We mention noveloptimization techniques, notably one based on the provenance of facts andrules. We exhibit experiments that demonstrate that the rich features ofWebdamlog can be supported at reasonable cost and that the engine scales tolarge volumes of data. Finally, we discuss the implementation of a Webdamlogpeer system that provides an environment for the engine. In particular, a peersupports wrappers to exchange Webdamlog data with non-Webdamlog peers.We illustrate these peers by presenting a picture management applicationthat we used for demonstration purposes. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Distribution Datalog Knowledge Base Peer to Peer Web Data Management
5	Distributed data management with access control : social Networks and Data of the Web / Gestion de Données Distribuées avec Contrôle d’Accès : réseaux sociaux et données du Web Galland, Alban 28 September 2011 (has links) La masse d’information disponible sur leWeb s’accroit rapidement, sous l’afflux de données en provenance des utilisateurs et des compagnies. Ces données qu’ils souhaitent partager de façon controllée sur le réseau et quisont réparties sur de nombreuses machines et systèmes différents, ne sont rapidement plus gérables directement par des moyens humains. Nous introduisons WebdamExchange, un nouveau modèle de bases de connaissancesdistribuées, qui comprend des assertions au sujet des données, du contrôle d’accés et de la distribution. Ces assertions peuvent être échangées avec d’autres pairs, répliquées, interrogées et mises à jour, en gardant la trace de leur origine. La base de connaissance permet aussi de guider de façon automatique sa propre gestion. WebdamExchange est basé surWebdamLog, un nouveau langage de règles pour la gestion de données distribuées, qui associe formellement les règles déductives de Datalog avec négation et les règles actives de Datalog::. WebdamLog met l’accent sur la dynamicité et les interactions, caractéristiques du Web 2.0. Ce modèle procure à la fois un langage expressif pour la spécification de systèmes distribués complexes et un cadre formel pour l’étude de propriétés fondamentales de la distribution. Nous présentons aussi une implémentation de notre base de connaissance. Nous pensons que ces contributions formentune fondation solide pour surmonter les problèmes de gestion de données du Web, en particulier dans le cadre du contrôle d’accès. / The amount of information on the Web is spreading very rapidly. Users as well as companies bring data to the network and are willing to share with others. They quickly reach a situation where their information is hosted on many machines they own and on a large number of autonomous systems where they have accounts. Management of all this information is rapidly becoming beyond human expertise. We introduce WebdamExchange, a novel distributed knowledge-base model that includes logical statements for specifying information, access control, secrets, distribution, and knowledge about other peers. These statements can be communicated, replicated, queried, and updated, while keeping track of time and provenance. The resulting knowledge guides distributed data management. WebdamExchange model is based on WebdamLog, a new rule-based language for distributed data management that combines in a formal setting deductiverules as in Datalog with negation, (to specify intensional data) and active rules as in Datalog:: (for updates and communications). The model provides a novel setting with a strong emphasis on dynamicity and interactions(in a Web 2.0 style). Because the model is powerful, it provides a clean basis for the specification of complex distributed applications. Because it is simple, it provides a formal framework for studying many facets of the problem such as distribution, concurrency, and expressivity in the context of distributed autonomous peers. We also discuss an implementation of a proof-of-concept system that handles all the components of the knowledge base and experiments with a lighter system designed for smartphones. We believe that these contributions are a good foundation to overcome theproblems of Web data management, in particular with respect to access control. Distribution Contrôle d’Accès Réseaux Sociaux Gestion de Données du We Datalog Distribué Distribution Access Control Social Network Web Data Management Distributed Datalog
6	Extração de informação não-supervisionada por segmentação de texto Vilarinho, Eli Cortez Custódio 14 December 2012 (has links) Submitted by Lúcia Brandão (lucia.elaine@live.com) on 2015-07-27T19:15:09Z No. of bitstreams: 1 Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-28T19:02:25Z (GMT) No. of bitstreams: 1 Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-28T19:08:39Z (GMT) No. of bitstreams: 1 Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5) / Made available in DSpace on 2015-07-28T19:08:39Z (GMT). No. of bitstreams: 1 Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5) Previous issue date: 2012-12-14 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / In this work we propose, implement and evaluate a new unsupervised approach for the problem of Information Extraction by Text Segmentation (IETS). Our approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to our approach. Based on our approach, we have produced a number of results to address the IETS problem in a unsupervised fashion. In particular, we have developed, implemented and evaluated distinct IETS methods, namely ONDUX, JUDIE and iForm. ONDUX (On Demand Unsupervised Information Extraction) is an unsupervised probabilistic approach for IETS that relies on content-based features to bootstrap the learning of structure-based features. Structure-based features are exploited to disambiguate the extraction of certain attributes through a reinforcement step, which relies on sequencing and positioning of attribute values directly learned on-demand from the input texts. JUDIE (Joint Unsupervised Structure Discovery and Information Extraction) aims at automatically extracting several semi-structured data records in the form of continuous text and having no explicit delimiters between them. In comparison with other IETS methods, including ONDUX, JUDIE faces a task considerably harder, that is, extracting information while simultaneously uncovering the underlying structure of the implicit records containing it. In spite of that, it achieves results comparable to the state-of- the-art methods. iForm applies our approach to the task of Web form filling. It aims at extracting segments from a data-rich text given as input and associating these segments with fields from a target Web form. The extraction process relies on content-based features learned from data that was previously submitted to the Web form. All of these methods were evaluated considering different experimental datasets, which we use to perform a large set of experiments in order to validate our approach and methods. These experiments indicate that our proposed approach yields high quality results when compared to state-of-the-art approaches and that it is able to properly support IETS methods in a number of real applications. / Neste trabalho, propomos, implementar e avaliar uma nova abordagem não supervisionada para o problema de Extração de Informações Segmentação Texto (IETS). Nossa abordagem baseia-se em informações disponíveis sobre dados pré-existentes para aprender a associar segmentos na seqüência de entrada com atributos de um determinado domínio contando com uma muito eficaz conjunto de recursos baseados em conteúdo. A eficácia dos recursos com base em conteúdo também é explorada para aprender diretamente com recursos baseados em estrutura de dados de teste, sem prévia formação humana-driven, uma característica única para a nossa abordagem. Com base em nossa abordagem, que produziram um número de resultados de abordar o problema IETS num sem supervisão moda. Em particular, temos desenvolvido, implementado e avaliado IETS distintas métodos, nomeadamente ONDUX, judie e iForm. ONDUX (On Demand Unsupervised Extração de Informação) é uma abordagem probabilística sem supervisão para que IETS depende de características baseadas em conteúdo para iniciar o aprendizado de características baseadas em estrutura. Recursos baseados em estrutura são exploradas para disambiguate a extração de certos atributos através de uma etapa de reforço, que se baseia na sequenciação e posicionamento de valores de atributos diretamente aprendidas on-demand a partir dos textos de entrada. Judie (Joint Estrutura sem supervisão Descoberta e Extração de Informações) visa automaticamente extrair vários registros semi-estruturados de dados na forma de texto contínuo e não tendo delimitadores explícitas entre eles. Em comparação com outros IETS métodos, incluindo ONDUX, judie enfrenta uma tarefa consideravelmente mais forte, isto é, extrair informações, ao mesmo tempo descobrindo a estrutura subjacente de os registros implícitas que o contenham. Apesar disso, ele consegue resultados comparáveis aos a métodos the-art estado-da. iForm aplica-se a nossa abordagem para a tarefa de forma Web o preenchimento. Destina-se a extração de segmentos de um texto rico em dados fornecidos como entrada e associando esses segmentos com campos de um formulário Web de destino. O processo de extracção depende de recursos com base em conteúdo aprendidas com os dados que foram previamente submetidos à o formulário Web. Todos esses métodos foram avaliados considerando diferente experimental conjuntos de dados, que usamos para realizar um grande conjunto de experiências, a fim de validar nossa abordagem e métodos. Estas experiências indicam que a nossa abordagem proposta produz resultados de alta qualidade quando comparado com abordagens state-of-the-art e que ele é capaz de suportar adequadamente os métodos IETS em uma série de aplicações reais. Banco de dados Gerência de dados da web Information extraction Database Web data management

1

Page generated in 0.2678 seconds