Global ETD Search

11	Distributed data management with access control : social Networks and Data of the Web Galland, Alban 28 September 2011 (has links) (PDF) The amount of information on the Web is spreading very rapidly. Users as well as companies bring data to the network and are willing to share with others. They quickly reach a situation where their information is hosted on many machines they own and on a large number of autonomous systems where they have accounts. Management of all this information is rapidly becoming beyond human expertise. We introduce WebdamExchange, a novel distributed knowledge-base model that includes logical statements for specifying information, access control, secrets, distribution, and knowledge about other peers. These statements can be communicated, replicated, queried, and updated, while keeping track of time and provenance. The resulting knowledge guides distributed data management. WebdamExchange model is based on WebdamLog, a new rule-based language for distributed data management that combines in a formal setting deductiverules as in Datalog with negation, (to specify intensional data) and active rules as in Datalog:: (for updates and communications). The model provides a novel setting with a strong emphasis on dynamicity and interactions(in a Web 2.0 style). Because the model is powerful, it provides a clean basis for the specification of complex distributed applications. Because it is simple, it provides a formal framework for studying many facets of the problem such as distribution, concurrency, and expressivity in the context of distributed autonomous peers. We also discuss an implementation of a proof-of-concept system that handles all the components of the knowledge base and experiments with a lighter system designed for smartphones. We believe that these contributions are a good foundation to overcome theproblems of Web data management, in particular with respect to access control. [INFO:INFO_OH] Computer Science/Other Distribution Access Control Social Network Web Data Management Distributed Datalog
12	Interaktivní procházení webu a extrakce dat / Interactive web crawling and data extraction Fejfar, Petr January 2018 (has links) Title: Interactive crawling and data extraction Author: Bc. Petr Fejfar Author's e-mail address: pfejfar@gmail.com Department: Department of Distributed and Dependable Systems Supervisor: Mgr. Pavel Je ek, Ph.D., Department of Distributed and De- pendable Systems Abstract: The subject of this thesis is Web crawling and data extraction from Rich Internet Applications (RIA). The thesis starts with analysis of modern Web pages along with techniques used for crawling and data extraction. Based on this analysis, we designed a tool which crawls RIAs according to the instructions defined by the user via graphic interface. In contrast with other currently popular tools for RIAs, our solution is targeted at users with no programming experience, including business and analyst users. The designed solution itself is implemented in form of RIA, using the Web- Driver protocol to automate multiple browsers according to user-defined instructions. Our tool allows the user to inspect browser sessions by dis- playing pages that are being crawled simultaneously. This feature enables the user to troubleshoot the crawlers. The outcome of this thesis is a fully design and implemented tool enabling business user to extract data from the RIAs. This opens new opportunities for this type of user to collect data from Web pages for use...
13	Descoberta de ruído em páginas da web oculta através de uma abordagem de aprendizagem supervisionada / A supervised learning approach for noise discovery in web pages found in the hidden web Lutz, João Adolfo Froede January 2013 (has links) Um dos problemas da extração de dados na web é a remoção de ruído existente nas páginas. Esta tarefa busca identificar todos os elementos não informativos em meio ao conteúdo, como por exemplo cabeçalhos, menus ou propagandas. A presença de ruído pode prejudicar seriamente o desempenho de motores de busca e tarefas de mineração de dados na web. Este trabalho aborda o problema da descoberta de ruído em páginas da web oculta, a parte da web que é acessível apenas através do preenchimento de formulários. No processamento da web oculta, a extração de dados geralmente é precedida por uma etapa de inserção de dados, na qual os formulários que dão acesso às páginas ocultas são automaticamente ou semi-automaticamente preenchidos. Durante esta fase, são coleta- dos dados do domínio em questão, como os rótulos e valores dos campos. A proposta deste trabalho é agregar este tipo de dados com informações sintáticas dos elementos que compõem a página. É mostrado empiricamente que esta combinação atinge resultados melhores que uma abordagem baseada apenas em informações sintáticas. / One of the problems of data extraction from web pages is the identification of noise in pages. This task aims at identifying non-informative elements in pages, such as headers, menus, or advertisement. The presence of noise may hinder the performance of search engines and web mining tasks. In this paper we tackle the problem of discovering noise in web pages found in the hidden web, i.e., that part of the web that is only accessible by filling web forms. In hidden web processing, data extraction is usually preceeded by a form filling step, in which the query forms that give access to the hidden web pages are automatically or semi-automatically filled. During form filling relevant data about the queried domain are collected, as field names and field values. Our proposal combines this type of data with syntactic information about the nodes that compose the page. We show empirically that this combination achieves better results than an approach that is based solely on syntactic information. Keywords: Recuperacao : Informacao Web : Desenvolvimento Hidden web Information retrieval Web data extraction Web noise removal
14	Descoberta de ruído em páginas da web oculta através de uma abordagem de aprendizagem supervisionada / A supervised learning approach for noise discovery in web pages found in the hidden web Lutz, João Adolfo Froede January 2013 (has links) Um dos problemas da extração de dados na web é a remoção de ruído existente nas páginas. Esta tarefa busca identificar todos os elementos não informativos em meio ao conteúdo, como por exemplo cabeçalhos, menus ou propagandas. A presença de ruído pode prejudicar seriamente o desempenho de motores de busca e tarefas de mineração de dados na web. Este trabalho aborda o problema da descoberta de ruído em páginas da web oculta, a parte da web que é acessível apenas através do preenchimento de formulários. No processamento da web oculta, a extração de dados geralmente é precedida por uma etapa de inserção de dados, na qual os formulários que dão acesso às páginas ocultas são automaticamente ou semi-automaticamente preenchidos. Durante esta fase, são coleta- dos dados do domínio em questão, como os rótulos e valores dos campos. A proposta deste trabalho é agregar este tipo de dados com informações sintáticas dos elementos que compõem a página. É mostrado empiricamente que esta combinação atinge resultados melhores que uma abordagem baseada apenas em informações sintáticas. / One of the problems of data extraction from web pages is the identification of noise in pages. This task aims at identifying non-informative elements in pages, such as headers, menus, or advertisement. The presence of noise may hinder the performance of search engines and web mining tasks. In this paper we tackle the problem of discovering noise in web pages found in the hidden web, i.e., that part of the web that is only accessible by filling web forms. In hidden web processing, data extraction is usually preceeded by a form filling step, in which the query forms that give access to the hidden web pages are automatically or semi-automatically filled. During form filling relevant data about the queried domain are collected, as field names and field values. Our proposal combines this type of data with syntactic information about the nodes that compose the page. We show empirically that this combination achieves better results than an approach that is based solely on syntactic information. Keywords: Recuperacao : Informacao Web : Desenvolvimento Hidden web Information retrieval Web data extraction Web noise removal
15	Descoberta de ruído em páginas da web oculta através de uma abordagem de aprendizagem supervisionada / A supervised learning approach for noise discovery in web pages found in the hidden web Lutz, João Adolfo Froede January 2013 (has links) Um dos problemas da extração de dados na web é a remoção de ruído existente nas páginas. Esta tarefa busca identificar todos os elementos não informativos em meio ao conteúdo, como por exemplo cabeçalhos, menus ou propagandas. A presença de ruído pode prejudicar seriamente o desempenho de motores de busca e tarefas de mineração de dados na web. Este trabalho aborda o problema da descoberta de ruído em páginas da web oculta, a parte da web que é acessível apenas através do preenchimento de formulários. No processamento da web oculta, a extração de dados geralmente é precedida por uma etapa de inserção de dados, na qual os formulários que dão acesso às páginas ocultas são automaticamente ou semi-automaticamente preenchidos. Durante esta fase, são coleta- dos dados do domínio em questão, como os rótulos e valores dos campos. A proposta deste trabalho é agregar este tipo de dados com informações sintáticas dos elementos que compõem a página. É mostrado empiricamente que esta combinação atinge resultados melhores que uma abordagem baseada apenas em informações sintáticas. / One of the problems of data extraction from web pages is the identification of noise in pages. This task aims at identifying non-informative elements in pages, such as headers, menus, or advertisement. The presence of noise may hinder the performance of search engines and web mining tasks. In this paper we tackle the problem of discovering noise in web pages found in the hidden web, i.e., that part of the web that is only accessible by filling web forms. In hidden web processing, data extraction is usually preceeded by a form filling step, in which the query forms that give access to the hidden web pages are automatically or semi-automatically filled. During form filling relevant data about the queried domain are collected, as field names and field values. Our proposal combines this type of data with syntactic information about the nodes that compose the page. We show empirically that this combination achieves better results than an approach that is based solely on syntactic information. Keywords: Recuperacao : Informacao Web : Desenvolvimento Hidden web Information retrieval Web data extraction Web noise removal
16	A comparison of HTML-aware tools for Web Data extraction Boronat, Xavier Azagra 20 October 2017 (has links) Nowadays we live in a world where information is present everywhere in our daily life. In those last years the amount of information that we receive has grown and the stands in which is distributed have changed; from conventional newspapers or the radio to mobile phones, digital television or the Web. In this document we reference to the information that we can find in the Web, a really big source of data which is still developing. info:eu-repo/classification/ddc/000 ddc:000
17	iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings Rahm, Erhard, Thor, Andreas, Aumüller, David, Do, Hong-Hai, Golovin, Nick, Kirsten, Toralf 04 February 2019 (has links) We present a new approach to information fusion of web data sources. It is based on peer-to-peer mappings between sources and utilizes correspondences between their instances. Such correspondences are already available between many sources, e.g. in the form of web links, and help combine the information about specific objects and support a high quality data fusion. Sources and mappings relate to a domain model to support a semantically focused information fusion. The iFuice architecture incorporates a mapping mediator offering both an interactive and a script-driven, workflow-like access to the sources and their mappings. The script programmer can use powerful generic operators to execute and manipulate mappings and their results. The paper motivates the new approach and outlines the architecture and its main components, in particular the domain model, source and mapping model, and the script operators and their usage. info:eu-repo/classification/ddc/004 ddc:004
18	Analytics-as-a-Service in a Multi-Cloud Environment through Semantically-enabled Hierarchical Data Processing Jayaraman, P.P., Perera, C., Georgakopoulos, D., Dustdar, S., Thakker, Dhaval, Ranjan, R. 16 August 2016 (has links) yes / A large number of cloud middleware platforms and tools are deployed to support a variety of Internet of Things (IoT) data analytics tasks. It is a common practice that such cloud platforms are only used by its owners to achieve their primary and predefined objectives, where raw and processed data are only consumed by them. However, allowing third parties to access processed data to achieve their own objectives significantly increases intergation, cooperation, and can also lead to innovative use of the data. Multicloud, privacy-aware environments facilitate such data access, allowing different parties to share processed data to reduce computation resource consumption collectively. However, there are interoperability issues in such environments that involve heterogeneous data and analytics-as-a-service providers. There is a lack of both - architectural blueprints that can support such diverse, multi-cloud environments, and corresponding empirical studies that show feasibility of such architectures. In this paper, we have outlined an innovative hierarchical data processing architecture that utilises semantics at all the levels of IoT stack in multicloud environments. We demonstrate the feasibility of such architecture by building a system based on this architecture using OpenIoT as a middleware, and Google Cloud and Microsoft Azure as cloud environments. The evaluation shows that the system is scalable and has no significant limitations or overheads.
19	An Empirical Study of Novel Approaches to Dimensionality Reduction and Applications Nsang, Augustine S. 23 September 2011 (has links) No description available. Computer Science dimensionality reduction random projections clustering classification queries web data
20	SEEDEEP: A System for Exploring and Querying Deep Web Data Sources Wang, Fan 27 September 2010 (has links) No description available. Computer Science Deep Web Data Integration Query Planning Query Optimization Data Management Web Data

Search results