Global ETD Search

21	Information Diffusion on Twitter Zhou, Li 03 June 2015 (has links) No description available. Computer Science Social Network Web Crawling Information Diffusion Simulation ANOVA OLS
22	Automated Discovery, Binding, and Integration Of GIS Web Services Shulman, Lev 18 May 2007 (has links) The last decade has demonstrated steady growth and utilization of Web Service technology. While Web Services have become significant in a number of IT domains such as eCommerce, digital libraries, data feeds, and geographical information systems, common portals or registries of Web Services require manual publishing for indexing. Manually compiled registries of Web Services have proven useful but often fail to include a considerable amount of Web Services published and available on the Web. We propose a system capable of finding, binding, and integrating Web Services into an index in an automated manner. By using a combination of guided search and web crawling techniques, the system finds a large number of Web Service providers that are further bound and aggregated into a single portal available for public use. Results show that this approach is successful in discovering a considerable number of Web Services in the GIS(Geographical Information Systems) domain, and demonstrate improvements over existing methods of Web Service Discovery. GIS Web Services Web Services Web Crawling Web Map Services XML Data Management Automated Discovery of Web Services Web Service Portal
23	Analysis of turkey&#039 / s visibility on global intrenet Oralalp, Sertac 01 May 2010 (has links) (PDF) In this study, Turkey&rsquo / s Internet visibility will be analyzed based on data to be collected from multiple different resources (such as / Google, Yahoo, Altavista, Bing and AOL). Analysis work will involve inspection of DNS queries, Web crawling and some other similar techniques. Our goal is to investigate global Internet and find webs that has common pattern of representing Internet visibility of Turkey and compare their characteristics with other webs&#039 / on the world and discover their similarities and differences. Turkey&rsquo
24	A Scalable P2P RIA Crawling System with Fault Tolerance Ben Hafaiedh, Khaled January 2016 (has links) Rich Internet Applications (RIAs) have been widely used in the web over the last decade as they were found to be responsive and user-friendly compared to traditional web applications. RIAs use client-side scripting such as JavaScript which allows for asynchronous updates on the server-side using AJAX (Asynchronous JavaScript and XML). Due to the large size of RIAs and therefore the long time required for crawling, distributed RIA crawling has been introduced with the aim to decrease the crawling time. However, the current RIA crawling systems are not scalable, i.e. they are limited to a relatively low number of crawlers. Furthermore, they do not allow for fault tolerance in case that a failure occurs in one of their components. In this research, we address the scalability and resilience problems when crawling RIAs in a distributed environment and we explore the possibilities of designing an efficient RIA crawling system that is scalable and fault-tolerant. Our approach is to partition the search space among several storage devices (distributed databases) over a peer-to-peer (P2P) network where each database is responsible for storing only a portion of the RIA graph. This makes the distributed data structure invulnerable to a single point of failure. However, accessing the distributed data required by crawlers makes the crawling task challenging when the number of crawlers becomes high. We show by simulation results and analytical reasoning that our system is scalable and fault-tolerant. Furthermore, simulation results show that the crawling time using the P2P crawling system is significantly faster than the crawling time using both the non-distributed crawling system and the distributed crawling system using a single database. Fault Tolerance Data Recovery Rich Internet Applications Web Crawling RIA Crawling Distributed RIA Crawling P2P Networks Graph Exploration
25	Extrakce informací z webových stránek / Information Extraction from Web Pages Bukovčák, Jakub January 2019 (has links) This master thesis is focused on current technologies that are used for downloading web pages and extraction of structured information from them. The paper describes available tools to make this process possible and easier. Another part of this document provides the overview of technologies that can be used for creating web pages. Also, there is an information about development of information systems with web user interface based on Java Enterprise Edition (Java EE) platform. The main part of this master thesis describes design and implementation of application used to specify and manage extraction tasks. The last part of this project describes application testing on real web pages and evaluation of achieved results.
26	Lokman: A Medical Ontology Based Topical Web Crawler Kayisoglu, Altug 01 September 2005 (has links) (PDF) Use of ontology is an approach to overcome the &ldquo / search-on-the-net&rdquo / problem. An ontology based web information retrieval system requires a topical web crawler to construct a high quality document collection. This thesis focuses on implementing a topical web crawler with medical domain ontology in order to find out the advantages of ontological information in web crawling. Crawler is implemented with Best-First search algorithm. Design of the crawler is optimized to UMLS ontology. Crawler is tested with Harvest Rate and Target Recall Metrics and compared to a non-ontology based Best-First Crawler. Performed test results proved that ontology use in crawler URL selection algorithm improved the crawler performance by 76%. QA Computer Software 76.75-76.765
27	[en] ALUMNI TOOL: INFORMATION RECOVERY OF PERSONAL DATA ON THE WEB IN AUTHENTICATED SOCIAL NETWORKS / [pt] ALUMNI TOOL: RECUPERAÇÃO DE DADOS PESSOAIS NA WEB EM REDES SOCIAIS AUTENTICADAS LUIS GUSTAVO ALMEIDA 02 August 2018 (has links) [pt] O uso de robôs de busca para coletar informações para um determinado contexto sempre foi um problema desafiante e tem crescido substancialmente nos últimos anos. Por exemplo, robôs de busca podem ser utilizados para capturar dados de redes sociais profissionais. Em particular, tais redes permitem estudar as trajetórias profissionais dos egressos de uma universidade, e responder diversas perguntas, como por exemplo: Quanto tempo um ex-aluno da PUC-Rio leva para chegar a um cargo de relevância? No entanto, um problema de natureza comum a este cenário é a impossibilidade de coletar informações devido a sistemas de autenticação, impedindo um robô de busca de acessar determinadas páginas e conteúdos. Esta dissertação aborda uma solução para capturar dados, que contorna o problema de autenticação e automatiza o processo de coleta de dados. A solução proposta coleta dados de perfis de usuários de uma rede social profissional para armazenamento em banco de dados e posterior análise. A dissertação contempla ainda a possibilidade de adicionar diversas outras fontes de dados dando ênfase a uma estrutura de armazém de dados. / [en] The use of search bots to collect information for a given context has grown substantially in recent years. For example, search bots may be used to capture data from professional social networks. In particular, such social networks facilitate studying the professional trajectory of the alumni of a given university, and answer several questions such as: How long does a former student of PUC-Rio take to arrive at a management position? However, a common problem in this scenario is the inability to collect information due to authentication systems, preventing a search robot from accessing certain pages and content. This dissertation addresses a solution to capture data, which circumvents the authentication problem and automates the data collection process. The proposed solution collects data from user profiles for later database storage and analysis. The dissertation also contemplates the possibility of adding several other sources of data giving emphasis to a data warehouse structure. [pt] RECUPERACAO DE INFORMACAO [en] INFORMATION RETRIEVAL [pt] WEB CRAWLING [en] WEB CRAWLING [pt] COLETA DE DADOS [en] DATA RETRIEVAL [pt] BIG DATA [en] BIG DATA [pt] BOTS [en] BOTS [pt] REDES SOCIAIS [en] SOCIAL MEDIA [pt] SELENIUM [en] SELENIUM [pt] SCRAPING [en] SCRAPING [pt] ROBOS DE BUSCA [en] SEARCH ENGINE [pt] WEB SPIDER [en] WEB SPIDER

Page generated in 0.0828 seconds