Spelling suggestions: "subject:"[een] WEB CRAWLING"" "subject:"[enn] WEB CRAWLING""
11 |
Model-based Crawling - An Approach to Design Efficient Crawling Strategies for Rich Internet ApplicationsDincturk, Mustafa Emre 02 August 2013 (has links)
Rich Internet Applications (RIAs) are a new generation of web applications that break away from the concepts on which traditional web applications are based. RIAs are more interactive and responsive than traditional web applications since RIAs allow client-side scripting (such as JavaScript) and asynchronous communication with the server (using AJAX). Although these are improvements in terms of user-friendliness, there is a big impact on our ability to automatically explore (crawl) these applications. Traditional crawling algorithms are not sufficient for crawling RIAs. We should be able to crawl RIAs in order to be able to search their content and build their models for various purposes such as reverse-engineering, detecting security vulnerabilities, assessing usability, and applying model-based testing techniques. One important problem is designing efficient crawling strategies for RIAs. It seems possible to design crawling strategies more efficient than the standard crawling strategies, the Breadth-First and the Depth-First. In this thesis, we explore the possibilities of designing efficient crawling strategies. We use a general approach that we called Model-based Crawling and present two crawling strategies that are designed using this approach. We show by experimental results that model-based crawling strategies are more efficient than the standard strategies.
|
12 |
Analysis Of TurkeyOralalp, Sertac 01 May 2010 (has links) (PDF)
In this study, Turkey&rsquo / s Internet visibility will be analyzed based on data to be collected from multiple different resources (such as / Google, Yahoo, Altavista, Bing and AOL). Analysis work will involve inspection of DNS queries, Web crawling and some other similar techniques. Our goal is to investigate global Internet and find webs that has common pattern of representing Internet visibility of Turkey and compare their characteristics with other webs' / on the world and discover their similarities and differences.
|
13 |
Interaktivní procházení webu a extrakce dat / Interactive web crawling and data extractionFejfar, Petr January 2018 (has links)
Title: Interactive crawling and data extraction Author: Bc. Petr Fejfar Author's e-mail address: pfejfar@gmail.com Department: Department of Distributed and Dependable Systems Supervisor: Mgr. Pavel Je ek, Ph.D., Department of Distributed and De- pendable Systems Abstract: The subject of this thesis is Web crawling and data extraction from Rich Internet Applications (RIA). The thesis starts with analysis of modern Web pages along with techniques used for crawling and data extraction. Based on this analysis, we designed a tool which crawls RIAs according to the instructions defined by the user via graphic interface. In contrast with other currently popular tools for RIAs, our solution is targeted at users with no programming experience, including business and analyst users. The designed solution itself is implemented in form of RIA, using the Web- Driver protocol to automate multiple browsers according to user-defined instructions. Our tool allows the user to inspect browser sessions by dis- playing pages that are being crawled simultaneously. This feature enables the user to troubleshoot the crawlers. The outcome of this thesis is a fully design and implemented tool enabling business user to extract data from the RIAs. This opens new opportunities for this type of user to collect data from Web pages for use...
|
14 |
Internet das coisas: controvérsias nas notícias e redes temáticasSinger, Talyta Louise January 2014 (has links)
Submitted by Pós-Com Pós-Com (pos-com@ufba.br) on 2015-04-17T15:06:44Z
No. of bitstreams: 1
Talyta Louise Todescat Singer - Dissertação.pdf: 4034000 bytes, checksum: dffc645dc88549ae313a8951db3d574f (MD5) / Approved for entry into archive by Vania Magalhaes (magal@ufba.br) on 2017-09-29T16:25:19Z (GMT) No. of bitstreams: 1
Talyta Louise Todescat Singer - Dissertação.pdf: 4034000 bytes, checksum: dffc645dc88549ae313a8951db3d574f (MD5) / Made available in DSpace on 2017-09-29T16:25:19Z (GMT). No. of bitstreams: 1
Talyta Louise Todescat Singer - Dissertação.pdf: 4034000 bytes, checksum: dffc645dc88549ae313a8951db3d574f (MD5) / CNPQ / Esta pesquisa se dedica a realizar um estudo exploratório da internet das coisas, identificando e descrevendo os tensionamentos surgidos da produção, captura, processamento e/ou transmissão de informação por objetos interconectados observáveis a partir dos rastros digitais públicos. Nos concentramos em identificar questões sensíveis, temas que levantem discussões e mobilizem diferentes tipos de atores – desenvolvedores, políticos, usuários, indústria, leis, protocolos – e possam ser compreendidas por mais de um ponto de vista. A pesquisa tem como marco teórico a Teoria Ator-Rede que a partir de seu princípio de simetria entre humanos e não-humanos que confere a ambos a possibilidade de agência. A metodologia empregada é a de cartografia de controvérsias, um conjunto de técnicas aplicáveis a exploração e visualização de conflitos. A pesquisa empírica é formada por um mapeamento da rede temática da internet das coisas a partir de web crawling de sites em português e em inglês e análise de conteúdo de notícias publicadas sobre o assunto em sites de notícia de grande visibilidade. A pesquisa identificou controvérsias em seis temas: dependência tecnológica, software livre, padronização, legislação, privacidade e segurança. A dissertação está dividida em três capítulos: o primeiro dedicado a criar um panorama atual da internet das coisas, discutir conceitos e apresentar uma linha do tempo; o segundo apresenta conceitos-chave da Teoria Ator-Rede e as etapas da cartografia de controvérsias; o último capítulo explicita as etapas de coleta e processamento de dados e os principais resultados. Nas considerações finais apresentamos um quadro síntese das controvérsias encontradas e uma árvore de argumentos que identifica os principais pontos de discordância e os principais atores que participam das discussões em torno dos objetos conectados. / The present research is an exploratory study of internet of things that identifies
and describes the conflicts emerging on connected objects production,
processing and transmission of information. We are concerned about the
sensible questions that involve different kinds of actants -users, developers,
politicians, industry, laws, communication protocols - and needs to be observed by multiple viewpoints. Actor - Network Theory is our theoretical framework that
includes the principle of generalized symmetry be tween humans and non-humans and gives them both agency capacity. We use the cartography of controversies as a method to explore and visualize public debates through a set of techniques that includes web crawling and content analysis. Our results show controversies about six subjects: internet addiction, free software, standardization, legislation, privacy and security. The research is divides in three chapters: the first one creates a landscape of internet of things, its definition and history; the second one presents the key concepts about Actor-Network Theory and the layers of cartography of controversies; the last chapter reports our methodological choices and main results. Our final remarks show a summary table of controversies found and a disagreement tree.
|
15 |
Breaking Hash-Tag Detection Algorithm for Social Media (Twitter)January 2015 (has links)
abstract: In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm value patterns, breakouts, and spot potential reversals. In my thesis, I hypothesize that the concept of trading volume can be extrapolated to social media (Twitter).
The ubiquity of social media, especially Twitter, in financial market has been overly resonant in the past couple of years. With the growth of its (Twitter) usage by news channels, financial experts and pandits, the global economy does seem to hinge on 140 characters. By analyzing the number of tweets hash tagged to a stock, a strong relation can be established between the number of people talking about it, to the trading volume of the stock.
In my work, I overt this relation and find a state of the breakout when the volume goes beyond a characterized support or resistance level. / Dissertation/Thesis / Masters Thesis Computer Science 2015
|
16 |
Skrapa försäljningssidor på nätet : Ett ramverk för webskrapningsrobotarKarlsson, Emil, Edberg, Mikael January 2016 (has links)
På internet finns det idag ett stort utbud av försäljningswebbsidor där det hela tiden inkommer nya annonser. Vi ser att det finns ett behov av ett verktyg som övervakar de här webbsidorna dygnet runt för att se hur mycket som säljs och vad som säljs. Att skapa ett program som övervakar webbsidor är tidskrävande, därför har vi skapat ett ramverk som underlättar skapandet av webbskrapare som är fokuserade på att listbaserade försäljningswebbsidor på nätet. Det finns flera olika ramverk för webbskrapning, men det finns väldigt få som endast är fokuserade på den här typen av webbsidor.
|
17 |
Model-based Crawling - An Approach to Design Efficient Crawling Strategies for Rich Internet ApplicationsDincturk, Mustafa Emre January 2013 (has links)
Rich Internet Applications (RIAs) are a new generation of web applications that break away from the concepts on which traditional web applications are based. RIAs are more interactive and responsive than traditional web applications since RIAs allow client-side scripting (such as JavaScript) and asynchronous communication with the server (using AJAX). Although these are improvements in terms of user-friendliness, there is a big impact on our ability to automatically explore (crawl) these applications. Traditional crawling algorithms are not sufficient for crawling RIAs. We should be able to crawl RIAs in order to be able to search their content and build their models for various purposes such as reverse-engineering, detecting security vulnerabilities, assessing usability, and applying model-based testing techniques. One important problem is designing efficient crawling strategies for RIAs. It seems possible to design crawling strategies more efficient than the standard crawling strategies, the Breadth-First and the Depth-First. In this thesis, we explore the possibilities of designing efficient crawling strategies. We use a general approach that we called Model-based Crawling and present two crawling strategies that are designed using this approach. We show by experimental results that model-based crawling strategies are more efficient than the standard strategies.
|
18 |
Information Diffusion on TwitterZhou, Li 03 June 2015 (has links)
No description available.
|
19 |
Automated Discovery, Binding, and Integration Of GIS Web ServicesShulman, Lev 18 May 2007 (has links)
The last decade has demonstrated steady growth and utilization of Web Service technology. While Web Services have become significant in a number of IT domains such as eCommerce, digital libraries, data feeds, and geographical information systems, common portals or registries of Web Services require manual publishing for indexing. Manually compiled registries of Web Services have proven useful but often fail to include a considerable amount of Web Services published and available on the Web. We propose a system capable of finding, binding, and integrating Web Services into an index in an automated manner. By using a combination of guided search and web crawling techniques, the system finds a large number of Web Service providers that are further bound and aggregated into a single portal available for public use. Results show that this approach is successful in discovering a considerable number of Web Services in the GIS(Geographical Information Systems) domain, and demonstrate improvements over existing methods of Web Service Discovery.
|
20 |
Analysis of turkey' / s visibility on global intrenetOralalp, Sertac 01 May 2010 (has links) (PDF)
In this study, Turkey&rsquo / s Internet visibility will be analyzed based on data to be collected from multiple different resources (such as / Google, Yahoo, Altavista, Bing and AOL). Analysis work will involve inspection of DNS queries, Web crawling and some other similar techniques. Our goal is to investigate global Internet and find webs that has common pattern of representing Internet visibility of Turkey and compare their characteristics with other webs' / on the world and discover their similarities and differences.
|
Page generated in 0.0325 seconds