Global ETD Search

11	Seleção de valores para preenchimento de formulários web / Selection of values for form filling Moraes, Tiago Guimarães January 2013 (has links) Os motores de busca tradicionais utilizam técnicas que rastreiam as páginas na Web através de links HTML. Porém a maior parte da Web não é acessada por essas técnicas. A parcela da Web não acessada é chamada de Web oculta. Uma enorme quantidade de informação estruturada e de melhor qualidade que a presente na Web tradicional está disponível atrás das interfaces de busca, os formulários que são pontos de entrada para a Web oculta. Essa porção da Web é de difícil acesso para os motores de busca, pois o preenchimento correto dos formulários representa um grande desafio, dado que foram construídos para a manipulação humana e possuem grande variabilidade e diversidade de línguas e domínios. O grande desafio é selecionar os valores corretos para os campos do formulário, realizando um número reduzido de submissões que obtenha a cobertura da maior parte da base de dados por trás do formulário. Vários trabalhos propõem métodos para busca na Web oculta, porém a maior parte deles apresenta grandes limitações para a aplicação automática na Web. Entre as principais limitações estão a dependência de informação prévia a respeito do domínio dos formulários, o não tratamento de todos os tipos de campos que um formulário pode apresentar e a correta seleção de um subgrupo do conjunto de todas as possibilidades de preenchimento de um formulário. No presente trabalho é apresentada uma arquitetura genérica para o preenchimento automático de formulários. A principal contribuição dessa arquitetura consiste na seleção de valores para o preenchimento de formulários através do método ITP (Instance template pruning). para o preenchimento de formulários através do método ITP (Instance template pruning). Muitos formulários apresentam um número inviável de possibilidades de preenchimento quando combinam os valores dos campos. O método ITP consegue reduzir drasticamente o número de possibilidades. A poda de diversas consultas é possível à medida que as submissões são feitas e o conhecimento a respeito do formulário é obtido. Os experimentos realizados mostraram que o método proposto é superior ao método utilizado como baseline. A comparação foi feita com o método que representa o estado da arte. O método proposto pode ser utilizado em conjunto com outros métodos de forma a obter uma busca efetiva na Web oculta. Desta forma, os experimentos a partir da combinação do ITP com o baseline também implicaram em bons resultados. / The traditional search engines crawl the Web pages through HTML links. However, the biggest part of the Web is invisible for these crawlers. The portion of the Web which is not accessed is called hidden Web. An enormous quantity of structured data and with higher quality than in the traditional Web is available behind search interfaces, the forms that are the entry points to the hidden Web. Access this part of theWeb by search engines is difficult because the correct filling of forms represent a big challenge. Since these forms are built for human manipulation and have big variability and diversity of domains and languages. The challenge is to select the correct values to fill the form fields, with a few number of submissions that reach good coverage of the database behind the form. Several works proposed methods to search the hidden Web. Most of these works present big limitations for an application that surfaces the entire Web in a horizontal and automatic way. The main limitations are the dependency of prior information about the form domains, the non-treatment of the all form field types and the correct selection of a subgroup of the set of all form filling possibilities. In the present work is presented a generic architecture for the automatic form filling. The main contribution of this architecture is the selection of values for the form submission through the ITP (Instance Template Pruning) method. Several forms have an infeasible number of form filling possibilities when combining all fields and values. The ITP method can drastically reduce the number of possibilities. The prune of many possible queries is feasible as the submissions are made and the knowledge about the form is obtained. The results of the experiments performed indicate that the ITP method is superior to the baseline utilized. The comparison is made with the method that represents the state of the art. The proposed method can be used with other methods in order to an effective search in the hidden Web. Therefore, the results by the combination of ITP and baseline methods also have implicated in good results. Banco : Dados Desenvolvimento : Software Serviços Web Hidden web crawling Deep web crawling Automatic filling forms Automatic query selection
12	Seleção de valores para preenchimento de formulários web / Selection of values for form filling Moraes, Tiago Guimarães January 2013 (has links) Os motores de busca tradicionais utilizam técnicas que rastreiam as páginas na Web através de links HTML. Porém a maior parte da Web não é acessada por essas técnicas. A parcela da Web não acessada é chamada de Web oculta. Uma enorme quantidade de informação estruturada e de melhor qualidade que a presente na Web tradicional está disponível atrás das interfaces de busca, os formulários que são pontos de entrada para a Web oculta. Essa porção da Web é de difícil acesso para os motores de busca, pois o preenchimento correto dos formulários representa um grande desafio, dado que foram construídos para a manipulação humana e possuem grande variabilidade e diversidade de línguas e domínios. O grande desafio é selecionar os valores corretos para os campos do formulário, realizando um número reduzido de submissões que obtenha a cobertura da maior parte da base de dados por trás do formulário. Vários trabalhos propõem métodos para busca na Web oculta, porém a maior parte deles apresenta grandes limitações para a aplicação automática na Web. Entre as principais limitações estão a dependência de informação prévia a respeito do domínio dos formulários, o não tratamento de todos os tipos de campos que um formulário pode apresentar e a correta seleção de um subgrupo do conjunto de todas as possibilidades de preenchimento de um formulário. No presente trabalho é apresentada uma arquitetura genérica para o preenchimento automático de formulários. A principal contribuição dessa arquitetura consiste na seleção de valores para o preenchimento de formulários através do método ITP (Instance template pruning). para o preenchimento de formulários através do método ITP (Instance template pruning). Muitos formulários apresentam um número inviável de possibilidades de preenchimento quando combinam os valores dos campos. O método ITP consegue reduzir drasticamente o número de possibilidades. A poda de diversas consultas é possível à medida que as submissões são feitas e o conhecimento a respeito do formulário é obtido. Os experimentos realizados mostraram que o método proposto é superior ao método utilizado como baseline. A comparação foi feita com o método que representa o estado da arte. O método proposto pode ser utilizado em conjunto com outros métodos de forma a obter uma busca efetiva na Web oculta. Desta forma, os experimentos a partir da combinação do ITP com o baseline também implicaram em bons resultados. / The traditional search engines crawl the Web pages through HTML links. However, the biggest part of the Web is invisible for these crawlers. The portion of the Web which is not accessed is called hidden Web. An enormous quantity of structured data and with higher quality than in the traditional Web is available behind search interfaces, the forms that are the entry points to the hidden Web. Access this part of theWeb by search engines is difficult because the correct filling of forms represent a big challenge. Since these forms are built for human manipulation and have big variability and diversity of domains and languages. The challenge is to select the correct values to fill the form fields, with a few number of submissions that reach good coverage of the database behind the form. Several works proposed methods to search the hidden Web. Most of these works present big limitations for an application that surfaces the entire Web in a horizontal and automatic way. The main limitations are the dependency of prior information about the form domains, the non-treatment of the all form field types and the correct selection of a subgroup of the set of all form filling possibilities. In the present work is presented a generic architecture for the automatic form filling. The main contribution of this architecture is the selection of values for the form submission through the ITP (Instance Template Pruning) method. Several forms have an infeasible number of form filling possibilities when combining all fields and values. The ITP method can drastically reduce the number of possibilities. The prune of many possible queries is feasible as the submissions are made and the knowledge about the form is obtained. The results of the experiments performed indicate that the ITP method is superior to the baseline utilized. The comparison is made with the method that represents the state of the art. The proposed method can be used with other methods in order to an effective search in the hidden Web. Therefore, the results by the combination of ITP and baseline methods also have implicated in good results. Banco : Dados Desenvolvimento : Software Serviços Web Hidden web crawling Deep web crawling Automatic filling forms Automatic query selection
13	Improving the Chatbot Experience : With a Content-based Recommender System Gardner, Angelica January 2019 (has links) Chatbots are computer programs with the capability to lead a conversation with a human user. When a chatbot is unable to match a user’s utterance to any predefined answer, it will use a fallback intent; a generic response that does not contribute to the conversation in any meaningful way. This report aims to investigate if a content-based recommender system could provide support to a chatbot agent in case of these fallback experiences. Content-based recommender systems use content to filter, prioritize and deliver relevant information to users. Their purpose is to search through a large amount of content and predict recommendations based on user requirements. The recommender system developed in this project consists of four components: a web spider, a Bag-of-words model, a graph database, and the GraphQL API. The anticipation was to capture web page articles and rank them with a numeric scoring to figure out which articles that make for the best recommendation concerning given subjects. The chatbot agent could then use these recommended articles to provide the user with value and help instead of a generic response. After the evaluation, it was found that the recommender system in principle fulfilled all requirements, but that the scoring algorithm used could achieve significant improvements in its recommendations if a more advanced algorithm would be implemented. The scoring algorithm used in this project is based on word count, which lacks taking the context of the dialogue between the user and the agent into consideration, among other things. Chatbot Recommender system Web crawling Bag-of-words Graph database GraphQL Computer Engineering Datorteknik
14	Model-based Crawling - An Approach to Design Efficient Crawling Strategies for Rich Internet Applications Dincturk, Mustafa Emre 02 August 2013 (has links) Rich Internet Applications (RIAs) are a new generation of web applications that break away from the concepts on which traditional web applications are based. RIAs are more interactive and responsive than traditional web applications since RIAs allow client-side scripting (such as JavaScript) and asynchronous communication with the server (using AJAX). Although these are improvements in terms of user-friendliness, there is a big impact on our ability to automatically explore (crawl) these applications. Traditional crawling algorithms are not sufficient for crawling RIAs. We should be able to crawl RIAs in order to be able to search their content and build their models for various purposes such as reverse-engineering, detecting security vulnerabilities, assessing usability, and applying model-based testing techniques. One important problem is designing efficient crawling strategies for RIAs. It seems possible to design crawling strategies more efficient than the standard crawling strategies, the Breadth-First and the Depth-First. In this thesis, we explore the possibilities of designing efficient crawling strategies. We use a general approach that we called Model-based Crawling and present two crawling strategies that are designed using this approach. We show by experimental results that model-based crawling strategies are more efficient than the standard strategies. Rich Internet Applications Web Crawling Web Applications Modeling Model-based Crawling AJAX JavaScript
15	Analysis Of Turkey Oralalp, Sertac 01 May 2010 (has links) (PDF) In this study, Turkey&rsquo / s Internet visibility will be analyzed based on data to be collected from multiple different resources (such as / Google, Yahoo, Altavista, Bing and AOL). Analysis work will involve inspection of DNS queries, Web crawling and some other similar techniques. Our goal is to investigate global Internet and find webs that has common pattern of representing Internet visibility of Turkey and compare their characteristics with other webs&#039 / on the world and discover their similarities and differences. ZA Internet 4201-4251 Turkey&rsquo
16	Interaktivní procházení webu a extrakce dat / Interactive web crawling and data extraction Fejfar, Petr January 2018 (has links) Title: Interactive crawling and data extraction Author: Bc. Petr Fejfar Author's e-mail address: pfejfar@gmail.com Department: Department of Distributed and Dependable Systems Supervisor: Mgr. Pavel Je ek, Ph.D., Department of Distributed and De- pendable Systems Abstract: The subject of this thesis is Web crawling and data extraction from Rich Internet Applications (RIA). The thesis starts with analysis of modern Web pages along with techniques used for crawling and data extraction. Based on this analysis, we designed a tool which crawls RIAs according to the instructions defined by the user via graphic interface. In contrast with other currently popular tools for RIAs, our solution is targeted at users with no programming experience, including business and analyst users. The designed solution itself is implemented in form of RIA, using the Web- Driver protocol to automate multiple browsers according to user-defined instructions. Our tool allows the user to inspect browser sessions by dis- playing pages that are being crawled simultaneously. This feature enables the user to troubleshoot the crawlers. The outcome of this thesis is a fully design and implemented tool enabling business user to extract data from the RIAs. This opens new opportunities for this type of user to collect data from Web pages for use...
17	Internet das coisas: controvérsias nas notícias e redes temáticas Singer, Talyta Louise January 2014 (has links) Submitted by Pós-Com Pós-Com (pos-com@ufba.br) on 2015-04-17T15:06:44Z No. of bitstreams: 1 Talyta Louise Todescat Singer - Dissertação.pdf: 4034000 bytes, checksum: dffc645dc88549ae313a8951db3d574f (MD5) / Approved for entry into archive by Vania Magalhaes (magal@ufba.br) on 2017-09-29T16:25:19Z (GMT) No. of bitstreams: 1 Talyta Louise Todescat Singer - Dissertação.pdf: 4034000 bytes, checksum: dffc645dc88549ae313a8951db3d574f (MD5) / Made available in DSpace on 2017-09-29T16:25:19Z (GMT). No. of bitstreams: 1 Talyta Louise Todescat Singer - Dissertação.pdf: 4034000 bytes, checksum: dffc645dc88549ae313a8951db3d574f (MD5) / CNPQ / Esta pesquisa se dedica a realizar um estudo exploratório da internet das coisas, identificando e descrevendo os tensionamentos surgidos da produção, captura, processamento e/ou transmissão de informação por objetos interconectados observáveis a partir dos rastros digitais públicos. Nos concentramos em identificar questões sensíveis, temas que levantem discussões e mobilizem diferentes tipos de atores – desenvolvedores, políticos, usuários, indústria, leis, protocolos – e possam ser compreendidas por mais de um ponto de vista. A pesquisa tem como marco teórico a Teoria Ator-Rede que a partir de seu princípio de simetria entre humanos e não-humanos que confere a ambos a possibilidade de agência. A metodologia empregada é a de cartografia de controvérsias, um conjunto de técnicas aplicáveis a exploração e visualização de conflitos. A pesquisa empírica é formada por um mapeamento da rede temática da internet das coisas a partir de web crawling de sites em português e em inglês e análise de conteúdo de notícias publicadas sobre o assunto em sites de notícia de grande visibilidade. A pesquisa identificou controvérsias em seis temas: dependência tecnológica, software livre, padronização, legislação, privacidade e segurança. A dissertação está dividida em três capítulos: o primeiro dedicado a criar um panorama atual da internet das coisas, discutir conceitos e apresentar uma linha do tempo; o segundo apresenta conceitos-chave da Teoria Ator-Rede e as etapas da cartografia de controvérsias; o último capítulo explicita as etapas de coleta e processamento de dados e os principais resultados. Nas considerações finais apresentamos um quadro síntese das controvérsias encontradas e uma árvore de argumentos que identifica os principais pontos de discordância e os principais atores que participam das discussões em torno dos objetos conectados. / The present research is an exploratory study of internet of things that identifies and describes the conflicts emerging on connected objects production, processing and transmission of information. We are concerned about the sensible questions that involve different kinds of actants -users, developers, politicians, industry, laws, communication protocols - and needs to be observed by multiple viewpoints. Actor - Network Theory is our theoretical framework that includes the principle of generalized symmetry be tween humans and non-humans and gives them both agency capacity. We use the cartography of controversies as a method to explore and visualize public debates through a set of techniques that includes web crawling and content analysis. Our results show controversies about six subjects: internet addiction, free software, standardization, legislation, privacy and security. The research is divides in three chapters: the first one creates a landscape of internet of things, its definition and history; the second one presents the key concepts about Actor-Network Theory and the layers of cartography of controversies; the last chapter reports our methodological choices and main results. Our final remarks show a summary table of controversies found and a disagreement tree. Comunicação e cultura contemporâneas Internet das coisas Cartografia de controvérsias Teoria ator-rede Análise de conteúdo Web crawling
18	Breaking Hash-Tag Detection Algorithm for Social Media (Twitter) January 2015 (has links) abstract: In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm value patterns, breakouts, and spot potential reversals. In my thesis, I hypothesize that the concept of trading volume can be extrapolated to social media (Twitter). The ubiquity of social media, especially Twitter, in financial market has been overly resonant in the past couple of years. With the growth of its (Twitter) usage by news channels, financial experts and pandits, the global economy does seem to hinge on 140 characters. By analyzing the number of tweets hash tagged to a stock, a strong relation can be established between the number of people talking about it, to the trading volume of the stock. In my work, I overt this relation and find a state of the breakout when the volume goes beyond a characterized support or resistance level. / Dissertation/Thesis / Masters Thesis Computer Science 2015 Computer science Finance Economics Algorithm hashtags Stock Prediction Twitter Volume Breakout Web Crawling
19	Skrapa försäljningssidor på nätet : Ett ramverk för webskrapningsrobotar Karlsson, Emil, Edberg, Mikael January 2016 (has links) På internet finns det idag ett stort utbud av försäljningswebbsidor där det hela tiden inkommer nya annonser. Vi ser att det finns ett behov av ett verktyg som övervakar de här webbsidorna dygnet runt för att se hur mycket som säljs och vad som säljs. Att skapa ett program som övervakar webbsidor är tidskrävande, därför har vi skapat ett ramverk som underlättar skapandet av webbskrapare som är fokuserade på att listbaserade försäljningswebbsidor på nätet. Det finns flera olika ramverk för webbskrapning, men det finns väldigt få som endast är fokuserade på den här typen av webbsidor. Web scraping Web crawling Framework Listbased sales websites. Computer Sciences Datavetenskap (datalogi)
20	Model-based Crawling - An Approach to Design Efficient Crawling Strategies for Rich Internet Applications Dincturk, Mustafa Emre January 2013 (has links) Rich Internet Applications (RIAs) are a new generation of web applications that break away from the concepts on which traditional web applications are based. RIAs are more interactive and responsive than traditional web applications since RIAs allow client-side scripting (such as JavaScript) and asynchronous communication with the server (using AJAX). Although these are improvements in terms of user-friendliness, there is a big impact on our ability to automatically explore (crawl) these applications. Traditional crawling algorithms are not sufficient for crawling RIAs. We should be able to crawl RIAs in order to be able to search their content and build their models for various purposes such as reverse-engineering, detecting security vulnerabilities, assessing usability, and applying model-based testing techniques. One important problem is designing efficient crawling strategies for RIAs. It seems possible to design crawling strategies more efficient than the standard crawling strategies, the Breadth-First and the Depth-First. In this thesis, we explore the possibilities of designing efficient crawling strategies. We use a general approach that we called Model-based Crawling and present two crawling strategies that are designed using this approach. We show by experimental results that model-based crawling strategies are more efficient than the standard strategies. Rich Internet Applications Web Crawling Web Applications Modeling Model-based Crawling AJAX JavaScript

Search results