• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 13
  • 6
  • 1
  • 1
  • 1
  • Tagged with
  • 23
  • 23
  • 7
  • 7
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Extrakce informací z webových stránek / Information Extraction from Web Pages

Bukovčák, Jakub January 2019 (has links)
This master thesis is focused on current technologies that are used for downloading web pages and extraction of structured information from them. The paper describes available tools to make this process possible and easier. Another part of this document provides the overview of technologies that can be used for creating web pages. Also, there is an information about development of information systems with web user interface based on Java Enterprise Edition (Java EE) platform. The main part of this master thesis describes design and implementation of application used to specify and manage extraction tasks. The last part of this project describes application testing on real web pages and evaluation of achieved results.
22

Lokman: A Medical Ontology Based Topical Web Crawler

Kayisoglu, Altug 01 September 2005 (has links) (PDF)
Use of ontology is an approach to overcome the &ldquo / search-on-the-net&rdquo / problem. An ontology based web information retrieval system requires a topical web crawler to construct a high quality document collection. This thesis focuses on implementing a topical web crawler with medical domain ontology in order to find out the advantages of ontological information in web crawling. Crawler is implemented with Best-First search algorithm. Design of the crawler is optimized to UMLS ontology. Crawler is tested with Harvest Rate and Target Recall Metrics and compared to a non-ontology based Best-First Crawler. Performed test results proved that ontology use in crawler URL selection algorithm improved the crawler performance by 76%.
23

[en] ALUMNI TOOL: INFORMATION RECOVERY OF PERSONAL DATA ON THE WEB IN AUTHENTICATED SOCIAL NETWORKS / [pt] ALUMNI TOOL: RECUPERAÇÃO DE DADOS PESSOAIS NA WEB EM REDES SOCIAIS AUTENTICADAS

LUIS GUSTAVO ALMEIDA 02 August 2018 (has links)
[pt] O uso de robôs de busca para coletar informações para um determinado contexto sempre foi um problema desafiante e tem crescido substancialmente nos últimos anos. Por exemplo, robôs de busca podem ser utilizados para capturar dados de redes sociais profissionais. Em particular, tais redes permitem estudar as trajetórias profissionais dos egressos de uma universidade, e responder diversas perguntas, como por exemplo: Quanto tempo um ex-aluno da PUC-Rio leva para chegar a um cargo de relevância? No entanto, um problema de natureza comum a este cenário é a impossibilidade de coletar informações devido a sistemas de autenticação, impedindo um robô de busca de acessar determinadas páginas e conteúdos. Esta dissertação aborda uma solução para capturar dados, que contorna o problema de autenticação e automatiza o processo de coleta de dados. A solução proposta coleta dados de perfis de usuários de uma rede social profissional para armazenamento em banco de dados e posterior análise. A dissertação contempla ainda a possibilidade de adicionar diversas outras fontes de dados dando ênfase a uma estrutura de armazém de dados. / [en] The use of search bots to collect information for a given context has grown substantially in recent years. For example, search bots may be used to capture data from professional social networks. In particular, such social networks facilitate studying the professional trajectory of the alumni of a given university, and answer several questions such as: How long does a former student of PUC-Rio take to arrive at a management position? However, a common problem in this scenario is the inability to collect information due to authentication systems, preventing a search robot from accessing certain pages and content. This dissertation addresses a solution to capture data, which circumvents the authentication problem and automates the data collection process. The proposed solution collects data from user profiles for later database storage and analysis. The dissertation also contemplates the possibility of adding several other sources of data giving emphasis to a data warehouse structure.

Page generated in 0.0373 seconds