Global ETD Search

61	Collecte orientée sur le Web pour la recherche d'information spécialisée De Groc, Clément 05 June 2013 (has links) (PDF) Les moteurs de recherche verticaux, qui se concentrent sur des segments spécifiques du Web, deviennent aujourd'hui de plus en plus présents dans le paysage d'Internet. Les moteurs de recherche thématiques, notamment, peuvent obtenir de très bonnes performances en limitant le corpus indexé à un thème connu. Les ambiguïtés de la langue sont alors d'autant plus contrôlables que le domaine est bien ciblé. De plus, la connaissance des objets et de leurs propriétés rend possible le développement de techniques d'analyse spécifiques afin d'extraire des informations pertinentes.Dans le cadre de cette thèse, nous nous intéressons plus précisément à la procédure de collecte de documents thématiques à partir du Web pour alimenter un moteur de recherche thématique. La procédure de collecte peut être réalisée en s'appuyant sur un moteur de recherche généraliste existant (recherche orientée) ou en parcourant les hyperliens entre les pages Web (exploration orientée).Nous étudions tout d'abord la recherche orientée. Dans ce contexte, l'approche classique consiste à combiner des mot-clés du domaine d'intérêt, à les soumettre à un moteur de recherche et à télécharger les meilleurs résultats retournés par ce dernier.Après avoir évalué empiriquement cette approche sur 340 thèmes issus de l'OpenDirectory, nous proposons de l'améliorer en deux points. En amont du moteur de recherche, nous proposons de formuler des requêtes thématiques plus pertinentes pour le thème afin d'augmenter la précision de la collecte. Nous définissons une métrique fondée sur un graphe de cooccurrences et un algorithme de marche aléatoire, dans le but de prédire la pertinence d'une requête thématique. En aval du moteur de recherche, nous proposons de filtrer les documents téléchargés afin d'améliorer la qualité du corpus produit. Pour ce faire, nous modélisons la procédure de collecte sous la forme d'un graphe triparti et appliquons un algorithme de marche aléatoire biaisé afin d'ordonner par pertinence les documents et termes apparaissant dans ces derniers.Dans la seconde partie de cette thèse, nous nous focalisons sur l'exploration orientée du Web. Au coeur de tout robot d'exploration orientée se trouve une stratégie de crawl qui lui permet de maximiser le rapatriement de pages pertinentes pour un thème, tout en minimisant le nombre de pages visitées qui ne sont pas en rapport avec le thème. En pratique, cette stratégie définit l'ordre de visite des pages. Nous proposons d'apprendre automatiquement une fonction d'ordonnancement indépendante du thème à partir de données existantes annotées automatiquement. [INFO:INFO_OH] Computer Science/Other Collecte orientée Recherche d'information Web Crawling orientée Recherche orientée Apprentissage automatique Recherche de l'information
62	Migratory Cues For Encephalitogenic Effector T Cells Within The CNS During The Different Phases Of EAE Schläger, Christian 30 April 2013 (has links) No description available. 610 EAE MS Chemokines T cells TPLSM intravascular crawling
63	Analysis of turkey&#039 / s visibility on global intrenet Oralalp, Sertac 01 May 2010 (has links) (PDF) In this study, Turkey&rsquo / s Internet visibility will be analyzed based on data to be collected from multiple different resources (such as / Google, Yahoo, Altavista, Bing and AOL). Analysis work will involve inspection of DNS queries, Web crawling and some other similar techniques. Our goal is to investigate global Internet and find webs that has common pattern of representing Internet visibility of Turkey and compare their characteristics with other webs&#039 / on the world and discover their similarities and differences. Turkey&rsquo
64	Towards completely automatized HTML form discovery on the web Moraes, Maurício Coutinho January 2013 (has links) The forms discovered by our proposal can be directly used as training data by some form classifiers. Our experimental validation used thousands of real Web forms, divided into six domains, including a representative subset of the publicly available DeepPeep form base (DEEPPEEP, 2010; DEEPPEEP REPOSITORY, 2011). Our results show that it is feasible to mitigate the demanding manual work required by two cutting-edge form classifiers (i.e., GFC and DSFC (BARBOSA; FREIRE, 2007a)), at the cost of a relatively small loss in effectiveness. Recuperacao : Informacao HTML (Linguagem de marcação) Serviços Web Banco : Dados Deep web Hidden web Crawling Domain-specific search Query form discovery
65	Towards completely automatized HTML form discovery on the web Moraes, Maurício Coutinho January 2013 (has links) The forms discovered by our proposal can be directly used as training data by some form classifiers. Our experimental validation used thousands of real Web forms, divided into six domains, including a representative subset of the publicly available DeepPeep form base (DEEPPEEP, 2010; DEEPPEEP REPOSITORY, 2011). Our results show that it is feasible to mitigate the demanding manual work required by two cutting-edge form classifiers (i.e., GFC and DSFC (BARBOSA; FREIRE, 2007a)), at the cost of a relatively small loss in effectiveness. Recuperacao : Informacao HTML (Linguagem de marcação) Serviços Web Banco : Dados Deep web Hidden web Crawling Domain-specific search Query form discovery
66	Towards completely automatized HTML form discovery on the web Moraes, Maurício Coutinho January 2013 (has links) The forms discovered by our proposal can be directly used as training data by some form classifiers. Our experimental validation used thousands of real Web forms, divided into six domains, including a representative subset of the publicly available DeepPeep form base (DEEPPEEP, 2010; DEEPPEEP REPOSITORY, 2011). Our results show that it is feasible to mitigate the demanding manual work required by two cutting-edge form classifiers (i.e., GFC and DSFC (BARBOSA; FREIRE, 2007a)), at the cost of a relatively small loss in effectiveness. Recuperacao : Informacao HTML (Linguagem de marcação) Serviços Web Banco : Dados Deep web Hidden web Crawling Domain-specific search Query form discovery
67	Contribution à la veille stratégique : DOWSER, un système de découverte de sources Web d’intérêt opérationnel / Buisness Intelligence contribution : DOWSER, Discovering of Web Sources Evaluating Relevance Noël, Romain 17 October 2014 (has links) L'augmentation constante du volume d'information disponible sur le Web a rendu compliquée la découverte de nouvelles sources d'intérêt sur un sujet donné. Les experts du renseignement doivent faire face à cette problématique lorsqu'ils recherchent des pages sur des sujets spécifiques et sensibles. Ces pages non populaires sont souvent mal indexées ou non indexées par les moteurs de recherche à cause de leur contenu délicat, les rendant difficile à trouver. Nos travaux, qui s'inscrivent dans ce contenu du Renseignement d'Origine Source Ouverte (ROSO), visent à aider l'expert du renseignement dans sa tâche de découverte de nouvelles sources. Notre approche s'articule autour de la modélisation du besoin opérationnel et de l'exploration ciblée du Web. La modélisation du besoin informationnel permet de guider l'exploration du web pour découvrir et fournir des sources pertinentes à l'expert. / The constant growth of the Web in recent years has made more difficult the discovery of new sources of information on a given topic. This is a prominent problem for Expert in Intelligence Analysis (EIA) who are faced with the search of pages on specific and sensitive topics. Because of their lack of popularity or because they are poorly indexed due to their sensitive content, these pages are hard to find with traditional search engine. In this article, we describe a new Web source discovery system called DOWSER. The goal of this system is to provide users with new sources of information related to their needs without considering the popularity of a page unlike classic Information Retrieval tools. The expected result is a balance between relevance and originality, in the sense that the wanted pages are not necessary popular. DOWSER in based on a user profile to focus its exploration of the Web in order to collect and index only related Web documents. Exploration ciblée Recherche d'information Profil utilisateur Modélisation besoin informationnel Focused crawling Information retrieval Similarity measure User profile
68	Extrakce informací z webových stránek / Information Extraction from Web Pages Bukovčák, Jakub January 2019 (has links) This master thesis is focused on current technologies that are used for downloading web pages and extraction of structured information from them. The paper describes available tools to make this process possible and easier. Another part of this document provides the overview of technologies that can be used for creating web pages. Also, there is an information about development of information systems with web user interface based on Java Enterprise Edition (Java EE) platform. The main part of this master thesis describes design and implementation of application used to specify and manage extraction tasks. The last part of this project describes application testing on real web pages and evaluation of achieved results.
69	Crawling Records on the Inter-Planetary Name System / En genomsökning av register i det interplanetära namnsystemet Gard, Axel January 2023 (has links) This thesis studies the characteristics of data hosted on the interplanetary name system, which is a part of the interplanetary file system. From these records, information such as file names, locations, and sizes, was investigated. Data was collected on the number of peers hosting the records, thereby determining the decentralization of the record on the network. Data on how often content on the network changes, were collected and investigated. In addition to evaluating records, a search engine was prototyped to show how to integrate the data into a system. A large part of the network was crawled and the rate of change was found to be high. Most of the peers were found to host HTML files. Most content identifiers found were hosted by more than one peer. This means that a search engine needs to be able to support text file formats and revisit peers regularly to be up-to-date with the records. InterPlanetary File System (IPFS) InterPlanetary Name System (IPNS) infor- mation retrieval system search engine crawling Computer Sciences Datavetenskap (datalogi)
70	Discovering and Tracking Interesting Web Services Rocco, Daniel J. (Daniel John) 01 December 2004 (has links) The World Wide Web has become the standard mechanism for information distribution and scientific collaboration on the Internet. This dissertation research explores a suite of techniques for discovering relevant dynamic sources in a specific domain of interest and for managing Web data effectively. We first explore techniques for discovery and automatic classification of dynamic Web sources. Our approach utilizes a service class model of the dynamic Web that allows the characteristics of interesting services to be specified using a service class description. To promote effective Web data management, the Page Digest Web document encoding eliminates tag redundancy and places structure, content, tags, and attributes into separate containers, each of which can be referenced in isolation or in conjunction with the other elements of the document. The Page Digest Sentinel system leverages our unique encoding to provide efficient and scalable change monitoring for arbitrary Web documents through document compartmentalization and semantic change request grouping. Finally, we present XPack, an XML document compression system that uses a containerized view of an XML document to provide both good compression and efficient querying over compressed documents. XPack's queryable XML compression format is general-purpose, does not rely on domain knowledge or particular document structural characteristics for compression, and achieves better query performance than standard query processors using text-based XML. Our research expands the capabilities of existing dynamic Web techniques, providing superior service discovery and classification services, efficient change monitoring of Web information, and compartmentalized document handling. DynaBot is the first system to combine a service class view of the Web with a modular crawling architecture to provide automated service discovery and classification. The Page Digest Web document encoding represents Web documents efficiently by separating the individual characteristics of the document. The Page Digest Sentinel change monitoring system utilizes the Page Digest document encoding for scalable change monitoring through efficient change algorithms and intelligent request grouping. Finally, XPack is the first XML compression system that delivers compression rates similar to existing techniques while supporting better query performance than standard query processors using text-based XML. XML compression Source discovery Web services Data management Directed crawling Document representation XML (Document markup language) Data compression (Computer science) Internet searching Query languages (Computer science) Web services

Search results