Global ETD Search

Return to search

Deriving Semantic Objects from the Structured Web (Inférer des Objects Sémantiques du Web Structuré)

This thesis focuses on the extraction and analysis of Web data objects, investigated from different points of view: temporal, structural, semantic. We first survey different strategies and best practices for deriving temporal aspects of Web pages, together with a more in-depth study on Web feeds for this particular purpose. Next, in the context of dynamically-generated Web pages by content management systems, we present two keyword-based techniques that perform article extraction from such pages. Keywords, either automatically acquired through a Tf−Idf analysis, or extracted from Web feeds, guide the process of object identification, either at the level of a single Web page (SIGFEED algorithm), or across different pages sharing the same template (FOREST algorithm). We finally present, in the context of the deep Web, a generic framework which aims at discovering the semantic model of a Web object (here, data record) by, first, using FOREST for the extraction of objects, and second, by representing the implicit rdf:type similarities between the object attributes and the entity of the Web interface as relationships that, together with the instances extracted from the objects, form a labeled graph. This graph is further aligned to a generic ontology like YAGO for the discovery of the graph's unknown types and relations.

[INFO:INFO_WB] Computer Science/Web

[INFO:INFO_WB] Informatique/Web

Identifer	oai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00922459
Date	29 October 2012
Creators	Oita, Marilena
Publisher	Telecom ParisTech
Source Sets	CCSD theses-EN-ligne, France
Language	English
Detected Language	English
Type	PhD thesis

Page generated in 0.0024 seconds

Deriving Semantic Objects from the Structured Web (Inférer des Objects Sémantiques du Web Structuré)

Description

Links & Downloads

Tags

Additional Fields