Return to search

Structured data extraction: separating content from noise on news websites

<p>In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.</p>

Identiferoai:union.ndltd.org:UPSALLA/oai:DiVA.org:ntnu-9898
Date January 2009
CreatorsArizaleta, Mikel
PublisherNorwegian University of Science and Technology, Department of Computer and Information Science, Institutt for datateknikk og informasjonsvitenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, text

Page generated in 0.0017 seconds