Global ETD Search

Return to search

Structured data extraction: separating content from noise on news websites

In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.

http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9898
Local ntnudaim:4769

ntnudaim

SIF2 datateknikk

Intelligente systemer

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-9898
Date	January 2009
Creators	Arizaleta, Mikel
Publisher	Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.002 seconds

Structured data extraction: separating content from noise on news websites

Description

Links & Downloads

Tags

Additional Fields