<p>In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.</p>
Identifer | oai:union.ndltd.org:UPSALLA/oai:DiVA.org:ntnu-9898 |
Date | January 2009 |
Creators | Arizaleta, Mikel |
Publisher | Norwegian University of Science and Technology, Department of Computer and Information Science, Institutt for datateknikk og informasjonsvitenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, text |
Page generated in 0.0017 seconds