Return to search

Measuring the homogeneity and similarity of language corpora

Corpus-based methods are now dominant in Natural Language Processing (NLP). Creating big corpora is no longer difficult and the technology to analyze them is growing faster, more robust and more accurate. However, when an NLP application performs well on one corpus, it is unclear whether this level of performance would be maintained on others. To make progress on these questions, we need methods for comparing corpora. This thesis investigates comparison methods based on the notions of corpus homogeneity and similarity.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:421298
Date January 2005
CreatorsCavaglia, Gabriela Maria Chiara
PublisherUniversity of Brighton
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://research.brighton.ac.uk/en/studentTheses/8b46265d-65c5-477e-9296-412fbb053ed0

Page generated in 0.0014 seconds