Global ETD Search

Return to search

Measuring the homogeneity and similarity of language corpora

Corpus-based methods are now dominant in Natural Language Processing (NLP). Creating big corpora is no longer difficult and the technology to analyze them is growing faster, more robust and more accurate. However, when an NLP application performs well on one corpus, it is unclear whether this level of performance would be maintained on others. To make progress on these questions, we need methods for comparing corpora. This thesis investigates comparison methods based on the notions of corpus homogeneity and similarity.

https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.421298

006.35

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:421298
Date	January 2005
Creators	Cavaglia, Gabriela Maria Chiara
Publisher	University of Brighton
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	https://research.brighton.ac.uk/en/studentTheses/8b46265d-65c5-477e-9296-412fbb053ed0

Page generated in 0.0014 seconds

Measuring the homogeneity and similarity of language corpora

Description

Links & Downloads

Tags

Additional Fields