Spelling suggestions: "subject:"botswana daily news"" "subject:"botswana daily jews""
1 |
What did they cover? : a cluster analysis of news stories published in the Botswana Daily News, January – December 2004Mogotsi, Isaac Carter 12 1900 (has links)
Thesis (MPhil (Information Science))--University of Stellenbosch, 2005. / ENGLISH ABSTRACT: In this study, a cluster analysis of news stories published in the Botswana Daily
News during the period January - December 2004 was undertaken. The study
was exploratory in nature and sought to find out what topics were predominant
during the study period. The approach we adopted can be divided into three
phases, namely data collection, document pre-processing, and cluster analysis.
The data used in the study was downloaded from the Botswana Daily News
website using a simple program developed specifically for that purpose. Document
pre-processing was concerned with transforming the raw documents
into a format that could be directly operated upon by the various clustering
algorithms. The documents themselves were represented using the vector
space model, with the tf.idf term weighting scheme. We experimented with
three clustering approaches, namely, direct k-way clustering, k-way clustering
through repeated bisections, and agglomerative clustering. Agglomerative
clustering performed poorly, and we thus discarded its results. Direct k-way
clustering and k-way clustering through repeated bisections produced similar
results, though the former performed better in terms of external isolation and
internal cohesion of the clusters produced. Consequently, we only retained the
results from direct k-way clustering, and subsequently performed a quarterly
analysis of our corpus using only the direct k-way clustering algorithm. Analysis
of the complete corpus identified a number of topics that were prevalent
over the study period. Interestingly, a quarterly analysis of the corpus revealed
other topics whose prevalence appears to have been limited to certain parts of
the year.
|
Page generated in 0.328 seconds