Return to search

Exploration of relationships from texts using self-organizing maps

This thesis explored and visualized the relationships of documents data, based on the technique of self-organizing maps (SOM), a subtype of artificial neural network for visualizing high-dimensional data in low-dimensional views. The source data for this thesis are the full Extensible Markup Language (XML) texts of A Standard Corpus of Present Day Edited American English. The first step is transforming these XML files to produce a term-document matrix, including stop word removal, stemming, tf-idf (term frequency–inverse document frequency) weighting, global filtering; here rows of this matrix represent documents as n-dimensional vectors. Secondly, these vectors are clustered and visualized by SOM consisting of neurons, each neuron relatives to a set of documents with a certain number of same terms. Then a network has been constructed from SOM, with vertices set of neurons and documents, lines set of linkages between neurons and documents. Finally this network exports to the Pajek for analysis and final visualization.
Date January 2007
CreatorsLu, Weiping
PublisherHögskolan i Gävle, Institutionen för teknik och byggd miljö
Source SetsDiVA Archive at Upsalla University
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text

Page generated in 0.001 seconds