Return to search

Improving scalability and accuracy of text mining in grid environment

The advance in technologies such as massive storage devices and high speed internet has led to an enormous increase in the volume of available documents in electronic form. These documents represent information in a complex and rich manner that cannot be analysed using conventional statistical data mining methods. Consequently, text mining is developed as a growing new technology for discovering knowledge from textual data and managing textual information. Processing and analysing textual information can potentially obtain valuable and important information, yet these tasks also requires enormous amount of computational resources due to the sheer size of the data available. Therefore, it is important to enhance the existing methodologies to achieve better scalability, efficiency and accuracy. / The emerging Grid technology shows promising results in solving the problem of scalability by splitting the works from text clustering algorithms into a number of jobs, each to be executed separately and simultaneously on different computing resources. That allows for a substantial decrease in the processing time and maintaining the similar level of quality at the same time. / To improve the quality of the text clustering results, a new document encoding method is introduced that takes into consideration of the semantic similarities of the words. In this way, documents that are similar in content will be more likely to be group together. / One of the ultimate goals of text mining is to help us to gain insights to the problem and to assist in the decision making process together with other source of information. Hence we tested the effectiveness of incorporating text mining method in the context of stock market prediction. This is achieved by integrating the outcomes obtained from text mining with the ones from data mining, which results in a more accurate forecast than using any single method.

Identiferoai:union.ndltd.org:ADTP/270030
Date January 2009
CreatorsZhai, Yuzheng
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
RightsRestricted Access: Abstract and Citation Only

Page generated in 0.0109 seconds