Traditional data mining methods were limited by availability of computing resources like network bandwidth, storage space and processing power. These algorithms were developed to work around this problem by looking at a small cross-section of the whole data available. However since a major chunk of the data is kept out, the predictions were generally inaccurate and missed out on significant features that was part of the data. Today with resources growing at almost the same pace as data, it is possible to rethink mining algorithms to work on distributed resources and essentially distributed data. Distributed data mining thus holds great promise. Using grid technologies, data mining can be extended to areas which were not previously looked at because of the volume of data being generated, like climate modeling, web usage, etc. An important characteristic of data today is that it is highly decentralized and mostly redundant. Data mining algorithms which can make efficient use of distributed data has to be thought of. Though it is possible to bring all the data together and run traditional algorithms, this has a high overhead, in terms of bandwidth usage for transmission, preprocessing steps which have to be to handle every format the received data. By processing the data locally, the preprocessing stage can be made less bulky and also the traditional data mining techniques would be able to work on the data efficiently. The focus of this project is to use an existing data mining technique, fuzzy c-means clustering to work on distributed data in a simulated grid environment and to review the performance of this approach viz., the traditional approach.
Identifer | oai:union.ndltd.org:LSU/oai:etd.lsu.edu:etd-01242005-144012 |
Date | 25 January 2005 |
Creators | Nayar, Arun B |
Contributors | Warren Liao, Jianhua Chen, Gabrielle Allen |
Publisher | LSU |
Source Sets | Louisiana State University |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lsu.edu/docs/available/etd-01242005-144012/ |
Rights | unrestricted, I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. |
Page generated in 0.0022 seconds