There is a tremendous volume of data being generated in today’s world. As organizations around the globe realize the increased importance of their data as being a valuable asset in gaining a competitive edge in a fast-paced and a dynamic business world, more and more attention is being paid to the quality of the data. Advances in the fields of data mining, predictive modeling, text mining, web mining, business intelligence, health care analytics, etc. all depend on clean, accurate data. That one cannot effectively mine data, which is dirty, comes as no surprise. This research is an exploratory study of different domain data sets, addressing the data quality issues specific to each domain, identifying the challenges faced and arriving at techniques or methodologies for measuring and improving the data quality. The primary focus of the research is on the SAR or Search and Rescue dataset, identifying key issues related to data quality therein and developing an algorithm for improving the data quality. SAR missions which are routinely conducted all over the world show a trend of increasing mission costs. Retrospective studies of historic SAR data not only allow for a detailed analysis and understanding of SAR incidents and patterns, but also form the basis for generating probability maps, analytical data models, etc., which allow for an efficient use of valuable SAR resources and their distribution. One of the challenges with regards to the SAR dataset is that the collection process is not perfect. Often, the LKP or the Last Known Position is not known or cannot be arrived at. The goal is to fully or partially geocode the LKP for as many data points as possible, identify those data points where the LKP cannot be geocoded at all, and further highlight the underlying data quality issues. The SAR Algorithm has been developed, which makes use of partial or incomplete information, cleans and validates the data, and further extracts address information from relevant fields to successfully geocode the data. The algorithm improves the geocoding accuracy and has been validated by a set of approaches.
Identifer | oai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:etd-2811 |
Date | 01 January 2011 |
Creators | Wakchaure, Abhijit |
Publisher | STARS |
Source Sets | University of Central Florida |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Electronic Theses and Dissertations |
Page generated in 0.0179 seconds