Given the overall increase in the availability of computational resources, and the importance of forecasting the future, it should come as no surprise that prediction is considered to be one of the most compelling and challenging problems for both academia and industry in the world of data analytics. But how is prediction done, what factors make it easier or harder to do, how accurate can we expect the results to be, and can we harness the available computational resources in meaningful ways? With efforts ranging from those designed to save lives in the moments before a near field tsunami to others attempting to predict the performance of Major League Baseball players, future generations need to have realistic expectations about prediction methods and analytics. This thesis takes a broad look at the problem, including motivation, methodology, accuracy, and infrastructure. In particular, a careful study involving experiments in regression, the prediction of continuous, numerical values, and classification, the assignment of a class to each sample, is provided. The results and conclusions of these experiments cover only the included data sets and the applied algorithms as implemented by the Python library. The evaluation includes accuracy and running time of different algorithms across several data sets to establish tradeoffs between the approaches, and determine the impact of variations in the size of the data sets involved. As scalability is a key characteristic required to meet the needs of future prediction problems, a discussion of some of the challenges associated with parallelization is included. / Graduate / 0984 / erickson@uvic.ca
Identifer | oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/5662 |
Date | 03 September 2014 |
Creators | Erickson, Joshua N. |
Contributors | Coady, Yvonne |
Source Sets | University of Victoria |
Language | English, English |
Detected Language | English |
Type | Thesis |
Rights | Available to the World Wide Web, http://creativecommons.org/publicdomain/zero/1.0/ |
Page generated in 0.0023 seconds