Global ETD Search

721	Algorithmically Guided Information Visualization : Explorative Approaches for High Dimensional, Mixed and Categorical Data / Algoritmiskt vägledd informationsvisualisering för högdimensionell och kategorisk data Johansson Fernstad, Sara January 2011 (has links) Facilitated by the technological advances of the last decades, increasing amounts of complex data are being collected within fields such as biology, chemistry and social sciences. The major challenge today is not to gather data, but to extract useful information and gain insights from it. Information visualization provides methods for visual analysis of complex data but, as the amounts of gathered data increase, the challenges of visual analysis become more complex. This thesis presents work utilizing algorithmically extracted patterns as guidance during interactive data exploration processes, employing information visualization techniques. It provides efficient analysis by taking advantage of fast pattern identification techniques as well as making use of the domain expertise of the analyst. In particular, the presented research is concerned with the issues of analysing categorical data, where the values are names without any inherent order or distance; mixed data, including a combination of categorical and numerical data; and high dimensional data, including hundreds or even thousands of variables. The contributions of the thesis include a quantification method, assigning numerical values to categorical data, which utilizes an automated method to define category similarities based on underlying data structures, and integrates relationships within numerical variables into the quantification when dealing with mixed data sets. The quantification is incorporated in an interactive analysis pipeline where it provides suggestions for numerical representations, which may interactively be adjusted by the analyst. The interactive quantification enables exploration using commonly available visualization methods for numerical data. Within the context of categorical data analysis, this thesis also contributes the first user study evaluating the performance of what are currently the two main visualization approaches for categorical data analysis. Furthermore, this thesis contributes two dimensionality reduction approaches, which aim at preserving structure while reducing dimensionality, and provide flexible and user-controlled dimensionality reduction. Through algorithmic quality metric analysis, where each metric represents a structure of interest, potentially interesting variables are extracted from the high dimensional data. The automatically identified structures are visually displayed, using various visualization methods, and act as guidance in the selection of interesting variable subsets for further analysis. The visual representations furthermore provide overview of structures within the high dimensional data set and may, through this, aid in focusing subsequent analysis, as well as enabling interactive exploration of the full high dimensional data set and selected variable subsets. The thesis also contributes the application of algorithmically guided approaches for high dimensional data exploration in the rapidly growing field of microbiology, through the design and development of a quality-guided interactive system in collaboration with microbiologists. Information visualization data mining high dimensional data categorical data mixed data
722	Accommodating temporal semantics in data mining and knowledge discovery / Rainsford, Chris P. January 1999 (has links) Thesis (PhD) -- University of South Australia, 1999 Data mining Temporal databases
723	Secure location services: Vulnerability analysis and provision of security in location systems Pozzobon, O. Unknown Date (has links) No description available. 280505 Data Security
724	On semiparametric regression and data mining Ormerod, John T, Mathematics & Statistics, Faculty of Science, UNSW January 2008 (has links) Semiparametric regression is playing an increasingly large role in the analysis of datasets exhibiting various complications (Ruppert, Wand & Carroll, 2003). In particular semiparametric regression a plays prominent role in the area of data mining where such complications are numerous (Hastie, Tibshirani & Friedman, 2001). In this thesis we develop fast, interpretable methods addressing many of the difficulties associated with data mining applications including: model selection, missing value analysis, outliers and heteroscedastic noise. We focus on function estimation using penalised splines via mixed model methodology (Wahba 1990; Speed 1991; Ruppert et al. 2003). In dealing with the difficulties associated with data mining applications many of the models we consider deviate from typical normality assumptions. These models lead to likelihoods involving analytically intractable integrals. Thus, in keeping with the aim of speed, we seek analytic approximations to such integrals which are typically faster than numeric alternatives. These analytic approximations not only include popular penalised quasi-likelihood (PQL) approximations (Breslow & Clayton, 1993) but variational approximations. Originating in physics, variational approximations are a relatively new class of approximations (to statistics) which are simple, fast, flexible and effective. They have recently been applied to statistical problems in machine learning where they are rapidly gaining popularity (Jordan, Ghahramani, Jaakkola & Sau11999; Corduneanu & Bishop, 2001; Ueda & Ghahramani, 2002; Bishop & Winn, 2003; Winn & Bishop 2005). We develop variational approximations to: generalized linear mixed models (GLMMs); Bayesian GLMMs; simple missing values models; and for outlier and heteroscedastic noise models, which are, to the best of our knowledge, new. These methods are quite effective and extremely fast, with fitting taking minutes if not seconds on a typical 2008 computer. We also make a contribution to variational methods themselves. Variational approximations often underestimate the variance of posterior densities in Bayesian models (Humphreys & Titterington, 2000; Consonni & Marin, 2004; Wang & Titterington, 2005). We develop grid-based variational posterior approximations. These approximations combine a sequence of variational posterior approximations, can be extremely accurate and are reasonably fast. Data mining Regression analysis
725	Design and evaluation of database access paths Keen, Christopher David January 1978 (has links) 196 leaves : ill., diagrs., graphs, tables ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Thesis (Ph.D.)--University of Adelaide, Dept. of Computing Science, 1979 Data base management system.
726	Computer aided optimisation of combinational logic / Christopher W illiam Nettle Nettle, Christopher William January 1979 (has links) Typescript (photocopy) / vii, 190 leaves ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Thesis (Ph.D.) Dept. of Electrical and Electronic Engineering, University of Adelaide, 1979 Logic design Data processing.
727	The structure of the background errors in a global wave model. Greenslade, Diana J. M. January 2004 (has links) Title page, table of contents and abstract only. The complete thesis in print form is available from the University of Adelaide Library. / One of the main limitations to current wave data assimilation systems is the lack of an accurate representation of the structure of the background errors. For example, the current operational wave data assimilation system at the Australian Bureau of Meteorology (BoM) prescribes globally uniform background error correlations of Gaussian shape with a length scale of 300 km and the error variance of both the background and observation errors is defined to be 0.25 m². This thesis describes an investigation into the determination of the background errors in a global wave model. There are two methods that are commonly used to determine background errors: the observational method and the 'NMC method'. The observational method is the main tool used in this thesis, although the 'NMC method' is considered also. The observational method considers correlations of the differences between observations and the background, in this case, the modelled Significant Wave Height (SWH) field. The observations used are satellite altimter estimates of SWH. Before applying the method, the effect of the irregular satellite sampling pattern is examined. This is achieved by constructing a set of anomaly correlations from modelled wave fields. The modelled wave fields are then sampled at the locations of the altimeter observations and the anomaly correlations are recalculated from the simulated altimeter data. The results are compared to the original anomaly correlations. It is found that in general, the altimeter sampling pattern underpredicts the spatial scale of the anomaly correlation. Observations of SWH from the ERS-2 altimeter are used in this thesis. To ensure that the observations used are of the highest quality possible, a validation of the European Remote Sensing Satellite 2 (ERS-2) SWH observations is performed. The altimeter data are compared to waverider buoy observations over a time period of approximately 4.5 years. With a set of 2823 co-located SWH estimates, it is found that in general, the altimeter overestimates low SWH and underestimates high SWH. A two-branched linear correction to the altimeter data is found, which reduces the overall rms error in SWH to approximately 0.2 m. Results from the previous sections are then used to calculate the background error correlations. Specifically, correlations of the differences between modelled SWH and the bias-corrected ERS-2 data are calculated. The irregular sampling pattern of the altimeter is accounted for by adjusting the correlation length scales according to latitude and the calculated length scale. The results show that the length scale of the background errors varies significantly over the globe, with the largest scales at low latitudes and shortest scales at high latitudes. Very little seasonal or year-to-year variability is detected. Conversely, the magnitude of the background error variance is found to have considerable seasonal and year-to-year variability. By separating the altimeter ground tracks into ascending and descending tracks, it is possible to examine, to a limited extent, whether any anisotropy exists in the background errors. Some of the areas on the globe that exhibit the most anisotropy are the Great Australian Bight and the North Atlantic Ocean. The background error correlations are also briefly examined via the 'NMC method', i.e., by considering differences between SWH forecasts of different ranges valid at the same time. It is found that the global distribution of the length scale of the error correlation is similar to that found using the observational method. It is also shown that the size of the correlation length scale increases as the forecast period increases. The new background error structure that has been developed is incorporated into a data assimilation system and evaluated over two month-long time periods. Compared to the current operational system at the BoM, it is found that this new structure improves the skill of the wave model by approximately 10%, with considerable geographical variability in the amount of improvement. / http://proxy.library.adelaide.edu.au/login?url= http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1113813 / Thesis (Ph.D.) -- University of Adelaide, School of Mathematical Sciences, 2004 data; background errors; wave
728	The structure of the background errors in a global wave model. Greenslade, Diana J. M. January 2004 (has links) Title page, table of contents and abstract only. The complete thesis in print form is available from the University of Adelaide Library. / One of the main limitations to current wave data assimilation systems is the lack of an accurate representation of the structure of the background errors. For example, the current operational wave data assimilation system at the Australian Bureau of Meteorology (BoM) prescribes globally uniform background error correlations of Gaussian shape with a length scale of 300 km and the error variance of both the background and observation errors is defined to be 0.25 m². This thesis describes an investigation into the determination of the background errors in a global wave model. There are two methods that are commonly used to determine background errors: the observational method and the 'NMC method'. The observational method is the main tool used in this thesis, although the 'NMC method' is considered also. The observational method considers correlations of the differences between observations and the background, in this case, the modelled Significant Wave Height (SWH) field. The observations used are satellite altimter estimates of SWH. Before applying the method, the effect of the irregular satellite sampling pattern is examined. This is achieved by constructing a set of anomaly correlations from modelled wave fields. The modelled wave fields are then sampled at the locations of the altimeter observations and the anomaly correlations are recalculated from the simulated altimeter data. The results are compared to the original anomaly correlations. It is found that in general, the altimeter sampling pattern underpredicts the spatial scale of the anomaly correlation. Observations of SWH from the ERS-2 altimeter are used in this thesis. To ensure that the observations used are of the highest quality possible, a validation of the European Remote Sensing Satellite 2 (ERS-2) SWH observations is performed. The altimeter data are compared to waverider buoy observations over a time period of approximately 4.5 years. With a set of 2823 co-located SWH estimates, it is found that in general, the altimeter overestimates low SWH and underestimates high SWH. A two-branched linear correction to the altimeter data is found, which reduces the overall rms error in SWH to approximately 0.2 m. Results from the previous sections are then used to calculate the background error correlations. Specifically, correlations of the differences between modelled SWH and the bias-corrected ERS-2 data are calculated. The irregular sampling pattern of the altimeter is accounted for by adjusting the correlation length scales according to latitude and the calculated length scale. The results show that the length scale of the background errors varies significantly over the globe, with the largest scales at low latitudes and shortest scales at high latitudes. Very little seasonal or year-to-year variability is detected. Conversely, the magnitude of the background error variance is found to have considerable seasonal and year-to-year variability. By separating the altimeter ground tracks into ascending and descending tracks, it is possible to examine, to a limited extent, whether any anisotropy exists in the background errors. Some of the areas on the globe that exhibit the most anisotropy are the Great Australian Bight and the North Atlantic Ocean. The background error correlations are also briefly examined via the 'NMC method', i.e., by considering differences between SWH forecasts of different ranges valid at the same time. It is found that the global distribution of the length scale of the error correlation is similar to that found using the observational method. It is also shown that the size of the correlation length scale increases as the forecast period increases. The new background error structure that has been developed is incorporated into a data assimilation system and evaluated over two month-long time periods. Compared to the current operational system at the BoM, it is found that this new structure improves the skill of the wave model by approximately 10%, with considerable geographical variability in the amount of improvement. / http://proxy.library.adelaide.edu.au/login?url= http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1113813 / Thesis (Ph.D.) -- University of Adelaide, School of Mathematical Sciences, 2004 data; background errors; wave
729	On semiparametric regression and data mining Ormerod, John T, Mathematics & Statistics, Faculty of Science, UNSW January 2008 (has links) Semiparametric regression is playing an increasingly large role in the analysis of datasets exhibiting various complications (Ruppert, Wand & Carroll, 2003). In particular semiparametric regression a plays prominent role in the area of data mining where such complications are numerous (Hastie, Tibshirani & Friedman, 2001). In this thesis we develop fast, interpretable methods addressing many of the difficulties associated with data mining applications including: model selection, missing value analysis, outliers and heteroscedastic noise. We focus on function estimation using penalised splines via mixed model methodology (Wahba 1990; Speed 1991; Ruppert et al. 2003). In dealing with the difficulties associated with data mining applications many of the models we consider deviate from typical normality assumptions. These models lead to likelihoods involving analytically intractable integrals. Thus, in keeping with the aim of speed, we seek analytic approximations to such integrals which are typically faster than numeric alternatives. These analytic approximations not only include popular penalised quasi-likelihood (PQL) approximations (Breslow & Clayton, 1993) but variational approximations. Originating in physics, variational approximations are a relatively new class of approximations (to statistics) which are simple, fast, flexible and effective. They have recently been applied to statistical problems in machine learning where they are rapidly gaining popularity (Jordan, Ghahramani, Jaakkola & Sau11999; Corduneanu & Bishop, 2001; Ueda & Ghahramani, 2002; Bishop & Winn, 2003; Winn & Bishop 2005). We develop variational approximations to: generalized linear mixed models (GLMMs); Bayesian GLMMs; simple missing values models; and for outlier and heteroscedastic noise models, which are, to the best of our knowledge, new. These methods are quite effective and extremely fast, with fitting taking minutes if not seconds on a typical 2008 computer. We also make a contribution to variational methods themselves. Variational approximations often underestimate the variance of posterior densities in Bayesian models (Humphreys & Titterington, 2000; Consonni & Marin, 2004; Wang & Titterington, 2005). We develop grid-based variational posterior approximations. These approximations combine a sequence of variational posterior approximations, can be extremely accurate and are reasonably fast. Data mining Regression analysis
730	On semiparametric regression and data mining Ormerod, John T, Mathematics & Statistics, Faculty of Science, UNSW January 2008 (has links) Semiparametric regression is playing an increasingly large role in the analysis of datasets exhibiting various complications (Ruppert, Wand & Carroll, 2003). In particular semiparametric regression a plays prominent role in the area of data mining where such complications are numerous (Hastie, Tibshirani & Friedman, 2001). In this thesis we develop fast, interpretable methods addressing many of the difficulties associated with data mining applications including: model selection, missing value analysis, outliers and heteroscedastic noise. We focus on function estimation using penalised splines via mixed model methodology (Wahba 1990; Speed 1991; Ruppert et al. 2003). In dealing with the difficulties associated with data mining applications many of the models we consider deviate from typical normality assumptions. These models lead to likelihoods involving analytically intractable integrals. Thus, in keeping with the aim of speed, we seek analytic approximations to such integrals which are typically faster than numeric alternatives. These analytic approximations not only include popular penalised quasi-likelihood (PQL) approximations (Breslow & Clayton, 1993) but variational approximations. Originating in physics, variational approximations are a relatively new class of approximations (to statistics) which are simple, fast, flexible and effective. They have recently been applied to statistical problems in machine learning where they are rapidly gaining popularity (Jordan, Ghahramani, Jaakkola & Sau11999; Corduneanu & Bishop, 2001; Ueda & Ghahramani, 2002; Bishop & Winn, 2003; Winn & Bishop 2005). We develop variational approximations to: generalized linear mixed models (GLMMs); Bayesian GLMMs; simple missing values models; and for outlier and heteroscedastic noise models, which are, to the best of our knowledge, new. These methods are quite effective and extremely fast, with fitting taking minutes if not seconds on a typical 2008 computer. We also make a contribution to variational methods themselves. Variational approximations often underestimate the variance of posterior densities in Bayesian models (Humphreys & Titterington, 2000; Consonni & Marin, 2004; Wang & Titterington, 2005). We develop grid-based variational posterior approximations. These approximations combine a sequence of variational posterior approximations, can be extremely accurate and are reasonably fast. Data mining Regression analysis

Search results