Return to search

Development and Application of Statistical and Machine Learning Techniques in Probabilistic Astronomical Catalogue-Matching Problems

Advances in the development of detector and computer technology have led to a rapid increase in the availability of large datasets to the astronomical community. This has created opportunities to do science that would otherwise be difficult or impossible. At the same time, astronomers have acknowledged that this influx of data creates new challe nges in the development of tools and practice to facilitate usage of this technology by the international community. A world wide effort known as the Virtual Observatory has developed to this end involving collaborations between astronomers, computer scientists and statisticians. Different telescopes survey the sky in different wavelengths producing catalogues of objects con- taining observations of both positional and non-positional properties. Because multiple catalogues exist, a common situation is that there are two catalogues containing observations of the same piece of sky (e.g. one sparse catalogue with relatively few objects per unit area, and one dense catalogue with many more objects per unit area). Identifying matches i.e. different observations of the same object in different catalogues is an important step in building a multi-wavelength understanding of the universe. Positional properties of objects can be used in some cases to perform catalogue matching, however in other cases position alone is insufficient to determine matching objects. This thesis applies machine learning and statistical methods to explore the usefulness of non- positional properties in identifying these matching objects common in two different catalogues. A machine learning classification system is shown to be able to identify these objects in a particu- lar problem domain. It is shown that non-positional inputs can be very beneficial in identifying matches for a particular problem. The result is that supervised learning is shown to be a viable method to be applied in difficult catalogue matching problems. The use of probabilistic outputs is developed as an enhancement in order to give a means of iden- tifying the uncertainty in the matches. Something that distinguishes this problem from standard pattern classification problems is that one class, the match es, belong to a high dimensional dis- tribution where the non-matches belong to a lower dimensional distribution. This assumption is developed in a probabilistic framework. The result of this is a class of probability models useful for catalogue matching and a number of tests for the suitability of the computed probabilities. The tests were applied on a problem and showed a good classificati on rate, good results obtained by scoring rules and good calibration. Visual inspection of the output also suggested that algorithm was behaving in a sensible way. While reasonable results are obtained, it is acknowledged that the question of is the probability a good probability is philosophically awkward. One goal of analysing astronomical matched or unmatched catalogues is in order to make accurate inferential statements on the basis of the available data. A silent assumption is often made that the first step in analysing unmatched catalogues is to find the best match between them, then to plot this best-match data assuming it to be correct. This thesis shows that this assumption is false, inferential statements based on the best match data can potentially be quite misleading. To address this problem a new framework for catalogue matching, based on Bayesian statistics is developed. In this Bayesian framework it is unnecessary for the method to commit to a single matched dataset; rather the ensemble of all possible matches can be used. This method compares favourably to other methods either based upon choosing the most likely match. The result of this is the outline of a method for analysing astronomical datasets not by a scatter plot obtained from the perfectly known pre-matched list of data, but rather using predictive distributions which need not be based on a perfect list and indeed might be based upon unmatched or partly matched catalogues.

Identiferoai:union.ndltd.org:ADTP/254160
CreatorsDavid Rohde
Source SetsAustraliasian Digital Theses Program
Detected LanguageEnglish

Page generated in 0.0024 seconds