When we develop and use species distribution models to predict species' current or potential distributions, we are faced with the trade-offs between model generality, precision, and realism. It is important to know how to improve and validate model generality while maintaining good model precision and realism. However, it is difficult for ecologists to evaluate species distribution models using field-sampled data alone because the true species response function to environmental or ecological factors is unknown. Species distribution models should be able to approximate the true characteristics and distributions of species if ecologists want to use them as reliable tools. Simulated data provide the advantage of being able to know the true species-environment relationships and control the causal factors of interest to obtain insights into the effects of these factors on model performance. I used a case study on Bythotrephes longimanus distributions from several hundred Ontario lakes and a simulation study to explore the effects on model performance caused by several factors: the choice of predictor variables, the model evaluation methods, the quantity and quality of the data used for developing models, and the strengths and weaknesses of different species distribution models. Linear discriminant analysis, multiple logistic regression, random forests, and artificial neural networks were compared in both studies. Results based on field data sampled from lakes indicated that the predictive performance of the four models was more variable when developed on abiotic (physical and chemical) conditions alone, whereas the generality of these models improved when including biotic (relevant species) information. When using simulated data, although the overall performance of random forests and artificial neural networks was better than linear discriminant analysis and multiple logistic regression, linear discriminant analysis and multiple logistic regression had relatively good and stable model sensitivity at different sample size and detection limit levels, which may be useful for predicting species presences when data are limited. Random forests performed consistently well at different sample size levels, but was more sensitive to high detection limit. The performance of artificial neural networks was affected by both sample size and detection limit, and it was more sensitive to small sample size.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/43754 |
Date | 14 January 2014 |
Creators | Wang, Lifei |
Contributors | Jackson, Donald Andrew |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0028 seconds