Global ETD Search

51	On the development and application of indirect site indexes based on edaphoclimatic variables for commercial forestry in South Africa Esler, William Kevin 03 1900 (has links) Thesis (MScFor)--Stellenbosch University, 2012. / ENGLISH ABSTRACT: Site Index is used extensively in modern commercial forestry both as an indicator of current and future site potential, but also as a means of site comparison. The concept is deeply embedded into current forest planning processes, and without it empirical growth and yield modelling would not function in its present form. Most commercial forestry companies in South Africa currently spend hundreds of thousands of Rand annually collecting growth stock data via inventory, but spend little or no money on the default compartment data (specifically Site Index) which is used to estimate over 90% of the product volumes in their long term plans. A need exists to construct reliable methods to determine Site Index for sites which have not been physically measured (the socalled "default", or indirect Site Index). Most previous attempts to model Site Index have used multiple linear regression as the model, alternative methods have been explored in this thesis: Regression tree analysis, random forest analysis, hybrid or model trees, multiple linear regression, and multiple linear regression using regression trees to identify the variables. Regression tree analysis proves to be ideally suited to this type of data, and a generic model with only three site variables was able to capture 49.44 % of the variation in Site Index. Further localisation of the model could prove to be commercially useful. One of the key assumptions associated with Site Index, that it is unaffected by initial planting density, was tested using linear mixed effects modelling. The results show that there may well be role played by initial stocking in some species (notably E. dunnii and E. nitens), and that further work may be warranted. It was also shown that early measurement of dominant height results in poor estimates of Site Index, which will have a direct impact on inventory policies and on data to be included in Site Index modelling studies. This thesis is divided into six chapters: Chapter 1 contains a description of the concept of Site Index and it's origins, as well as, how the concept is used within the current forest planning processes. Chapter 2 contains an analysis on the influence of initial planted density on the estimate of Site Index. Chapter 3 explores the question of whether the age at which dominant height is measured has any effect on the quality of Site Index estimates. Chapter 4 looks at various modelling methodologies and compares the resultant models. Chapter 5 contains conclusions and recommendations for further study, and finally Chapter 6 discusses how any new Site Index model will effect the current planning protocol. / AFRIKAANSE OPSOMMING: Hedendaagse kommersiële bosbou gebruik groeiplek indeks (Site Index) as 'n aanduiding van huidige en toekomstige groeiplek moontlikhede, asook 'n metode om groeiplekke te vergelyk. Hierdie beginsel is diep gewortel in bestaande beplanningsprosesse en daarsonder kan empiriese groeien opbrengsmodelle nie in hul huidige vorm funksioneer nie. SuidAfrikaanse bosboumaatskappye bestee jaarliks groot bedrae geld aan die versameling van groeivoorraad data deur middel van opnames, maar weinig of geen geld word aangewend vir die insameling van ongemete vak data (veral groeiplek indeks) nie. Ongemete vak data word gebuik om meer as 90% van die produksie volume te beraam in langtermyn beplaning. 'n Behoefte bestaan om betroubare metodes te ontwikkel om groeiplek indeks te bereken vir groeiplekke wat nog nie opgemeet is nie. Die meeste vorige pogings om groeiplek indeks te beraam het meervoudige linêre regressie as model gebruik. Alternatiewe metodes is ondersoek; naamlik regressieboom analise, ewekansige woud analise, hibriedeof modelbome, meervoudige linêre regressie en meervoudige linêre regressie waarin die veranderlike faktore bepaal is deur regressiebome. Regressieboom analise blyk geskik te wees vir hierdie tipe data en 'n veralgemeende model met slegs drie groeiplek veranderlikes dek 49.44 % van die variasie in groeiplek indeks. Verdere lokalisering van die model kan dus van kommersiële waarde wees. 'n Sleutel aanname is gemaak dat aanvanklike plantdigtheid nie 'n invloed op groeiplek indeks het nie. Hierdie aanname is getoets deur linêre gemengde uitwerkings modelle. Die toetsuitslag dui op 'n moontlikheid dat plantdigtheid wel 'n invloed het op sommige spesies (vernaamlik E. dunnii en E. nitens) en verdere navorsing kan daarom geregverdig word. Dit is ook bewys dat metings van jonger bome vir dominante hoogtes gee aanleiding tot swak beramings van groeiplek indekse. Gevolglik sal hierdie toestsuitslag groeivoorraad opname beleid, asook die data wat vir groeiplek indeks modellering gebruik word, beïnvloed. Hierdie tesis word in ses hoofstukke onderverdeel. Hoofstuk een bevat 'n beskrywing van die beginsel van groeiplek indeks, die oorsprong daarvan, asook hoe die beginsel tans in huidige bosbou beplannings prosesse toegepas word. Hoofstuk twee bestaan uit ń ontleding van die invloed van aanvanklike plantdigtheid op die beraming van groeplek indeks. In hoofstuk drie word ondersoek wat die moontlike invloed is van die ouderdom waarop metings vir dominante hoogte geneem word, op die kwaliteit van groeplek indeks beramings het. Hoofstuk vier verken verskeie modelle metodologieë en vergelyk die uitslaggewende modelle. Hoofstuk vyf bevat gevolgtrekkings en voorstelle vir verdere studies. Afsluitend, is hoofstuk ses ń bespreking van hoe enige nuwe groeiplek indeks modelle die huidige beplannings protokol kan beïnvloed. Indirect site index Dominant height Initial planting density -- South Africa Measurement age Regression trees Random forests -- South Africa Hybrid model trees -- South africa Multiple linear regression Dissertations -- Forest and wood science Theses -- Forest and wood science Forest and Wood Science
52	Supervised Learning of Piecewise Linear Models Manwani, Naresh January 2012 (has links) (PDF) Supervised learning of piecewise linear models is a well studied problem in machine learning community. The key idea in piecewise linear modeling is to properly partition the input space and learn a linear model for every partition. Decision trees and regression trees are classic examples of piecewise linear models for classification and regression problems. The existing approaches for learning decision/regression trees can be broadly classified in to two classes, namely, fixed structure approaches and greedy approaches. In the fixed structure approaches, tree structure is fixed before hand by fixing the number of non leaf nodes, height of the tree and paths from root node to every leaf node of the tree. Mixture of experts and hierarchical mixture of experts are examples of fixed structure approaches for learning piecewise linear models. Parameters of the models are found using, e.g., maximum likelihood estimation, for which expectation maximization(EM) algorithm can be used. Fixed structure piecewise linear models can also be learnt using risk minimization under an appropriate loss function. Learning an optimal decision tree using fixed structure approach is a hard problem. Constructing an optimal binary decision tree is known to be NP Complete. On the other hand, greedy approaches do not assume any parametric form or any fixed structure for the decision tree classifier. Most of the greedy approaches learn tree structured piecewise linear models in a top down fashion. These are built by binary or multi-way recursive partitioning of the input space. The main issues in top down decision tree induction is to choose an appropriate objective function to rate the split rules. The objective function should be easy to optimize. Top-down decision trees are easy to implement and understand, but there are no optimality guarantees due to their greedy nature. Regression trees are built in the similar way as decision trees. In regression trees, every leaf node is associated with a linear regression function. All piece wise linear modeling techniques deal with two main tasks, namely, partitioning of the input space and learning a linear model for every partition. However, Partitioning of the input space and learning linear models for different partitions are not independent problems. Simultaneous optimal estimation of partitions and learning linear models for every partition, is a combinatorial problem and hence computationally hard. However, piecewise linear models provide better insights in to the classification or regression problem by giving explicit representation of the structure in the data. The information captured by piecewise linear models can be summarized in terms of simple rules, so that, they can be used to analyze the properties of the domain from which the data originates. These properties make piecewise linear models, like decision trees and regression trees, extremely useful in many data mining applications and place them among top data mining algorithms. In this thesis, we address the problem of supervised learning of piecewise linear models for classification and regression. We propose novel algorithms for learning piecewise linear classifiers and regression functions. We also address the problem of noise tolerant learning of classifiers in presence of label noise. We propose a novel algorithm for learning polyhedral classifiers which are the simplest form of piecewise linear classifiers. Polyhedral classifiers are useful when points of positive class fall inside a convex region and all the negative class points are distributed outside the convex region. Then the region of positive class can be well approximated by a simple polyhedral set. The key challenge in optimally learning a fixed structure polyhedral classifier is to identify sub problems, where each sub problem is a linear classification problem. This is a hard problem and identifying polyhedral separability is known to be NP complete. The goal of any polyhedral learning algorithm is to efficiently handle underlying combinatorial problem while achieving good classification accuracy. Existing methods for learning a fixed structure polyhedral classifier are based on solving non convex constrained optimization problems. These approaches do not efficiently handle the combinatorial aspect of the problem and are computationally expensive. We propose a method of model based estimation of posterior class probability to learn polyhedral classifiers. We solve an unconstrained optimization problem using a simple two step algorithm (similar to EM algorithm) to find the model parameters. To the best of our knowledge, this is the first attempt to form an unconstrained optimization problem for learning polyhedral classifiers. We then modify our algorithm to find the number of required hyperplanes also automatically. We experimentally show that our approach is better than the existing polyhedral learning algorithms in terms of training time, performance and the complexity. Most often, class conditional densities are multimodal. In such cases, each class region may be represented as a union of polyhedral regions and hence a single polyhedral classifier is not sufficient. To handle such situation, a generic decision tree is required. Learning optimal fixed structure decision tree is a computationally hard problem. On the other hand, top-down decision trees have no optimality guarantees due to the greedy nature. However, top-down decision tree approaches are widely used as they are versatile and easy to implement. Most of the existing top-down decision tree algorithms (CART,OC1,C4.5, etc.) use impurity measures to assess the goodness of hyper planes at each node of the tree. These measures do not properly capture the geometric structures in the data. We propose a novel decision tree algorithm that ,at each node, selects hyperplanes based on an objective function which takes into consideration geometric structure of the class regions. The resulting optimization problem turns out to be a generalized eigen value problem and hence is efficiently solved. We show through empirical studies that our approach leads to smaller size trees and better performance compared to other top-down decision tree approaches. We also provide some theoretical justification for the proposed method of learning decision trees. Piecewise linear regression is similar to the corresponding classification problem. For example, in regression trees, each leaf node is associated with a linear regression model. Thus the problem is once again that of (simultaneous) estimation of optimal partitions and learning a linear model for each partition. Regression trees, hinge hyperplane method, mixture of experts are some of the approaches to learn continuous piecewise linear regression models. Many of these algorithms are computationally intensive. We present a method of learning piecewise linear regression model which is computationally simple and is capable of learning discontinuous functions as well. The method is based on the idea of K plane regression that can identify a set of linear models given the training data. K plane regression is a simple algorithm motivated by the philosophy of k means clustering. However this simple algorithm has several problems. It does not give a model function so that we can predict the target value for any given input. Also, it is very sensitive to noise. We propose a modified K plane regression algorithm which can learn continuous as well as discontinuous functions. The proposed algorithm still retains the spirit of k means algorithm and after every iteration it improves the objective function. The proposed method learns a proper Piece wise linear model that can be used for prediction. The algorithm is also more robust to additive noise than K plane regression. While learning classifiers, one normally assumes that the class labels in the training data set are noise free. However, in many applications like Spam filtering, text classification etc., the training data can be mislabeled due to subjective errors. In such cases, the standard learning algorithms (SVM, Adaboost, decision trees etc.) start over fitting on the noisy points and lead to poor test accuracy. Thus analyzing the vulnerabilities of classifiers to label noise has recently attracted growing interest from the machine learning community. The existing noise tolerant learning approaches first try to identify the noisy points and then learn classifier on remaining points. In this thesis, we address the issue of developing learning algorithms which are inherently noise tolerant. An algorithm is inherently noise tolerant if, the classifier it learns with noisy samples would have the same performance on test data as that learnt from noise free samples. Algorithms having such robustness (under suitable assumption on the noise) are attractive for learning with noisy samples. Here, we consider non uniform label noise which is a generic noise model. In non uniform label noise, the probability of the class label for an example being incorrect, is a function of the feature vector of the example.(We assume that this probability is less than 0.5 for all feature vectors.) This can account for most cases of noisy data sets. There is no provably optimal algorithm for learning noise tolerant classifiers in presence of non uniform label noise. We propose a novel characterization of noise tolerance of an algorithm. We analyze noise tolerance properties of risk minimization frame work as risk minimization is a common strategy for classifier learning. We show that risk minimization under 01 loss has the best noise tolerance properties. None of the other convex loss functions have such noise tolerance properties. Empirical risk minimization under 01 loss is a hard problem as 01 loss function is not differentiable. We propose a gradient free stochastic optimization technique to minimize risk under 01 loss function for noise tolerant learning of linear classifiers. We show (under some conditions) that the algorithm converges asymptotically to the global minima of the risk under 01 loss function. We illustrate the noise tolerance of our algorithm through simulations experiments. We demonstrate the noise tolerance of the algorithm through simulations. Linear Models Linear Models (Classification) Linear Models (Regression) Polyhedral Classifiers Decision Trees Piecewise Linear Regression Noise Tolerant Learning Piecewise Linear Models Polyhedral Classifier Learning Geometric Decision Tree Regression Trees Nonlinear Models Supervised Learning Computer Science
53	Intensive poultry production and highly pathogenic avian influenza H5N1 in Thailand: statistical and process-based models / Production intensive de volailles et influenza aviaire hautement pathogène H5N1 en Thaïlande: approches statistiques et mécanistiques Van Boeckel, Thomas 26 September 2013 (has links) Le virus de l’influenza aviaire hautement pathogène (IAHP) de type H5N1 apparu en Chine en 1996 constitue une menace pour la santé humaine en raison de sa circulation endémique dans les volailles domestiques et de son potentiel zoonotique. La sévérité de l'infection liée à l'IAHP H5N1 est variable selon les espèces d'oiseaux: certains anatidés sont porteurs sains et asymptomatiques du virus tandis que dans les élevages de poulets, l'IAHP est fortement contagieux et caractérisé par des taux de mortalité supérieurs à 90%. Chez les humains, l'impact de l'IAHP H5N1 reste à ce jour modéré (630 cas humains dont 375 morts, World Health Organization Juin, 2013) en raison de la faible transmission du virus des volailles aux humains et d'humain à humain. Cependant, étant donné les taux de létalité élevés (>50%), un changement des modalités de transmission pourrait mener à un impact beaucoup plus élevé.<p>Depuis son émergence, l'IAHP H5N1 a eu un impact économique important dans de nombreux pays d’Asie du Sud-Est. La Thaïlande, pays qui fait partie des principaux exportateurs mondiaux de viande de volaille, a été sévèrement touchée par les multiples vagues épidémiques entre 2003 et 2005. Ces épisodes ont eu un impact sur les revenus des petits et moyens producteurs, mais également causé des pertes économiques importantes dans le secteur de la production intensive de volailles en raison de l'embargo imposé par les principaux marchés d'exportation. <p>L'objectif de ce travail est d’étudier quantitativement l'association entre la production intensive de la volaille et la distribution spatio-temporelle de l'IAHP H5N1 en Thaïlande. Deux approches ont été développées pour aborder cette étude: le développement d’une part de modèles statistiques visant à identifier les déterminants du risque d'IAHP H5N1, et d'autre part, de modèles mécanistiques visant à simuler des trajectoires épidémiques sur base de la connaissance des mécanismes de transmission de l'IAHP H5N1, de la structure du secteur de la production de volaille et des mesures d'intervention mises en place. <p>A l’aide de facteurs environnementaux et anthropogéniques, nous montrons que: (i) la distribution des canards domestiques en Asie peut être prédite en utilisant des modèles de régression non-linéaire, et (ii) la production de volailles peut être désagrégée entre production extensive et intensive sur base du nombre de volailles par éleveur. Enfin (iii), nous montrons en utilisant des arbres de régression boostés ("Boosted Regression Trees", BRT) que les principaux déterminants de la distribution du risque d'IAHP H5N1 sont les canards élevés en systèmes intensifs, le nombre de cycles de culture de riz et la proportion d'eau présente dans le paysage. Finalement, nous illustrons les potentialités des modèles mécanistiques pour évaluer l'efficacité des mesures d'intervention implémentées, tester des scénarios alternatifs d'intervention et identifier des stratégies optimales de prévention et d'intervention contre de futures épidémies<p> / Doctorat en Sciences agronomiques et ingénierie biologique / info:eu-repo/semantics/nonPublished Agronomie générale Sciences exactes et naturelles Avian influenza A virus -- Thailand Poultry -- Breeding -- Thailand Influenzavirus A aviaire -- Thaïlande Volailles -- Elevage -- Thaïlande Thailand Regression intensive production free grazing ducks Boosted Regression Trees Poultry Influenza H5N1
54	Pénalités minimales pour la sélection de modèle / Minimal penalties for model selection Sorba, Olivier 09 February 2017 (has links) Dans le cadre de la sélection de modèle par contraste pénalisé, L. Birgé and P. Massart ont prouvé que le phénomène de pénalité minimale se produit pour la sélection libre parmi des variables gaussiennes indépendantes. Nous étendons certains de leurs résultats à la partition d'un signal gaussien lorsque la famille de partitions envisagées est suffisamment riche, notamment dans le cas des arbres de régression. Nous montrons que le même phénomène se produit dans le cadre de l'estimation de densité. La richesse de la famille de modèle s'apparente à une forme d'isotropie. De ce point de vue le phénomène de pénalité minimale est intrinsèque. Pour corroborer et illustrer ce point de vue, nous montrons que le même phénomène se produit pour une famille de modèles d'orientation aléatoire uniforme. / L. Birgé and P. Massart proved that the minimum penalty phenomenon occurs in Gaussian model selection when the model family arises from complete variable selection among independent variables. We extend some of their results to discrete Gaussian signal segmentation when the model family corresponds to a sufficiently rich family of partitions of the signal's support. This is the case of regression trees. We show that the same phenomenon occurs in the context of density estimation. The richness of the model family can be related to a certain form of isotropy. In this respect the minimum penalty phenomenon is intrinsic. To corroborate this point of view, we show that the minimum penalty phenomenon occurs when the models are chosen randomly under an isotropic law. Contraste pénalisé Segmentation de signal gaussien Détection de ruptures multiples CART Moindres carrés pénalisés Estimation de densité Arbres de régression Penalized contrast Gaussian signal segmentation Multiple changepoints detection CART Penalized least-squares Density estimation Regression trees
55	Assessment of Machine Learning Applied to X-Ray Fluorescence Core Scan Data from the Zinkgruvan Zn-Pb-Ag Deposit, Bergslagen, Sweden Simán, Frans Filip January 2020 (has links) Lithological core logging is a subjective and time consuming endeavour which could possibly be automated, the question is if and to what extent this automation would affect the resulting core logs. This study presents a case from the Zinkgruvan Zn-Pb-Ag mine, Bergslagen, Sweden; in which Classification and Regression Trees and K-means Clustering on the Self Organising Map were applied to X-Ray Flourescence lithogeochemistry data derived from automated core scan technology. These two methods are assessed through comparison to manual core logging. It is found that the X-Ray Fluorescence data are not sufficiently accurate or precise for the purpose of automated full lithological classification since not all elements are successfully quantified. Furthermore, not all lithologies are possible to distinquish with lithogeochemsitry alone furter hindering the success of automated lithological classification. This study concludes that; 1) K-means on the Self Organising Map is the most successful approach, however; this may be influenced by the method of domain validation, 2) the choice of ground truth for learning is important for both supervised learning and the assessment of machine learning accuracy and 3) geology, data resolution and choice of elements are important parameters for machine learning. Both the supervised method of Classification and Regression Trees and the unsupervised method of K-means clustering applied to Self Organising Maps show potential to assist core logging procedures. XRF X-Ray Fluorescence Core Scanning Machine Learning AI SOM Self Organising Map CART Classification and Regression Trees K-means clustering Rock Classification Virtual Core Logging Zinkgruvan Geosciences, Multidisciplinary Multidisciplinär geovetenskap Geology Geologi Mineral and Mine Engineering Mineral- och gruvteknik
56	Using Gradient Boosting to Identify Pricing Errors in GLM-Based Tariffs for Non-life Insurance / Identifiering av felprissättningar i GLM-baserade skadeförsäkringstariffer genom Gradient boosting Greberg, Felix, Rylander, Andreas January 2022 (has links) Most non-life insurers and many creditors use regressions, more specifically Generalized Linear Models (GLM), to price their liabilities. One limitation with GLMs is that interactions between predictors are handled manually, which makes finding interactions a tedious and time-consuming task. This increases the cost of rate making and, more importantly, actuaries can miss important interactions resulting in sub-optimal customer prices. Several papers have shown that Gradient Tree Boosting can outperform GLMs in insurance pricing since it handles interactions automatically. Insurers and creditors are however reluctant to use so-called ”Black-Box” solutions for both regulatory and technical reasons. Tree-based methods have been used to identify pricing errors in regressions, albeit only as ad-hoc solutions. The authors instead propose a systematic approach to automatically identify and evaluate interactions between predictors before adding them to a traditional GLM. The model can be used in three different ways: Firstly, it can create a table of statistically significant candidate interactions to add to a GLM. Secondly, it can automatically and iteratively add new interactions to an old GLM until no more statistically significant interactions can be found. Lastly, it can automatically create a new GLM without an existing pricing model. All approaches are tested on two motor insurance data sets from a Nordic P&C insurer and the results show that all methods outperform the original GLMs. Although the two iterative modes perform better than the first, insurers are recommended to mainly use the first mode since this results in a reasonable trade-off between automating processes and leveraging actuaries’ professional judgment. / De flesta skadeförsäkringsbolag och många långivare använder regressioner, mer specifikt generaliserade linjära modeller (GLM), för att prissätta sina skulder. En begräsning med GLM:er är att interaktioner mellan exogena variabler hanteras manuellt, vilket innebär att hanteringen av dessa är tidskrävande. Detta påverkar försäkringsbolags lönsamhet på flera sätt. För det första ökar kostnaderna för att skapa tariffer och för det andra kan aktuarier missa viktiga interaktioner, vilket resulterar i suboptimala kundpriser. Tidigare forskning visar att Gradient Boosting kan överträffa GLM:er inom försäkringsprissättning eftersom denna metod hanterar interaktioner automatiskt. Försäkringsbolag och kreditgivare är dock motvilliga till att använda så kallade ”Black-box-lösningar” på grund av både regulatoriska och tekniska skäl. Trädbaserade metoder har tidigare använts för att hitta felprissättningar i regressioner, dock endast genom situationsanpassade lösningar. Författarna föreslår i stället en systematisk metod för att automatiskt identifiera och evaluera interaktioner innan de inkluderas i en traditionell GLM. Modellen kan användas på tre olika sätt: Först och främst kan den användas för att skapa en tabell med statistiskt signifikanta interaktioner att addera till en existerande GLM. Utöver detta kan den iterativt och automatiskt lägga till sådana interaktioner tills inga fler återstår. Slutligen kan modellen också användas för att skapa en helt ny GLM från grunden, utan en existerande prissättningsmodell. Metoderna testas på två motorförsäkringsdataset från ett nordiskt skadeförsäkringsbolag och resultaten visar att alla överträffar originalregressionen. Även om de två iterativa metoderna överträffar den första metoden rekommenderas försäkringsbolag att använda den första metoden. Detta eftersom den resulterar i en rimlig avvägning mellan att automatisera processer och att nyttja aktuariers omdömesförmåga. GLM Gradient Boosting XGBoost Non-life insurance Property & Casualty Rate making Insurance Tariff MTPL insurance Machine learning Regression trees Tweedie regression Credit risk GLM Gradient Boosting XGBoost Skadeförsäkring Prissättning Försäkringstariff Trafikförsäkring Regressionsträd Maskininlärning Tweedie-regression Kreditrisk Other Mathematics Annan matematik
57	Use of Adaptive Mobile Applications to Improve Mindfulness Boshoff, Wiehan 08 June 2018 (has links) No description available. Computer Engineering Computer Science Industrial Engineering Design Neurosciences Mental Health Mindfulness MAAS ACS Adaptive Mobile Applications Breathing Machine Learning Decision Trees Random Forests Attention Control Regression Trees Meditation fMRI NFT Neurofeedback Training Tinnitus Management
58	Development and Evaluation of an Integrated Approach to Study In-Bus Exposure Using Data Mining and Artificial Intelligence Methods Kadiyala, Akhil 24 September 2012 (has links) No description available. Civil Engineering Environmental Engineering Environmental Health Indoor Air Quality Public Transportation Buses Biodiesel Data Mining Sensitivity of the Regression Trees Artificial Neural Networks Genetic Algorithm Neural Networks Evolutionary Neural Networks In-Bus Exposure Air Quality Model Validation
59	Alien invaders and reptile traders : risk assessment and modelling of trends, vectors and traits influencing introduction and establishment of alien reptiles and amphibians Van Wilgen, Nicola Jane 12 1900 (has links) Thesis (PhD)--Stellenbosch University, 2010. / ENGLISH ABSTRACT: Biological invasions are a growing threat to biodiversity, trade and agriculture in South Africa. Though alien reptiles and amphibians (herpetofauna) are not currently a major issue, escalating problems worldwide and increased trade in South Africa suggest a possible increase in future problems. In this thesis I explore practical measures for risk assessment implementable under national legislation. I began by documenting record-keeping and legislative differences between provinces in South Africa. This revealed some serious deficiencies, complicating attempts to compile accurate inventories and discern import trends. International trade data, however, revealed an exponential increase in the number of imports to South Africa over the last 30 years. Characterising the abundance of species in this trade is important as species introduced in large numbers pose a higher establishment risk. In South Africa, I found a tendency for venomous and expensive species to be traded in low numbers, whereas species that are easy to breed and handle, or that are colourful or patterned are traded in higher numbers. Unlike South Africa, California and Florida have had a large number of well-documented herpetofaunal introductions. These introductions were used to verify the role of several key predictors in species establishment. I first evaluated the role of each variable separately. I examined different approaches for bioclimatic modelling, the predictive power of different sources of distribution data, and methods of assigning a climate-match score. I also present the first test of Darwin’s naturalization hypothesis for land vertebrates using two new phylogenies inferred for native and introduced reptiles in California and Florida. I then used boosted regression trees (BRT) to infer the relative contribution of each factor to species establishment success. Results from the BRTs were incorporated into a user-friendly spreadsheet model for use by assessors inexperienced in complex modelling techniques. Introduction effort was found to be the strongest contributor to establishment success. Furthermore, species with short juvenile periods were more likely to establish than species that started breeding later, as were species with more distant relatives in regional biotas. Average climate match and life form were also important. Of the herpetofaunal groups, frogs and lizards were most likely to establish, while snakes and turtles established at much lower rates, though analysis of all recorded herpetofaunal introductions shows slightly different patterns. Predictions made by the BRT model to independent data were relatively poor, though this is unlikely to be unique to this study and can be partially explained by missing data. Though numerous uncertainties remain in this field, many can be lessened by applying case by case rules rather than generalising across all herpetofaunal groups. The purpose for import and potential trade volume of a species will influence the threat it poses. Considering this in conjunction with a species’ environmental tolerances and previous success of species with similar life histories, should provide a reasonable and defendable estimate of establishment risk. Finally, a brief summary of the potential impacts of introduced alien herpetofauna is provided in the thesis. / AFRIKAANSE OPSOMMING: Indringer spesies hou ‘n al groter bedreiging in vir die biodiversiteit, handel en landbou van Suid- Afrika. Alhoewel uitheemse reptiele en amfibieërs (herpetofauna) tans nie ‘n groot bedreiging in Suid-Afrika is nie, dui groeiende probleme wêreldwyd asook 'n toename in plaaslike handel op moontlike toekomstige probleme. In hierdie tesis, ondersoek ek praktiese metodes vir risikobepaling wat onder nasionale wetgewing toegepas kan word. Ek begin deur die verskille in stoor van rekords en wetgewing tussen provinsies te dokumenteer. Hierdie proses het ernstige tekortkominge uitgewys, wat pogings om akkurate inventarisse saam te stel en invoertendense te bepaal, bemoeilik. Internasionale handelsdata het egter getoon dat daar ‘n eksponensiële toename in die hoeveelheid invoere na Suid-Afrika oor die laaste 30 jaar was. Die hoeveelheid spesies in hierdie handel is belangrik omdat spesies wat in groot hoeveelhede ingevoer word, ‘n hoër vestigingsrisiko het. In Suid-Afrika is ‘n tendens gevind vir handel in giftige en duur spesies teen lae hoeveelhede, terwyl spesies wat maklik teel, maklik hanteer kan word en kleurvol is of mooi patrone het, in groter hoeveelhede mee handel gedryf word. Kalifornië and Florida, in teenstelling met Suid-Afrika, het ‘n hoë aantal goed-gedokumenteerde gevalle van herpetofauna wat in die natuur vrygestel is. Hierdie introduksies was gebruik om die rol van verskeie belangrike faktore in die vestiging van populasies te bepaal. Eerstens het ek die rol van elke faktor apart ondersoek. Ek het verskillende benaderinge vir bioklimatiese model-bou ondersoek, die akuraatheid van verskillende bronne van distribusiedata getoets en drie metodes om ‘n “climate match score” te bereken, voorgestel. Ek bied ook die eerste toets van Darwin se naturalisasie-hipotese vir landwerveldiere aan, deur gebruik te maak van twee nuwe filogenieë wat ek gebou het vir inheemse en ingevoerde reptiele in Kalifornië en Florida. Ek het verder gebruik gemaak van “boosted regression trees” (BRT) om die relatiewe bydrae van elke faktor tot die vestigings-potensiaal van spesies te bepaal. Resultate van hierdie BRTs was ingekorporeerd in ‘n gebruikersvriendelike ontledingstaat wat deur bestuurders, onervare in komplekse modelboutegnieke, gebruik kan word. Invoer-hoeveelheid was die faktor wat die sterktste bygedra het tot suksesvolle vestiging. Verder is spesies met kort jeugperiodes en dié met verlangse familie in streeks-biotas, meer geskik om suksesvol te vestig. Gemiddelde klimaatooreenstemming en lewensvorm was ook belangrik. Paddas en akkedisse was die mees waarskynlikste van die herpetofauna groepe om populasies te vestig, terwyl slange en skilpaaie teen laer tempo’s populasies gevestig het, alhoewel analise van alle gedokumenteerde gevalle van herpetofauna-invoerings wêreldwyd effens verskillende tendense toon. Voorspellings wat deur die BRT-model vir onafhangklike data gemaak is was redelik swak, alhoewel hierdie resultaat onwaarskynlik nie uniek aan die studie is nie, en word gedeeltelik verduidelik deur die gebrek aan data. Alhoewel talle onsekerhede steeds bestaan, kan dié verminder word deur geval-tot-geval reëls toe te pas eerder as om vir herpetofauna as ‘n groep te veralgemeen. Die doel van invoer en potensiële handel-volumes van ‘n spesie, sal die bedreiging wat die spesie toon, bepaal. Hierdie faktore moet saam met omgewingstoleransie en voorafgaande sukses van spesies met soortgelyke lewenswyses oorweeg word, om ‘n aanvaarbare en verdedigbare beraming van vestigingsrisiko te gee. Laastens, word ‘n kort opsomming van die effekte wat uitheemse herpetofauna mag hê, verskaf. Herpetofauna Exotic species Pet trade Alien species policy Darwin's naturalization hypothesis Bioclimatic envelope modelling Boosted regression trees Establishment rates California Florida Dissertations -- Zoology Theses -- Zoology Biological invasions Risk assessment Live animal trade
60	Informed statistical modelling of habitat suitability for rare and threatened species O'Leary, Rebecca A. January 2008 (has links) In this thesis a number of statistical methods have been developed and applied to habitat suitability modelling for rare and threatened species. Data available on these species are typically limited. Therefore, developing these models from these data can be problematic and may produce prediction biases. To address these problems there are three aims of this thesis. The _rst aim is to develop and implement frequentist and Bayesian statistical modelling approaches for these types of data. The second aim is develop and implement expert elicitation methods. The third aim is to apply these novel approaches to Australian rare and threatened species case studies with the intention of habitat suitability modelling. The _rst aim is ful_lled by investigating two innovative approaches for habitat suitability modelling and sensitivity analysis of the second approach to priors. The _rst approach is a new multilevel framework developed to model the species distribution at multiple scales and identify excess zeros (absences outside the species range). Applying a statistical modelling approach to the identi_cation of excess zeros has not previously been conducted. The second approach is an extension and application of Bayesian classi_cation trees to modelling the habitat suitability of a threatened species. This is the _rst `real' application of this approach in ecology. Lastly, sensitivity analysis of the priors in Bayesian classi_cation trees are examined for a real case study. Previously, sensitivity analysis of this approach to priors has not been examined. To address the second aim, expert elicitation methods are developed, extended and compared in this thesis. In particular, one elicitation approach is extended from previous research, there is a comparison of three elicitation methods, and one new elicitation approach is proposed. These approaches are illustrated for habitat suitability modelling of a rare species and the opinions of one or two experts are elicited. The _rst approach utilises a simple questionnaire, in which expert opinion is elicited on whether increasing values of a covariate either increases, decreases or does not substantively impact on a response. This approach is extended to express this information as a mixture of three normally distributed prior distributions, which are then combined with available presence/absence data in a logistic regression. This is one of the _rst elicitation approaches within the habitat suitability modelling literature that is appropriate for experts with limited statistical knowledge and can be used to elicit information from single or multiple experts. Three relatively new approaches to eliciting expert knowledge in a form suitable for Bayesian logistic regression are compared, one of which is the questionnaire approach. Included in this comparison of three elicitation methods are a summary of the advantages and disadvantages of these three methods, the results from elicitations and comparison of the prior and posterior distributions. An expert elicitation approach is developed for classi_cation trees, in which the size and structure of the tree is elicited. There have been numerous elicitation approaches proposed for logistic regression, however no approaches have been suggested for classi_cation trees. The last aim of this thesis is addressed in all chapters, since the statistical approaches proposed and extended in this thesis have been applied to real case studies. Two case studies have been examined in this thesis. The _rst is the rare native Australian thistle (Stemmacantha australis), in which the dataset contains a large number of absences distributed over the majority of Queensland, and a small number of presence sites that are only within South-East Queensland. This case study motivated the multilevel modelling framework. The second case study is the threatened Australian brush-tailed rock-wallaby (Petrogale penicillata). The application and sensitivity analysis of Bayesian classi_cation trees, and all expert elicitation approaches investigated in this thesis are applied to this case study. This work has several implications for conservation and management of rare and threatened species. Novel statistical approaches addressing the _rst aim provide extensions to currently existing methods, or propose a new approach, for identi _cation of current and potential habitat. We demonstrate that better model predictions can be achieved using each method, compared to standard techniques. Elicitation approaches addressing the second aim ensure expert knowledge in various forms can be harnessed for habitat modelling, a particular bene_t for rare and threatened species which typically have limited data. Throughout, innovations in statistical methodology are both motivated and illustrated via habitat modelling for two rare and threatened species: the native thistle Stemmacantha australis and the brush-tailed rock wallaby Petrogale penicillata. variable selection

Search results