11 |
Species Distribution Modeling: Implications of Modeling Approaches, Biotic Effects, Sample Size, and Detection LimitWang, Lifei 14 January 2014 (has links)
When we develop and use species distribution models to predict species' current or potential distributions, we are faced with the trade-offs between model generality, precision, and realism. It is important to know how to improve and validate model generality while maintaining good model precision and realism. However, it is difficult for ecologists to evaluate species distribution models using field-sampled data alone because the true species response function to environmental or ecological factors is unknown. Species distribution models should be able to approximate the true characteristics and distributions of species if ecologists want to use them as reliable tools. Simulated data provide the advantage of being able to know the true species-environment relationships and control the causal factors of interest to obtain insights into the effects of these factors on model performance. I used a case study on Bythotrephes longimanus distributions from several hundred Ontario lakes and a simulation study to explore the effects on model performance caused by several factors: the choice of predictor variables, the model evaluation methods, the quantity and quality of the data used for developing models, and the strengths and weaknesses of different species distribution models. Linear discriminant analysis, multiple logistic regression, random forests, and artificial neural networks were compared in both studies. Results based on field data sampled from lakes indicated that the predictive performance of the four models was more variable when developed on abiotic (physical and chemical) conditions alone, whereas the generality of these models improved when including biotic (relevant species) information. When using simulated data, although the overall performance of random forests and artificial neural networks was better than linear discriminant analysis and multiple logistic regression, linear discriminant analysis and multiple logistic regression had relatively good and stable model sensitivity at different sample size and detection limit levels, which may be useful for predicting species presences when data are limited. Random forests performed consistently well at different sample size levels, but was more sensitive to high detection limit. The performance of artificial neural networks was affected by both sample size and detection limit, and it was more sensitive to small sample size.
|
12 |
Species Distribution Modeling: Implications of Modeling Approaches, Biotic Effects, Sample Size, and Detection LimitWang, Lifei 14 January 2014 (has links)
When we develop and use species distribution models to predict species' current or potential distributions, we are faced with the trade-offs between model generality, precision, and realism. It is important to know how to improve and validate model generality while maintaining good model precision and realism. However, it is difficult for ecologists to evaluate species distribution models using field-sampled data alone because the true species response function to environmental or ecological factors is unknown. Species distribution models should be able to approximate the true characteristics and distributions of species if ecologists want to use them as reliable tools. Simulated data provide the advantage of being able to know the true species-environment relationships and control the causal factors of interest to obtain insights into the effects of these factors on model performance. I used a case study on Bythotrephes longimanus distributions from several hundred Ontario lakes and a simulation study to explore the effects on model performance caused by several factors: the choice of predictor variables, the model evaluation methods, the quantity and quality of the data used for developing models, and the strengths and weaknesses of different species distribution models. Linear discriminant analysis, multiple logistic regression, random forests, and artificial neural networks were compared in both studies. Results based on field data sampled from lakes indicated that the predictive performance of the four models was more variable when developed on abiotic (physical and chemical) conditions alone, whereas the generality of these models improved when including biotic (relevant species) information. When using simulated data, although the overall performance of random forests and artificial neural networks was better than linear discriminant analysis and multiple logistic regression, linear discriminant analysis and multiple logistic regression had relatively good and stable model sensitivity at different sample size and detection limit levels, which may be useful for predicting species presences when data are limited. Random forests performed consistently well at different sample size levels, but was more sensitive to high detection limit. The performance of artificial neural networks was affected by both sample size and detection limit, and it was more sensitive to small sample size.
|
13 |
Digital Twin-based Intrusion Detection for Industrial Control SystemsVarghese, Seba January 2021 (has links)
Digital twins for industrial control systems have gained significant interest over recent years. This attention is mainly because of the advanced capabilities offered by digital twins in the areas of simulation, optimization, and predictive maintenance. Some recent studies discuss the possibility of using digital twins for intrusion detection in industrial control systems. To this end, this thesis aims to propose a security framework for industrial control systems including its digital twin for security monitoring and a machine learning-based intrusion detection system for real-time intrusion detection. The digital twin solution used in this study is a standalone simulation of an industrial filling plant available as open-source. After thoroughly evaluating the implementation aspects of the existing knowledge-driven open-source digital twin solutions of industrial control systems, this solution is chosen. The cybersecurity analysis approach utilizes this digital twin to model and execute different realistic process-aware attack scenarios and generate a training dataset reflecting the process measurements under normal operations and attack scenarios. A total of 23 attack scenarios are modelled and executed in the digital twin and these scenarios belong to four different attack types, naming command injection, network DoS, calculated measurement injection, and naive measurement injection. Furthermore, the proposed framework also includes a machine learning-based intrusion detection system. This intrusion detection system is designed in two stages. The first stage involves an offline evaluation of the performance of eight different supervised machine learning algorithms on the labelled dataset. In the second stage, a stacked ensemble classifier model that combines the best performing supervised algorithms on different training dataset labels is modelled as the final machine learning model. This stacked ensemble model is trained offline using the labelled dataset and then used for classifying the incoming data samples from the digital twin during the live operation of the system. The results show that the designed intrusion detection system is capable of detecting and classifying intrusions in near real-time (0.1 seconds). The practicality and benefits of the proposed digital twin-based security framework are also discussed in this work. / Digitala tvillingar för industriella styrsystem har fått ett betydande intresse under de senaste åren. Denna uppmärksamhet beror främst på de avancerade möjligheter som digitala tvillingar erbjuder inom simulering, optimering och förutsägbart underhåll. Några färska studier diskuterar möjligheten att använda digitala tvillingar för intrångsdetektering i industriella styrsystem. För detta ändamål syftar denna avhandling till att föreslå ett säkerhetsramverk för industriella styrsystem inklusive dess digitala tvilling för säkerhetsövervakning och ett maskininlärningsbaserat intrångsdetekteringssystem för intrångsdetektering i realtid. Den digitala tvillinglösningen som används i denna studie är en fristående simulering av en industriell fyllningsanläggning som finns tillgänglig som öppen källkod. Efter noggrann utvärdering av implementeringsaspekterna för de befintliga kunskapsdrivna digitala tvillinglösningarna med öppen källkod för industriella styrsystem, väljs denna lösning. Cybersäkerhetsanalysmetoden använder denna digitala tvilling för att modellera och exekvera olika realistiska processmedvetna attackscenarier och generera en utbildningsdataset som återspeglar processmätningarna under normala operationer och attackscenarier. Totalt 23 angreppsscenarier modelleras och utförs i den digitala tvillingen och dessa scenarier tillhör fyra olika angreppstyper, namnskommandoinjektion, nätverks -DoS, beräknad mätinjektion och naiv mätinjektion. Dessutom innehåller det föreslagna ramverket också ett maskininlärningsbaserat system för intrångsdetektering. Detta intrångsdetekteringssystem är utformat i två steg. Det första steget innebär en offline -utvärdering av prestanda för åtta olika algoritmer för maskininlärning övervakad på den märkta datauppsättningen. I det andra steget modelleras en staplad ensemble -klassificerarmodell som kombinerar de bäst presterande övervakade algoritmerna på olika etiketter för utbildningsdataset som den slutliga modellen för maskininlärning. Denna staplade ensemblemodell tränas offline med hjälp av den märkta datauppsättningen och används sedan för att klassificera inkommande dataprover från den digitala tvillingen under systemets levande drift. Resultaten visar att det konstruerade intrångsdetekteringssystemet kan upptäcka och klassificera intrång i nära realtid (0,1 sekunder). Det praktiska och fördelarna med den föreslagna digitala tvillingbaserade säkerhetsramen diskuteras också i detta arbete.
|
Page generated in 0.0825 seconds