Global ETD Search

11	Statistical methods for species richness estimation using count data from multiple sampling units Argyle, Angus Gordon 23 April 2012 (has links) The planet is experiencing a dramatic loss of species. The majority of species are unknown to science, and it is usually infeasible to conduct a census of a region to acquire a complete inventory of all life forms. Therefore, it is important to estimate and conduct statistical inference on the total number of species in a region based on samples obtained from field observations. Such estimates may suggest the number of species new to science and at potential risk of extinction. In this thesis, we develop novel methodology to conduct statistical inference, based on abundance-based data collected from multiple sampling locations, on the number of species within a taxonomic group residing in a region. The primary contribution of this work is the formulation of novel statistical methodology for analysis in this setting, where abundances of species are recorded at multiple sampling units across a region. This particular area has received relatively little attention in the literature. In the first chapter, the problem of estimating the number of species is formulated in a broad context, one that occurs in several seemingly unrelated fields of study. Estimators are commonly developed from statistical sampling models. Depending on the organisms or objects under study, different sampling techniques are used, and consequently, a variety of statistical models have been developed for this problem. A review of existing estimation methods, categorized by the associated sampling model, is presented in the second chapter. The third chapter develops a new negative binomial mixture model. The negative binomial model is employed to account for the common tendency of individuals of a particular species to occur in clusters. An exponential mixing distribution permits inference on the number of species that exist in the region, but were in fact absent from the sampling units. Adopting a classical approach for statistical inference, we develop the maximum likelihood estimator, and a corresponding profile-log-likelihood interval estimate of species richness. In addition, a Gaussian-based confidence interval based on large-sample theory is presented. The fourth chapter further extends the hierarchical model developed in Chapter 3 into a Bayesian framework. The motivation for the Bayesian paradigm is explained, and a hierarchical model based on random effects and discrete latent variables is presented. Computing the posterior distribution in this case is not straight-forward. A data augmentation technique that indirectly places priors on species richness is employed to compute the model using a Metropolis-Hastings algorithm. The fifth chapter examines the performance of our new methodology. Simulation studies are used to examine the mean-squared error of our proposed estimators. Comparisons to several commonly-used non-parametric estimators are made. Several conclusions emerge, and settings where our approaches can yield superior performance are clarified. In the sixth chapter, we present a case study. The methodology is applied to a real data set of oribatid mites (a taxonomic order of micro-arthropods) collected from multiple sites in a tropical rainforest in Panama. We adjust our statistical sampling models to account for the varying masses of material sampled from the sites. The resulting estimates of species richness for the oribatid mites are useful, and contribute to a wider investigation, currently underway, examining the species richness of all arthropods in the rainforest. Our approaches are the only existing methods that can make full use of the abundance-based data from multiple sampling units located in a single region. The seventh and final chapter concludes the thesis with a discussion of key considerations related to implementation and modeling assumptions, and describes potential avenues for further investigation. / Graduate species richness negative binomial mixture finite mixture model data augmentation hierarchical model
12	Disease Mapping with log Gaussian Cox Processes Li, Ye 16 August 2013 (has links) One of the main classes of spatial epidemiological studies is disease mapping, where the main aim is to describe the overall disease distribution on a map, for example, to highlight areas of elevated or lowered mortality or morbidity risk, or to identify important social or environmental risk factors adjusting for the spatial distribution of the disease. This thesis focused and proposed solutions to two most commonly seen obstacles in disease mapping applications, the changing census boundaries due to long study period and data aggregation for patients' confidentiality. In disease mapping, when target diseases have low prevalence, the study usually covers a long time period to accumulate sufficient cases. However, during this period, numerous irregular changes in the census regions on which population is reported may occur, which complicates inferences. A new model was developed for the case when the exact location of the cases is available, consisting of a continuous random spatial surface and fixed effects for time and ages of individuals. The process is modelled on a fine grid, approximating the underlying continuous risk surface with Gaussian Markov Random Field and Bayesian inference is performed using integrated nested Laplace approximations. The model was applied to clinical data on the location of residence at the time of diagnosis of new Lupus cases in Toronto, Canada, for the 40 years to 2007, with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are produced for Lupus and, for comparison, Psoriatic Arthritis data from the same clinic. Simulation studies are also carried out to better understand the performance of the proposed model as well as to compare with existing methods. When the exact locations of the cases are not known, inference is complicated by the uncertainty of case locations due to data aggregation on census regions for confidentiality. Conventional modelling relies on the census boundaries that are unrelated to the biological process being modelled, and may result in stronger spatial dependence in less populated regions which dominate the map. A new model was developed consisting of a continuous random spatial surface with aggregated responses and fixed covariate effects on census region levels. The continuous spatial surface was approximated by Markov random field, greatly reduces the computational complexity. The process was modelled on a lattice of fine grid cells and Bayesian inference was performed using Markov Chain Monte Carlo with data augmentation. Simulation studies were carried out to assess performance of the proposed model and to compare with the conventional Besag-York-Molli\'e model as well as model assuming exact locations are known. Receiver operating characteristic curves and Mean Integrated Squared Errors were used as measures of performance. For the application, surveillance data on the locations of residence at the time of diagnosis of syphilis cases in North Carolina, for the 9 years to 2007 are modelled with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are also produced, identifying Lumberton as a ``syphilis hotspot". Markov random field Bayesian inference geostatistics data augmentation spatial statistics 0463 0573
13	Disease Mapping with log Gaussian Cox Processes Li, Ye 16 August 2013 (has links) One of the main classes of spatial epidemiological studies is disease mapping, where the main aim is to describe the overall disease distribution on a map, for example, to highlight areas of elevated or lowered mortality or morbidity risk, or to identify important social or environmental risk factors adjusting for the spatial distribution of the disease. This thesis focused and proposed solutions to two most commonly seen obstacles in disease mapping applications, the changing census boundaries due to long study period and data aggregation for patients' confidentiality. In disease mapping, when target diseases have low prevalence, the study usually covers a long time period to accumulate sufficient cases. However, during this period, numerous irregular changes in the census regions on which population is reported may occur, which complicates inferences. A new model was developed for the case when the exact location of the cases is available, consisting of a continuous random spatial surface and fixed effects for time and ages of individuals. The process is modelled on a fine grid, approximating the underlying continuous risk surface with Gaussian Markov Random Field and Bayesian inference is performed using integrated nested Laplace approximations. The model was applied to clinical data on the location of residence at the time of diagnosis of new Lupus cases in Toronto, Canada, for the 40 years to 2007, with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are produced for Lupus and, for comparison, Psoriatic Arthritis data from the same clinic. Simulation studies are also carried out to better understand the performance of the proposed model as well as to compare with existing methods. When the exact locations of the cases are not known, inference is complicated by the uncertainty of case locations due to data aggregation on census regions for confidentiality. Conventional modelling relies on the census boundaries that are unrelated to the biological process being modelled, and may result in stronger spatial dependence in less populated regions which dominate the map. A new model was developed consisting of a continuous random spatial surface with aggregated responses and fixed covariate effects on census region levels. The continuous spatial surface was approximated by Markov random field, greatly reduces the computational complexity. The process was modelled on a lattice of fine grid cells and Bayesian inference was performed using Markov Chain Monte Carlo with data augmentation. Simulation studies were carried out to assess performance of the proposed model and to compare with the conventional Besag-York-Molli\'e model as well as model assuming exact locations are known. Receiver operating characteristic curves and Mean Integrated Squared Errors were used as measures of performance. For the application, surveillance data on the locations of residence at the time of diagnosis of syphilis cases in North Carolina, for the 9 years to 2007 are modelled with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are also produced, identifying Lumberton as a ``syphilis hotspot". Markov random field Bayesian inference geostatistics data augmentation spatial statistics 0463 0573
14	Entwicklung eines Monte-Carlo-Verfahrens zum selbständigen Lernen von Gauß-Mischverteilungen Lauer, Martin. Unknown Date (has links) (PDF) Universiẗat, Diss., 2004--Osnabrück.
15	Human pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNetHuman pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNet Calzavara, Ivan January 2020 (has links) In recent years, deep learning, a critical technology in computer vision, has achieved remarkable milestones in many fields, such as image classification and object detection. In particular, it has also been introduced to address the problem of violence detection, which is a big challenge considering the complexity to establish an exact definition for the phenomenon of violence. Thanks to the ever increasing development of new technologies for surveillance, we have nowadays access to an enormous database of videos that can be analyzed to find any abnormal behavior. However, by dealing with such huge amount of data it is unrealistic to manually examine all of them. Deep learning techniques, instead, can automatically study, learn and perform classification operations. In the context of violence detection, with the extraction of visual harmful patterns, it is possible to design various descriptors to represent features that can identify them. In this research we tackle the task of generating new augmented datasets in order to try to simplify the identification step performed by a violence detection technique in the field of Deep Learning. The novelty of this work is to introduce the usage of DensePose model to enrich the images in a dataset by highlighting (i.e. by identifying and segmenting) all the human beings present in them. With this approach we gained knowledge of how this algorithm performs on videos with a violent context and how the violent detection network benefit from this procedure. Performances have been evaluated from the point of view of segmentation accuracy and efficiency of the violence detection network, as well from the computational point of view. Results shows how the context of the scene is the major indicator that brings the DensePose model to correct segment human beings and how the context of violence does not seem to be the most suitable field for the application of this model since the common overlap of bodies (distinctive aspect of violence) acts as disadvantage for the segmentation. For this reason, the violence detection network does not exploit its full potential. Finally, we understood how such augmented datasets can boost up the training speed by reducing the time needed for the weights-update phase, making this procedure a helpful adds-on for implementations in different contexts where the identification of human beings still plays the major role. Violent Detection Deep Learning DensePose Human Pose Data Augmentation Image Segmentation Computer Engineering Datorteknik
16	A Study on Generative Adversarial Networks Exacerbating Social Data Bias January 2020 (has links) abstract: Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Artificial intelligence Computer science Ethics bias data augmentation generative adversarial network machine learning society
17	Machine Learning on Acoustic Signals Applied to High-Speed Bridge Deck Defect Detection Chou, Yao 06 December 2019 (has links) Machine learning techniques are being applied to many data-intensive problems because they can accurately provide classification of complex data using appropriate training. Often, the performance of machine learning can exceed the performance of traditional techniques because machine learning can take advantage of higher dimensionality than traditional algorithms. In this work, acoustic data sets taken using a rapid scanning technique on concrete bridge decks provided an opportunity to both apply machine learning algorithms to improve detection performance and also to investigate the ways that training of neural networks can be aided by data augmentation approaches. Early detection and repair can enhance safety and performance as well as reduce long-term maintenance costs of concrete bridges. In order to inspect for non-visible internal cracking (called delaminations) of concrete bridges, a rapid inspection method is needed. A six-channel acoustic impact-echo sounding apparatus is used to generate large acoustic data sets on concrete bridge decks at high speeds. A machine learning data processing architecture is described to accurately detect and map delaminations based on the acoustic responses. The machine learning approach achieves accurate results at speeds between 25 and 45 km/h across a bridge deck and successfully demonstrates the use of neural networks to analyze this type of acoustic data. In order to obtain excellent performance, model training generally requires large data sets. However, in many potentially interesting cases, such as bridge deck defect detection, acquiring enough data for training can be difficult. Data augmentation can be used to increase the effective size of the training data set. Acoustic signal data augmentation is demonstrated in conjunction with a machine learning model for acoustic defect detection on bridge decks. Four different augmentation methods are applied to data using two different augmentation strategies. This work demonstrates that a "goldilocks" data augmentation approach can be used to increase machine learning performance when only a limited data set is available. The major technical contributions of this work include application of machine learning to acoustic data sets relevant to bridge deck inspection, solving an important problem in the field of nondestructive evaluation, and a more generalized approach to data augmentation of limited acoustic data sets to expand the classes of acoustic problems that machine learning can successfully address. bridge defect detection convolutional neural networks data augmentation delaminations machine learning Engineering
18	Simulating Artificial Recombination for a Deep Convolutional Autoencoder Levin, Fredrik January 2021 (has links) Population structure is an important field of study due to its importance in finding underlying genetics of various diseases.This is why this thesis has looked at a newly presented deep convolutional autoencoder that has been showing promising results when compared to the state-of-the-art method for quantifying genetic similarities within population structure. The main focus was to introduce data augmentation in the form of artificial diploid recombination to this autoencoder in an attempt to increase performance and robustness of the network structure. The training data for the network consist of arrays containing information about single-nucleotide polymorphisms present in an individual. Each instance of augmented data was simulated by randomising cuts based on the distance between the polymorphisms, and then creating a new array by alternating between the arrays of two randomised original data instances. Several networks were then trained using this data augmentation. The performance of the trained networks was compared to networks trained on only original data using several metrics. Both groups of networks had similar performance for most metrics. The main difference was that networks trained on only original data had a low genotype concordance on simulated data. This indicates an underlying risk using the original networks, which can be overcome by introducing the artificial recombination. Machine Learning Recombination Data Augmentation Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Computer Sciences Datavetenskap (datalogi)
19	A Data Augmentation Methodology for Class-imbalanced Image Processing in Prognostic and Health Management Yang, Shaojie January 2020 (has links) No description available. Mechanical Engineering Data Augmentation Image Processing Machine Vision Deep Learning Prognostics and Health Management
20	Cycling Safety Data Augmentation in the Urban Environment Costa, Miguel, Roque, Carlos, Marques, Manuel, Moura, Filipe 02 January 2023 (has links) Cities plan to revitalize sustainable transportation, with a key emphasis on cycling. However, cities need to provide safe environments for cyclists through better infrastru.cture design. education programs, or other interventions to increase cycling nwnbers, as safety concerns greatly discourage people from cycling. Thus, cities' strategies aim to protect and improve the safety ofthose who cycle. Here, cycling research con1ributes to understanding cycling and what factors related to the individua4 the bicycle, and the surrounding environm.ent, in.fluence the risk. cyclists face. Objective cycling safety goals are to i) decrease the outcome severity of accidents involving cyclists and ii) decrease the overall nwnber of accidents. lt is often based on accident records or police reports, yet most incidents are often not reported. Nevertheless, accident statistics are vital because they allow for factors such as demographic data and built environment tobe analyzed to understand cyclists' risk. ofbeing involved or injured in an accident. There is a worldwide need for more data about cycling accidents, their context, and the built environm.ent's influence. Hence complete datasets are required. We mak:e use of CYCLANDS - a collection of 30 datasets comprising 1.58M cycling accident records - to explore how other data and analysis can complement accident records. Thus, a subset of CYCLANDS was augmented to analyze circulation spaces around accident locations. We hope this takes a step in that direction, fostering the mix of authoritative and volunteered data and providing a more complete data set.

Search results