Global ETD Search

71	Using Neural Networks to identify Individual Animals from Photographs Kabuga, Emmanuel 04 May 2020 (has links) Effective management needs to know sizes of animal populations. This can be accomplished in various ways, but a very popular way is mark-recapture studies. Mark-recapture studies need a way of telling if a captured animal has been previously seen. For traditional mark-recapture, this is achieved by applying a tag to the animal. For non-invasive mark-recapture methods which exploit photographs, there is no tag on the animal’s body. As a result, these methods require animals to be individually identifiable. They assess if an animal has been caught before by examining photographs for animals which have individual-specific marks (Cross et al., 2014; Gomez et al., 2016; Beijbom et al., 2016; Körschens, Barz, and Denzler, 2018). This study develops a model which can reliably match photographs of the same individual based on individual-specific marks. The model consists of two main parts, an object detection model, and a classifier which takes two photos as input and outputs a predicted probability that the pair is from the same individual (a match). The object detection model is a convolutional neural network (CNN) and the matching classifier is a special kind of CNN called a siamese network. The siamese network uses a pair of CNNs that share weights to summarise the images, followed by some dense layers which combine the summaries into measures of similarity which can be used to predict a match. The model is tested on two case studies, humpback whales (HBWs) and western leopard toads (WLTs). The HBW dataset consists of images originally collected by various institutions across the globe and uploaded to the Happywhale platform which encourages scientists to identify individual mammals. HBWs can be identified by their fins and specials markings. There is lots of data for this problem. The WLT dataset consists of images collected by citizen scientists in South Africa. They were either uploaded to iSpot, a citizen science project which collects images or sent to the (WLT) project, a conservation project staffed by volunteers. WLTs can be identified by their unique spots. There is a little data for this problem. One part of this dataset consists of labelled individuals and another part is unlabelled. The model was able to give good results for both HBWs and WLTs. In 95% of the cases the model managed to correctly identify if a pair of images is from the same HBW individual or not. It accurately identified if a pair of images is drawn from the same WLT individual or not in 87% of the cases. This study also assessed the effectiveness of the semi-supervised approach on the WLT unlabelled dataset. In this study, the semisupervised approach has been partially successful. The model was able to identify new individuals and matches which were not identified before, but they were relatively few in numbers. Without an exhaustive check of the data, it is not clear whether this is due to the failure of the semi-supervised approach, or because there are not many matches in the data. After adding the newly identified and labelled individuals to the WLT labelled dataset, the model slightly improved its performance and correctly identified 89% of WLT pairs. A number of computer-aided photo-matching algorithms have been proposed (Matthé et al., 2017). This study also assessed the performance of Wild-ID (Bolger et al., 2012), one of the commonly used photo-matching algorithm on both HBW and WLT datasets. The model developed in this thesis achieved very competitive results compared with Wild-ID. Model accuracies for the proposed siamese network were much higher than those returned by Wild-ID on the HBW dataset, and roughly the same on the WLT dataset. Statistical Sciences
72	Reinforcement learning for telescope optimisation Blows, Curtly 27 February 2020 (has links) Reinforcement learning is a relatively new and unexplored branch of machine learning with a wide variety of applications. This study investigates reinforcement learning and provides an overview of its application to a variety of different problems. We then explore the possible use of reinforcement learning for telescope target selection and scheduling in astronomy with the hope of effectively mimicking the choices made by professional astronomers. This is relevant as next-generation astronomy surveys will require near realtime decision making in response to high-speed transient discoveries. We experiment with and apply some of the leading approaches in reinforcement learning to simplified models of the target selection problem. We find that the methods used in this study show promise but do not generalise well. Hence while there are indications that reinforcement learning algorithms could work, more sophisticated algorithms and simulations are needed. Statistical Sciences
73	Nonlinear mixed effects modeling of gametocyte carriage in patients with uncomplicated malaria Distiller, G B January 2007 (has links) Includes bibliographical references (leaves 96-102) Statistical Sciences
74	The optimal asset allocation for South African real return investors Van Zyl, Barry 25 February 2020 (has links) This research aims to establish the optimal asset allocations for targeting specific real returns over short, medium and long-term investment horizons. The joint returns are modelled with data-centric methods that are empirical and non-parametric in nature, and are able to capture the dependencies of returns over time. The asset classes that are considered are South African (SA) equities, SA bonds, SA cash, SA property, global equities, global bonds, global cash, and global property. The returns of each asset class are modelled, each class with its own empirical distribution based on monthly returns from 1972 to 2017. The monthly returns are grouped in a block of rolling periods of varying block lengths in order to attempt to capture dependencies across time. These blocks of data are resampled in order to simulate the distributions of returns of portfolios with their own unique empirical distribution. The optimal portfolios are derived using a genetic algorithm, showcasing how these extremely versatile optimisation tools can be used in combination with resampling methods to find the optimal portfolio for virtually any criterion. A comparison is also made to the traditional mean-variance optimal portfolios, yielding an estimate of the bias in mean-variance optimisation’s (MVO) optimal weights. It is investigated how these optimal portfolios are influenced by the choice of risk criterion and investment horizon. The effect of the most important and consequential nuisance parameter in this research’s model, the block length, is discussed. The relationships established between the characteristics of optimal portfolios and investment horizon and risk criterion and the comparisons with classic MVO should be of interest to investors and investment professionals alike. Economic and market regimes are “identified” on the basis of economic and market data, consequently the resampling probabilities will be unequal. The optimal weights conditional on regimes are derived. Both static and changing regimes are considered. Lastly, an out-of-sample backtest of the performance of the optimal portfolios conditional on the regime across time at six month intervals is conducted from 1983 to 2017. It shows that out of the three block lengths tested for a single investment horizon of 36 months, a block length of 24 months yielded the best overall risk-adjusted performance, on average. Conditioning for regimes is shown to generally outperform the unconditional approach. The improvements are marginal and further research is recommended to investigate the performance for longer investment horizons and other values of the two tuning parameters, block length and tactical pressure. The higher level aim of this work is to present a broad sense of how data-driven nonparametric methods can be used in conjunction with metaheuristic procedures. The objective of combining these techniques is to find optimal portfolios under very general conditions and with very few assumptions regarding the underlying distributions. Statistical Sciences
75	Object Detection and Size Determination of Pineapple Fruit at a Juicing Factory Harris, Jessica 26 January 2022 (has links) The aim of this thesis is to develop a method for determining pineapple fruit size from images. This was achieved by first detecting pineapples in each image using Mask Region-based Convolutional Neural Network (Mask R-CNN) and then extracting the pixel diameter and length measurements, and the projected areas, from the detected mask outputs. Various Mask R-CNNs were considered for the task of pineapple detection. The best-performing detector made use of MS COCO starting weights, a ResNet50 CNN backbone, and horizontal flipping data augmentation during the training process. This model (Model 4: COCO Fliplr Res50) achieved an average precision of 91.4% on the validation set and an average precision of 90.1% on the test set, and was used to predict masks for an unseen dataset containing images of pre-measured pineapples. The distributions of measurements extracted from the detected masks were compared to those of the manual measurements using two-sample Z-tests and Kolmogorov–Smirnov (KS) tests. There was sufficient similarity between the distributions, and it was therefore established that the reported method is appropriate for pineapple size determination in this context. All the data and code is available in a GitHub repository for reproducible research. Statistical Sciences
76	Detection and Isolation of Prey Capture Events in Animal-Borne Images Chirwa, Temweka S 26 January 2022 (has links) Understanding the foraging habits and prey availability for a species is crucial. Prey availability is crucial to a species' survival and sustainability of the food pyramid. Identifying the type of prey consumed also allows ecologists to determine the energy received, while the duration and extent of foraging bouts provide information about the energy expended. With recent advancements in technology, data collection has become more accessible, and animal-borne video cameras are an increasingly popular mechanism for collecting information about foraging and other behaviour. Video recorders collect large volumes of data but create a bottleneck as data processing is still predominantly done manually. This process is time-consuming and costly, even with the assistance of crowdsourced tasks. Advancements in deep learning, and its applications to computer vision, provide opportunities to apply these tools to ecological problems, such as the processing of data from animal-borne video recorders. Speeding up the annotation process allows more time to be spent focused on the ecological research questions. This dissertation aims to develop detection and isolation models that will assist in the processing of visual data, namely images from animal-borne videos. The first model used for detection will perform an image classification determining whether prey is present or not. Images found to have prey present will then be presented to the second model for isolation that identifies exactly where within the image the prey is and labels the type of prey. The models were trained on video data of little penguins (Eudyptula minor ), whose main prey in this investigation are small fish, predominantly anchovies, and jellyfish. The image classification model based on the ResNet architecture achieved 85% accuracy with precision and recall values of 0.85 and 0.85 respectively on its test set. The object detection model based on the You Only Look Once (YOLO) framework achieved a mean average precision of 60% on its test set. However, the models did not perform well enough on unseen full length videos to be used without human supervision or to serve as alternatives to manual labelling. Rather, the models can be used to guide researchers to areas that may contain prey events. Statistical Sciences
77	Monitoring and mapping the critically endangered Clanwilliam cedar using aerial imagery and deep learning Hadebe, Blessings 26 January 2022 (has links) The critically endangered Clanwilliam cedar, Widdringtonia wallichii, is an iconic tree species endemic to the Cederberg mountains in the Fynbos Biome. Consistent declines in its populations have been noted across its range primarily due to the impact of fire and climate change. Mapping the occurrences of this species over its range is key to the monitoring of surviving individuals and is important for the management of biodiversity in the region. Recent efforts have focused on the use of freely available Google EarthTM imagery to manually map the species across its global native distribution. This study advances this work by proposing an approach for automating the process of tree detection using deep-learning. The approach involves using sets of high-resolution red, green, blue (RGB) imagery to train artificial neural networks for the task of tree-crown detection. Additional models are trained on colour-infrared imagery, since live vegetation has a red tone on the near-infrared (NIR) spectrum. Preliminary results show that using an intersection-over-union threshold of 0.5 yields an average tree-crown recall of 0.67 with a precision of 0.53, and that the addition of the NIR spectral band does not result in improved performance. The viability of using this approach to regularly update maps of the Clanwilliam Cedar and monitor its population trends in the Cederberg is assessed. Statistical Sciences
78	Modelling gametocytes in the presence of interval-censoring Takawira, Michaela Faro C 29 June 2022 (has links) Malaria is a parasitic disease that has afflicted many over the years, with Plasmodium falciparum malaria accounting for many deaths. Gametocytes are the sexual form of the parasite responsible for transmitting and spreading malaria from the human host to the mosquito vector. Most studies are designed to measure asexual parasites and hence the measurement intervals are not optimal for measuring gametocytes. The data analysed was obtained from a series of clinical trials conducted between 2002 and 2004 in Mpumalanga, South Africa and Mozambique. As part of the South-East African Combination Antimalarial Therapy (SEACAT) evaluation of the phased introduction of combination antimalarial drug under the Lubombo Spatial Development Initiative. Patients were observed on days 0, 3, 7, 14, 21, 28 and 42, where blood samples were collected and were analysed, providing gametocyte densities and other information. Due to the study design, observing gametocyte profiles was complex due to censoring. Censoring occurs when an event has not been observed, and hence the event time is only partially known. Censoring occurs for various reasons, which include the following. In some instances, a patient may enter a study with gametocytes circulating in their system before enrolling into the study. The time in which gametocytes have emerged is therefore unknown. Under such circumstances, this type of censoring is known as left censoring. Another example of censoring occurs when a patient is lost-to-follow-up or leaves the study before the end of the study period due to treatment failure leaving the gametocyte profile of the patient incomplete, resulting in right censoring. Lastly, besides left and right censoring, the gametocyte data has been affected by interval-censoring. Patients are observed and monitored on specific days; thus, the actual moment of observation of gametocyte emergence or clearance is estimated to have occurred between two observation days. The gametocyte data is thus also characterized by interval-censoring. Given these interesting characteristics of the gametocyte data, this research aims to directly model gametocytes while taking into account censoring.. Researchers often opt to assume the times to events of interest are at the moment of observation (right endpoint of an interval) or midpoint of the interval in which the event is assumed to have occurred. In this research, we also investigate and discuss the impact of ignoring the interval-censored mechanism present in the data and how parameter estimates based on these ad hoc approaches might differ from interval-censored results. Following the above discussion, the research will then apply interval-censored techniques to the data. Several survival analysis models were applied to the gametocyte data. These models included the Cox Proportional Hazards (PH) model, parametric PH model and accelerated failure time models, which were used to illustrate how results may differ based on whether intervalcensoring was taken into account or not. From the analysis, it was observed that the midpoint imputation was a better proxy for interval-censoring compared to the right-imputation method and that ignoring interval censoring had more of an impact on events that occurred during the wider intervals towards the latter part of the observation period. From the clinical perspective, it was found that younger patients who had high levels of baseline asexual parasitemia, quintuple mutations and used sulfadoxine-pyrimethamine were estimated to experience the most prolonged duration of gametocytemia. It was also found that age, treatment and baseline asexual parasitemia influenced whether a patient would develop gametocytes. Statistical Sciences
79	Market state discovery Singo, Unarine 21 April 2023 (has links) (PDF) We explore the concept of financial market state discovery by assessing the robustness of two unsupervised machine learning algorithms: Inverse Covariance Clustering (ICC) and Agglomerative Super Paramagnetic Clustering (ASPC). The assessment is carried out by: simulating market datasets varying in complexity; implementing ICC and ASPC to estimate the underlying states (using only simulated log-returns as inputs); and measuring the algorithms' ability to recover the underlying states, using the Adjusted Rand Index (ARI) as a performance metric. Experiments revealed that ASPC is a more robust and better performing algorithm than ICC. ICC is able to produce competitive results in 2-state markets; however, ICC's primary disadvantage is its inability to maintain strong performance in 3, 4 and 5-state markets. For example, ASPC produced ARI numbers that were up to 800% superior to ICC in 5-state markets. Furthermore, ASPC does not rely on the art of selecting good hyper-parameters such as, the number of states a priori. ICC's utility as a market state discovery algorithm is limited. statistical sciences
80	Automated detection and classification of red roman in unconstrained underwater environments using Mask R-CNN Conrady, Christopher 09 February 2022 (has links) The availability of relatively cheap, high-resolution digital cameras has led to an exponential increase in the capture of natural environments and their inhabitants. Videobased surveys are particularly useful in the underwater domain where observation by humans can be expensive, dangerous, inaccessible, or destructive to the natural environment. Moreover, video-based surveys offer an unedited record of biodiversity at a given point in time – one that is not reliant on human recall or susceptible to observer bias. In addition, secondary data that is useful in scientific study (date, time, location, etc.) are by default stored in almost all digital formats as metadata. When analysed effectively, this growing body of digital data offers the opportunity for robust and independently reproducible scientific study of marine biodiversity (and how this might change over time, for example). However, the manual review of image and video data by humans is slow, expensive, and not scalable. A large majority of marine data has never gone through analysis by human experts. This necessitates computer-based (or automated) methods of analysis that can be deployed at a fraction of the time and cost, at a comparable accuracy. Mask R-CNN, a deep learning object recognition framework, has outperformed all previous state-of-the-art results on competitive benchmarking tasks. Despite this success, Mask R-CNN and other state-of-the-art object recognition techniques have not been widely applied in the underwater domain, and not at all within the context of South Africa. To address this gap in the literature, this thesis contributes (i) a novel image dataset of red roman (Chrysoblephus laticeps), a fish species endemic to Southern Africa, and (ii) a Mask R-CNN framework for the automated localisation, classification, counting, and tracking of red roman in unconstrained underwater environments. The model, trained on an 80:10:10 split, accurately detected and classified red roman on the training dataset (mAP50 = 80.29%), validation dataset (mAP50 = 80.35%), as well as on previously unseen footage (test dataset) (mAP50 = 81.45%). The fact that the model performs equally well on unseen footage suggests that it is capable of generalising to new streams of data not used in this research – this is critical for the utility of any statistical model outside of “laboratory conditions”. This research serves as a proof-of-concept that machine learning based methods of video analysis of marine data can replace or at least supplement human analysis. Statistical Sciences

Search results