Spelling suggestions: "subject:"data augmentation."" "subject:"data daugmentation.""
11 |
Performance evaluation of deep learning object detectors for weed detection and real time deployment in cotton fieldsRahman, Abdur 13 August 2024 (has links) (PDF)
Effective weed control is crucial, especially for herbicide-resistant species. Machine vision technology, through weed detection and localization, can facilitate precise, species-specific treatments. Despite the challenges posed by unstructured field conditions and weed variability, deep learning (DL) algorithms show promise. This study evaluated thirteen DL-based weed detection models, including YOLOv5, RetinaNet, EfficientDet, Fast RCNN, and Faster RCNN, using pre-trained object detectors. RetinaNet (R101-FPN) achieved the highest accuracy with a mean average precision (mAP@0.50) of 79.98%, though it had longer inference times. YOLOv5n, with the fastest inference (17 ms on Google Colab) and only 1.8 million parameters, achieved a comparable 76.58% mAP@0.50, making it suitable for real-time use in resource-limited devices. A prototype using YOLOv5 was tested on two datasets, showing good real-time accuracy on In-season data and comparable results on Cross-season data, despite some accuracy challenges due to dataset distribution shifts.
|
12 |
Data Augmentation GUI Tool for Machine Learning ModelsSharma, Sweta 30 October 2023 (has links)
The industrial production of semiconductor assemblies is subject to high requirements. As a result, several tests are needed in terms of component quality. In the long run, manual quality assurance (QA) is often connected with higher expenditures. Using a technique based on machine learning, some of these tests may be carried out automatically. Deep neural networks (NN) have shown to be very effective in a diverse range of computer vision applications. Especially convolutional neural networks (CNN), which belong to a subset of NN, are an effective tool for image classification. Deep NNs have the disadvantage of requiring a significant quantity of training data to reach excellent performance. When the dataset is too small a phenomenon known as overfitting can occur. Massive amounts of data cannot be supplied in certain contexts, such as the production of semiconductors. This is especially true given the relatively low number of rejected components in this field. In order to prevent overfitting, a variety of image augmentation methods may be used to the process of artificially creating training images. However, many of those methods cannot be used in certain fields due to their inapplicability. For this thesis, Infineon Technologies AG provided the images of a semiconductor component generated by an ultrasonic microscope. The images can be categorized as having a sufficient number of good and a minority of rejected components, with good components being defined as components that have been deemed to have passed quality control and rejected components being components that contain a defect and did not pass quality control.
The accomplishment of the project, the efficacy with which it is carried out, and its level of quality may be dependent on a number of factors; however, selecting the appropriate tools is one of the most important of these factors because it enables significant time and resource savings while also producing the best results. We demonstrate a data augmentation graphical user interface (GUI) tool that has been widely used in the domain of image processing. Using this method, the dataset size has been increased while maintaining the accuracy-time trade-off and optimizing the robustness of deep learning models. The purpose of this work is to develop a user-friendly tool that incorporates traditional, advanced, and smart data augmentation, image processing,
and machine learning (ML) approaches. More specifically, the technique mainly uses
are zooming, rotation, flipping, cropping, GAN, fusion, histogram matching,
autoencoder, image restoration, compression etc. This focuses on implementing and
designing a MATLAB GUI for data augmentation and ML models. The thesis was
carried out for the Infineon Technologies AG in order to address a challenge that all
semiconductor industries experience. The key objective is not only to create an easy-
to-use GUI, but also to ensure that its users do not need advanced technical
experiences to operate it. This GUI may run on its own as a standalone application.
Which may be implemented everywhere for the purposes of data augmentation and
classification. The objective is to streamline the working process and make it easy to
complete the Quality assurance job even for those who are not familiar with data
augmentation, machine learning, or MATLAB. In addition, research will investigate the
benefits of data augmentation and image processing, as well as the possibility that
these factors might contribute to an improvement in the accuracy of AI models.
|
13 |
A statistical framework for estimating output-specific efficienciesGstach, Dieter January 2003 (has links) (PDF)
This paper presents a statistical framework for estimating output-specific efficiencies for the 2-output case based upon a DEA frontier estimate. The key to the approach is the concept of target output-mix. Being usually unobserved, target output-mixes of firms are modelled as missing data. Using this concept the relevant data generating process can be formulated. The resulting likelihood function is analytically intractable, so a data augmented Bayesian approach is proposed for estimation purposes. This technique is adapted to the present purpose. Some implementation issues are discussed leading to an empirical Bayes setup with data informed priors. A prove of scale invariance is provided. (author's abstract) / Series: Department of Economics Working Paper Series
|
14 |
Data Augmentation and Dynamic Linear ModelsFrühwirth-Schnatter, Sylvia January 1992 (has links) (PDF)
We define a subclass of dynamic linear models with unknown hyperparameters called d-inverse-gamma models. We then approximate the marginal p.d.f.s of the hyperparameter and the state vector by the data augmentation algorithm of Tanner/Wong. We prove that the regularity conditions for convergence hold. A sampling based scheme for practical implementation is discussed. Finally, we illustrate how to obtain an iterative importance sampling estimate of the model likelihood. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
|
15 |
Statistical methods for species richness estimation using count data from multiple sampling unitsArgyle, Angus Gordon 23 April 2012 (has links)
The planet is experiencing a dramatic loss of species. The majority of species are unknown to science, and it is usually infeasible to conduct a census of a region to acquire a complete inventory of all life forms. Therefore, it is important to estimate and conduct statistical inference on the total number of species in a region based on samples obtained from field observations. Such estimates may suggest the number of species new to science and at potential risk of extinction.
In this thesis, we develop novel methodology to conduct statistical inference, based on abundance-based data collected from multiple sampling locations, on the number of species within a taxonomic group residing in a region. The primary contribution of this work is the formulation of novel statistical methodology for analysis in this setting, where abundances of species are recorded at multiple sampling units across a region. This particular area has received relatively little attention in the literature.
In the first chapter, the problem of estimating the number of species is formulated in a broad context, one that occurs in several seemingly unrelated fields of study. Estimators are commonly developed from statistical sampling models. Depending on the organisms or objects under study, different sampling techniques are used, and consequently, a variety of statistical models have been developed for this problem. A review of existing estimation methods, categorized by the associated sampling model, is presented in the second chapter.
The third chapter develops a new negative binomial mixture model. The negative binomial model is employed to account for the common tendency of individuals of a particular species to occur in clusters. An exponential mixing distribution permits inference on the number of species that exist in the region, but were in fact absent from the sampling units. Adopting a classical approach for statistical inference, we develop the maximum likelihood estimator, and a corresponding profile-log-likelihood interval estimate of species richness. In addition, a Gaussian-based confidence interval based on large-sample theory is presented.
The fourth chapter further extends the hierarchical model developed in Chapter 3 into a Bayesian framework. The motivation for the Bayesian paradigm is explained, and a hierarchical model based on random effects and discrete latent variables is presented. Computing the posterior distribution in this case is not straight-forward. A data augmentation technique that indirectly places priors on species richness is employed to compute the model using a Metropolis-Hastings algorithm.
The fifth chapter examines the performance of our new methodology. Simulation studies are used to examine the mean-squared error of our proposed estimators. Comparisons to several commonly-used non-parametric estimators are made. Several conclusions emerge, and settings where our approaches can yield superior performance are clarified.
In the sixth chapter, we present a case study. The methodology is applied to a real data set of oribatid mites (a taxonomic order of micro-arthropods) collected from multiple sites in a tropical rainforest in Panama. We adjust our statistical sampling models to account for the varying masses of material sampled from the sites. The resulting estimates of species richness for the oribatid mites are useful, and contribute to a wider investigation, currently underway, examining the species richness of all arthropods in the rainforest.
Our approaches are the only existing methods that can make full use of the abundance-based data from multiple sampling units located in a single region. The seventh and final chapter concludes the thesis with a discussion of key considerations related to implementation and modeling assumptions, and describes potential avenues for further investigation. / Graduate
|
16 |
Disease Mapping with log Gaussian Cox ProcessesLi, Ye 16 August 2013 (has links)
One of the main classes of spatial epidemiological studies is disease mapping, where the main aim is to describe the overall disease distribution on a map, for example, to highlight areas of elevated or lowered mortality or morbidity risk, or to identify important social or environmental risk factors adjusting for the spatial distribution of the disease. This thesis focused and proposed solutions to two most commonly seen obstacles in disease mapping applications, the changing census boundaries due to long study period and data aggregation for patients' confidentiality.
In disease mapping, when target diseases have low prevalence, the study usually covers a long time period to accumulate sufficient cases.
However, during this period, numerous irregular changes in the census regions on which population is reported may occur, which complicates inferences.
A new model was developed for the case when the exact location of the cases is available, consisting of a continuous random spatial surface and fixed effects for time and ages of individuals.
The process is modelled on a fine grid, approximating the underlying continuous risk surface with Gaussian Markov Random Field and Bayesian inference is performed using integrated nested Laplace approximations. The model was applied to clinical data on the location of residence at the time of diagnosis of new Lupus cases in Toronto, Canada, for the 40 years to 2007, with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are produced for Lupus and, for comparison, Psoriatic Arthritis data from the same clinic.
Simulation studies are also carried out to better understand the performance of the proposed model as well as to compare with existing methods.
When the exact locations of the cases are not known, inference is complicated by the uncertainty of case locations due to data aggregation on census regions for confidentiality.
Conventional modelling relies on the census boundaries that are unrelated to the biological process being modelled, and may result in stronger spatial dependence in less populated regions which dominate the map. A new model was developed consisting of a continuous random spatial surface with aggregated responses and fixed covariate effects on census region levels.
The continuous spatial surface was approximated by Markov random field, greatly reduces the computational complexity.
The process was modelled on a lattice of fine grid cells and Bayesian inference was performed using Markov Chain Monte Carlo with data augmentation.
Simulation studies were carried out to assess performance of the proposed model and to compare with the conventional Besag-York-Molli\'e model
as well as model assuming exact locations are known. Receiver operating characteristic curves and Mean Integrated Squared Errors were used as measures
of performance. For the application, surveillance data on the locations of residence at the time of diagnosis of syphilis cases in North Carolina, for the 9 years to 2007 are modelled with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are also produced, identifying Lumberton as a ``syphilis hotspot".
|
17 |
Disease Mapping with log Gaussian Cox ProcessesLi, Ye 16 August 2013 (has links)
One of the main classes of spatial epidemiological studies is disease mapping, where the main aim is to describe the overall disease distribution on a map, for example, to highlight areas of elevated or lowered mortality or morbidity risk, or to identify important social or environmental risk factors adjusting for the spatial distribution of the disease. This thesis focused and proposed solutions to two most commonly seen obstacles in disease mapping applications, the changing census boundaries due to long study period and data aggregation for patients' confidentiality.
In disease mapping, when target diseases have low prevalence, the study usually covers a long time period to accumulate sufficient cases.
However, during this period, numerous irregular changes in the census regions on which population is reported may occur, which complicates inferences.
A new model was developed for the case when the exact location of the cases is available, consisting of a continuous random spatial surface and fixed effects for time and ages of individuals.
The process is modelled on a fine grid, approximating the underlying continuous risk surface with Gaussian Markov Random Field and Bayesian inference is performed using integrated nested Laplace approximations. The model was applied to clinical data on the location of residence at the time of diagnosis of new Lupus cases in Toronto, Canada, for the 40 years to 2007, with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are produced for Lupus and, for comparison, Psoriatic Arthritis data from the same clinic.
Simulation studies are also carried out to better understand the performance of the proposed model as well as to compare with existing methods.
When the exact locations of the cases are not known, inference is complicated by the uncertainty of case locations due to data aggregation on census regions for confidentiality.
Conventional modelling relies on the census boundaries that are unrelated to the biological process being modelled, and may result in stronger spatial dependence in less populated regions which dominate the map. A new model was developed consisting of a continuous random spatial surface with aggregated responses and fixed covariate effects on census region levels.
The continuous spatial surface was approximated by Markov random field, greatly reduces the computational complexity.
The process was modelled on a lattice of fine grid cells and Bayesian inference was performed using Markov Chain Monte Carlo with data augmentation.
Simulation studies were carried out to assess performance of the proposed model and to compare with the conventional Besag-York-Molli\'e model
as well as model assuming exact locations are known. Receiver operating characteristic curves and Mean Integrated Squared Errors were used as measures
of performance. For the application, surveillance data on the locations of residence at the time of diagnosis of syphilis cases in North Carolina, for the 9 years to 2007 are modelled with the aim of finding areas of abnormally high risk. Predicted risk surfaces and posterior exceedance probabilities are also produced, identifying Lumberton as a ``syphilis hotspot".
|
18 |
Entwicklung eines Monte-Carlo-Verfahrens zum selbständigen Lernen von Gauß-MischverteilungenLauer, Martin. Unknown Date (has links) (PDF)
Universiẗat, Diss., 2004--Osnabrück.
|
19 |
Human pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNetHuman pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNetCalzavara, Ivan January 2020 (has links)
In recent years, deep learning, a critical technology in computer vision, has achieved remarkable milestones in many fields, such as image classification and object detection. In particular, it has also been introduced to address the problem of violence detection, which is a big challenge considering the complexity to establish an exact definition for the phenomenon of violence. Thanks to the ever increasing development of new technologies for surveillance, we have nowadays access to an enormous database of videos that can be analyzed to find any abnormal behavior. However, by dealing with such huge amount of data it is unrealistic to manually examine all of them. Deep learning techniques, instead, can automatically study, learn and perform classification operations. In the context of violence detection, with the extraction of visual harmful patterns, it is possible to design various descriptors to represent features that can identify them. In this research we tackle the task of generating new augmented datasets in order to try to simplify the identification step performed by a violence detection technique in the field of Deep Learning. The novelty of this work is to introduce the usage of DensePose model to enrich the images in a dataset by highlighting (i.e. by identifying and segmenting) all the human beings present in them. With this approach we gained knowledge of how this algorithm performs on videos with a violent context and how the violent detection network benefit from this procedure. Performances have been evaluated from the point of view of segmentation accuracy and efficiency of the violence detection network, as well from the computational point of view. Results shows how the context of the scene is the major indicator that brings the DensePose model to correct segment human beings and how the context of violence does not seem to be the most suitable field for the application of this model since the common overlap of bodies (distinctive aspect of violence) acts as disadvantage for the segmentation. For this reason, the violence detection network does not exploit its full potential. Finally, we understood how such augmented datasets can boost up the training speed by reducing the time needed for the weights-update phase, making this procedure a helpful adds-on for implementations in different contexts where the identification of human beings still plays the major role.
|
20 |
A Study on Generative Adversarial Networks Exacerbating Social Data BiasJanuary 2020 (has links)
abstract: Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
Page generated in 0.1095 seconds