• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 57
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 84
  • 84
  • 37
  • 35
  • 20
  • 20
  • 18
  • 18
  • 18
  • 16
  • 16
  • 14
  • 14
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Genomic Data Augmentation with Variational Autoencoder

Thyrum, Emily 12 1900 (has links)
In order to treat cancer effectively, medical practitioners must predict pathological stages accurately, and machine learning methods can be employed to make such predictions. However, biomedical datasets, including genomic datasets, often have disproportionately more samples from people of European ancestry than people of other ethnic or racial groups, which can cause machine learning methods to perform better on the European samples than on the people of the under-represented groups. Data augmentation can be employed as a potential solution in order to artificially increase the number of samples from people of under-represented racial groups, and can in turn improve pathological stage predictions for future patients from such under-represented groups. Genomic data augmentation has been explored previously, for example using a Generative Adversarial Network, but to the best of our knowledge, the use of the variational autoencoder for the purpose of genomic data augmentation remains largely unexplored. Here we utilize a geometry-based variational autoencoder that models the latent space as a Riemannian manifold so that samples can be generated without the use of a prior distribution to show that the variational autoencoder can indeed be used to reliably augment genomic data. Using TCGA prostate cancer genotype data, we show that our VAE-generated data can improve pathological stage predictions on a test set of European samples. Because we only had European samples that were labeled in terms of pathological stage, we were not able to validate the African generated samples in this way, but we still attempt to show how such samples may be realistic. / Computer and Information Science
2

Outcome-Driven Clustering of Microarray Data

Hsu, Jessie 17 September 2012 (has links)
The rapid technological development of high-throughput genomics has given rise to complex high-dimensional microarray datasets. One strategy for reducing the dimensionality of microarray experiments is to carry out a cluster analysis to find groups of genes with similar expression patterns. Though cluster analysis has been studied extensively, the clinical context in which the analysis is performed is usually considered separately if at all. However, allowing clinical outcomes to inform the clustering of microarray data has the potential to identify gene clusters that are more useful for describing the clinical course of disease. The aim of this dissertation is to utilize outcome information to drive the clustering of gene expression data. In Chapter 1, we propose a joint clustering model that assumes a relationship between gene clusters and a continuous patient outcome. Gene expression is modeled using cluster specific random effects such that genes in the same cluster are correlated. A linear combination of these random effects is then used to describe the continuous clinical outcome. We implement a Markov chain Monte Carlo algorithm to iteratively sample the unknown parameters and determine the cluster pattern. Chapter 2 extends this model to binary and failure time outcomes. Our strategy is to augment the data with a latent continuous representation of the outcome and specify that the risk of the event depends on the latent variable. Once the latent variable is sampled, we relate it to gene expression via cluster specific random effects and apply the methods developed in Chapter 1. The setting of clustering longitudinal microarrays using binary and survival outcomes is considered in Chapter 3. We propose a model that incorporates a random intercept and slope to describe the gene expression time trajectory. As before, a continuous latent variable that is linearly related to the random effects is introduced into the model and a Markov chain Monte Carlo algorithm is used for sampling. These methods are applied to microarray data from trauma patients in the Inflammation and Host Response to Injury research project. The resulting partitions are visualized using heat maps that depict the frequency with which genes cluster together.
3

Heavy-Tailed Innovations in the R Package stochvol

Kastner, Gregor January 2015 (has links) (PDF)
We document how sampling from a conditional Student's t distribution is implemented in stochvol. Moreover, a simple example using EUR/CHF exchange rates illustrates how to use the augmented sampler. We conclude with results and implications. (author's abstract)
4

Design Space Exploration of MobileNet for Suitable Hardware Deployment

DEBJYOTI SINHA (8764737) 28 April 2020 (has links)
<p> Designing self-regulating machines that can see and comprehend various real world objects around it are the main purpose of the AI domain. Recently, there has been marked advancements in the field of deep learning to create state-of-the-art DNNs for various CV applications. It is challenging to deploy these DNNs into resource-constrained micro-controller units as often they are quite memory intensive. Design Space Exploration is a technique which makes CNN/DNN memory efficient and more flexible to be deployed into resource-constrained hardware. MobileNet is small DNN architecture which was designed for embedded and mobile vision, but still researchers faced many challenges in deploying this model into resource limited real-time processors.</p><p> This thesis, proposes three new DNN architectures, which are developed using the Design Space Exploration technique. The state-of-the art MobileNet baseline architecture is used as foundation to propose these DNN architectures in this study. They are enhanced versions of the baseline MobileNet architecture. DSE techniques like data augmentation, architecture tuning, and architecture modification have been done to improve the baseline architecture. First, the Thin MobileNet architecture is proposed which uses more intricate block modules as compared to the baseline MobileNet. It is a compact, efficient and flexible architecture with good model accuracy. To get a more compact models, the KilobyteNet and the Ultra-thin MobileNet DNN architecture is proposed. Interesting techniques like channel depth alteration and hyperparameter tuning are introduced along-with some of the techniques used for designing the Thin MobileNet. All the models are trained and validated from scratch on the CIFAR-10 dataset. The experimental results (training and testing) can be visualized using the live accuracy and logloss graphs provided by the Liveloss package. The Ultra-thin MobileNet model is more balanced in terms of the model accuracy and model size out of the three and hence it is deployed into the NXP i.MX RT1060 embedded hardware unit for image classification application.</p>
5

COLOR HALFTONING AND ACOUSTIC ANOMALY DETECTION FOR PRINTING SYSTEMS

Chin-ning Chen (9128687) 12 October 2021 (has links)
<p>In the first chapter, we illustrate a big picture of the printing systems and the concentration of this dissertation. </p><p><br></p><p>In the second chapter, we present a tone-dependent fast error diffusion algorithm for color images, in which the quantizer is based on a simulated linearized printer space and the filter weight function depends on the ratio of the luminance of the current pixel to the maximum luminance value. The pixels are processed according to a serpentine scan instead of the classic raster scan. We compare the results of our algorithm to those achieved using</p> <p>the fixed Floyd-Steinberg weights and processing the image according to a raster scan ordering. In the third chapter, we first design a defect generator to generate the synthetic abnormal</p> <p>printer sounds, and then develop or explore three features for sound-based anomaly detection. In the fourth chapter, we explore six classifiers as our anomaly detection models, and explore or develop six augmentation methods to see whether or not an augmented dataset can improve the model performance. In the fifth chapter, we illustrate the data arrangement and the evaluation methods. Finally, we show the evaluation results based on</p> <p>different inputs, different features, and different classifiers.</p> <p><br></p><p>In the last chapter, we summarize the contributions of this dissertation.</p>
6

Design Space Exploration of MobileNet for Suitable Hardware Deployment

Sinha, Debjyoti 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Designing self-regulating machines that can see and comprehend various real world objects around it are the main purpose of the AI domain. Recently, there has been marked advancements in the field of deep learning to create state-of-the-art DNNs for various CV applications. It is challenging to deploy these DNNs into resource-constrained micro-controller units as often they are quite memory intensive. Design Space Exploration is a technique which makes CNN/DNN memory efficient and more flexible to be deployed into resource-constrained hardware. MobileNet is small DNN architecture which was designed for embedded and mobile vision, but still researchers faced many challenges in deploying this model into resource limited real-time processors. This thesis, proposes three new DNN architectures, which are developed using the Design Space Exploration technique. The state-of-the art MobileNet baseline architecture is used as foundation to propose these DNN architectures in this study. They are enhanced versions of the baseline MobileNet architecture. DSE techniques like data augmentation, architecture tuning, and architecture modification have been done to improve the baseline architecture. First, the Thin MobileNet architecture is proposed which uses more intricate block modules as compared to the baseline MobileNet. It is a compact, efficient and flexible architecture with good model accuracy. To get a more compact models, the KilobyteNet and the Ultra-thin MobileNet DNN architecture is proposed. Interesting techniques like channel depth alteration and hyperparameter tuning are introduced along-with some of the techniques used for designing the Thin MobileNet. All the models are trained and validated from scratch on the CIFAR-10 dataset. The experimental results (training and testing) can be visualized using the live accuracy and logloss graphs provided by the Liveloss package. The Ultra-thin MobileNet model is more balanced in terms of the model accuracy and model size out of the three and hence it is deployed into the NXP i.MX RT1060 embedded hardware unit for image classification application.
7

Data Augmentation GUI Tool for Machine Learning Models

Sharma, Sweta 30 October 2023 (has links)
The industrial production of semiconductor assemblies is subject to high requirements. As a result, several tests are needed in terms of component quality. In the long run, manual quality assurance (QA) is often connected with higher expenditures. Using a technique based on machine learning, some of these tests may be carried out automatically. Deep neural networks (NN) have shown to be very effective in a diverse range of computer vision applications. Especially convolutional neural networks (CNN), which belong to a subset of NN, are an effective tool for image classification. Deep NNs have the disadvantage of requiring a significant quantity of training data to reach excellent performance. When the dataset is too small a phenomenon known as overfitting can occur. Massive amounts of data cannot be supplied in certain contexts, such as the production of semiconductors. This is especially true given the relatively low number of rejected components in this field. In order to prevent overfitting, a variety of image augmentation methods may be used to the process of artificially creating training images. However, many of those methods cannot be used in certain fields due to their inapplicability. For this thesis, Infineon Technologies AG provided the images of a semiconductor component generated by an ultrasonic microscope. The images can be categorized as having a sufficient number of good and a minority of rejected components, with good components being defined as components that have been deemed to have passed quality control and rejected components being components that contain a defect and did not pass quality control. The accomplishment of the project, the efficacy with which it is carried out, and its level of quality may be dependent on a number of factors; however, selecting the appropriate tools is one of the most important of these factors because it enables significant time and resource savings while also producing the best results. We demonstrate a data augmentation graphical user interface (GUI) tool that has been widely used in the domain of image processing. Using this method, the dataset size has been increased while maintaining the accuracy-time trade-off and optimizing the robustness of deep learning models. The purpose of this work is to develop a user-friendly tool that incorporates traditional, advanced, and smart data augmentation, image processing, and machine learning (ML) approaches. More specifically, the technique mainly uses are zooming, rotation, flipping, cropping, GAN, fusion, histogram matching, autoencoder, image restoration, compression etc. This focuses on implementing and designing a MATLAB GUI for data augmentation and ML models. The thesis was carried out for the Infineon Technologies AG in order to address a challenge that all semiconductor industries experience. The key objective is not only to create an easy- to-use GUI, but also to ensure that its users do not need advanced technical experiences to operate it. This GUI may run on its own as a standalone application. Which may be implemented everywhere for the purposes of data augmentation and classification. The objective is to streamline the working process and make it easy to complete the Quality assurance job even for those who are not familiar with data augmentation, machine learning, or MATLAB. In addition, research will investigate the benefits of data augmentation and image processing, as well as the possibility that these factors might contribute to an improvement in the accuracy of AI models.
8

A statistical framework for estimating output-specific efficiencies

Gstach, Dieter January 2003 (has links) (PDF)
This paper presents a statistical framework for estimating output-specific efficiencies for the 2-output case based upon a DEA frontier estimate. The key to the approach is the concept of target output-mix. Being usually unobserved, target output-mixes of firms are modelled as missing data. Using this concept the relevant data generating process can be formulated. The resulting likelihood function is analytically intractable, so a data augmented Bayesian approach is proposed for estimation purposes. This technique is adapted to the present purpose. Some implementation issues are discussed leading to an empirical Bayes setup with data informed priors. A prove of scale invariance is provided. (author's abstract) / Series: Department of Economics Working Paper Series
9

Data Augmentation and Dynamic Linear Models

Frühwirth-Schnatter, Sylvia January 1992 (has links) (PDF)
We define a subclass of dynamic linear models with unknown hyperparameters called d-inverse-gamma models. We then approximate the marginal p.d.f.s of the hyperparameter and the state vector by the data augmentation algorithm of Tanner/Wong. We prove that the regularity conditions for convergence hold. A sampling based scheme for practical implementation is discussed. Finally, we illustrate how to obtain an iterative importance sampling estimate of the model likelihood. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
10

Statistical methods for species richness estimation using count data from multiple sampling units

Argyle, Angus Gordon 23 April 2012 (has links)
The planet is experiencing a dramatic loss of species. The majority of species are unknown to science, and it is usually infeasible to conduct a census of a region to acquire a complete inventory of all life forms. Therefore, it is important to estimate and conduct statistical inference on the total number of species in a region based on samples obtained from field observations. Such estimates may suggest the number of species new to science and at potential risk of extinction. In this thesis, we develop novel methodology to conduct statistical inference, based on abundance-based data collected from multiple sampling locations, on the number of species within a taxonomic group residing in a region. The primary contribution of this work is the formulation of novel statistical methodology for analysis in this setting, where abundances of species are recorded at multiple sampling units across a region. This particular area has received relatively little attention in the literature. In the first chapter, the problem of estimating the number of species is formulated in a broad context, one that occurs in several seemingly unrelated fields of study. Estimators are commonly developed from statistical sampling models. Depending on the organisms or objects under study, different sampling techniques are used, and consequently, a variety of statistical models have been developed for this problem. A review of existing estimation methods, categorized by the associated sampling model, is presented in the second chapter. The third chapter develops a new negative binomial mixture model. The negative binomial model is employed to account for the common tendency of individuals of a particular species to occur in clusters. An exponential mixing distribution permits inference on the number of species that exist in the region, but were in fact absent from the sampling units. Adopting a classical approach for statistical inference, we develop the maximum likelihood estimator, and a corresponding profile-log-likelihood interval estimate of species richness. In addition, a Gaussian-based confidence interval based on large-sample theory is presented. The fourth chapter further extends the hierarchical model developed in Chapter 3 into a Bayesian framework. The motivation for the Bayesian paradigm is explained, and a hierarchical model based on random effects and discrete latent variables is presented. Computing the posterior distribution in this case is not straight-forward. A data augmentation technique that indirectly places priors on species richness is employed to compute the model using a Metropolis-Hastings algorithm. The fifth chapter examines the performance of our new methodology. Simulation studies are used to examine the mean-squared error of our proposed estimators. Comparisons to several commonly-used non-parametric estimators are made. Several conclusions emerge, and settings where our approaches can yield superior performance are clarified. In the sixth chapter, we present a case study. The methodology is applied to a real data set of oribatid mites (a taxonomic order of micro-arthropods) collected from multiple sites in a tropical rainforest in Panama. We adjust our statistical sampling models to account for the varying masses of material sampled from the sites. The resulting estimates of species richness for the oribatid mites are useful, and contribute to a wider investigation, currently underway, examining the species richness of all arthropods in the rainforest. Our approaches are the only existing methods that can make full use of the abundance-based data from multiple sampling units located in a single region. The seventh and final chapter concludes the thesis with a discussion of key considerations related to implementation and modeling assumptions, and describes potential avenues for further investigation. / Graduate

Page generated in 0.0485 seconds