Global ETD Search

231	Data Mining Algorithms for Decentralized Fault Detection and Diagnostic in Industrial Systems Grbovic, Mihajlo January 2012 (has links) Timely Fault Detection and Diagnosis in complex manufacturing systems is critical to ensure safe and effective operation of plant equipment. Process fault is defined as a deviation from normal process behavior, defined within the limits of safe production. The quantifiable objectives of Fault Detection include achieving low detection delay time, low false positive rate, and high detection rate. Once a fault has been detected pinpointing the type of fault is needed for purposes of fault mitigation and returning to normal process operation. This is known as Fault Diagnosis. Data-driven Fault Detection and Diagnosis methods emerged as an attractive alternative to traditional mathematical model-based methods, especially for complex systems due to difficulty in describing the underlying process. A distinct feature of data-driven methods is that no a priori information about the process is necessary. Instead, it is assumed that historical data, containing process features measured in regular time intervals (e.g., power plant sensor measurements), are available for development of fault detection/diagnosis model through generalization of data. The goal of my research was to address the shortcomings of the existing data-driven methods and contribute to solving open problems, such as: 1) decentralized fault detection and diagnosis; 2) fault detection in the cold start setting; 3) optimizing the detection delay and dealing with noisy data annotations. 4) developing models that can adapt to concept changes in power plant dynamics. For small-scale sensor networks, it is reasonable to assume that all measurements are available at a central location (sink) where fault predictions are made. This is known as a centralized fault detection approach. For large-scale networks, decentralized approach is often used, where network is decomposed into potentially overlapping blocks and each block provides local decisions that are fused at the sink. The appealing properties of the decentralized approach include fault tolerance, scalability, and reusability. When one or more blocks go offline due to maintenance of their sensors, the predictions can still be made using the remaining blocks. In addition, when the physical facility is reconfigured, either by changing its components or sensors, it can be easier to modify part of the decentralized system impacted by the changes than to overhaul the whole centralized system. The scalability comes from reduced costs of system setup, update, communication, and decision making. Main challenges in decentralized monitoring include process decomposition and decision fusion. We proposed a decentralized model where the sensors are partitioned into small, potentially overlapping, blocks based on the Sparse Principal Component Analysis (PCA) algorithm, which preserves strong correlations among sensors, followed by training local models at each block, and fusion of decisions based on the proposed Maximum Entropy algorithm. Moreover, we introduced a novel framework for adding constraints to the Sparse PCA problem. The constraints limit the set of possible solutions by imposing additional goals to be reached trough optimization along with the existing Sparse PCA goals. The experimental results on benchmark fault detection data show that Sparse PCA can utilize prior knowledge, which is not directly available in data, in order to produce desirable network partitions, with a pre-defined limit on communication cost and/or robustness. / Computer and Information Science Computer Science Data Mining Decentralized Learning Fault Detection Fault Diagnosis Machine Learning Sparse Principal Component Analysis
232	Transverse Position Reconstruction in a Liquid Argon Time Projection Chamber using Principal Component Analysis and Multi-Dimensional Fitting Watson, Andrew William January 2017 (has links) One of the most enduring questions in modern physics is the dark matter problem. Measurements of galactic rotation curves taken in the middle of the twentieth century suggest that there are large spherical halos of unseen matter permeating and surrounding most galaxies, stretching far beyond their visible extents. Although some of this mass discrepancy can be attributed to sources like primordial black holes or Massive Astrophysical Compact Halo Objects (MACHOs), these theories can only explain a small percentage of this "missing matter". One approach which could account for the entirety of this missing mass is the theory of Weakly Interacting Massive Particles, or "WIMPs". As their name suggests, WIMPs interact only through the weak nuclear force and gravity and are quite massive (100 GeV/c2 to 1 TeV/c2). These particles have very small cross sections (≈ 10−39 cm2) with nucleons and therefore interact only very rarely with "normal" baryonic matter. To directly detect a dark matter particle, one needs to overcome this small cross-section barrier. In many experiments, this is achieved by utilizing detectors filled with liquid noble elements, which have excellent particle identification capabilities and are very low-background, allowing potential WIMP signals to be more easily distinguished from detector noise. These experiments also often apply uniform electric fields across their liquid volumes, turning the apparatus into Time Projection Chambers or "TPCs". TPCs can accurately determine the location of an interaction in the liquid volume (often simply called an "event") along the direction of the electric field. In DarkSide-50 ("DS-50" for short), the electric field is aligned antiparallel to the z-axis of the detector, and so the depth of an event can be determined to a considerable degree of accuracy by measuring the time between the first and second scintillation signals ("S1" and "S2"), which are generated at the interaction point itself and in a small gas pocket above the liquid region, respectively. One of the lingering challenges in this experiment, however, is the determination of an event’s position along the other two spatial dimensions, that is, its transverse or "xy" position. Some liquid noble element TPCs have achieved remarkably accurate event position reconstructions, typically using the relative amounts of S2 light collected by Photo-Multiplier Tubes ("PMTs") as the input data to their reconstruction algorithms. This approach has been particularly challenging in DarkSide-50, partly due to unexpected asymmetries in the detector, and partly due to the design of the detector itself. A variety of xy-Reconstruction methods ("xy methods" for short) have come and gone in DS- 50, with only a few of them providing useful results. The xy method described in this dissertation is a two-step Principal Component Analysis / Multi-Dimensional Fit (PCAMDF) reconstruction. In a nutshell, this method develops a functional mapping from the 19-dimensional space of the signal received by the PMTs at the "top" (or the "anode" end) of the DarkSide-50 TPC to each of the transverse coordinates, x and y. PCAMDF is a low-level "machine learning" algorithm, and as such, needs to be "trained" with a sample of representative events; in this case, these are provided by the DarkSide geant4-based Monte Carlo, g4ds. In this work, a thorough description of the PCAMDF xy-Reconstruction method is provided along with an analysis of its performance on MC events and data. The method is applied to several classes of data events, including coincident decays, external gamma rays from calibration sources, and both atmospheric argon "AAr" and underground argon "UAr". Discrepancies between the MC and data are explored, and fiducial volume cuts are calculated. Finally, a novel method is proposed for finding the accuracy of the PCAMDF reconstruction on data by using the asymmetry of the S2 light collected on the anode and cathode PMT arrays as a function of xy. / Physics Physics Particle Physics Astrophysics Argon Dark Matter Position Reconstruction Principal Component Analysis Time Projection Chamber Wimp
233	Multivariate Analysis Applied to Discrete Part Manufacturing Wallace, Darryl 09 1900 (has links) <p>The overall focus of this thesis is the implementation of a process monitoring system in a real manufacturing environment that utilizes multivariate analysis techniques to assess the state of the process. The process in question was the medium-high volume manufacturing of discrete aluminum parts using relatively simple machining processes involving the use of two tools. This work can be broken down into three main sections.</p><p>The first section involved the modeling of temperatures and thermal expansion measurements for real-time thermal error compensation. Thermal expansion of the Z-axis was measured indirectly through measurement of the two quality parameters related to this axis with a custom gage that was designed for this part. A compensation strategy is proposed which is able to hold the variation of the parts to ±0.02mm, where the tolerance is ±0.05mm.</p><p>The second section involved the modeling of the process data from the parts that included vibration, current, and temperature signals from the machine. The modeling of the process data using Principal Component Analysis (PCA), while unsuccessful in detecting minor simulated process faults, was successful in detecting a miss-loaded part during regular production. Simple control charts using Hotelling's T^2 statistic and Squared Prediction Error are illustrated. The modeling of quality data from the process data of good parts using Projection to Latent Structures by Partial Least Squares (PLS) data did not provide very accurate fits to the data; however, all of the predictions are within the tolerance specifications.</p><p>The final section discusses the implementation of a process monitoring system in both manual and automatic production environments. A method for the integration and storage of process data with Mitutoyo software MCOSMOS and MeasurLink® is described. All of the codes to perform multivariate analysis and process monitoring were written using Matlab.</p> / Thesis / Master of Applied Science (MASc)
234	Statistical Methods for Data Integration and Disease Classification Islam, Mohammad 11 1900 (has links) Classifying individuals into binary disease categories can be challenging due to complex relationships across different exposures of interest. In this thesis, we investigate three different approaches for disease classification using multiple biomarkers. First, we consider combining information from literature reviews and INTERHEART data set to identify the threshold of ApoB, ApoA1 and the ratio of these two biomarkers to classify individuals at risk of developing myocardial infarction. We develop a Bayesian estimation procedure for this purpose that utilizes the conditional probability distribution of these biomarkers. This method is flexible compared to standard logistic regression approach and allows us to identify a precise threshold of these biomarkers. Second, we consider the problem of disease classification using two dependent biomarkers. An independently identified threshold for this purpose usually leads to a conflicting classification for some individuals. We develop and describe a method of determining the joint threshold of two dependent biomarkers for a disease classification, based on the joint probability distribution function constructed through copulas. This method will allow researchers uniquely classify individuals at risk of developing the disease. Third, we consider the problem of classifying an outcome using a gene and miRNA expression data sets. Linear principal component analysis (PCA) is a widely used approach to reduce the dimension of such data sets and subsequently use it for classification, but many authors suggest using kernel PCA for this purpose. Using real and simulated data sets, we compare these two approaches and assess the performance of components towards genetic data integration for an outcome classification. We conclude that reducing dimensions using linear PCA followed by a logistic regression model for classification seems to be acceptable for this purpose. We also observe that integrating information from multiple data sets using either of these approaches leads to a better performance of an outcome classification. / Thesis / Doctor of Philosophy (PhD) Data Integration Disease Classification Bayesian Approach Conditional Logistic Regression Copula Biomarker Principal Component Gamma Distribution
235	An Investigation of Unidimensional Testing Procedures under Latent Trait Theory using Principal Component Analysis McGill, Michael T. 11 December 2009 (has links) There are several generally accepted rules for detecting unidimensionality, but none are well tested. This simulation study investigated well-known methods, including but not limited to, the Kaiser (k>1) Criterion, Percentage of Measure Validity (greater than 50%, 40%, or 20%), Ratio of Eigenvalues, and Kelley method, and compares these methods to each other and a new method proposed by the author (McGill method) for assessing unidimensionality. After applying principal component analysis (PCA) to the residuals of a Latent Trait Test Theory (LTTT) model, this study was able to address three purposes: determining the Type I error rates associated with various criterion values, for assessing unidimensionality; determining the Type II error rates and statistical power associated with various rules of thumb when assessing dimensionality; and, finally, determining whether more suitable criterion values could be established for the methods of the study by accounting for various characteristics of the measurement context. For those methods based on criterion values, new modified values are proposed. For those methods without criterion values for dimensionality decisions, criterion values are modeled and presented. The methods compared in this study were investigated using PCA on residuals from the Rasch model. The sample size, test length, ability distribution variability, and item distribution variability were varied and the resulting Type I and Type II error rates of each method were examined. The results imply that certain conditions can cause improper diagnoses as to the dimensionality of instruments. Adjusted methods are suggested to induce a more stable condition relative to the Type I and Type II error rates. The nearly ubiquitous Kaiser method was found to be biased towards signaling multidimensionality whether it exists or not. The modified version of the Kaiser method and the McGill method, proposed by the author were shown to be among the best at detecting unidimensionality when it was present. In short, methods that take into account changes in variables such as sample size, test length, item variability, and person variability are better than methods that use a single, static criterionvalue in decision making with respect to dimensionality. / Ph. D. Unidimensionality Principal Component Analysis Dimensionality Measurement IRT Item Resonse THeory Rasch
236	Segmentation of the market for labeled ornamental plants by environmental preferences: A latent class analysis D'Alessio, Nicole Marie 09 July 2015 (has links) Labeling is a product differentiation mechanism which has increased in prevalence across many markets. This study investigated the potential for a labeling program applied in ornamental plant sales, given key ongoing issues affecting ornamental plant producers: irrigation water use and plant disease. Our research investigated how to better understand the market for plants certified as disease free and/or produced using water conservation techniques through segmenting the market by consumers' environmental preferences. Latent class analysis was conducted using choice modeling survey results and respondent scores on the New Environmental Paradigm scale. The results show that when accounting for environmental preferences, consumers can be grouped into two market segments. Relative to each other, these segments are considered: price sensitive and attribute sensitive. Our research also investigated market segments' preferences for multiple certifying authorities. The results strongly suggest that consumers of either segment do not have a preference for any particular certifying authority. / Master of Science Latent class analysis ornamental plants principal component analysis product labeling environmental certification disease management water conservation
237	Modified Kernel Principal Component Analysis and Autoencoder Approaches to Unsupervised Anomaly Detection Merrill, Nicholas Swede 01 June 2020 (has links) Unsupervised anomaly detection is the task of identifying examples that differ from the normal or expected pattern without the use of labeled training data. Our research addresses shortcomings in two existing anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE), and proposes novel solutions to improve both of their performances in the unsupervised settings. Anomaly detection has several useful applications, such as intrusion detection, fault monitoring, and vision processing. More specifically, anomaly detection can be used in autonomous driving to identify obscured signage or to monitor intersections. Kernel techniques are desirable because of their ability to model highly non-linear patterns, but they are limited in the unsupervised setting due to their sensitivity of parameter choices and the absence of a validation step. Additionally, conventionally KPCA suffers from a quadratic time and memory complexity in the construction of the gram matrix and a cubic time complexity in its eigendecomposition. The problem of tuning the Gaussian kernel parameter, $sigma$, is solved using the mini-batch stochastic gradient descent (SGD) optimization of a loss function that maximizes the dispersion of the kernel matrix entries. Secondly, the computational time is greatly reduced, while still maintaining high accuracy by using an ensemble of small, textit{skeleton} models and combining their scores. The performance of traditional machine learning approaches to anomaly detection plateaus as the volume and complexity of data increases. Deep anomaly detection (DAD) involves the applications of multilayer artificial neural networks to identify anomalous examples. AEs are fundamental to most DAD approaches. Conventional AEs rely on the assumption that a trained network will learn to reconstruct normal examples better than anomalous ones. In practice however, given sufficient capacity and training time, an AE will generalize to reconstruct even very rare examples. Three methods are introduced to more reliably train AEs for unsupervised anomaly detection: Cumulative Error Scoring (CES) leverages the entire history of training errors to minimize the importance of early stopping and Percentile Loss (PL) training aims to prevent anomalous examples from contributing to parameter updates. Lastly, early stopping via Knee detection aims to limit the risk of over training. Ultimately, the two new modified proposed methods of this research, Unsupervised Ensemble KPCA (UE-KPCA) and the modified training and scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets. / Master of Science / Anomaly detection is the task of identifying examples that differ from the normal or expected pattern. The challenge of unsupervised anomaly detection is distinguishing normal and anomalous data without the use of labeled examples to demonstrate their differences. This thesis addresses shortcomings in two anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE) and proposes new solutions to apply them in the unsupervised setting. Ultimately, the two modified methods, Unsupervised Ensemble KPCA (UE-KPCA) and the Modified Training and Scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets. Machine learning Deep learning (Machine learning) Anomaly Detection Autoencoder Kernel Principal Component Analysis
238	Effects of Manufacturing Deviations on Core Compressor Blade Performance De Losier, Clayton Ray 20 April 2009 (has links) There has been recent incentive for understanding the possible deleterious effects that manufacturing deviations can have on compressor blade performance. This is of particular importance in today's age, as compressor designs are pushing operating limits by employing fewer stages with higher loadings and are designed to operate at ever higher altitudes. Deviations in these advanced, as well as legacy designs, could negatively affect the performance and operation of a core compressor; thus, a numerical investigation to quantify manufacturing deviations and their effects is undertaken. Data from three radial sections of every compressor blade in a single row of a production compressor is used as the basis for this investigation. Deviations from the compressor blade design intent to the as-manufactured blades are quantified with a statistical method known as principle component analysis (PCA). MISES, an Euler solver coupled with integral boundary-layer calculations, is used to analyze the effects that the aforementioned deviations have on compressor blade performance when the inlet flow conditions produce a Mach number of approximately 0.7 and a Reynolds number of approximately 6.5e5. It was found that the majority of manufacturing deviations were within a range of plus or minus 4 percent of the design intent, and deviations at the leading edge had a critical effect on performance. Of particular interest is the fact that deviations at the leading edge not only degraded performance but significantly changed the boundary-layer behavior from that of the design case. / Master of Science manufacturing deviations MISES gas turbine compressor leading edge principal component analysis
239	A machine learning approach for ethnic classification: the British Pakistani face Khalid Jilani, Shelina, Ugail, Hassan, Bukar, Ali M., Logan, Andrew J., Munshi, Tasnim January 2017 (has links) No / Ethnicity is one of the most salient clues to face identity. Analysis of ethnicity-specific facial data is a challenging problem and predominantly carried out using computer-based algorithms. Current published literature focusses on the use of frontal face images. We addressed the challenge of binary (British Pakistani or other ethnicity) ethnicity classification using profile facial images. The proposed framework is based on the extraction of geometric features using 10 anthropometric facial landmarks, within a purpose-built, novel database of 135 multi-ethnic and multi-racial subjects and a total of 675 face images. Image dimensionality was reduced using Principle Component Analysis and Partial Least Square Regression. Classification was performed using Linear Support Vector Machine. The results of this framework are promising with 71.11% ethnic classification accuracy using a PCA algorithm + SVM as a classifier, and 76.03% using PLS algorithm + SVM as a classifier. Face Principal component analysis Support vector machines Feature extraction Classification algorithms Algorithm design and analysis Databases
240	Unsupervised Learning for Efficient Underwriting Dalla Torre, Elena January 2024 (has links) In the field of actuarial science, statistical methods have been extensively studied toestimate the risk of insurance. These methods are good at estimating the risk of typicalinsurance policies, as historical data is available. However, their performance can be pooron unique insurance policies, which require the manual assessment of an underwriter. Aclassification of insurance policies on a unique/typical scale would help insurance companiesallocate manual resources more efficiently and validate the goodness of fit of thepricing models on unique objects. The aim of this thesis is to use outlier detection methodsto identify unique non-life insurance policies. The many categorical nominal variablespresent in insurance policy data sets represent a challenge when applying outlier detectionmethods. Therefore, we also explore different ways to derive informative numericalrepresentations of categorical nominal variables. First, as a baseline, we use the principalcomponent analysis of mixed data to find a numerical representation of categorical nominalvariables and the principal component analysis to identify unique insurances. Then,we see whether better performance can be achieved using autoencoders which can capturecomplex non-linearities. In particular, we learn a numerical representation of categoricalnominal variables using the encoder layer of an autoencoder, and we use a different autoencoderto identify unique insurances. Since we are in an unsupervised setting, the twomethods are compared by performing a simulation study and using the NLS-KDD dataset. The analysis shows autoencoders are superior at identifying unique objects than principalcomponent analysis. We conclude that the ability of autoencoders to model complexnon-linearities between the variables allows for this class of methods to achieve superiorperformance. Datadriven Underwriting Outlier Detection Autoencoders Principal Component Analysis Representation Learning Probability Theory and Statistics Sannolikhetsteori och statistik

Search results