141 |
Identification of Suspicious Semiconductor Devices Using Independent Component Analysis with Dimensionality ReductionBartholomäus, Jenny, Wunderlich, Sven, Sasvári, Zoltán 22 August 2019 (has links)
In the semiconductor industry the reliability of devices is of paramount importance. Therefore, after removing the defective ones, one wants to detect irregularities in measurement data because corresponding devices have a higher risk of failure early in the product lifetime. The paper presents a method to improve the detection of such suspicious devices where the screening is made on transformed measurement data. Thereby, e.g., dependencies between tests can be taken into account. Additionally, a new dimensionality reduction is performed within the transformation, so that the reduced and transformed data comprises only the informative content from the raw data. This simplifies the complexity of the subsequent screening steps. The new approach will be applied to semiconductor measurement data and it will be shown, by means of examples, how the screening can be improved.
|
142 |
High dimensional data clustering; A comparative study on gene expressions : Experiment on clustering algorithms on RNA-sequence from tumors with evaluation on internal validationHenriksson, William January 2019 (has links)
In cancer research, class discovery is the first process for investigating a new dataset for which hidden groups there are by similar attributes. However datasets from gene expressions, RNA microarray or RNA-sequence, are high-dimensional. Which makes it hard to perform clusteranalysis and to get clusters that are well separated. Well separated clusters are wanted because that tells that objects are most likely not placed in wrong clusters. This report investigate in an experiment whether using K-Means and hierarchical are suitable for clustering gene expressions in RNA-sequence data from various tumors. Dimensionality reduction methods are also applied to see whether that helps create well-separated clusters. The results tell that well separated clusters are only achieved by using PCA as dimensionality reduction and K-Means on correlation. The main contribution of this paper is determining that using K-Means or hierarchical clustering on the full natural dimensionality of RNA-sequence data returns unwanted silhouette average width, under 0,4.
|
143 |
Principal Component Modelling of Fuel Consumption ofSeagoing Vessels and Optimising Fuel Consumption as a Mixed-Integer ProblemIvan, Jean-Paul January 2020 (has links)
The fuel consumption of a seagoing vessel is, through a combination of Box-Cox transforms and principal component analysis, reduced to a univariatefunction of the primary principle component with mean model error −3.2%and error standard deviation 10.3%. In the process, a Latin-hypercube-inspired space partitioning sampling technique is developed and successfully used to produce a representative sampleused in determining the regression coefficients. Finally, a formal optimisation problem for minimising the fuel use is described. The problem is derived from a parametrised expression for the fuel consumption, and has only 3, or 2 if simplified, free variables at each timestep. Some information has been redacted in order to comply with NDA restrictions. Most redactions are either names (of vessels or otherwise), units, andin some cases (especially on figures) quantities. / <p>Presentation was performed remotely using Zoom.</p>
|
144 |
A General Model for Continuous Noninvasive Pulmonary Artery Pressure EstimationSmith, Robert Anthony 15 December 2011 (has links) (PDF)
Elevated pulmonary artery pressure (PAP) is a significant healthcare risk. Continuous monitoring for patients with elevated PAP is crucial for effective treatment, yet the most accurate method is invasive and expensive, and cannot be performed repeatedly. Noninvasive methods exist but are inaccurate, expensive, and cannot be used for continuous monitoring. We present a machine learning model based on heart sounds that estimates pulmonary artery pressure with enough accuracy to exclude an invasive diagnostic operation, allowing for consistent monitoring of heart condition in suspect patients without the cost and risk of invasive monitoring. We conduct a greedy search through 38 possible features using a 109-patient cross-validation to find the most predictive features. Our best general model has a standard estimate of error (SEE) of 8.28 mmHg, which outperforms the previous best performance in the literature on a general set of unseen patient data.
|
145 |
Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks / Ökad talarinvarians i obevakad talinlärning genom partitionering av probabilistiska modeller med hjälp av linjära siamesiska nätverkFahlström Myrman, Arvid January 2017 (has links)
Unsupervised learning of speech is concerned with automatically finding patterns such as words or speech sounds, without supervision in the form of orthographical transcriptions or a priori knowledge of the language. However, a fundamental problem is that unsupervised speech learning methods tend to discover highly speaker-specific and context-dependent representations of speech. We propose a method for improving the quality of posteriorgrams generated from an unsupervised model through partitioning of the latent classes discovered by the model. We do this by training a sparse siamese model to find a linear transformation of input posteriorgrams, extracted from the unsupervised model, to lower-dimensional posteriorgrams. The siamese model makes use of same-category and different-category speech fragment pairs obtained through unsupervised term discovery. After training, the model is converted into an exact partitioning of the posteriorgrams. We evaluate the model on the minimal-pair ABX task in the context of the Zero Resource Speech Challenge. We are able to demonstrate that our method significantly reduces the dimensionality of standard Gaussian mixture model posteriorgrams, while also making them more speaker invariant. This suggests that the model may be viable as a general post-processing step to improve probabilistic acoustic features obtained by unsupervised learning. / Obevakad inlärning av tal innebär att automatiskt hitta mönster i tal, t ex ord eller talljud, utan bevakning i form av ortografiska transkriptioner eller tidigare kunskap om språket. Ett grundläggande problem är dock att obevakad talinlärning tenderar att hitta väldigt talar- och kontextspecifika representationer av tal. Vi föreslår en metod för att förbättra kvaliteten av posteriorgram genererade med en obevakad modell, genom att partitionera de latenta klasserna funna av modellen. Vi gör detta genom att träna en gles siamesisk modell för att hitta en linjär transformering av de givna posteriorgrammen, extraherade från den obevakade modellen, till lågdimensionella posteriorgram. Den siamesiska modellen använder sig av talfragmentpar funna med obevakad ordupptäckning, där varje par består av fragment som antingen tillhör samma eller olika klasser. Den färdigtränade modellen görs sedan om till en exakt partitionering av posteriorgrammen. Vi följer Zero Resource Speech Challenge, och evaluerar modellen med hjälp av minimala ordpar-ABX-uppgiften. Vi demonstrerar att vår metod avsevärt minskar posteriorgrammens dimensionalitet, samtidigt som posteriorgrammen blir mer talarinvarianta. Detta antyder att modellen kan vara användbar som ett generellt extra steg för att förbättra probabilistiska akustiska särdrag från obevakade modeller.
|
146 |
Feature Extraction and FeatureSelection for Object-based LandCover Classification : Optimisation of Support Vector Machines in aCloud Computing EnvironmentStromann, Oliver January 2018 (has links)
Mapping the Earth’s surface and its rapid changes with remotely sensed data is a crucial tool to un-derstand the impact of an increasingly urban world population on the environment. However, the impressive amount of freely available Copernicus data is only marginally exploited in common clas-sifications. One of the reasons is that measuring the properties of training samples, the so-called ‘fea-tures’, is costly and tedious. Furthermore, handling large feature sets is not easy in most image clas-sification software. This often leads to the manual choice of few, allegedly promising features. In this Master’s thesis degree project, I use the computational power of Google Earth Engine and Google Cloud Platform to generate an oversized feature set in which I explore feature importance and analyse the influence of dimensionality reduction methods. I use Support Vector Machines (SVMs) for object-based classification of satellite images - a commonly used method. A large feature set is evaluated to find the most relevant features to discriminate the classes and thereby contribute most to high clas-sification accuracy. In doing so, one can bypass the sensitive knowledge-based but sometimes arbi-trary selection of input features.Two kinds of dimensionality reduction methods are investigated. The feature extraction methods, Linear Discriminant Analysis (LDA) and Independent Component Analysis (ICA), which transform the original feature space into a projected space of lower dimensionality. And the filter-based feature selection methods, chi-squared test, mutual information and Fisher-criterion, which rank and filter the features according to a chosen statistic. I compare these methods against the default SVM in terms of classification accuracy and computational performance. The classification accuracy is measured in overall accuracy, prediction stability, inter-rater agreement and the sensitivity to training set sizes. The computational performance is measured in the decrease in training and prediction times and the compression factor of the input data. I conclude on the best performing classifier with the most effec-tive feature set based on this analysis.In a case study of mapping urban land cover in Stockholm, Sweden, based on multitemporal stacks of Sentinel-1 and Sentinel-2 imagery, I demonstrate the integration of Google Earth Engine and Google Cloud Platform for an optimised supervised land cover classification. I use dimensionality reduction methods provided in the open source scikit-learn library and show how they can improve classification accuracy and reduce the data load. At the same time, this project gives an indication of how the exploitation of big earth observation data can be approached in a cloud computing environ-ment.The preliminary results highlighted the effectiveness and necessity of dimensionality reduction methods but also strengthened the need for inter-comparable object-based land cover classification benchmarks to fully assess the quality of the derived products. To facilitate this need and encourage further research, I plan to publish the datasets (i.e. imagery, training and test data) and provide access to the developed Google Earth Engine and Python scripts as Free and Open Source Software (FOSS). / Kartläggning av jordens yta och dess snabba förändringar med fjärranalyserad data är ett viktigt verktyg för att förstå effekterna av en alltmer urban världsbefolkning har på miljön. Den imponerande mängden jordobservationsdata som är fritt och öppet tillgänglig idag utnyttjas dock endast marginellt i klassifikationer. Att hantera ett set av många variabler är inte lätt i standardprogram för bildklassificering. Detta leder ofta till manuellt val av få, antagligen lovande variabler. I det här arbetet använde jag Google Earth Engines och Google Cloud Platforms beräkningsstyrkan för att skapa ett överdimensionerat set av variabler i vilket jag undersöker variablernas betydelse och analyserar påverkan av dimensionsreducering. Jag använde stödvektormaskiner (SVM) för objektbaserad klassificering av segmenterade satellitbilder – en vanlig metod inom fjärranalys. Ett stort antal variabler utvärderas för att hitta de viktigaste och mest relevanta för att diskriminera klasserna och vilka därigenom mest bidrar till klassifikationens exakthet. Genom detta slipper man det känsliga kunskapsbaserade men ibland godtyckliga urvalet av variabler.Två typer av dimensionsreduceringsmetoder tillämpades. Å ena sidan är det extraktionsmetoder, Linjär diskriminantanalys (LDA) och oberoende komponentanalys (ICA), som omvandlar de ursprungliga variablers rum till ett projicerat rum med färre dimensioner. Å andra sidan är det filterbaserade selektionsmetoder, chi-två-test, ömsesidig information och Fisher-kriterium, som rangordnar och filtrerar variablerna enligt deras förmåga att diskriminera klasserna. Jag utvärderade dessa metoder mot standard SVM när det gäller exakthet och beräkningsmässiga prestanda.I en fallstudie av en marktäckeskarta över Stockholm, baserat på Sentinel-1 och Sentinel-2-bilder, demonstrerade jag integrationen av Google Earth Engine och Google Cloud Platform för en optimerad övervakad marktäckesklassifikation. Jag använde dimensionsreduceringsmetoder som tillhandahålls i open source scikit-learn-biblioteket och visade hur de kan förbättra klassificeringsexaktheten och minska databelastningen. Samtidigt gav detta projekt en indikation på hur utnyttjandet av stora jordobservationsdata kan nås i en molntjänstmiljö.Resultaten visar att dimensionsreducering är effektiv och nödvändig. Men resultaten stärker också behovet av ett jämförbart riktmärke för objektbaserad klassificering av marktäcket för att fullständigt och självständigt bedöma kvaliteten på de härledda produkterna. Som ett första steg för att möta detta behov och för att uppmuntra till ytterligare forskning publicerade jag dataseten och ger tillgång till källkoderna i Google Earth Engine och Python-skript som jag utvecklade i denna avhandling.
|
147 |
Feature Extraction using Dimensionality Reduction Techniques: Capturing the Human PerspectiveColeman, Ashley B. January 2015 (has links)
No description available.
|
148 |
Extracting key features for analysis and recognition in computer visionGao, Hui 13 March 2006 (has links)
No description available.
|
149 |
REGION-BASED GEOMETRIC ACTIVE CONTOUR FOR CLASSIFICATION USING HYPERSPECTRAL REMOTE SENSING IMAGESYan, Lin 20 October 2011 (has links)
No description available.
|
150 |
ON THE CONVERGENCE AND APPLICATIONS OF MEAN SHIFT TYPE ALGORITHMSAliyari Ghassabeh, Youness 01 October 2013 (has links)
Mean shift (MS) and subspace constrained mean shift (SCMS) algorithms are non-parametric, iterative methods to find a representation of a high dimensional data set on a principal curve or surface embedded in a high dimensional space. The representation of high dimensional data on a principal curve or surface, the class of mean shift type algorithms and their properties, and applications of these algorithms are the main focus of this dissertation. Although MS and SCMS algorithms have been used in many applications, a rigorous study of their convergence is still missing. This dissertation aims to fill some of the gaps between theory and practice by investigating some convergence properties of these algorithms. In particular, we propose a sufficient condition for a kernel density estimate with a Gaussian kernel to have isolated stationary points to guarantee the convergence of the MS algorithm. We also show that the SCMS algorithm inherits some of the important convergence properties of the MS algorithm. In particular, the monotonicity and convergence of the density estimate values along the sequence of output values of the algorithm are shown. We also show that the distance between consecutive points of the output sequence converges to zero, as does the projection of the gradient vector onto the subspace spanned by the D-d eigenvectors corresponding to the D-d largest eigenvalues of the local inverse covariance matrix. Furthermore, three new variations of the SCMS algorithm are proposed and the running times and performance of the resulting algorithms are compared with original SCMS algorithm. We also propose an adaptive version of the SCMS algorithm to consider the effect of new incoming samples without running the algorithm on the whole data set. As well, we develop some new potential applications of the MS and SCMS algorithm. These applications involve finding straight lines in digital images; pre-processing data before applying locally linear embedding (LLE) and ISOMAP for dimensionality reduction; noisy source vector quantization where the clean data need to be estimated before the quanization step; improving the performance of kernel regression in certain situations; and skeletonization of digitally stored handwritten characters. / Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2013-09-30 18:01:12.959
|
Page generated in 0.0263 seconds