Global ETD Search

141	VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS Junhan Zhao (11024559) 25 June 2021 (has links) Enabling human understanding of high-dimensional (HD) data is critical for scientific research but highly challenging. To deal with large datasets, probabilistic-based non-linear DR models, like UMAP and t-SNE, lead the performance on reducing the high dimensionality. However, considering the trade-off between global and local structure preservation and the randomness initialized for computation, applying non-linear models in different parameter settings to unknown high-dimensional structure data may return different 2D visual forms. Much critical neighborhood relationship may be falsely imposed, and uncertainty may be introduced into the low-dimensional embedding visualizations, so-called distortion. In this work, a survey has been conducted to illustrate the most state-of-the-art layout enrichment works for interpreting dimensionality reduction methods and results. Responding to the lack of visual interpretation techniques to probabilistic-based DR methods, we propose a visualization technique called ManiGraph, which facilitates users to explore multi-view 2D embeddings via mesoscopic structure graphs. A dynamic mesoscopic structure first subsets HD data by a hexagonal grid in visual space from non-linear embedding (e.g., UMAP). Then, it measures the regional adapted trustworthiness/continuity and visualizes the restored missing and highlighted false connections between subsets from high-dimensional space to the low-dimensional in a node-linkage manner. The visualization helps users understand and interpret the distortion from both visualization and model stages. We further demonstrate the user cases tested on intuitive 3D toy datasets, fashion-MNIST, and single-cell RNA sequencing with domain experts in unsupervised scenarios. This work will potentially benefit the data science community, from toolkit users to DR algorithm developers.<br> Computer graphics Human-computer interaction data visualization methods dimensionality reduction analysis Uncertainty Unsupervised Learning Computer-Human Interaction Computer Graphics
142	Data analysis for Systematic Literature Reviews Chao, Roger January 2021 (has links) Systematic Literature Reviews (SLR) are a powerful research tool to identify and select literature to answer a certain question. However, an approach to extract inherent analytical data in Systematic Literature Reviews’ multi-dimensional datasets was lacking. Previous Systematic Literature Review tools do not incorporate the capability of providing said analytical insight. Therefore, this thesis aims to provide a useful approach comprehending various algorithms and data treatment techniques to provide the user with analytical insight on their data that is not evident in the bare execution of a Systematic Literature Review. For this goal, a literature review has been conducted to find the most relevant techniques to extract data from multi-dimensional data sets and the aforementioned approach has been tested on a survey regarding Self-Adaptive Systems (SAS) using a web-application. As a result, we find out what are the most adequate techniques to incorporate into the approach this thesis will provide. data analysis systematic literature reviews clustering dimensionality reduction self-adaptive systems multi-dimensional data Computer Engineering Datorteknik
143	Identification of Suspicious Semiconductor Devices Using Independent Component Analysis with Dimensionality Reduction Bartholomäus, Jenny, Wunderlich, Sven, Sasvári, Zoltán 22 August 2019 (has links) In the semiconductor industry the reliability of devices is of paramount importance. Therefore, after removing the defective ones, one wants to detect irregularities in measurement data because corresponding devices have a higher risk of failure early in the product lifetime. The paper presents a method to improve the detection of such suspicious devices where the screening is made on transformed measurement data. Thereby, e.g., dependencies between tests can be taken into account. Additionally, a new dimensionality reduction is performed within the transformation, so that the reduced and transformed data comprises only the informative content from the raw data. This simplifies the complexity of the subsequent screening steps. The new approach will be applied to semiconductor measurement data and it will be shown, by means of examples, how the screening can be improved. info:eu-repo/classification/ddc/621.3 ddc:621.3
144	High dimensional data clustering; A comparative study on gene expressions : Experiment on clustering algorithms on RNA-sequence from tumors with evaluation on internal validation Henriksson, William January 2019 (has links) In cancer research, class discovery is the first process for investigating a new dataset for which hidden groups there are by similar attributes. However datasets from gene expressions, RNA microarray or RNA-sequence, are high-dimensional. Which makes it hard to perform clusteranalysis and to get clusters that are well separated. Well separated clusters are wanted because that tells that objects are most likely not placed in wrong clusters. This report investigate in an experiment whether using K-Means and hierarchical are suitable for clustering gene expressions in RNA-sequence data from various tumors. Dimensionality reduction methods are also applied to see whether that helps create well-separated clusters. The results tell that well separated clusters are only achieved by using PCA as dimensionality reduction and K-Means on correlation. The main contribution of this paper is determining that using K-Means or hierarchical clustering on the full natural dimensionality of RNA-sequence data returns unwanted silhouette average width, under 0,4. Cluster analysis cluster validation RNA-sequence tumors high-dimensional data dimensionality reduction Information Systems
145	Principal Component Modelling of Fuel Consumption ofSeagoing Vessels and Optimising Fuel Consumption as a Mixed-Integer Problem Ivan, Jean-Paul January 2020 (has links) The fuel consumption of a seagoing vessel is, through a combination of Box-Cox transforms and principal component analysis, reduced to a univariatefunction of the primary principle component with mean model error −3.2%and error standard deviation 10.3%. In the process, a Latin-hypercube-inspired space partitioning sampling technique is developed and successfully used to produce a representative sampleused in determining the regression coefficients. Finally, a formal optimisation problem for minimising the fuel use is described. The problem is derived from a parametrised expression for the fuel consumption, and has only 3, or 2 if simplified, free variables at each timestep. Some information has been redacted in order to comply with NDA restrictions. Most redactions are either names (of vessels or otherwise), units, andin some cases (especially on figures) quantities. / <p>Presentation was performed remotely using Zoom.</p> fuel consumption modelling dimensionality reduction PCA principal component analysis Box-Cox transform recursive space partitioning parametrisation model reduction Mathematics Matematik
146	A General Model for Continuous Noninvasive Pulmonary Artery Pressure Estimation Smith, Robert Anthony 15 December 2011 (has links) (PDF) Elevated pulmonary artery pressure (PAP) is a significant healthcare risk. Continuous monitoring for patients with elevated PAP is crucial for effective treatment, yet the most accurate method is invasive and expensive, and cannot be performed repeatedly. Noninvasive methods exist but are inaccurate, expensive, and cannot be used for continuous monitoring. We present a machine learning model based on heart sounds that estimates pulmonary artery pressure with enough accuracy to exclude an invasive diagnostic operation, allowing for consistent monitoring of heart condition in suspect patients without the cost and risk of invasive monitoring. We conduct a greedy search through 38 possible features using a 109-patient cross-validation to find the most predictive features. Our best general model has a standard estimate of error (SEE) of 8.28 mmHg, which outperforms the previous best performance in the literature on a general set of unseen patient data. Feature Selection PAP Medical Diagnostics SVM Parameter Selection Neural Networks Neural Networks Topology Dimensionality Reduction Computer Sciences
147	Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks / Ökad talarinvarians i obevakad talinlärning genom partitionering av probabilistiska modeller med hjälp av linjära siamesiska nätverk Fahlström Myrman, Arvid January 2017 (has links) Unsupervised learning of speech is concerned with automatically finding patterns such as words or speech sounds, without supervision in the form of orthographical transcriptions or a priori knowledge of the language. However, a fundamental problem is that unsupervised speech learning methods tend to discover highly speaker-specific and context-dependent representations of speech. We propose a method for improving the quality of posteriorgrams generated from an unsupervised model through partitioning of the latent classes discovered by the model. We do this by training a sparse siamese model to find a linear transformation of input posteriorgrams, extracted from the unsupervised model, to lower-dimensional posteriorgrams. The siamese model makes use of same-category and different-category speech fragment pairs obtained through unsupervised term discovery. After training, the model is converted into an exact partitioning of the posteriorgrams. We evaluate the model on the minimal-pair ABX task in the context of the Zero Resource Speech Challenge. We are able to demonstrate that our method significantly reduces the dimensionality of standard Gaussian mixture model posteriorgrams, while also making them more speaker invariant. This suggests that the model may be viable as a general post-processing step to improve probabilistic acoustic features obtained by unsupervised learning. / Obevakad inlärning av tal innebär att automatiskt hitta mönster i tal, t ex ord eller talljud, utan bevakning i form av ortografiska transkriptioner eller tidigare kunskap om språket. Ett grundläggande problem är dock att obevakad talinlärning tenderar att hitta väldigt talar- och kontextspecifika representationer av tal. Vi föreslår en metod för att förbättra kvaliteten av posteriorgram genererade med en obevakad modell, genom att partitionera de latenta klasserna funna av modellen. Vi gör detta genom att träna en gles siamesisk modell för att hitta en linjär transformering av de givna posteriorgrammen, extraherade från den obevakade modellen, till lågdimensionella posteriorgram. Den siamesiska modellen använder sig av talfragmentpar funna med obevakad ordupptäckning, där varje par består av fragment som antingen tillhör samma eller olika klasser. Den färdigtränade modellen görs sedan om till en exakt partitionering av posteriorgrammen. Vi följer Zero Resource Speech Challenge, och evaluerar modellen med hjälp av minimala ordpar-ABX-uppgiften. Vi demonstrerar att vår metod avsevärt minskar posteriorgrammens dimensionalitet, samtidigt som posteriorgrammen blir mer talarinvarianta. Detta antyder att modellen kan vara användbar som ett generellt extra steg för att förbättra probabilistiska akustiska särdrag från obevakade modeller. speech recognition unsupervised speech learning unsupervised acoustic modelling siamese networks linear networks speech embeddings dimensionality reduction Computer Sciences Datavetenskap (datalogi)
148	Feature Extraction and FeatureSelection for Object-based LandCover Classification : Optimisation of Support Vector Machines in aCloud Computing Environment Stromann, Oliver January 2018 (has links) Mapping the Earth’s surface and its rapid changes with remotely sensed data is a crucial tool to un-derstand the impact of an increasingly urban world population on the environment. However, the impressive amount of freely available Copernicus data is only marginally exploited in common clas-sifications. One of the reasons is that measuring the properties of training samples, the so-called ‘fea-tures’, is costly and tedious. Furthermore, handling large feature sets is not easy in most image clas-sification software. This often leads to the manual choice of few, allegedly promising features. In this Master’s thesis degree project, I use the computational power of Google Earth Engine and Google Cloud Platform to generate an oversized feature set in which I explore feature importance and analyse the influence of dimensionality reduction methods. I use Support Vector Machines (SVMs) for object-based classification of satellite images - a commonly used method. A large feature set is evaluated to find the most relevant features to discriminate the classes and thereby contribute most to high clas-sification accuracy. In doing so, one can bypass the sensitive knowledge-based but sometimes arbi-trary selection of input features.Two kinds of dimensionality reduction methods are investigated. The feature extraction methods, Linear Discriminant Analysis (LDA) and Independent Component Analysis (ICA), which transform the original feature space into a projected space of lower dimensionality. And the filter-based feature selection methods, chi-squared test, mutual information and Fisher-criterion, which rank and filter the features according to a chosen statistic. I compare these methods against the default SVM in terms of classification accuracy and computational performance. The classification accuracy is measured in overall accuracy, prediction stability, inter-rater agreement and the sensitivity to training set sizes. The computational performance is measured in the decrease in training and prediction times and the compression factor of the input data. I conclude on the best performing classifier with the most effec-tive feature set based on this analysis.In a case study of mapping urban land cover in Stockholm, Sweden, based on multitemporal stacks of Sentinel-1 and Sentinel-2 imagery, I demonstrate the integration of Google Earth Engine and Google Cloud Platform for an optimised supervised land cover classification. I use dimensionality reduction methods provided in the open source scikit-learn library and show how they can improve classification accuracy and reduce the data load. At the same time, this project gives an indication of how the exploitation of big earth observation data can be approached in a cloud computing environ-ment.The preliminary results highlighted the effectiveness and necessity of dimensionality reduction methods but also strengthened the need for inter-comparable object-based land cover classification benchmarks to fully assess the quality of the derived products. To facilitate this need and encourage further research, I plan to publish the datasets (i.e. imagery, training and test data) and provide access to the developed Google Earth Engine and Python scripts as Free and Open Source Software (FOSS). / Kartläggning av jordens yta och dess snabba förändringar med fjärranalyserad data är ett viktigt verktyg för att förstå effekterna av en alltmer urban världsbefolkning har på miljön. Den imponerande mängden jordobservationsdata som är fritt och öppet tillgänglig idag utnyttjas dock endast marginellt i klassifikationer. Att hantera ett set av många variabler är inte lätt i standardprogram för bildklassificering. Detta leder ofta till manuellt val av få, antagligen lovande variabler. I det här arbetet använde jag Google Earth Engines och Google Cloud Platforms beräkningsstyrkan för att skapa ett överdimensionerat set av variabler i vilket jag undersöker variablernas betydelse och analyserar påverkan av dimensionsreducering. Jag använde stödvektormaskiner (SVM) för objektbaserad klassificering av segmenterade satellitbilder – en vanlig metod inom fjärranalys. Ett stort antal variabler utvärderas för att hitta de viktigaste och mest relevanta för att diskriminera klasserna och vilka därigenom mest bidrar till klassifikationens exakthet. Genom detta slipper man det känsliga kunskapsbaserade men ibland godtyckliga urvalet av variabler.Två typer av dimensionsreduceringsmetoder tillämpades. Å ena sidan är det extraktionsmetoder, Linjär diskriminantanalys (LDA) och oberoende komponentanalys (ICA), som omvandlar de ursprungliga variablers rum till ett projicerat rum med färre dimensioner. Å andra sidan är det filterbaserade selektionsmetoder, chi-två-test, ömsesidig information och Fisher-kriterium, som rangordnar och filtrerar variablerna enligt deras förmåga att diskriminera klasserna. Jag utvärderade dessa metoder mot standard SVM när det gäller exakthet och beräkningsmässiga prestanda.I en fallstudie av en marktäckeskarta över Stockholm, baserat på Sentinel-1 och Sentinel-2-bilder, demonstrerade jag integrationen av Google Earth Engine och Google Cloud Platform för en optimerad övervakad marktäckesklassifikation. Jag använde dimensionsreduceringsmetoder som tillhandahålls i open source scikit-learn-biblioteket och visade hur de kan förbättra klassificeringsexaktheten och minska databelastningen. Samtidigt gav detta projekt en indikation på hur utnyttjandet av stora jordobservationsdata kan nås i en molntjänstmiljö.Resultaten visar att dimensionsreducering är effektiv och nödvändig. Men resultaten stärker också behovet av ett jämförbart riktmärke för objektbaserad klassificering av marktäcket för att fullständigt och självständigt bedöma kvaliteten på de härledda produkterna. Som ett första steg för att möta detta behov och för att uppmuntra till ytterligare forskning publicerade jag dataseten och ger tillgång till källkoderna i Google Earth Engine och Python-skript som jag utvecklade i denna avhandling. Feature Extraction Feature Selection Dimensionality Reduction Google Earth Engine Remote Sensing Fjärranalysteknik
149	Feature Extraction using Dimensionality Reduction Techniques: Capturing the Human Perspective Coleman, Ashley B. January 2015 (has links) No description available. Computer Science Feature Extraction Dimensionality Reduction Principal Component Analysis Multi-dimensional Scaling Isomap Kernel Principal Component Analysis
150	Extracting key features for analysis and recognition in computer vision Gao, Hui 13 March 2006 (has links) No description available. Computer Science Biological motion Modeling human action styles Three-mode PCA Feature extraction Dimensionality reduction Linear Discriminant Analysis Bootstrap Bumping

Search results