Global ETD Search

221	Optimizing Deep Neural Networks for Classification of Short Texts Pettersson, Fredrik January 2019 (has links) This master's thesis investigates how a state-of-the-art (SOTA) deep neural network (NN) model can be created for a specific natural language processing (NLP) dataset, the effects of using different dimensionality reduction techniques on common pre-trained word embeddings and how well this model generalize on a secondary dataset. The research is motivated by two factors. One is that the construction of a machine learning (ML) text classification (TC) model is typically done around a specific dataset and often requires a lot of manual intervention. It's therefore hard to know exactly what procedures to implement for a specific dataset and how the result will be affected. The other reason is that, if the dimensionality of pre-trained embedding vectors can be lowered without losing accuracy, and thus saving execution time, other techniques can be used during the time saved to achieve even higher accuracy. A handful of deep neural network architectures are used, namely a convolutional neural network (CNN), long short-term memory neural network (LSTM) and a bidirectional LSTM (Bi-LSTM) architecture. These deep neural network architectures are combined with four different word embeddings: GoogleNews-vectors-negative300, glove.840B.300d, paragram_300_sl999 and wiki-news-300d-1M. Three main experiments are conducted in this thesis. In the first experiment, a top-performing TC model is created for a recent NLP competition held at Kaggle.com. Each implemented procedure is benchmarked on how the accuracy and execution time of the model is affected. In the second experiment, principal component analysis (PCA) and random projection (RP) are applied to the pre-trained word embeddings used in the top-performing model to investigate how the accuracy and execution time is affected when creating lower-dimensional embedding vectors. In the third experiment, the same model is benchmarked on a separate dataset (Sentiment140) to investigate how well it generalizes on other data and how each implemented procedure affects the accuracy compared to on the original dataset. The first experiment results in a bidirectional LSTM model and a combination of the three embeddings: glove, paragram and wiki-news concatenated together. The model is able to give predictions with an F1 score of 71% which is good enough to reach 9th place out of 1,401 participating teams in the competition. In the second experiment, the execution time is improved by 13%, by using PCA, while lowering the dimensionality of the embeddings by 66% and only losing half a percent of F1 accuracy. RP gave a constant accuracy of 66-67% regardless of the projected dimensions compared to over 70% when using PCA. In the third experiment, the model gained around 12% accuracy from the initial to the final benchmarks, compared to 19% on the competition dataset. The best-achieved accuracy on the Sentiment140 dataset is 86% and thus higher than the 71% achieved on the Quora dataset. Machine learning Text classification state-of-the-art SOTA Deep neural network Dimensionality reduction Word embeddings Computer and Information Sciences Data- och informationsvetenskap
222	Mera sličnosti između modela Gausovih smeša zasnovana na transformaciji prostora parametara Krstanović Lidija 25 September 2017 (has links) <p>Predmet istraživanja ovog rada je istraživanje i eksploatacija mogućnosti da parametri Gausovih komponenti korišćenih Gaussian mixture modela  (GMM) aproksimativno leže na niže dimenzionalnoj površi umetnutoj u konusu pozitivno definitnih matrica. U tu svrhu uvodimo novu, mnogo efikasniju meru sličnosti između GMM-ova projektovanjem LPP-tipa parametara komponenti iz više dimenzionalnog parametarskog originalno konfiguracijskog prostora u prostor značajno niže dimenzionalnosti. Prema tome, nalaženje distance između dva GMM-a iz originalnog prostora se redukuje na nalaženje distance između dva skupa niže dimenzionalnih euklidskih vektora, ponderisanih odgovarajućim težinama. Predložena mera je pogodna za primene koje zahtevaju visoko dimenzionalni prostor obeležja i/ili veliki ukupan broj Gausovih komponenti. Razrađena metodologija je primenjena kako na sintetičkim tako i na realnim eksperimentalnim podacima.</p> / <p>This thesis studies the possibility that the parameters of Gaussian components of a<br />particular Gaussian Mixture Model (GMM) lie approximately on a lower-dimensional<br />surface embedded in the cone of positive definite matrices. For that case, we deliver<br />novel, more efficient similarity measure between GMMs, by LPP-like projecting the<br />components of a particular GMM, from the high dimensional original parameter space,<br />to a much lower dimensional space. Thus, finding the distance between two GMMs in<br />the original space is reduced to finding the distance between sets of lower<br />dimensional euclidian vectors, pondered by corresponding weights. The proposed<br />measure is suitable for applications that utilize high dimensional feature spaces and/or<br />large overall number of Gaussian components. We confirm our results on artificial, as<br />well as real experimental data.</p>
223	VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS Junhan Zhao (11024559) 25 June 2021 (has links) Enabling human understanding of high-dimensional (HD) data is critical for scientific research but highly challenging. To deal with large datasets, probabilistic-based non-linear DR models, like UMAP and t-SNE, lead the performance on reducing the high dimensionality. However, considering the trade-off between global and local structure preservation and the randomness initialized for computation, applying non-linear models in different parameter settings to unknown high-dimensional structure data may return different 2D visual forms. Much critical neighborhood relationship may be falsely imposed, and uncertainty may be introduced into the low-dimensional embedding visualizations, so-called distortion. In this work, a survey has been conducted to illustrate the most state-of-the-art layout enrichment works for interpreting dimensionality reduction methods and results. Responding to the lack of visual interpretation techniques to probabilistic-based DR methods, we propose a visualization technique called ManiGraph, which facilitates users to explore multi-view 2D embeddings via mesoscopic structure graphs. A dynamic mesoscopic structure first subsets HD data by a hexagonal grid in visual space from non-linear embedding (e.g., UMAP). Then, it measures the regional adapted trustworthiness/continuity and visualizes the restored missing and highlighted false connections between subsets from high-dimensional space to the low-dimensional in a node-linkage manner. The visualization helps users understand and interpret the distortion from both visualization and model stages. We further demonstrate the user cases tested on intuitive 3D toy datasets, fashion-MNIST, and single-cell RNA sequencing with domain experts in unsupervised scenarios. This work will potentially benefit the data science community, from toolkit users to DR algorithm developers.<br> Computer graphics Human-computer interaction data visualization methods dimensionality reduction analysis Uncertainty Unsupervised Learning Computer-Human Interaction Computer Graphics
224	Data analysis for Systematic Literature Reviews Chao, Roger January 2021 (has links) Systematic Literature Reviews (SLR) are a powerful research tool to identify and select literature to answer a certain question. However, an approach to extract inherent analytical data in Systematic Literature Reviews’ multi-dimensional datasets was lacking. Previous Systematic Literature Review tools do not incorporate the capability of providing said analytical insight. Therefore, this thesis aims to provide a useful approach comprehending various algorithms and data treatment techniques to provide the user with analytical insight on their data that is not evident in the bare execution of a Systematic Literature Review. For this goal, a literature review has been conducted to find the most relevant techniques to extract data from multi-dimensional data sets and the aforementioned approach has been tested on a survey regarding Self-Adaptive Systems (SAS) using a web-application. As a result, we find out what are the most adequate techniques to incorporate into the approach this thesis will provide. data analysis systematic literature reviews clustering dimensionality reduction self-adaptive systems multi-dimensional data Computer Engineering Datorteknik
225	Identification of Suspicious Semiconductor Devices Using Independent Component Analysis with Dimensionality Reduction Bartholomäus, Jenny, Wunderlich, Sven, Sasvári, Zoltán 22 August 2019 (has links) In the semiconductor industry the reliability of devices is of paramount importance. Therefore, after removing the defective ones, one wants to detect irregularities in measurement data because corresponding devices have a higher risk of failure early in the product lifetime. The paper presents a method to improve the detection of such suspicious devices where the screening is made on transformed measurement data. Thereby, e.g., dependencies between tests can be taken into account. Additionally, a new dimensionality reduction is performed within the transformation, so that the reduced and transformed data comprises only the informative content from the raw data. This simplifies the complexity of the subsequent screening steps. The new approach will be applied to semiconductor measurement data and it will be shown, by means of examples, how the screening can be improved. info:eu-repo/classification/ddc/621.3 ddc:621.3
226	Perpetrator Workplace Aggression: Development of a Perpetrator Aggression Scale (PAS) Islam, Md Rashedul 04 May 2022 (has links) No description available. Occupational Psychology Organizational Behavior Psychology perpetrator workplace aggression uni-dimensionality general factor content-related validity construct-related validity
227	High dimensional data clustering; A comparative study on gene expressions : Experiment on clustering algorithms on RNA-sequence from tumors with evaluation on internal validation Henriksson, William January 2019 (has links) In cancer research, class discovery is the first process for investigating a new dataset for which hidden groups there are by similar attributes. However datasets from gene expressions, RNA microarray or RNA-sequence, are high-dimensional. Which makes it hard to perform clusteranalysis and to get clusters that are well separated. Well separated clusters are wanted because that tells that objects are most likely not placed in wrong clusters. This report investigate in an experiment whether using K-Means and hierarchical are suitable for clustering gene expressions in RNA-sequence data from various tumors. Dimensionality reduction methods are also applied to see whether that helps create well-separated clusters. The results tell that well separated clusters are only achieved by using PCA as dimensionality reduction and K-Means on correlation. The main contribution of this paper is determining that using K-Means or hierarchical clustering on the full natural dimensionality of RNA-sequence data returns unwanted silhouette average width, under 0,4. Cluster analysis cluster validation RNA-sequence tumors high-dimensional data dimensionality reduction Information Systems
228	Principal Component Modelling of Fuel Consumption ofSeagoing Vessels and Optimising Fuel Consumption as a Mixed-Integer Problem Ivan, Jean-Paul January 2020 (has links) The fuel consumption of a seagoing vessel is, through a combination of Box-Cox transforms and principal component analysis, reduced to a univariatefunction of the primary principle component with mean model error −3.2%and error standard deviation 10.3%. In the process, a Latin-hypercube-inspired space partitioning sampling technique is developed and successfully used to produce a representative sampleused in determining the regression coefficients. Finally, a formal optimisation problem for minimising the fuel use is described. The problem is derived from a parametrised expression for the fuel consumption, and has only 3, or 2 if simplified, free variables at each timestep. Some information has been redacted in order to comply with NDA restrictions. Most redactions are either names (of vessels or otherwise), units, andin some cases (especially on figures) quantities. / <p>Presentation was performed remotely using Zoom.</p> fuel consumption modelling dimensionality reduction PCA principal component analysis Box-Cox transform recursive space partitioning parametrisation model reduction Mathematics Matematik
229	Woven Forms : creating three-dimensional objects transformed from flat woven textile Burkhardt, Leonie Annett January 2022 (has links) Technological developments in digital Jacquard weaving, as well as material research, have a strong influence on today‘s possibilities of textile production. These advancements enable to shift the perspective of textile as a flat surface to textile as a three-dimensional form and push two-dimensional weaving into the third dimension. Utilizing recent technologies in the form of applying multi-layering weaving techniques and embedding heat-reactive shrinking material, the research of Woven Forms aims to explore the forming method of construction through weaving to create abstract forms transformed from flat and to investigate its textile-form properties of shape, texture, color, and scale. The developed method of Embedded Form Weaving is set within experimental design research and structures a systematical approach to generate three-dimensional forms activated from flat surfaces. The outcome in form of abstract, self-supporting textile-forms showcases the multitude of form expressions and variety of formal variables within two construction-form-thinking families. This research contributes to the field of 3D weaving, demonstrates the potential for further research and application possibilities in other disciplines and fields, and evaluates the potential of seeing the weaving loom as a forming tool. While the fundamental base is the interlacement of warp and weft, technology, material science, and textile engineering shift the perception of woven textiles: from a rectangular piece of cloth to the opportunity to construct textile-forms. Textile Design Jacquard weaving three-dimensional woven textile form shrinking material texture color three-dimensionality Design Design
230	A General Model for Continuous Noninvasive Pulmonary Artery Pressure Estimation Smith, Robert Anthony 15 December 2011 (has links) (PDF) Elevated pulmonary artery pressure (PAP) is a significant healthcare risk. Continuous monitoring for patients with elevated PAP is crucial for effective treatment, yet the most accurate method is invasive and expensive, and cannot be performed repeatedly. Noninvasive methods exist but are inaccurate, expensive, and cannot be used for continuous monitoring. We present a machine learning model based on heart sounds that estimates pulmonary artery pressure with enough accuracy to exclude an invasive diagnostic operation, allowing for consistent monitoring of heart condition in suspect patients without the cost and risk of invasive monitoring. We conduct a greedy search through 38 possible features using a 109-patient cross-validation to find the most predictive features. Our best general model has a standard estimate of error (SEE) of 8.28 mmHg, which outperforms the previous best performance in the literature on a general set of unseen patient data. Feature Selection PAP Medical Diagnostics SVM Parameter Selection Neural Networks Neural Networks Topology Dimensionality Reduction Computer Sciences

Search results