• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 241
  • 85
  • 27
  • 20
  • 10
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 484
  • 484
  • 179
  • 151
  • 116
  • 116
  • 110
  • 70
  • 68
  • 60
  • 55
  • 53
  • 52
  • 50
  • 49
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

A comparative study of feature selection methodologies in a readability assessment framework for children’s literature

Singh, Ritu 30 August 2012 (has links)
No description available.
72

Extracting Feature Vectors From Event-Related fMRI Data to Enable Machine Learning Analysis

Soldate, Jeffrey S. 05 October 2022 (has links)
Linear models are the dominant means of extracting summaries of events in fMRI for feature vector based machine learning. While they are both useful and robust, they are limited by the assumptions made in modeling. In this work, we examine a number of feature extraction techniques adjacent to linear models that account for or allow wider variation. Primarily, we construct mixed effects models able to account for variation between stimuli of the same class and perform empirical tests on the resulting feature extraction – classifier system. We extend this analysis to spatial temporal models as well as summary models. We find that mixed effects models increase classifier performance at the cost of increased uncertainty in prediction estimates. In addition, these models identify similar regions of interest in separating classes. While they currently require knowledge hidden during testing, we present these results as an optimum to be reached in additional works. / Doctor of Philosophy / Machine learning is a popular tool for extracting useful information from functional MR images. One approach is classification using feature vectors derived from observations. In this work, we examine new strategies for extracting feature vectors time varying data and explore the effect these feature vectors have on the results of machine learning analysis. In a set of simulations and real data, we compare a range of standard methods for feature extraction to new methods developed for this work. We find the most effective approach for successful classification is feature extraction through the use of mixed effects models. We also find that these models preserve the selection of feature sets that are maximally important to classification. We then explore the range of considerations required to use any of the methods examined in this work for a range of cases. We hope this provides solid ground for both future expansion of feature extraction methods and helpful advice for future users of these methods.
73

Evalutating Biological Data Using Rank Correlation Methods

Slotta, Douglas J. 24 May 2005 (has links)
Analyses based upon rank correlation methods, such as Spearman's Rho and Kendall's Tau, can provide quick insights into large biological data sets. Comparing expression levels between different technologies and models is problematic due to the different units of measure. Here again, rank correlation provides an effective means of comparison between the two techniques. Massively Parallel Signature Sequencing (MPSS) transcript abundance levels to microarray signal intensities for Arabidopsis thaliana are compared. Rank correlations can be applied to subsets as well as the entire set. Results of subset comparisons can be used to improve the capabilities of predictive models, such as Predicted Highly Expressed (PHX). This is done for Escherichia coli. Methods are given to combine predictive models based upon feedback from experimental data. The problem of feature selection in supervised learning situations is also considered, where all features are drawn from a common domain and are best interpreted via ordinal comparisons with other features, rather than as numerical values. This is done for synthetic data as well as for microarray experiments examining the life cycle of Drosophila melanogaster and human leukemia cells. Two novel methods are presented based upon Rho and Tau, and their efficacy is tested with synthetic and real world data. The method based upon Spearman's Rho is shown to be more effective. / Ph. D.
74

Task Oriented Privacy-preserving (TOP) Technologies Using Automatic Feature Selection

Jafer, Yasser January 2016 (has links)
A large amount of digital information collected and stored in datasets creates vast opportunities for knowledge discovery and data mining. These datasets, however, may contain sensitive information about individuals and, therefore, it is imperative to ensure that their privacy is protected. Most research in the area of privacy preserving data publishing does not make any assumptions about an intended analysis task applied on the dataset. In many domains such as healthcare, finance, etc; however, it is possible to identify the analysis task beforehand. Incorporating such knowledge of the ultimate analysis task may improve the quality of the anonymized data while protecting the privacy of individuals. Furthermore, the existing research which consider the ultimate analysis task (e.g., classification) is not suitable for high-dimensional data. We show that automatic feature selection (which is a well-known dimensionality reduction technique) can be utilized in order to consider both aspects of privacy and utility simultaneously. In doing so, we show that feature selection can enhance existing privacy preserving techniques addressing k-anonymity and differential privacy and protect privacy while reducing the amount of modifications applied to the dataset; hence, in most of the cases achieving higher utility. We consider incorporating the concept of privacy-by-design within the feature selection process. We propose techniques that turn filter-based and wrapper-based feature selection into privacy-aware processes. To this end, we build a layer of privacy on top of regular feature selection process and obtain a privacy preserving feature selection that is not only guided by accuracy but also the amount of protected private information. In addition to considering privacy after feature selection we introduce a framework for a privacy-aware feature selection evaluation measure. That is, we incorporate privacy during feature selection and obtain a list of candidate privacy-aware attribute subsets that consider (and satisfy) both efficacy and privacy requirements simultaneously. Finally, we propose a multi-dimensional, privacy-aware evaluation function which incorporates efficacy, privacy, and dimensionality weights and enables the data holder to obtain a best attribute subset according to its preferences.
75

Regularizing Vision-Transformers Using Gumbel-Softmax Distributions on Echocardiography Data / Regularisering av Vision-Transformers med hjälp av Gumbel-Softmax-fördelningar på ekokardiografidata

Nilsson, Alfred January 2023 (has links)
This thesis introduces an novel approach to model regularization in Vision Transformers (ViTs), a category of deep learning models. It employs stochastic embedded feature selection within the context of echocardiography video analysis, specifically focusing on the EchoNet-Dynamic dataset. The proposed method, termed Gumbel Vision-Transformer (G-ViT), combines ViTs and Concrete Autoencoders (CAE) to enhance the generalization of models predicting left ventricular ejection fraction (LVEF). The model comprises a ViT frame encoder for spatial representation and a transformer sequence model for temporal aspects, forming a Video ViT (V-ViT) architecture that, when used without feature selection, serves as a baseline on LVEF prediction performance. The key contribution lies in the incorporation of stochastic image patch selection in video frames during training. The CAE method is adapted for this purpose, achieving approximately discrete patch selections by sampling from the Gumbel-Softmax distribution, a relaxation of the categorical. The experiments conducted on EchoNetDynamic demonstrate a consistent and notable regularization effect. The G-ViT model, trained with learned feature selection, achieves a test R² of 0.66 outperforms random masking baselines and the full-input V-ViT counterpart with an R² of 0.63, and showcasing improved generalization in multiple evaluation metrics. The G-ViT is compared against recent related work in the application of ViTs on EchoNet-Dynamic, notably outperforming the application of Swin-transformers, UltraSwin, which achieved an R² of 0.59. Moreover, the thesis explores model explainability by visualizing selected patches, providing insights into how the G-ViT utilizes regions known to be crucial for LVEF prediction for humans. This proposed approach extends beyond regularization, offering a unique explainability tool for ViTs. Efficiency aspects are also considered, revealing that the G-ViT model, trained with a reduced number of input tokens, yields comparable or superior results while significantly reducing GPU memory and floating-point operations. This efficiency improvement holds potential for energy reduction during training. / Detta examensarbete introducerar en ny metod för att uppnå regularisering av Vision-Transformers (ViTs), en kategori av deep learning-modeller. Den använder sig stokastisk inbäddad feature selection i kontexten av analys av ekokardiografivideor, specifikt inriktat på datasetet EchoNet-Dynamic. Den föreslagna metoden, kallad Gumbel Vision-Transformer (G-ViT), kombinerar ViTs och Concrete Autoencoders (CAE) för att förbättra generaliseringen av modeller som förutspår ejektionsfraktion i vänstra ventrikeln (left ventricular ejection fraction, LVEF). Modellen inbegriper en ViT frame encoder för spatiella representationer och en transformer-sekvensmodell för tidsaspekter, vilka bilder en arkitektur, Video-ViT (V-ViT), som tränad utan feature selection utgör en utgångspunkt (baseline) för jämförelse vid prediktion av LVEF. Det viktigaste bidraget ligger i införandet av stokastiskt urval av bild-patches i videobilder under träning. CAE-metoden anpassas för detta ändamål, och uppnår approxmativt diskret patch-selektion genom att dra stickprov från Gumbel-Softmax-fördelningen, en relaxation av den kategoriska fördelningen. Experimenten utförda på EchoNet-Dynamic visar en konsekvent och anmärkningsvärd regulariseringseffekt. G-ViTmodellen, tränad med inlärd feature selection, uppnår ett R² på 0,66 och överträffar slumpmässigt urval och V-ViT-motsvarigheten som använder sig av hela bilder med ett R² på 0,63, och uppvisar förbättrad generalisering i flera utvärderingsmått. G-ViT jämförs med nyligen publicerat arbete i tillämpningen av ViTs på EchoNet-Dynamic och överträffar bland annat en tillämpning av Swin-transformers, UltraSwin, som uppnådde en R² på 0,59. Dessutom utforskar detta arbete modellförklarbarhet genom att visualisera utvalda bild-patches, vilket ger insikter i hur G-ViT använder regioner som är kända för att vara avgörande för LVEF-estimering för människor. Denna föreslagna metod sträcker sig bortom regularisering och erbjuder ett unikt förklaringsverktyg för ViTs. Effektivitetsaspekter beaktas också, vilket avslöjar att G-ViT-modellen, tränad med ett reducerat antal inmatningstokens, ger jämförbara eller överlägsna resultat samtidigt som den avsevärt minskar GPU-minnet och flyttalsoperationer. Denna effektivitetsförbättring har potential för energireduktion under träning.
76

Reconnaissance de forme dans cybersécurité

Vashaee, Ali January 2014 (has links)
Résumé : L’expansion des images sur le Web a provoqué le besoin de mettre en œuvre des méthodes de classement d’images précises pour plusieurs applications notamment la cybersécurité. L’extraction des caractéristiques est une étape primordiale dans la procédure du classement des images vu son impact direct sur la performance de la catégorisation finale des images et de leur classement. L’objectif de cette étude est d’analyser l’état de l’art des différents espaces de caractéristiques pour évaluer leur efficacité dans le contexte de la reconnaissance de forme pour les applications de cybersécurité. Les expériences ont montré que les descripteurs de caractéristiques HOG et GIST ont une performance élevée. Par contre, cette dernière se dégrade face aux transformations géométriques des objets dans les images. Afin d’obtenir des systèmes de classement d’image plus fiables basés sur ces descripteurs, nous proposons deux méthodes. Dans la première méthode (PrMI) nous nous concentrons sur l’amélioration de la propriété d’invariance du système de classement par tout en maintenant la performance du classement. Dans cette méthode, un descripteur invariant par rapport à la rotation dérivé de HOG est utilisé (RIHOG) dans une technique de recherche "top-down" pour le classement des images. La méthode (PrMI) proposée donne non seulement une robustesse face aux transformations géométriques des objets, mais aussi une performance élevée similaire à celle de HOG. Elle est aussi efficace en terme de coût de calcul avec une complexité de l’ordre de O(n). Dans la deuxième méthode proposée (PrMII), nous nous focalisons sur la performance du classement en maintenant la propriété d’invariance du système de classement. Les objets sont localisés d’une façon invariante aux changement d’échelle dans l’espace de caractéristiques de covariance par région. Ensuite elles sont décrites avec les descripteurs HOG et GIST. Cette méthode procure une performance de classement meilleure en comparaison avec les méthodes implémentées dans l’étude et quelques méthodes CBIR expérimentées sur les données Caltech-256 dans les travaux antérieurs. // Abstract : The tremendous growth of accessible online images (Web images), provokes the need to perform accurate image ranking for applications like cyber-security. Fea­ture extraction is an important step in image ranking procedures due to its direct impact on final categorization and ranking performance. The goal of this study is to analyse the state of the art feature spaces in order to evaluate their efficiency in the abject recognition context and image ranking framework for cyber-security applications. Experiments show that HOG and GIST feature descriptors exhibit high ranking performance. Whereas, these features are not rotation and scale invariant. In order to obtain more reliable image ranking systems based on these feature spaces, we proposed two methods. In the first method (PrMI) we focused on improving the invariance property of the ranking system while maintaining the ranking perfor­mance. In this method, a rotation invariant feature descriptor is derived from HOC (RIHOC). This descriptor is used in a top-down searching technique to caver the scale variation of the abjects in the images. The proposed method (PrMI) not only pro­ vides robustness against geometrical transformations of objects but also provides high ranking performance close to HOC performance. It is also computationally efficient with complexity around O(n). In the second proposed method (PrMII) we focused on the ranking performance while maintaining the invariance property of the ranking system. Objects are localized in a scale invariant fashion under a Region Covariance feature space, then they are described using HOC and CIST features. Finally to ob­ tain better evaluation over the performance of proposed method we compare it with existing research in the similar domain(CBIR) on Caltech-256. Proposed methods provide highest ranking performance in comparison with implemented methods in this study, and some of the CBIR methods on Caltech-256 dataset in previous works.
77

Supervised Learning Techniques : A comparison of the Random Forest and the Support Vector Machine

Arnroth, Lukas, Fiddler Dennis, Jonni January 2016 (has links)
This thesis examines the performance of the support vector machine and the random forest models in the context of binary classification. The two techniques are compared and the outstanding one is used to construct a final parsimonious model. The data set consists of 33 observations and 89 biomarkers as features with no known dependent variable. The dependent variable is generated through k-means clustering, with a predefined final solution of two clusters. The training of the algorithms is performed using five-fold cross-validation repeated twenty times. The outcome of the training process reveals that the best performing versions of the models are a linear support vector machine and a random forest with six randomly selected features at each split. The final results of the comparison on the test set of these optimally tuned algorithms show that the random forest outperforms the linear kernel support vector machine. The former classifies all observations in the test set correctly whilst the latter classifies all but one correctly. Hence, a parsimonious random forest model using the top five features is constructed, which, to conclude, performs equally well on the test set compared to the original random forest model using all features.
78

Predicting reliability in multidisciplinary engineering systems under uncertainty

Hwang, Sungkun 27 May 2016 (has links)
The proposed study develops a framework that can accurately capture and model input and output variables for multidisciplinary systems to mitigate the computational cost when uncertainties are involved. The dimension of the random input variables is reduced depending on the degree of correlation calculated by relative entropy. Feature extraction methods; namely Principal Component Analysis (PCA), the Auto-Encoder (AE) algorithm are developed when the input variables are highly correlated. The Independent Features Test (IndFeaT) is implemented as the feature selection method if the correlation is low to select a critical subset of model features. Moreover, Artificial Neural Network (ANN) including Probabilistic Neural Network (PNN) is integrated into the framework to correctly capture the complex response behavior of the multidisciplinary system with low computational cost. The efficacy of the proposed method is demonstrated with electro-mechanical engineering examples including a solder joint and stretchable patch antenna examples.
79

Rule-Based Approaches for Large Biological Datasets Analysis : A Suite of Tools and Methods

Kruczyk, Marcin January 2013 (has links)
This thesis is about new and improved computational methods to analyze complex biological data produced by advanced biotechnologies. Such data is not only very large but it also is characterized by very high numbers of features. Addressing these needs, we developed a set of methods and tools that are suitable to analyze large sets of data, including next generation sequencing data, and built transparent models that may be interpreted by researchers not necessarily expert in computing. We focused on brain related diseases. The first aim of the thesis was to employ the meta-server approach to finding peaks in ChIP-seq data. Taking existing peak finders we created an algorithm that produces consensus results better than any single peak finder. The second aim was to use supervised machine learning to identify features that are significant in predictive diagnosis of Alzheimer disease in patients with mild cognitive impairment. This experience led to a development of a better feature selection method for rough sets, a machine learning method.  The third aim was to deepen the understanding of the role that STAT3 transcription factor plays in gliomas. Interestingly, we found that STAT3 in addition to being an activator is also a repressor in certain glioma rat and human models. This was achieved by analyzing STAT3 binding sites in combination with epigenetic marks. STAT3 regulation was determined using expression data of untreated cells and cells after JAK2/STAT3 inhibition. The four papers constituting the thesis are preceded by an exposition of the biological, biotechnological and computational background that provides foundations for the papers. The overall results of this thesis are witness of the mutually beneficial relationship played by Bioinformatics in modern Life Sciences and Computer Science.
80

A Proposed Frequency-Based Feature Selection Method for Cancer Classification

Pan, Yi 01 April 2017 (has links)
Feature selection method is becoming an essential procedure in data preprocessing step. The feature selection problem can affect the efficiency and accuracy of classification models. Therefore, it also relates to whether a classification model can have a reliable performance. In this study, we compared an original feature selection method and a proposed frequency-based feature selection method with four classification models and three filter-based ranking techniques using a cancer dataset. The proposed method was implemented in WEKA which is an open source software. The performance is evaluated by two evaluation methods: Recall and Receiver Operating Characteristic (ROC). Finally, we found the frequency-based feature selection method performed better than the original ranking method.

Page generated in 0.0865 seconds