• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 187
  • 56
  • 24
  • 9
  • 9
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 380
  • 229
  • 86
  • 72
  • 68
  • 65
  • 47
  • 46
  • 46
  • 40
  • 38
  • 37
  • 35
  • 34
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Machine learning with the cancer genome atlas head and neck squamous cell carcinoma dataset: improving usability by addressing inconsistency, sparsity, and high-dimensionality

Rendleman, Michael 01 May 2019 (has links)
In recent years, more data is becoming available for historical oncology case analysis. A large dataset that describes over 500 patient cases of Head and Neck Squamous Cell Carcinoma is a potential goldmine for finding ways to improve oncological decision support. Unfortunately, the best approaches for finding useful inferences are unknown. With so much information, from DNA and RNA sequencing to clinical records, we must use computational learning to find associations and biomarkers. The available data has sparsity, inconsistencies, and is very large for some datatypes. We processed clinical records with an expert oncologist and used complex modeling methods to substitute (impute) data for cases missing treatment information. We used machine learning algorithms to see if imputed data is useful for predicting patient survival. We saw no difference in ability to predict patient survival with the imputed data, though imputed treatment variables were more important to survival models. To deal with the large number of features in RNA expression data, we used two approaches: using all the data with High Performance Computers, and transforming the data into a smaller set of features (sparse principal components, or SPCs). We compared the performance of survival models with both datasets and saw no differences. However, the SPC models trained more quickly while also allowing us to pinpoint the biological processes each SPC is involved in to inform future biomarker discovery. We also examined ten processed molecular features for survival prediction ability and found some predictive power, though not enough to be clinically useful.
72

MULTIFACTOR DIMENSIONALITY REDUCTION WITH P RISK SCORES PER PERSON

Li, Ye 01 January 2018 (has links)
After reviewing Multifactor Dimensionality Reduction(MDR) and its extensions, an approach to obtain P(larger than 1) risk scores is proposed to predict the continuous outcome for each subject. We study the mean square error(MSE) of dimensionality reduced models fitted with sets of 2 risk scores and investigate the MSE for several special cases of the covariance matrix. A methodology is proposed to select a best set of P risk scores when P is specified a priori. Simulation studies based on true models of different dimensions(larger than 3) demonstrate that the selected set of P(larger than 1) risk scores outperforms the single aggregated risk score generated in AQMDR and illustrate that our methodology can determine a best set of P risk scores effectively. With different assumptions on the dimension of the true model, we considered the preferable set of risk scores from the best set of two risk scores and the best set of three risk scores. Further, we present a methodology to access a set of P risk scores when P is not given a priori. The expressions of asymptotic estimated mean square error of prediction(MSPE) are derived for a 1-dimensional model and 2-dimensional model. In the last main chapter, we apply the methodology of selecting a best set of risk scores where P has been specified a priori to Alzheimer’s Disease data and achieve a set of 2 risk scores and a set of three risk scores for each subject to predict measurements on biomarkers that are crucially involved in Alzheimer’s Disease.
73

Bringing interpretability and visualization with artificial neural networks

Gritsenko, Andrey 01 August 2017 (has links)
Extreme Learning Machine (ELM) is a training algorithm for Single-Layer Feed-forward Neural Network (SLFN). The difference in theory of ELM from other training algorithms is in the existence of explicitly-given solution due to the immutability of initialed weights. In practice, ELMs achieve performance similar to that of other state-of-the-art training techniques, while taking much less time to train a model. Experiments show that the speedup of training ELM is up to the 5 orders of magnitude comparing to standard Error Back-propagation algorithm. ELM is a recently discovered technique that has proved its efficiency in classic regression and classification tasks, including multi-class cases. In this thesis, extensions of ELMs for non-typical for Artificial Neural Networks (ANNs) problems are presented. The first extension, described in the third chapter, allows to use ELMs to get probabilistic outputs for multi-class classification problems. The standard way of solving this type of problems is based 'majority vote' of classifier's raw outputs. This approach can rise issues if the penalty for misclassification is different for different classes. In this case, having probability outputs would be more useful. In the scope of this extension, two methods are proposed. Additionally, an alternative way of interpreting probabilistic outputs is proposed. ELM method prove useful for non-linear dimensionality reduction and visualization, based on repetitive re-training and re-evaluation of model. The forth chapter introduces adaptations of ELM-based visualization for classification and regression tasks. A set of experiments has been conducted to prove that these adaptations provide better visualization results that can then be used for perform classification or regression on previously unseen samples. Shape registration of 3D models with non-isometric distortion is an open problem in 3D Computer Graphics and Computational Geometry. The fifth chapter discusses a novel approach for solving this problem by introducing a similarity metric for spectral descriptors. Practically, this approach has been implemented in two methods. The first one utilizes Siamese Neural Network to embed original spectral descriptors into a lower dimensional metric space, for which the Euclidean distance provides a good measure of similarity. The second method uses Extreme Learning Machines to learn similarity metric directly for original spectral descriptors. Over a set of experiments, the consistency of the proposed approach for solving deformable registration problem has been proven.
74

Multidimensional item response theory observed score equating methods for mixed-format tests

Peterson, Jaime Leigh 01 July 2014 (has links)
The purpose of this study was to build upon the existing MIRT equating literature by introducing a full multidimensional item response theory (MIRT) observed score equating method for mixed-format exams because no such methods currently exist. At this time, the MIRT equating literature is limited to full MIRT observed score equating methods for multiple-choice only exams and Bifactor observed score equating methods for mixed-format exams. Given the high frequency with which mixed-format exams are used and the accumulating evidence that some tests are not purely unidimensional, it was important to present a full MIRT equating method for mixed-format tests. The performance of the full MIRT observed score method was compared with the traditional equipercentile method, and unidimensional IRT (UIRT) observed score method, and Bifactor observed score method. With the Bifactor methods, group-specific factors were defined according to item format or content subdomain. With the full MIRT methods, two- and four-dimensional models were included and correlations between latent abilities were freely estimated or set to zero. All equating procedures were carried out using three end-of-course exams: Chemistry, Spanish Language, and English Language and Composition. For all subjects, two separate datasets were created using pseudo-groups in order to have two separate equating criteria. The specific equating criteria that served as baselines for comparisons with all other methods were the theoretical Identity and the traditional equipercentile procedures. Several important conclusions were made. In general, the multidimensional methods were found to perform better for datasets that evidenced more multidimensionality, whereas unidimensional methods worked better for unidimensional datasets. In addition, the scale on which scores are reported influenced the comparative conclusions made among the studied methods. For performance classifications, which are most important to examinees, there typically were not large discrepancies among the UIRT, Bifactor, and full MIRT methods. However, this study was limited by its sole reliance on real data which was not very multidimensional and for which the true equating relationship was not known. Therefore, plans for improvements, including the addition of a simulation study to introduce a variety of dimensional data structures, are also discussed.
75

Dimensionality Reduction Using Factor Analysis

Khosla, Nitin, n/a January 2006 (has links)
In many pattern recognition applications, a large number of features are extracted in order to ensure an accurate classification of unknown classes. One way to solve the problems of high dimensions is to first reduce the dimensionality of the data to a manageable size, keeping as much of the original information as possible and then feed the reduced-dimensional data into a pattern recognition system. In this situation, dimensionality reduction process becomes the pre-processing stage of the pattern recognition system. In addition to this, probablility density estimation, with fewer variables is a simpler approach for dimensionality reduction. Dimensionality reduction is useful in speech recognition, data compression, visualization and exploratory data analysis. Some of the techniques which can be used for dimensionality reduction are; Factor Analysis (FA), Principal Component Analysis(PCA), and Linear Discriminant Analysis(LDA). Factor Analysis can be considered as an extension of Principal Component Analysis. The EM (expectation maximization) algorithm is ideally suited to problems of this sort, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation, conditioned upon the obervations. The maximization step then provides a new estimate of the parameters. This research work compares the techniques; Factor Analysis (Expectation-Maximization algorithm based), Principal Component Analysis and Linear Discriminant Analysis for dimensionality reduction and investigates Local Factor Analysis (EM algorithm based) and Local Principal Component Analysis using Vector Quantization.
76

A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component Analysis

Moon, Sangwoo 01 August 2010 (has links)
Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability. This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation. Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets.
77

Terminating species and Lewis acid-base preference in oxohalides – a new route to low-dimensional compounds

Becker, Richard January 2007 (has links)
This thesis is based upon synthesis and structure determination of new transition metal oxo-halide compounds, which includes p-element cations that have a stereochemically active lone pair. A synthesis concept has been developed, which uses several different structural features to increase the possibility to yield a low-dimensional arrangement of transition metal cations. A total of 17 new compounds has been synthesised and their structures have been determined via single-crystal X-ray diffraction. The halides and the stereochemically active lone-pairs will typically act as terminating species segregating into regions of non-bonding volumes, which may take the form of 2D layers, 1D channels or Euclidean spheres. The transition metals that have been used for this work are copper, cobalt and iron. The Hard-Soft-Acid-Base principle has been utilized to match strong Lewis acids to strong Lewis bases and weak acids to weak bases. All compounds show tendencies towards low-dimensionality; they all have sheets of transition metal cations arranged into layers, where the layers most often are connected via weak dispersion forces.
78

Changeable and Privacy Preserving Face Recognition

Wang, Yongjin 23 February 2011 (has links)
Traditional methods of identity recognition are based on the knowledge of a password or a PIN, or possession factors such as tokens and ID cards. Such strategies usually afford low level of security, and can not meet the requirements of applications with high security demands. Biometrics refer to the technology of recognizing or validating the identity of an individual based on his/her physiological and/or behavioral characteristics. It is superior to conventional methods in both security and convenience since biometric traits can not be lost, forgotten, or stolen as easily, and it is relatively difficult to circumvent. However, although biometrics based solutions provide various advantages, there exist some inherent concerns of the technology. In the first place, biometrics can not be easily changed or reissued if compromised due to the limited number of biometric traits that humans possess. Secondly, since biometric data reflect the user's physiological or behavioral characteristics, privacy issues arise if the stored biometric templates are obtained by an adversary. To that end, changeability and privacy protection of biometric templates are two important issues that need to be addressed for widespread deployment of biometric technology. This dissertation systematically investigates random transformation based methods for addressing the challenging problems of changeability and privacy protection in biometrics enabled recognition systems. A random projection based approach is first introduced. We present a detailed mathematical analysis on the similarity and privacy preserving properties of random projection, and introduce a vector translation technique to achieve strong changeability. To further enhance privacy protection as well as to improve the recognition accuracy, a sorted index number (SIN) approach is proposed such that only the index numbers of the sorted feature vectors are stored as templates. The SIN framework is then evaluated in conjunction with random additive transform, random multiplicative transform, and random projection, for producing reissuable and privacy preserving biometric templates. The feasibility of the introduced solutions is well supported by detailed theoretical analyses. Extensive experimentation on a face based biometric recognition problem demonstrates the effectiveness of the proposed methods.
79

Changeable and Privacy Preserving Face Recognition

Wang, Yongjin 23 February 2011 (has links)
Traditional methods of identity recognition are based on the knowledge of a password or a PIN, or possession factors such as tokens and ID cards. Such strategies usually afford low level of security, and can not meet the requirements of applications with high security demands. Biometrics refer to the technology of recognizing or validating the identity of an individual based on his/her physiological and/or behavioral characteristics. It is superior to conventional methods in both security and convenience since biometric traits can not be lost, forgotten, or stolen as easily, and it is relatively difficult to circumvent. However, although biometrics based solutions provide various advantages, there exist some inherent concerns of the technology. In the first place, biometrics can not be easily changed or reissued if compromised due to the limited number of biometric traits that humans possess. Secondly, since biometric data reflect the user's physiological or behavioral characteristics, privacy issues arise if the stored biometric templates are obtained by an adversary. To that end, changeability and privacy protection of biometric templates are two important issues that need to be addressed for widespread deployment of biometric technology. This dissertation systematically investigates random transformation based methods for addressing the challenging problems of changeability and privacy protection in biometrics enabled recognition systems. A random projection based approach is first introduced. We present a detailed mathematical analysis on the similarity and privacy preserving properties of random projection, and introduce a vector translation technique to achieve strong changeability. To further enhance privacy protection as well as to improve the recognition accuracy, a sorted index number (SIN) approach is proposed such that only the index numbers of the sorted feature vectors are stored as templates. The SIN framework is then evaluated in conjunction with random additive transform, random multiplicative transform, and random projection, for producing reissuable and privacy preserving biometric templates. The feasibility of the introduced solutions is well supported by detailed theoretical analyses. Extensive experimentation on a face based biometric recognition problem demonstrates the effectiveness of the proposed methods.
80

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach

Hui, Shirley January 2005 (has links)
A topic of research that is frequently studied in Structural Biology is the problem of determining the degree of similarity between two protein structures. The most common solution is to perform a three dimensional structural alignment on the two structures. Rigid structural alignment algorithms have been developed in the past to accomplish this but treat the protein molecules as immutable structures. Since protein structures can bend and flex, rigid algorithms do not yield accurate results and as a result, flexible structural alignment algorithms have been developed. The problem with these algorithms is that the protein structures are represented using thousands of atomic coordinate variables. This results in a great computational burden due to the large number of degrees of freedom required to account for the flexibility. Past research in dimensionality reduction techniques has shown that a linear dimensionality reduction technique called Principal Component Analysis (PCA) is well suited for high dimensionality reduction. This thesis introduces a new flexible structural alignment algorithm called FlexSADRA, which uses PCA to perform flexible structural alignments. Test results show that FlexSADRA determines better alignments than rigid structural alignment algorithms. Unlike existing rigid and flexible algorithms, FlexSADRA addresses the problem in a significantly lower dimensionality problem space and assesses not only the structural fit but the structural feasibility of the final alignment.

Page generated in 0.0991 seconds