Spelling suggestions: "subject:"dimensionality reduction"" "subject:"dimensionnality reduction""
51 |
A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component AnalysisMoon, Sangwoo 01 August 2010 (has links)
Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability.
This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation.
Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets.
|
52 |
Changeable and Privacy Preserving Face RecognitionWang, Yongjin 23 February 2011 (has links)
Traditional methods of identity recognition are based on the knowledge of a password or a PIN, or possession factors such as tokens and ID cards. Such strategies usually afford low level of security, and can not meet the requirements of applications with high security demands. Biometrics refer to the technology of recognizing or validating the identity of an individual based on his/her physiological and/or behavioral characteristics. It is superior to conventional methods in both security and convenience since biometric traits can not be lost, forgotten, or stolen as easily, and it is relatively difficult to circumvent. However, although biometrics based solutions provide various advantages, there exist some inherent concerns of the technology. In the first place, biometrics can not be easily changed or reissued if compromised due to the limited number of biometric traits that humans possess. Secondly, since biometric data reflect the user's physiological or behavioral characteristics, privacy issues arise if the stored biometric templates are obtained by an adversary. To that end, changeability and privacy protection of biometric templates are two important issues that need to be addressed for widespread deployment of biometric technology.
This dissertation systematically investigates random transformation based methods for addressing the challenging problems of changeability and privacy protection in biometrics enabled recognition systems. A random projection based approach is first introduced. We present a detailed mathematical analysis on the similarity and privacy preserving properties of random projection, and introduce a vector translation technique to achieve strong changeability. To further enhance privacy protection as well as to improve the recognition accuracy, a sorted index number (SIN) approach is proposed such that only the index numbers of the sorted feature vectors are stored as templates. The SIN framework is then evaluated in conjunction with random additive transform, random multiplicative transform, and random projection, for producing reissuable and privacy preserving biometric templates. The feasibility of the introduced solutions is well supported by detailed theoretical analyses. Extensive experimentation on a face based biometric recognition problem demonstrates the effectiveness of the proposed methods.
|
53 |
Changeable and Privacy Preserving Face RecognitionWang, Yongjin 23 February 2011 (has links)
Traditional methods of identity recognition are based on the knowledge of a password or a PIN, or possession factors such as tokens and ID cards. Such strategies usually afford low level of security, and can not meet the requirements of applications with high security demands. Biometrics refer to the technology of recognizing or validating the identity of an individual based on his/her physiological and/or behavioral characteristics. It is superior to conventional methods in both security and convenience since biometric traits can not be lost, forgotten, or stolen as easily, and it is relatively difficult to circumvent. However, although biometrics based solutions provide various advantages, there exist some inherent concerns of the technology. In the first place, biometrics can not be easily changed or reissued if compromised due to the limited number of biometric traits that humans possess. Secondly, since biometric data reflect the user's physiological or behavioral characteristics, privacy issues arise if the stored biometric templates are obtained by an adversary. To that end, changeability and privacy protection of biometric templates are two important issues that need to be addressed for widespread deployment of biometric technology.
This dissertation systematically investigates random transformation based methods for addressing the challenging problems of changeability and privacy protection in biometrics enabled recognition systems. A random projection based approach is first introduced. We present a detailed mathematical analysis on the similarity and privacy preserving properties of random projection, and introduce a vector translation technique to achieve strong changeability. To further enhance privacy protection as well as to improve the recognition accuracy, a sorted index number (SIN) approach is proposed such that only the index numbers of the sorted feature vectors are stored as templates. The SIN framework is then evaluated in conjunction with random additive transform, random multiplicative transform, and random projection, for producing reissuable and privacy preserving biometric templates. The feasibility of the introduced solutions is well supported by detailed theoretical analyses. Extensive experimentation on a face based biometric recognition problem demonstrates the effectiveness of the proposed methods.
|
54 |
FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction ApproachHui, Shirley January 2005 (has links)
A topic of research that is frequently studied in Structural Biology is the problem of determining the degree of similarity between two protein structures. The most common solution is to perform a three dimensional structural alignment on the two structures. Rigid structural alignment algorithms have been developed in the past to accomplish this but treat the protein molecules as immutable structures. Since protein structures can bend and flex, rigid algorithms do not yield accurate results and as a result, flexible structural alignment algorithms have been developed. The problem with these algorithms is that the protein structures are represented using thousands of atomic coordinate variables. This results in a great computational burden due to the large number of degrees of freedom required to account for the flexibility. Past research in dimensionality reduction techniques has shown that a linear dimensionality reduction technique called Principal Component Analysis (PCA) is well suited for high dimensionality reduction. This thesis introduces a new flexible structural alignment algorithm called FlexSADRA, which uses PCA to perform flexible structural alignments. Test results show that FlexSADRA determines better alignments than rigid structural alignment algorithms. Unlike existing rigid and flexible algorithms, FlexSADRA addresses the problem in a significantly lower dimensionality problem space and assesses not only the structural fit but the structural feasibility of the final alignment.
|
55 |
FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction ApproachHui, Shirley January 2005 (has links)
A topic of research that is frequently studied in Structural Biology is the problem of determining the degree of similarity between two protein structures. The most common solution is to perform a three dimensional structural alignment on the two structures. Rigid structural alignment algorithms have been developed in the past to accomplish this but treat the protein molecules as immutable structures. Since protein structures can bend and flex, rigid algorithms do not yield accurate results and as a result, flexible structural alignment algorithms have been developed. The problem with these algorithms is that the protein structures are represented using thousands of atomic coordinate variables. This results in a great computational burden due to the large number of degrees of freedom required to account for the flexibility. Past research in dimensionality reduction techniques has shown that a linear dimensionality reduction technique called Principal Component Analysis (PCA) is well suited for high dimensionality reduction. This thesis introduces a new flexible structural alignment algorithm called FlexSADRA, which uses PCA to perform flexible structural alignments. Test results show that FlexSADRA determines better alignments than rigid structural alignment algorithms. Unlike existing rigid and flexible algorithms, FlexSADRA addresses the problem in a significantly lower dimensionality problem space and assesses not only the structural fit but the structural feasibility of the final alignment.
|
56 |
The gene-gene interactions on IgE production from prenatal stage to 6 years of ageChang, Jen-Chieh 22 August 2012 (has links)
Prevalence of childhood asthma in Taiwan has increased 9 times from 1.3% to 10-14% in the past 4 decades. Many studies worldwide have demonstrated that many genes in different chromosomes are implicated in childhood asthma in different ethnic populations. A growing body of evidence suggests that allergic sensitization could occur in perinatal stage and correlate to the development of childhood asthma. Epidemiological studies, however, indicate that prevalence of childhood asthma is much higher in developed countries than that in developing countries; and prevalence of childhood asthma in metropolitan area is higher than that in country sites. This suggests that certain genes can interact with the environmental factors in developed countries to promote the development of childhood atopic disorders. Interests are now increasing on what is (are) the real pathogenic gene-gene interaction(s) for childhood atopic disorders under influence of age, gender and environmental factors? In a large perinatal cohort study with 1,211 pregnant women and their offspring from the obstetrics and pediatrics of Kaohsiung Chang Gung Memorial Hospital, we analyzed 159 allergy candidate genes with 384 single nucleotide polymorphisms and showed that 14 genes over 22 somatic and X chromosomes risk to or protective from cord blood immunoglobulin E (CBIgE) elevation are different from those genes associated with IgE elevation in children under 1.5, 3 and 6 years of age (12, 15 and 12 genes, respectively). CX3CL1, IL13, PDGFRA and FGF1 polymorphisms were associated with elevated IgE at earlier ages (newborn, 1.5 and 3 years); HLA-DPA1, HLA-DQA1, CCR5 and IL5RA polymorphisms were associated with IgE production at 6 years of age. Further analysis by multifactor dimensionality reduction (MDR) developed from data reduction strategy, we found that there are interactions among innate immunity, adaptive immunity, and response and remodeling genes on IgE production begin in prenatal stage. For example, The gene-gene interaction among IL13, rs1800925, CYFIP2, rs767007 and PDE2A, rs755933 was significantly associated with IgE production at 3 years of age. This suggests that different genotypes of genes interact one another on the IgE production contributing to the development of allergic diseases. Since the concentration of IgE is an important indicator of atopic disorders and allergic sensitization, we believe after clarifying the natural course of the genomic profiles on IgE elevation, certain early predictor(s) and preventive regimens for allergic sensitization or atopic disorders may be made possible.
|
57 |
Tag-based Music Recommendation Systems Using Semantic Relations And Multi-domain InformationTatli, Ipek 01 September 2011 (has links) (PDF)
With the evolution of Web 2.0, most social-networking sites let their members participate in content generation. Users can label items with tags in these websites. A tag can be anything but it is actually a short description of the item. Because tags represent the reason why a user
likes an item, but not how much user likes it / they are better identifiers of user profiles than ratings, which are usually numerical values assigned to items by users. Thus, the tag-based contextual representations of music tracks are concentrated in this study.
Items are generally represented by vector space models in the content based recommendation systems. In tag-based recommendation systems, users and items are defined in terms of weighted vectors of social tags. When there is a large amount of tags, calculation of the items to be recommended becomes hard, because working with huge vectors is a time-consuming job. The main objective of this thesis is to represent individual tracks (songs) in lower dimensional spaces. An approach is described for creating music recommendations based on user-supplied tags that are augmented with a hierarchical structure extracted for top level genres from Dbpedia. In this structure, each genre is represented by its stylistic origins, typical instruments, derivative forms, sub genres and fusion genres. In addition to very large vector space models, insufficient number of user tags is another problem in the recommendation field. The proposed method is evaluated with different user profiling methods in case of any insufficiency in the number of user tags. User profiles are extended with multi-domain information. By using multi-domain information, the goal of making more successful and realistic predictions is achieved.
|
58 |
On the sampling design of high-dimensional signal in distributed detection through dimensionality reductionTai, Chih-hao 13 August 2008 (has links)
This work considers the sampling design for detection problems.Firstly,we focus on studying the effect of signal shape on sampling design for Gaussian detection problem.We then investigate the sampling design for distributed detection problems and compare the performance with the single sensor context. We also propose a sampling design scheme for the cluster-based wireless sensor networks.The cluster head employs a linear combination fusion to reduce the dimension of the sampled observation.Mathematical verification and simulation result show that the performance loss caused by the dimensionality reduction is exceedingly small as compared with the benchmark scheme,which is the sampling scheme without dimensionality reduction.In particular,there is no performance loss when the identical sampling points are employed at all sensor nodes.
|
59 |
Exploitation of complex network topology for link prediction in biological interactomesAlanis Lobato, Gregorio 06 1900 (has links)
The network representation of the interactions between proteins and genes allows for a holistic perspective of the complex machinery underlying the living cell. However, the large number of interacting entities within the cell makes network construction a daunting and arduous task, prone to errors and missing information.
Fortunately, the structure of biological networks is not different from that of other complex systems, such as social networks, the world-wide web or power grids, for which growth models have been proposed to better understand their structure and function. This means that we can design tools based on these models in order to exploit the topology of biological interactomes with the aim to construct more complete and reliable maps of the cell.
In this work, we propose three novel and powerful approaches for the prediction of interactions in biological networks and conclude that it is possible to mine the topology of these complex system representations and produce reliable and biologically meaningful information that enriches the datasets to which we have access today.
|
60 |
Multilinear Subspace Learning for Face and Gait RecognitionLu, Haiping 19 January 2009 (has links)
Face and gait recognition problems are challenging due to largely varying appearances, highly complex pattern distributions, and insufficient training samples. This dissertation focuses on multilinear subspace learning for face and gait recognition, where low-dimensional representations are learned directly from tensorial face or gait objects.
This research introduces a unifying multilinear subspace learning framework for systematic treatment of the multilinear subspace learning problem. Three multilinear projections are categorized according to the input-output space mapping as: vector-to-vector projection, tensor-to-tensor projection, and tensor-to-vector projection. Techniques for subspace learning from tensorial data are then proposed and analyzed. Multilinear principal component analysis (MPCA) seeks a tensor-to-tensor projection that maximizes the variation captured in the projected space, and it is further combined with linear discriminant analysis and boosting for better recognition performance. Uncorrelated MPCA (UMPCA) solves for a tensor-to-vector projection that maximizes the captured variation in the projected space while enforcing the zero-correlation constraint. Uncorrelated multilinear discriminant analysis (UMLDA) aims to produce uncorrelated features through a tensor-to-vector projection that maximizes a ratio of the between-class scatter over the within-class scatter defined in the projected space. Regularization and aggregation are incorporated in the UMLDA solution for enhanced performance.
Experimental studies and comparative evaluations are presented and analyzed on the PIE and FERET face databases, and the USF gait database. The results indicate that the MPCA-based solution has achieved the best overall performance in various learning scenarios, the UMLDA-based solution has produced the most stable and competitive results with the same parameter setting, and the UMPCA algorithm is effective in unsupervised learning in low-dimensional subspace. Besides advancing the state-of-the-art of multilinear subspace learning for face and gait recognition, this dissertation also has potential impact in both the development of new multilinear subspace learning algorithms and other applications involving tensor objects.
|
Page generated in 0.1563 seconds