Global ETD Search

181	A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component Analysis Moon, Sangwoo 01 August 2010 (has links) Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability. This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation. Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets. Support vector machine independent component analysis Hybrid dimensionality reduction Constrained optimization
182	Screening Web Breaks in a Pressroom by Soft Computing Ahmad, Alzghoul January 2008 (has links) <p>Web breaks are considered as one of the most significant runnability problems</p><p>in a pressroom. This work concerns the analysis of relation between various</p><p>parameters (variables) characterizing the paper, printing press, the printing</p><p>process and the web break occurrence. A large number of variables, 61 in</p><p>total, obtained off-line as well as measured online during the printing process</p><p>are used in the investigation. Each paper reel is characterized by a vector x</p><p>of 61 components.</p><p>Two main approaches are explored. The first one treats the problem as a</p><p>data classification task into "break" and "non break" classes. The procedures</p><p>of classifier training, the selection of relevant input variables and the selection</p><p>of hyper-parameters of the classifier are aggregated into one process based on</p><p>genetic search. The second approach combines procedures of genetic search</p><p>based variable selection and data mapping into a low dimensional space. The</p><p>genetic search process results into a variable set providing the best mapping</p><p>according to some quality function.</p><p>The empirical study was performed using data collected at a pressroom</p><p>in Sweden. The total number of data points available for the experiments</p><p>was equal to 309. Amongst those, only 37 data points represent the web</p><p>break cases. The results of the investigations have shown that the linear</p><p>relations between the independent variables and the web break frequency</p><p>are not strong.</p><p>Three important groups of variables were identified, namely Lab data</p><p>(variables characterizing paper properties and measured off-line in a paper</p><p>mill lab), Ink registry (variables characterizing operator actions aimed to</p><p>adjust ink registry) and Web tension. We found that the most important</p><p>variables are: Ink registry Y LS MD (adjustments of yellow ink registry</p><p>in machine direction on the lower paper side), Air permeability (character-</p><p>izes paper porosity), Paper grammage, Elongation MD, and four variables</p><p>characterizing web tension: Moment mean, Min sliding Mean, Web tension</p><p>variance, and Web tension mean.</p><p>The proposed methods were helpful in finding the variables influencing </p><p>the occurrence of web breaks and can also be used for solving other industrial</p><p>problems.</p> Printing press Support vector machine Web break Genetic search Data mining
183	Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins Durek, Pawel, Schudoma, Christian, Weckwerth, Wolfram, Selbig, Joachim, Walther, Dirk January 2009 (has links) Background: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites. Results: We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information. Conclusion: While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structurebased P-site prediction method has been made available at http://phos3d.mpimp-golm.mpg.de. Support vector machines Microarray data Docking interactions Signal-transduction Sequence alignment Life sciences
184	Applying Discriminant Functions with One-Class SVMs for Multi-Class Classification Lee, Zhi-Ying 09 August 2007 (has links) AdaBoost.M1 has been successfully applied to improve the accuracy of a learning algorithm for multi-class classification problems. However, it assumes that the performance of each base classifier must be better than 1/2, and this may be hard to achieve in practice for a multi-class problem. A new algorithm called AdaBoost.MK only requiring base classifiers better than a random guessing (1/k) is thus designed. Early SVM-based multi-class classification algorithms work by splitting the original problem into a set of two-class sub-problems. The time and space required by these algorithms are very demanding. In order to have low time and space complexities, we develop a base classifier that integrates one-class SVMs with discriminant functions. In this study, a hybrid method that integrates AdaBoost.MK and one-class SVMs with improved discriminant functions as the base classifiers is proposed to solve a multi-class classification problem. Experimental results on data sets from UCI and Statlog show that the proposed approach outperforms many popular multi-class algorithms including support vector clustering and AdaBoost.M1 with one-class SVMs as the base classifiers. AdaBoost.M1 multi-class classification One-class SVM Discriminant function Support vector clustering
185	Predicting mutation score using source code and test suite metrics Jalbert, Kevin 01 September 2012 (has links) Mutation testing has traditionally been used to evaluate the effectiveness of test suites and provide con dence in the testing process. Mutation testing involves the creation of many versions of a program each with a single syntactic fault. A test suite is evaluated against these program versions (i.e., mutants) in order to determine the percentage of mutants a test suite is able to identify (i.e., mutation score). A major drawback of mutation testing is that even a small program may yield thousands of mutants and can potentially make the process cost prohibitive. To improve the performance and reduce the cost of mutation testing, we proposed a machine learning approach to predict mutation score based on a combination of source code and test suite metrics. We conducted an empirical evaluation of our approach to evaluated its effectiveness using eight open source software systems. / UOIT Machine learning Mutation testing Software metrics Support vector machine Test suite effectiveness
186	Machine Learning and Graph Theory Approaches for Classification and Prediction of Protein Structure Altun, Gulsah 22 April 2008 (has links) Recently, many methods have been proposed for the classification and prediction problems in bioinformatics. One of these problems is the protein structure prediction. Machine learning approaches and new algorithms have been proposed to solve this problem. Among the machine learning approaches, Support Vector Machines (SVM) have attracted a lot of attention due to their high prediction accuracy. Since protein data consists of sequence and structural information, another most widely used approach for modeling this structured data is to use graphs. In computer science, graph theory has been widely studied; however it has only been recently applied to bioinformatics. In this work, we introduced new algorithms based on statistical methods, graph theory concepts and machine learning for the protein structure prediction problem. A new statistical method based on z-scores has been introduced for seed selection in proteins. A new method based on finding common cliques in protein data for feature selection is also introduced, which reduces noise in the data. We also introduced new binary classifiers for the prediction of structural transitions in proteins. These new binary classifiers achieve much higher accuracy results than the current traditional binary classifiers. protein structure prediction feature selection support vector machines graph theory machine learning algorithm Computer Sciences
187	Protein Secondary Structure Prediction Using Support Vector Machines, Nueral Networks and Genetic Algorithms Reyaz-Ahmed, Anjum B 03 May 2007 (has links) Bioinformatics techniques to protein secondary structure prediction mostly depend on the information available in amino acid sequence. Support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction. In this study, a new sliding window scheme is introduced with multiple windows to form the protein data for training and testing SVM. Orthogonal encoding scheme coupled with BLOSUM62 matrix is used to make the prediction. First the prediction of binary classifiers using multiple windows is compared with single window scheme, the results shows single window not to be good in all cases. Two new classifiers are introduced for effective tertiary classification. This new classifiers use neural networks and genetic algorithms to optimize the accuracy of the tertiary classifier. The accuracy level of the new architectures are determined and compared with other studies. The tertiary architecture is better than most available techniques. tertiary classifier Binary classifier BLOSUM62 encoding scheme orthogonal profile support vector machine (SVM) Computer Sciences
188	Prediction Of Protein Subcellular Localization Using Global Protein Sequence Feature Bozkurt, Burcin 01 August 2003 (has links) (PDF) The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years. Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic properties of proteins. Knowledge of the structure of a protein is essential for describing and understanding its function. In addition, subcellular localization of a protein can be used to provide some amount of characterization of a protein. In this study, a method for the prediction of protein subcellular localization based on primary sequence data is described. Primary sequence data for a protein is based on amino acid sequence. The frequency value for each amino acid is computed in one given position. Assigned frequencies are used in a new encoding scheme that conserves biological information based on point accepted mutations (PAM) substitution matrix. This method can be used to predict the nuclear, the cytosolic sequences, the mitochondrial targeting peptides (mTP) and the signal peptides (SP). For clustering purposes, other than well known traditional techniques, principle component analysis (PCA)&quot / and self-organizing maps (SOM)&quot / are used. For classication purposes, support vector machines (SVM)&quot / , a method of statistical learning theory recently introduced to bioinformatics is used. The aim of the combination of feature extraction, clustering and classification methods is to design an acccurate system that predicts the subcellular localization of proteins presented into the system. Our scheme for combining several methods is cascading or serial combination according to its architecture. In the cascading architecture, the output of a method serves as the input of the other model used.
189	Impacts of midpoint FACTS controllers on the coordiantion between generator phase backup protection and generator capability limits Elsamahy, Mohamed Salah Kamel 15 July 2011 The thesis reports the results of comprehensive studies carried out to explore the impact of midpoint FACTS Controllers (STATCOM and SVC) on the generator distance phase backup protection in order to identify important issues that protection engineers need to consider when designing and setting a generator protection system. In addition, practical, feasible and simple solutions to mitigate the adverse impact of midpoint FACTS Controllers on the generator distance phase backup protection are explored. The results of these studies show that midpoint FACTS Controllers have an adverse effect on the generator distance phase backup protection. This adverse effect, which can be in the form of underreach, overreach or a time delay, varies according to the fault type, fault location and generator loading. Moreover, it has been found that the adverse effect of the midpoint FACTS Controllers extends to affect the coordination between the generator distance phase backup protection and the generator steady-state overexcited capability limit. The Support Vector Machines classification technique is proposed as a replacement for the existing generator distance phase backup protection relay in order to alleviate potential problems. It has been demonstrated that this technique is a very promising solution, as it is fast, reliable and has a high performance efficiency. This will result in enhancing the coordination between the generator phase backup protection and the generator steady-state overexcited capability limit in the presence of midpoint FACTS Controllers. The thesis also presents the results of investigations carried out to explore the impact of the generator distance phase backup protection relay on the generator overexcitation thermal capability. The results of these investigations reveal that with the relay settings according to the current standards, the generator is over-protected and the generator distance phase backup protection relay restricts the generator overexcitation thermal capability during system disturbances. This restriction does not allow the supply of the maximum reactive power of the generating unit during such events. The restriction on the generator overexcitation thermal capability caused by the generator distance phase backup protection relay highlights the necessity to revise the relay settings. The proposed solution in this thesis is to reduce the generator distance phase backup protection relay reach in order to provide secure performance during system disturbances. generator capability limits midpoint FACTS controllers generator overexcitation limiters support vector machines generator phase backup protection
190	Classification of Genotype and Age of Eyes Using RPE Cell Size and Shape Yu, Jie 18 December 2012 (has links) Retinal pigment epithelium (RPE) is a principal site of pathogenesis in age-related macular de-generation (AMD). AMD is a main source of vision loss even blindness in the elderly and there is no effective treatment right now. Our aim is to describe the relationship between the morphology of RPE cells and the age and genotype of the eyes. We use principal component analysis (PCA) or functional principal component method (FPCA), support vector machine (SVM), and random forest (RF) methods to analyze the morphological data of RPE cells in mouse eyes to classify their age and genotype. Our analyses show that amongst all morphometric measures of RPE cells, cell shape measurements (eccentricity and solidity) are good for classification. But combination of cell shape and size (perimeter) provide best classification. Principal component analysis Functional principal component analysis Support vector machine Random forest Retinal pigment epithelium

Search results