Global ETD Search

11	Accuracy Improvement for RNA Secondary Structure Prediction with SVM Chang, Chia-Hung 30 July 2008 (has links) Ribonucleic acid (RNA) sometimes occurs in a complex structure called pseudoknots. Prediction of RNA secondary structures has drawn much attention from both biologists and computer scientists. Consequently, many useful tools have been developed for RNA secondary structure prediction, with or without pseudoknots. These tools have their individual strength and weakness. As a result, we propose a hybrid feature extraction method which integrates two prediction tools pknotsRG and NUPACK with a support vector machine (SVM). We first extract some useful features from the target RNA sequence, and then decide its prediction tool preference with SVM classification. Our test data set contains 723 RNA sequences, where 202 pseudoknotted RNA sequences are obtained from PseudoBase, and 521 nested RNA sequences are obtained from RNA SSTRAND. Experimental results show that our method improves not only the overall accuracy but also the sensitivity and the selectivity of the target sequences. Our method serves as a preprocessing process in analyzing RNA sequences before employing the RNA secondary structure prediction tools. The ability to combine the existing methods and make the prediction tools more accurate is our main contribution. RNA secondary structure support vector machine machine learning classification
12	Characterizing The Distinguishability Of Microbial Genomes Perry, Scott 21 April 2010 (has links) The field of metagenomics has shown great promise in the ability to recover microbial DNA from communities whose members resist traditional cultivation techniques, although in most instances the recovered material comprises short anonymous genomic fragments rather than complete genome sequences. In order to effectively assess the microbial diversity and ecology represented in such samples, accurate methods for DNA classification capable of assigning metagenomic fragments into their most likely taxonomic unit are required. Existing DNA classification methods have shown high levels of accuracy in attempting to classify sequences derived from low-complexity communities, however genome distinguishability generally deteriorates for complex communities or those containing closely related organisms. The goal of this thesis was to identify factors both intrinsic or external to the genome that may lead to the improvement of existing DNA classification methods and to probe the fundamental limitations of composition-based genome distinguishability. To assess the suite of factors affecting the distinguishability of genomes, support vector machine classifiers were trained to discriminate between pairs of microbial genomes using the relative frequencies of oligonucleotide patterns calculated from orthologous genes or short genomic fragments, and the resulting classification accuracy scores used as the measure of genomic distinguishability. Models were generated in order to relate distinguishability to several measures of genomic and taxonomic similarity, and interesting outlier genome pairs were identified by large residuals to the fitted models. Examination of the outlier pairs identified numerous factors that influence genome distinguishability, including genome reduction, extreme G+C composition, lateral gene transfer, and habitat-induced genome convergence. Fragments containing multiple protein-coding and non-coding sequences showed an increased tendency for misclassification, except in cases where the genomes were very closely related. Analysis of the biological function annotations associated with each fragment demonstrated that certain functional role categories showed increased or decreased tendency for misclassification. The use of pre-processing steps including DNA recoding, unsupervised clustering, 'symmetrization' of oligonucleotide frequencies, and correction for G+C content did not improve distinguishability. Existing composition-based DNA classifiers will benefit from the results reported in this thesis. Sequence-segmentation approaches will improve genome distinguishability by decreasing fragment heterogeneity, while factors such as habitat, lifestyle, extreme G+C composition, genome reduction, and biological role annotations may be used to express confidence in the classification of individual fragments. Although genome distinguishability tends to be proportional to genomic and taxonomic relatedness, these trends can be violated for closely related genome pairs that have undergone rapid compositional divergence, or unrelated genome pairs that have converged in composition due to similar habitats or unusual selective pressures. Additionally, there are fundamental limits to the resolution of composition-based classifiers when applied to genomic fragments typical of current metagenomic studies. genome signature genome composition metagenomics support vector machine
13	Predicting homologous signaling pathways using machine learning Bostan, Babak Unknown Date No description available. signaling pathway machine learning support vector machine prediction
14	Predicting homologous signaling pathways using machine learning Bostan, Babak 11 1900 (has links) Understanding biochemical reactions inside cells of individual organisms is a key factor for improving our biological knowledge. Signaling pathways provide a road map for a wide range of these chemical reactions that convert one signal or stimulus into another. In general, each signaling pathway in a cell involves many different proteins, each with one or more specific roles that help to amplify a relatively small stimulus into an effective response. Since proteins are essential components of a cells activities, it is important to understand how they work and in particular, to determine which of species proteins participate in each role. Experimentally determining this mapping of proteins to roles is difficult and time consuming. Fortunately, many individual pathways have been annotated for some species, and the pathways of other species can often be inferred using protein homology and the protein properties. signaling pathway machine learning support vector machine prediction
15	Machine learning and brain imaging in psychosis Zarogianni, Eleni January 2016 (has links) Over the past years early detection and intervention in schizophrenia have become a major objective in psychiatry. Early intervention strategies are intended to identify and treat psychosis prior to fulfilling diagnostic criteria for the disorder. To this aim, reliable early diagnostic biomarkers are needed in order to identify a high-risk state for psychosis and also predict transition to frank psychosis in those high-risk individuals destined to develop the disorder. Recently, machine learning methods have been successfully applied in the diagnostic classification of schizophrenia and in predicting transition to psychosis at an individual level based on magnetic resonance imaging (MRI) data and also neurocognitive variables. This work investigates the application of machine learning methods for the early identification of schizophrenia in subjects at high risk for developing the disorder. The dataset used in this work involves data from the Edinburgh High Risk Study (EHRS), which examined individuals at a heightened risk for developing schizophrenia for familial reasons, and the FePsy (Fruherkennung von Psychosen) study that was conducted in Basel and involves subjects at a clinical high-risk state for psychosis. The overriding aim of this thesis was to use machine learning, and specifically Support Vector Machine (SVM), in order to identify predictors of transition to psychosis in high-risk individuals, using baseline structural MRI data. There are three aims pertaining to this main one. (i) Firstly, our aim was to examine the feasibility of distinguishing at baseline those individuals who later developed schizophrenia from those who did not, yet had psychotic symptoms using SVM and baseline data from the EHRS study. (ii) Secondly, we intended to examine if our classification approach could generalize to clinical high-risk cohorts, using neuroanatomical data from the FePsy study. (iii) In a more exploratory context, we have also examined the diagnostic performance of our classifier by pooling the two datasets together. With regards to the first aim, our findings suggest that the early prediction of schizophrenia is feasible using a MRI-based linear SVM classifier operating at the single-subject level. Additionally, we have shown that the combination of baseline neuroanatomical data with measures of neurocognitive functioning and schizotypal cognition can improve predictive performance. The application of our pattern classification approach to baseline structural MRI data from the FePsy study highly replicated our previous findings. Our classification method identified spatially distributed networks that discriminate at baseline between subjects that later developed schizophrenia and other related psychoses and those that did not. Finally, a preliminary classification analysis using pooled datasets from the EHRS and the FePsy study supports the existence of a neuroanatomical pattern that differentiates between groups of high-risk subjects that develop psychosis against those who do not across research sites and despite any between-sites differences. Taken together, our findings suggest that machine learning is capable of distinguishing between cohorts of high risk subjects that later convert to psychosis and those that do not based on patterns of structural abnormalities that are present before disease onset. Our findings have some clinical implications in that machine learning-based approaches could advise or complement clinical decision-making in early intervention strategies in schizophrenia and related psychoses. Future work will be, however, required to tackle issues of reproducibility of early diagnostic biomarkers across research sites, where different assessment criteria and imaging equipment and protocols are used. In addition, future projects may also examine the diagnostic and prognostic value of multimodal neuroimaging data, possibly combined with other clinical, neurocognitive, genetic information. 616.89
16	Concrete Strength Prediction Modeling based on Support Vector Machine (SVM) Dhakal, Santosh 01 December 2015 (has links) Strength of concrete is the major parameter in the design of structures and is represented by the 28-day compressive strength of concrete. Many earlier studies proved that the compressive strength of concrete is not only related to w/c ratio but also rely on proportion of other constituent materials. Application of recently developed new generation admixtures for the production of high performance concrete, has made the concrete strength prediction complex and highly nonlinear challenging the research engineers and data scientists. Development of early accurate prediction model for concrete strength provides the mix designer a tentative idea to proportionate the mix ingredients accordingly reducing the number of trial mixes ultimately saving a lot of cost and time associated with it. In this study, we have proposed SVM regression tool to create the model for the prediction of concrete strength. Support vector machine (SVM) is a supervised machine learning technique based on statistical learning theory developed by Vapnik in 1995. SVM employs a kernel function to transform the data into high dimensional feature space and linear modeling is performed in the feature space to overcome the complexity related to highly nonlinear datasets. A dataset containing 425 observations of high performance concrete mix design with nine attribute variables from University of California, Irvine Repository are considered for this study. 395 datasets were used to train the model and 30 samples were taken as a test set by random sub sampling to test the model. Five-fold cross-validation technique was used to select the parameters of SVM. The metaparameter values ε = 0.001, C = 29.47 and γ = 10 are selected for creating the model. The model performance measures correlation coefficient (R), root mean square error (RMSE) values and residual plots suggest that the proposed SVM model is competent enough to predict the strength of concrete. The performance measures of proposed SVM model was compared with RVM model. Concrete Strength Modeling Prediction measures Support Vector Machine
17	A New Machine Learning Based Approach to NASA's Propulsion Engine Diagnostic Benchmark Problem January 2015 (has links) abstract: Gas turbine engine for aircraft propulsion represents one of the most physics-complex and safety-critical systems in the world. Its failure diagnostic is challenging due to the complexity of the model system, difficulty involved in practical testing and the infeasibility of creating homogeneous diagnostic performance evaluation criteria for the diverse engine makes. NASA has designed and publicized a standard benchmark problem for propulsion engine gas path diagnostic that enables comparisons among different engine diagnostic approaches. Some traditional model-based approaches and novel purely data-driven approaches such as machine learning, have been applied to this problem. This study focuses on a different machine learning approach to the diagnostic problem. Some most common machine learning techniques, such as support vector machine, multi-layer perceptron, and self-organizing map are used to help gain insight into the different engine failure modes from the perspective of big data. They are organically integrated to achieve good performance based on a good understanding of the complex dataset. The study presents a new hierarchical machine learning structure to enhance classification accuracy in NASA's engine diagnostic benchmark problem. The designed hierarchical structure produces an average diagnostic accuracy of 73.6%, which outperforms comparable studies that were most recently published. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2015 Electrical engineering gas turbine engine machine learning support vector machine
18	Análisis de datos y búsqueda de patrones en aplicaciones médicas García Ubilla, Arnol David January 2015 (has links) Ingeniero Civil Matemático / El suicidio en Chile se ha convertido en uno de los problemas más necesarios de hacer frente en salud pública, más aún, si consideramos que la enorme mayoría de las personas que mueren por suicidio presentan algún diagnóstico psiquiátrico y han consultado a un especialista los meses antes de cometer suicidio. Esto, motiva la creación de indicadores y alertas para detectar de forma eficaz y oportuna cuando una persona ingresa a una zona de riesgo suicida. En el presente trabajo se aborda este problema, definiendo una zona o espectro de riesgo suicida, y generando modelos matemáticos y estadísticos para la detección de pacientes en esta zona de riesgo. Para esto, se utiliza una base de datos de 707 pacientes, consultantes de salud mental, de tres centros de salud distintos de la región metropolitana. La base de datos a su vez contempla 343 variables, incluyendo tanto información sociodemográfica de cada paciente, como también sus respuestas en siete instrumentos clínicos utilizados habitualmente en salud mental (DEQ, STAXI, OQ, RFL, APGAR, PBI Madre y PBI Padre). Inicialmente la base de datos es depurada eliminando aquellos campos y/o registros con gran porcentaje de valores nulos, mientras que la imputación de valores perdidos se realiza mediante técnicas tradicionales y en algunos casos según el criterio experto, donde se utiliza un método de imputación según valor de subescala para los distintos instrumentos clínicos. Posteriormente, se realiza una reducción de atributos mediante el uso de herramientas estadísticas y provenientes del machine learning. Con esta información, se generan cinco modelos utilizando distintas técnicas y herramientas del ámbito de la minería de datos y machine learning mediante aprendizaje supervisado. Los modelos son generados y calibrados usando el lenguaje estadístico R, y se comparan sus resultados mediante cuatro métricas distintas: precisión (o accuracy), sensibilidad, especificidad, y mediante su representación en el espacio ROC. El modelo o clasificador finalmente propuesto corresponde a un modelo de support vector machine, que permite discriminar cuando un paciente se encuentra en una zona de riesgo suicida. El modelo fue entrenado utilizando un kernel de tipo RBF, y utiliza tan sólo 22 variables predictoras, entregando una precisión aproximada del $78%, calculada mediante k-validación cruzada de n-folds con k=100 y n=10. Salud mental Suicidio Factores de riesgo Minería de datos Support vector machine
19	Stochastic functional descent for learning Support Vector Machines He, Kun 22 January 2016 (has links) We present a novel method for learning Support Vector Machines (SVMs) in the online setting. Our method is generally applicable in that it handles the online learning of the binary, multiclass, and structural SVMs in a unified view. The SVM learning problem consists of optimizing a convex objective function that is composed of two parts: the hinge loss and quadratic regularization. To date, the predominant family of approaches for online SVM learning has been gradient-based methods, such as Stochastic Gradient Descent (SGD). Unfortunately, we note that there are two drawbacks in such approaches: first, gradient-based methods are based on a local linear approximation to the function being optimized, but since the hinge loss is piecewise-linear and nonsmooth, this approximation can be ill-behaved. Second, existing online SVM learning approaches share the same problem formulation with batch SVM learning methods, and they all need to tune a fixed global regularization parameter by cross validation. On the one hand, global regularization is ineffective in handling local irregularities encountered in the online setting; on the other hand, even though the learning problem for a particular global regularization parameter value may be efficiently solved, repeatedly solving for a wide range of values can be costly. We intend to tackle these two problems with our approach. To address the first problem, we propose to perform implicit online update steps to optimize the hinge loss, as opposed to explicit (or gradient-based) updates that utilize subgradients to perform local linearization. Regarding the second problem, we propose to enforce local regularization that is applied to individual classifier update steps, rather than having a fixed global regularization term. Our theoretical analysis suggests that our classifier update steps progressively optimize the structured hinge loss, with the rate controlled by a sequence of regularization parameters; setting these parameters is analogous to setting the stepsizes in gradient-based methods. In addition, we give sufficient conditions for the algorithm's convergence. Experimentally, our online algorithm can match optimal classification performances given by other state-of-the-art online SVM learning methods, as well as batch learning methods, after only one or two passes over the training data. More importantly, our algorithm can attain these results without doing cross validation, while all other methods must perform time-consuming cross validation to determine the optimal choice of the global regularization parameter. Computer science Structured prediction Support Vector Machine Online learning
20	SV-Means: A Fast One-Class Support Vector Machine-Based Level Set Estimator Pavy, Anne M. January 2017 (has links) No description available. Electrical Engineering open set classification one-class support vector machine

Search results