Spelling suggestions: "subject:"8upport vector"" "subject:"6upport vector""
61 |
Accuracy Improvement for RNA Secondary Structure Prediction with SVMChang, Chia-Hung 30 July 2008 (has links)
Ribonucleic acid (RNA) sometimes occurs in a complex structure called pseudoknots. Prediction of RNA secondary structures has drawn much attention from both biologists and computer scientists. Consequently, many useful tools have been developed for RNA secondary structure prediction, with or without pseudoknots. These tools have their individual strength and weakness. As a result, we propose a hybrid feature extraction method which integrates two prediction tools pknotsRG and NUPACK with a support vector machine (SVM). We first extract some useful features from the target RNA sequence, and then decide its prediction tool preference with SVM classification. Our test data set contains 723 RNA sequences, where 202 pseudoknotted RNA sequences are obtained from PseudoBase, and 521 nested RNA sequences are obtained from RNA SSTRAND. Experimental results show that our method improves not only the overall accuracy but also the sensitivity and the selectivity of the target sequences. Our method serves as a preprocessing process in analyzing RNA sequences before employing the RNA secondary structure prediction tools. The ability to combine the existing methods and make the prediction tools more accurate is our main contribution.
|
62 |
MaltParser -- An Architecture for Inductive Labeled Dependency ParsingHall, Johan January 2006 (has links)
<p>This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. Special attention is given in this thesis to learning methods based on support vector machines (SVM).</p><p>The implementation is validated in three sets of experiments using data from three languages (Chinese, English and Swedish). First, we check if the implementation realizes the underlying architecture. The experiments show that the MaltParser system outperforms the baseline and satisfies the basic constraints of well-formedness. Furthermore, the experiments show that it is possible to vary parsing algorithm, feature model and learning method independently. Secondly, we focus on the special properties of the SVM interface. It is possible to reduce the learning and parsing time without sacrificing accuracy by dividing the training data into smaller sets, according to the part-of-speech of the next token in the current parser configuration. Thirdly, the last set of experiments present a broad empirical study that compares SVM to memory-based learning (MBL) with five different feature models, where all combinations have gone through parameter optimization for both learning methods. The study shows that SVM outperforms MBL for more complex and lexicalized feature models with respect to parsing accuracy. There are also indications that SVM, with a splitting strategy, can achieve faster parsing than MBL. The parsing accuracy achieved is the highest reported for the Swedish data set and very close to the state of the art for Chinese and English.</p> / <p>Denna licentiatavhandling presenterar en mjukvaruarkitektur för</p><p>datadriven dependensparsning, dvs. för att automatiskt skapa en</p><p>syntaktisk analys i form av dependensgrafer för meningar i texter</p><p>på naturligt språk. Arkitekturen bygger på idén att man ska kunna variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Till grund för denna arkitektur har vi använt det teoretiska ramverket för induktiv dependensparsning presenterat av Nivre \citeyear{nivre06c}. Arkitekturen har realiserats i programvaran MaltParser, där det är möjligt att definiera komplexa särdragsmodeller i ett speciellt beskrivningsspråk. I denna avhandling kommer vi att lägga extra tyngd vid att beskriva hur vi har integrerat inlärningsmetoden supportvektor-maskiner (SVM).</p><p>MaltParser valideras med tre experimentserier, där data från tre språk används (kinesiska, engelska och svenska). I den första experimentserien kontrolleras om implementationen realiserar den underliggande arkitekturen. Experimenten visar att MaltParser utklassar en trivial metod för dependensparsning (\emph{eng}. baseline) och de grundläggande kraven på välformade dependensgrafer uppfylls. Dessutom visar experimenten att det är möjligt att variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Den andra experimentserien fokuserar på de speciella egenskaperna för SVM-gränssnittet. Experimenten visar att det är möjligt att reducera inlärnings- och parsningstiden utan att förlora i parsningskorrekthet genom att dela upp träningsdata enligt ordklasstaggen för nästa ord i nuvarande parsningskonfiguration. Den tredje och sista experimentserien presenterar en empirisk undersökning som jämför SVM med minnesbaserad inlärning (MBL). Studien använder sig av fem särdragsmodeller, där alla kombinationer av språk, inlärningsmetod och särdragsmodell</p><p>har genomgått omfattande parameteroptimering. Experimenten visar att SVM överträffar MBL för mer komplexa och lexikaliserade särdragsmodeller med avseende på parsningskorrekthet. Det finns även vissa indikationer på att SVM, med en uppdelningsstrategi, kan parsa en text snabbare än MBL. För svenska kan vi rapportera den högsta parsningskorrektheten hittills och för kinesiska och engelska är resultaten nära de bästa som har rapporterats.</p>
|
63 |
Computational Prediction of Transposon Insertion SitesAyat, Maryam 04 April 2013 (has links)
Transposons are DNA segments that can move or transpose themselves to new positions within the genome of an organism. Biologists need to predict preferred insertion sites of transposons to devise strategies in functional genomics and gene therapy studies. It has been found that the deformability property of the local DNA structure of the integration sites, called Vstep, is of significant importance in the target-site selection process. We considered the Vstep profiles of insertion sites and developed predictors based on Artificial Neural Networks (ANN) and Support Vector Machines (SVM). We trained our ANN and SVM predictors with the Sleeping Beauty transposonal data, and used them for identifying preferred individual insertion sites (each 12bp in length) and regions (each 100bp in length). Running a five-fold cross-validation showed that (1) Both ANN and SVM predictors are more successful in recognizing preferred regions than preferred individual sites; (2) Both ANN and SVM predictors have excellent performance in finding the most preferred regions (more than 90% sensitivity and specificity); and (3) The SVM predictor outperforms the ANN predictor in recognizing preferred individual sites and regions. The SVM has 83% sensitivity and 72% specificity in identifying preferred individual insertion sites, and 85% sensitivity and 90% specificity in recognizing preferred insertion regions.
|
64 |
Characterizing The Distinguishability Of Microbial GenomesPerry, Scott 21 April 2010 (has links)
The field of metagenomics has shown great promise in the ability to recover microbial DNA from communities whose members resist traditional cultivation techniques, although in most instances the recovered material comprises short anonymous genomic fragments rather than complete genome sequences. In order to effectively assess the microbial diversity and ecology represented in such samples, accurate methods for DNA classification capable of assigning metagenomic fragments into their most likely taxonomic unit are required. Existing DNA classification methods have shown high levels of accuracy in attempting to classify sequences derived from low-complexity communities, however genome distinguishability generally deteriorates for complex communities or those containing closely related organisms. The goal of this thesis was to identify factors both intrinsic or external to the genome that may lead to the improvement of existing DNA classification methods and to probe the fundamental limitations of composition-based genome distinguishability.
To assess the suite of factors affecting the distinguishability of genomes, support vector machine classifiers were trained to discriminate between pairs of microbial genomes using the relative frequencies of oligonucleotide patterns calculated from orthologous genes or short genomic fragments, and the resulting classification accuracy scores used as the measure of genomic distinguishability. Models were generated in order to relate distinguishability to several measures of genomic and taxonomic similarity, and interesting outlier genome pairs were identified by large residuals to the fitted models. Examination of the outlier pairs identified numerous factors that influence genome distinguishability, including genome reduction, extreme G+C composition, lateral gene transfer, and habitat-induced genome convergence. Fragments containing multiple protein-coding and non-coding sequences showed an increased tendency for misclassification, except in cases where the genomes were very closely related. Analysis of the biological function annotations associated with each fragment demonstrated that certain functional role categories showed increased or decreased tendency for misclassification. The use of pre-processing steps including DNA recoding, unsupervised clustering, 'symmetrization' of oligonucleotide frequencies, and correction for G+C content did not improve distinguishability.
Existing composition-based DNA classifiers will benefit from the results reported in this thesis. Sequence-segmentation approaches will improve genome distinguishability by decreasing fragment heterogeneity, while factors such as habitat, lifestyle, extreme G+C composition, genome reduction, and biological role annotations may be used to express confidence in the classification of individual fragments. Although genome distinguishability tends to be proportional to genomic and taxonomic relatedness, these trends can be violated for closely related genome pairs that have undergone rapid compositional divergence, or unrelated genome pairs that have converged in composition due to similar habitats or unusual selective pressures. Additionally, there are fundamental limits to the resolution of composition-based classifiers when applied to genomic fragments typical of current metagenomic studies.
|
65 |
Predicting homologous signaling pathways using machine learningBostan, Babak Unknown Date
No description available.
|
66 |
End-to-End Single-rate Multicast Congestion Detection Using Support Vector Machines.Liu, Xiaoming. January 2008 (has links)
<p>
<p>  / </p>
</p>
<p align="left">IP multicast is an efficient mechanism for simultaneously transmitting bulk data to multiple receivers. Many applications can benefit from multicast, such as audio and videoconferencing, multi-player games, multimedia broadcasting, distance education, and data replication. For either technical or policy reasons, IP multicast still has not yet been deployed in today&rsquo / s Internet. Congestion is one of the most important issues impeding the development and deployment of IP multicast and multicast applications.</p>
|
67 |
Computational Prediction of Transposon Insertion SitesAyat, Maryam 04 April 2013 (has links)
Transposons are DNA segments that can move or transpose themselves to new positions within the genome of an organism. Biologists need to predict preferred insertion sites of transposons to devise strategies in functional genomics and gene therapy studies. It has been found that the deformability property of the local DNA structure of the integration sites, called Vstep, is of significant importance in the target-site selection process. We considered the Vstep profiles of insertion sites and developed predictors based on Artificial Neural Networks (ANN) and Support Vector Machines (SVM). We trained our ANN and SVM predictors with the Sleeping Beauty transposonal data, and used them for identifying preferred individual insertion sites (each 12bp in length) and regions (each 100bp in length). Running a five-fold cross-validation showed that (1) Both ANN and SVM predictors are more successful in recognizing preferred regions than preferred individual sites; (2) Both ANN and SVM predictors have excellent performance in finding the most preferred regions (more than 90% sensitivity and specificity); and (3) The SVM predictor outperforms the ANN predictor in recognizing preferred individual sites and regions. The SVM has 83% sensitivity and 72% specificity in identifying preferred individual insertion sites, and 85% sensitivity and 90% specificity in recognizing preferred insertion regions.
|
68 |
Predicting homologous signaling pathways using machine learningBostan, Babak 11 1900 (has links)
Understanding biochemical reactions inside cells of individual organisms is a key factor for improving our biological knowledge. Signaling pathways provide a road map for a wide range of these chemical reactions that convert one signal or stimulus into another. In general, each signaling pathway in a cell involves many different proteins, each with one or more specific roles that help to amplify a relatively small stimulus into an effective response. Since proteins are essential components of a cells activities, it is important to understand how they work and in particular, to determine which of species proteins participate in each role. Experimentally determining this mapping of proteins to roles is difficult and time consuming. Fortunately, many individual pathways have been annotated for some species, and the pathways of other species can often be inferred using protein homology and the protein properties.
|
69 |
Land cover classification using linear support vector machines /Shakeel, Mohammad Danish. January 2008 (has links)
Thesis (M.S.)--Youngstown State University, 2008. / Includes bibliographical references (leaves 31-35). Also available via the World Wide Web in PDF format.
|
70 |
Dynamic task scheduling onto heterogeneous machines using Support Vector MachinePark, Yongwon. Baskiyar, Sanjeev, January 2008 (has links) (PDF)
Thesis (M.S.)--Auburn University, 2008. / Abstract. Includes bibliographical references (p. 26-29).
|
Page generated in 0.1614 seconds