311 |
Clustering System and Clustering Support Vector Machine for Local Protein Structure PredictionZhong, Wei 02 August 2006 (has links)
Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied.
|
312 |
Discovering Protein Sequence-Structure Motifs and Two Applications to Structural PredictionTang, Thomas Cheuk Kai January 2004 (has links)
This thesis investigates the correlations between short protein peptide sequences and local tertiary structures. In particular, it introduces a novel algorithm for partitioning short protein segments into clusters of local sequence-structure motifs, and demonstrates that these motif clusters contain useful structural information via two applications to structural prediction. The first application utilizes motif clusters to predict local protein tertiary structures. A novel dynamic programming algorithm that performs comparably with some of the best existing algorithms is described. The second application exploits the capability of motif clusters in recognizing regular secondary structures to improve the performance of secondary structure prediction based on Support Vector Machines. Empirical results show significant improvement in overall prediction accuracy with no performance degradation in any specific aspect being measured. The encouraging results obtained illustrate the great potential of using local sequence-structure motifs to tackle protein structure predictions and possibly other important problems in computational biology.
|
313 |
Vision-based place categorizationBormann, Richard Klaus Eduard 18 November 2010 (has links)
In this thesis we investigate visual place categorization by combining successful global image descriptors with a method of visual attention in order to automatically detect meaningful objects for places. The idea behind this is to incorporate information about typical objects for place categorization without the need for tedious labelling of important objects. Instead, the applied attention mechanism is intended to find the objects a human observer would focus first, so that the algorithm can use their discriminative power to conclude the place category. Besides this object-based place categorization approach we employ the Gist and the Centrist descriptor as holistic image descriptors.
To access the power of all these descriptors we employ SVM-DAS (discriminative accumulation scheme) for cue integration and furthermore smooth the output trajectory with a delayed Hidden Markov Model. For the classification of the variety of descriptors we present and evaluate several classification methods. Among them is a joint probability modelling approach with two approximations as well as a modified KNN classifier, AdaBoost and SVM. The latter two classifiers are enhanced for multi-class use with a probabilistic computation scheme which treats the individual classifiers as peers and not as a hierarchical sequence.
We check and tweak the different descriptors and classifiers in extensive tests mainly with a dataset of six homes. After these experiments we extend the basic algorithm with further filtering and tracking methods and evaluate their influence on the performance. Finally, we also test our algorithm within a university environment and on a real robot within a home environment.
|
314 |
System for Identifying Plankton from the SIPPER Instrument PlatformKramer, Kurt A. 29 October 2010 (has links)
Plankton imaging systems such as SIPPER produce a large quantity of data in the form of plankton images from a variety of classes. A system known as PICES was developed to quickly extract, classify and manage the millions of images produced from a single one-week research cruise. A new fast technique for parameter tuning and feature selection for Support Vector Machines using Wrappers was created. This technique allows for faster feature selection, while at the same time maintaining and sometimes improving classification accuracy. It also gives the user greater flexibility in the management of class contents in existing training libraries.
Support vector machines are binary classifiers that can implement multi-class classifiers by creating a classifier for each possible combination of classes or for each class using a one class versus all strategy. Feature selection searches for a single set of features to be used by each of the binary classifiers. This ignores the fact that features that may be good discriminators for two particular classes might not do well for other class combinations. As a result, the feature selection process may not include these features in the common set to be used by all support vector machines. It is shown through experimentation that by selecting features for each binary class combination, overall classification accuracy can be improved and the time required for training a multi-class support vector machine can be reduced. Another benefit of this approach is that significantly less time is required for feature selection when additional classes are added to the training data. This is because the features selected for the existing class combinations are still valid, so that feature selection only needs to be run for the new combination added.
This work resulted in a system called PICES, a GUI based user friendly system, which aids in the classification management of over 55 million images of plankton split amongst 180 classes. PICES embodies an improved means of performing Wrapper based feature selection that creates classifiers that train faster and are just as accurate and sometimes more accurate, while reducing the feature selection time.
|
315 |
Non-Destructive VIS/NIR Reflectance Spectrometry for Red Wine Grape AnalysisFadock, Michael 04 August 2011 (has links)
A novel non-destructive method of grape berry analysis is presented that uses reflected light to predict berry composition. The reflectance spectrum was collected using a diode array spectrometer (350 to 850 nm) over the 2009 and 2010 growing seasons. Partial least squares regression (PLS) and support vector machine regression (SVMR) generated calibrations between reflected light and composition for five berry components, total soluble solids (°Brix), titratable acidity (TA), pH, total phenols, and anthocyanins. Standard methods of analysis for the
components were employed and characterized for error. Decomposition of the reflectance data was performed by principal component analysis
(PCA) and independent component analysis (ICA). Regression models were constructed using 10x10 fold cross validated PLS and SVM models subject to smoothing, differentiation, and normalization pretreatments. All generated models were validated on the alternate season using two model selection strategies: minimum root mean squared error of prediction (RMSEP), and the "oneSE" heuristic.
PCA/ICA decomposition demonstrated consistent features in the long VIS wavelengths and NIR region. The features are consistent across seasons. 2009 was generally more variable, possibly due to cold weather affects. RMSEP and R2 statistics of models indicate that PLS °Brix, pH, and TA models are well predicted for 2009 and 2010. SVM was marginally better. The R2 values of the PLS °Brix, pH, and TA models for 2009 and 2010 respectively were: 0.84, 0.58, 0.56 and: 0.89, 0.81, 0.58. 2010 °Brix models were suitable for rough screening. Optimal
pretreatments were SG smoothing and relative normalization. Anthocyanins were well predicted in 2009, R2 0.65, but not in 2010, R2
0.15. Phenols were not well predicted in either year, R2 0.15-0.25. Validation demonstrated that °Brix, pH, and TA models from 2009 transferred to 2010 with fair results, R2 0.70, 0.72, 0.31. Models generated using 2010 reflectance data did not generate models that could predict 2009 data. It is hypothesized that weather events present in
2009 and not in 2010 allowed for a forward calibration transfer, and prevented the reverse calibration transfer. Heuristic selection was superior to minimum RMSEP for transfer, indicating some overfitting in the minimum RMSEP models. The results are demonstrative of a reflectance-composition relationship in the VIS-NIR region for °Brix, pH, and TA requiring additional study and development of further calibrations.
|
316 |
Leveraging supplementary transcriptions and transliterations via re-rankingBhargava, Aditya Unknown Date
No description available.
|
317 |
Bajeso metodo taikymas kredito rizikos valdyme / Bayesian method for a credit risk managementBūzius, Gediminas 09 July 2011 (has links)
Bajeso metodo taikymas kreditų rizikos valdyme: atlikta įvairių egzistuojančių metodų rizikai valdyti tyrimas, pateiktas analitinėje dalyje, aprašyti kai kurie plačiau naudojami mašininio mokymo ir matematiniai modeliai. Paiūlytas modelis eksperimentui atlikti, atliktas empirinis tyrimas ir pateikti gauti rezultatai, pateiktos išvados ir ateities perspektyvos. / Baysan Method for a Credit Risk Management This paper presents a method combining popular machine learning technique for classification, genetic search as a feature selection method for relevant attribute selection and Altman Z-Score discriminant technique for credit risk evaluation. Bayesian method based classifiers (Naïve Bayes, Bayesian Networks) were explored and used in this article to train classifiers. This method was applied to different sectors in service and industry. Its performance was evaluated using weighted mean accuracy and weighted mean error techniques. In theoretical part several methods were analyzed and described, in the end conclusions and suggestions were pointed.
|
318 |
Classification de spectres et recherche de biomarqueurs en spectroscopie par résonance magnétique nucléaire du proton dans les tumeurs prostatiquesParfait, Sébastien 06 December 2010 (has links) (PDF)
Le cancer de la prostate est le cancer le plus fréquent chez l'homme de plus de 50 ans. Actuellement, les méthodes de dépistage manquent soit de sensibilité, soit de spécificité ou sont désagréables pour le patient. La spectroscopie de résonance magnétique permet l'étude du métabolisme in vivo. L'utilisation d'appareil haut champ (≥3T) permet dorénavant d'analyser la prostate sans antenne endorectale. L'objectif de cette thèse est de créer un système automatique de dépistage de ce cancer en mettant au point une méthode de classification automatique permettant de traiter les données obtenues grâce à la spectroscopie de résonance magnétique. La spectroscopie de résonance magnétique est un phénomène complexe, très sensible aux conditions d'acquisition. Nous avons donc étudié comment améliorer l'acquisition de ce signal. Cependant, même avec une acquisition de très bonne qualité, le signal de résonance magnétique doit subir quelques traitements pour être analysable automatiquement par une méthode de classification. La suite du travail a donc consisté à rechercher les traitements à appliquer pour optimiser les spectres en vue d'une classification. Nous avons alors recherché la méthode de classification optimale pour ce problème. Cet ensemble d'étapes (acquisition du signal, traitement des spectres puis classification des données obtenues) nous permet de mettre en évidence la présence de tumeurs de la prostate avec un taux d'erreur global de moins de 12%. Dans un second temps, nous avons cherché de nouveaux biomarqueurs dans les spectres. Ces biomarqueurs pouvaient être un métabolite précis ou une plage de fréquence correspondant à plusieurs métabolites. Nous n'avons pas trouvé d'attributs plus significatifs que la choline ou le citrate, cependant quelques bandes de fréquence semblent participer à l'amélioration des taux d'erreurs. Enfin, nous avons élargi notre champ d'investigation en tentant d'appliquer ces techniques chez le rat. Des contraintes liées à l'acquisition ne nous ont pas permis d'obtenir suffisamment de spectres dans le cas pré-clinique. Nous avons cependant pu valider la faisabilité de la SRM chez le rongeur et sa pertinence dans le cerveau. La technique doit cependant être améliorée pour pouvoir être validée dans le cas du cancer de la prostate chez le rat.
|
319 |
Méthodes statistiques pour la prédiction de température dans les composants hyperfréquencesMallet, Grégory 25 October 2010 (has links) (PDF)
Cette thèse s'intéresse à l'application des méthodes d'apprentissage statistique pour la prédiction de température d'un composant électronique présent dans un radar. On étudie un cas simplifié des systèmes réels, le système étudié se limitant à un seul composant monté sur un système de refroidissement réduit. Le premier chapitre est consacré à la modélisation thermique. Après avoir présenté les principaux modes de transmission de l'agitation thermique, les modèles analytiques et numériques qui en découlent sont étudiés. En utilisant cette connaissance,le deuxième chapitre propose de choisir dans les méthodes de mesures les plus adaptées aux spécifications et aux contraintes de l'application choisie. Une fois que les bases de données ont été établies, nous pouvons utiliser dans le troisième chapitre les techniques de l'apprentissage statistique pour construire un modèle dynamique. Après un bref rappel sur les tenants et les aboutissants de la modélisation statistique, quatre familles de méthodes seront présentées : les modèles linéaires, les réseaux de neurones, les réseaux bayésiens dynamiques et les machines à vecteur support (SVM). Enfin, le quatrième chapitre est l'occasion de présenter une méthode de modélisation originale.En effet, après avoir détaillé la mise en oeuvre des méthodes d'identification de représentation d'état, nous verrons comment prendre en compte des a priori théoriques au cours de l'apprentissage de ce type de modèle, à savoir une contrainte de stabilité.
|
320 |
Automatické rozpoznávání stavu elektroměru z fotografie / Automatic recognition of the electrometer status from pictureHANZLÍK, Ondřej January 2015 (has links)
This thesis deals with problems of recognition of an electrometer´s state from sensing image. It is tangibly about electrometer´s scanning by a mobile phone´s camera. There is a surface with an electrometer´s dial which is detected and on this surface the particular numbers are detected consequently. The numbers are recognized via neural network. For more information from this image there are used some techniques of image segmentation to check the status. For the classification of the segmentation´s outputs are used classification tools, especially a support vector machine (SVM) and neural networks. Problems of image segmentations are solved by using OpenCV library. OpenCV is used for the implementation of the vector machine either. Application is on Android platform. Part of the thesis is concerned in a creation of a desktop application which is instrumental towards testing of neural network. The thesis also describes how to save the necessary data gathering in the course of the recognition which are used for working with neural network. The part of the thesis also deals with running web which will be evolved for the opportunity to participate in the further development of the system. There is available a public repository with source codes created during implementation.
|
Page generated in 0.0541 seconds