On the Structure Differences of Short Fragments and Amino Acids in Proteins with and without Disulfide BondsDayalan, Saravanan, saravanan.dayalan@rmit.edu.au January 2008 (has links)
Of the 20 standard amino acids, cysteines are the only amino acids that have a reactive sulphur atom, thus enabling two cysteines to form strong covalent bonds known as disulfide bonds. Even though almost all proteins have cysteines, not all of them have disulfide bonds. Disulfide bonds provide structural stability to proteins and hence are an important constraint in determining the structure of a protein. As a result, disulfide bonds are used to study various protein properties, one of them being protein folding. Protein structure prediction is the problem of predicting the three-dimensional structure of a protein from its one-dimensional amino acid sequence. Ab initio methods are a group of methods that attempt to solve this problem from first principles, using only basic physico-chemical properties of proteins. These methods use structure libraries of short amino acid fragments in the process of predicting the structure of a protein. The protein structures from which these structure libraries are created are not classified in any other way apart from being non-redundant. In this thesis, we investigate the structural dissimilarities of short amino acid fragments when occurring in proteins with disulfide bonds and when occurring in those proteins without disulfide bonds. We are interested in this because, as mentioned earlier, the protein structures from which the structure libraries of ab initio methods are created, are not classified in any form. This means that any significant structural difference in amino acids and short fragments when occurring in proteins with and without disulfide bonds would remain unnoticed as these structure libraries have both fragments from proteins with disulfide bonds and without disulfide bonds together. Our investigation of structural dissimilarities of amino acids and short fragments is done in four phases. In phase one, by statistically analysing the phi and psi backbone dihedral angle distributions we show that these fragments have significantly different structures in terms of dihedral angles when occurring in proteins with and without disulfide bonds. In phase two, using directional statistics we investigate how structurally different are the 20 different amino acids and the short fragments when occurring in proteins with and without disulfide bonds. In phase three of our work, we investigate the differences in secondary structure preference of the 20 amino acids in proteins with and without disulfide bonds. In phase four, we further investigate and show that there are significant differences within the same secondary structure region of amino acids when they occur in proteins with and without disulfide bonds. Finally, we present the design and implementation details of a dihedral angle and secondary structure database of short amino acid fragments (DASSD) that is publicly available. Thus, in this thesis we show previously unknown significant structure differences in terms of backbone dihedral angles and secondary structures in amino acids and short fragments when they occur in proteins with and without disulfide bonds.
Statistical Methods of Detection of Current Flow Structures in Stretches of Water. / Méthodes statistiques de détection des structures de courant dans les étendues d'eau.Novikov, Dmitri 14 December 2011 (has links)
Ce projet adresse le problème de détection des structures directionnelles précises dans les champs de courant. L'accent est mis surtout sur la détection des structures tourbillonnaires, puisque les chercheurs dans le domaine de la dynamique des fluides considèrent ces structures comme étant particulièrement importantes. Chapitre 1 présente la motivation de ce projet et fournit les détails du contexte environnemental et mathématique du problème en soulignant les parties essentielles de la théorie qu'on utilise après pour la solution. Chapitre 2 propose une méthode statistique, basée sur la proportion des vraisemblances, pour adresser le problème particulier de la détection des tourbillons et montre l'efficacité de cette méthode sur les données simulées et réelles en discutant aussi les limitations. Chapitre 3 développe les idées discutées dans les chapitres 1 et 2 pour trouver un outil statistique plus général qui répare les défauts de la première méthode et permet la détection de toutes les structures directionnelles qui peuvent intéresser les chercheurs. Tous les outils nécessaires pour l'analyse des données en utilisant les deux méthodes développées dans ce projet se trouvent dans les annexes A et B de la thèse. / This work addresses the problem of detecting specific directional structures in flows of current. Specific emphasis is placed on vortex detection, as scientists studying fluid dynamics consider this structure to be of particular importance. Chapter 1 presents the motivation behind the project and provides details about the environmental and, subsequently, the mathematical context of the problem, highlighting the essential parts of the theory that is later used to propose the solution. Chapter 2 offers a statistical approach, based on a likelihood ratio, to solving the specific problem of vortex detection and demonstrates the effectiveness of the method on simulated and real data, also discussing the limitations of the approach. Chapter 3 expands on the ideas discussed in Chapters 1 and 2 to derive a generalized statistical test that remedies the flaws of the first approach and extends to the problem of detecting any directional structure of interest. All necessary tools for the analysis of data using the two methods developed in this project are given in Appendix A and B.
3D imaging and nonparametric function estimation methods for analysis of infant cranial shape and detection of twin zygosityVuollo, V. (Ville) 17 April 2018 (has links)
The use of 3D imaging of craniofacial soft tissue has increased in medical science, and imaging technology has been developed greatly in recent years. 3D models are quite accurate and with imaging devices based on stereophotogrammetry, capturing the data is a quick and easy operation for the subject. However, analyzing 3D models of the face or head can be challenging and there is a growing need for efficient quantitative methods. In this thesis, new mathematical methods and tools for measuring craniofacial structures are developed.
The thesis is divided into three parts. In the first part, facial 3D data of Lithuanian twins are used for the determination of zygosity. Statistical pattern recognition methodology is used for classification and the results are compared with DNA testing.
In the second part of the thesis, the distribution of surface normal vector directions of a 3D infant head model is used to analyze skull deformation. The level of flatness and asymmetry are quantified by functionals of the kernel density estimate of the normal vector directions. Using 3D models from infants at the age of three months and clinical ratings made by experts, this novel method is compared with some previously suggested approaches. The method is also applied to clinical longitudinal research in which 3D images from three different time points are analyzed to find the course of positional cranial deformation and associated risk factors.
The final part of the thesis introduces a novel statistical scale space method, SphereSiZer, for exploring the structures of a probability density function defined on the unit sphere. The tools developed in the second part are used for the implementation of SphereSiZer. In SphereSiZer, the scale-dependent features of the density are visualized by projecting the statistically significant gradients onto a planar contour plot of the density function. The method is tested by analyzing samples of surface unit normal vector data of an infant head as well as data from generated simulated spherical densities.
The results and examples of the study show that the proposed novel methods perform well. The methods can be extended and developed in further studies. Cranial and facial 3D models will offer many opportunities for the development of new and sophisticated analytical methods in the future. / Tiivistelmä
Pään ja kasvojen pehmytkudoksen 3D-kuvantaminen on yleistynyt lääketieteessä, ja siihen tarvittava teknologia on kehittynyt huomattavasti viime vuosina. 3D-mallit ovat melko tarkkoja, ja kuvaus stereofotogrammetriaan perustuvalla laitteella on nopea ja helppo tilanne kuvattavalle. Kasvojen ja pään 3D-mallien analysointi voi kuitenkin olla haastavaa, ja tarve tehokkaille kvantitatiivisille menetelmille on kasvanut. Tässä väitöskirjassa kehitetään uusia matemaattisia kraniofakiaalisten rakenteiden mittausmenetelmiä ja -työkaluja.
Työ on jaettu kolmeen osaan. Ensimmäisessä osassa pyritään määrittämään liettualaisten kaksosten tsygositeetti kasvojen 3D-datan perusteella. Luokituksessa hyödynnetään tilastollista hahmontunnistusta, ja tuloksia verrataan DNA-testituloksiin.
Toisessa osassa analysoidaan pään epämuodostumia imeväisikäisten päiden 3D-kuvista laskettujen pintanormaalivektorien suuntiin perustuvan jakauman avulla. Tasaisuuden ja epäsymmetrian määrää mitataan normaalivektorien suuntakulmien ydinestimaatin funktionaalien avulla. Kehitettyä menetelmää verrataan joihinkin aiemmin ehdotettuihin lähestymistapoihin mittaamalla kolmen kuukauden ikäisten imeväisten 3D-malleja ja tarkastelemalla asiantuntijoiden tekemiä kliinisiä pisteytyksiä. Menetelmää sovelletaan myös kliiniseen pitkittäistutkimukseen, jossa tutkitaan pään epämuodostumien ja niihin liittyvien riskitekijöiden kehitystä kolmena eri ajankohtana otettujen 3D-kuvien perusteella.
Viimeisessä osassa esitellään uusi tilastollinen skaala-avaruusmenetelmä SphereSiZer, jolla tutkitaan yksikköpallon tiheysfunktion rakenteita. Toisessa osassa kehitettyjä työkaluja sovelletaan SphereSiZerin toteutukseen. SphereSiZer-menetelmässä tiheysfunktion eri skaalojen piirteet visualisoidaan projisoimalla tilastollisesti merkitsevät gradientit tiheysfunktiota kuvaavalle isoviivakartalle. Menetelmää sovelletaan imeväisikäisen pään pintanormaalivektoridataan ja simuloituihin, pallotiheysfunktioihin perustuviin otoksiin.
Tulosten ja esimerkkien perusteella väitöskirjassa esitetyt uudet menetelmät toimivat hyvin. Menetelmiä voidaan myös kehittää edelleen ja laajentaa jatkotutkimuksissa. Pään ja kasvojen 3D-mallit tarjoavat paljon mahdollisuuksia uusien ja laadukkaiden analyysityökalujen kehitykseen myöhemmissä tutkimuksissa.
Von Mises-Fisher based (co-)clustering for high-dimensional sparse data : application to text and collaborative filtering data / Modèles de mélange de von Mises-Fisher pour la classification simple et croisée de données éparses de grande dimensionSalah, Aghiles 21 November 2016 (has links)
La classification automatique, qui consiste à regrouper des objets similaires au sein de groupes, également appelés classes ou clusters, est sans aucun doute l’une des méthodes d’apprentissage non-supervisé les plus utiles dans le contexte du Big Data. En effet, avec l’expansion des volumes de données disponibles, notamment sur le web, la classification ne cesse de gagner en importance dans le domaine de la science des données pour la réalisation de différentes tâches, telles que le résumé automatique, la réduction de dimension, la visualisation, la détection d’anomalies, l’accélération des moteurs de recherche, l’organisation d’énormes ensembles de données, etc. De nombreuses méthodes de classification ont été développées à ce jour, ces dernières sont cependant fortement mises en difficulté par les caractéristiques complexes des ensembles de données que l’on rencontre dans certains domaines d’actualité tel que le Filtrage Collaboratif (FC) et de la fouille de textes. Ces données, souvent représentées sous forme de matrices, sont de très grande dimension (des milliers de variables) et extrêmement creuses (ou sparses, avec plus de 95% de zéros). En plus d’être de grande dimension et sparse, les données rencontrées dans les domaines mentionnés ci-dessus sont également de nature directionnelles. En effet, plusieurs études antérieures ont démontré empiriquement que les mesures directionnelles, telle que la similarité cosinus, sont supérieurs à d’autres mesures, telle que la distance Euclidiennes, pour la classification des documents textuels ou pour mesurer les similitudes entre les utilisateurs/items dans le FC. Cela suggère que, dans un tel contexte, c’est la direction d’un vecteur de données (e.g., représentant un document texte) qui est pertinente, et non pas sa longueur. Il est intéressant de noter que la similarité cosinus est exactement le produit scalaire entre des vecteurs unitaires (de norme 1). Ainsi, d’un point de vue probabiliste l’utilisation de la similarité cosinus revient à supposer que les données sont directionnelles et réparties sur la surface d’une hypersphère unité. En dépit des nombreuses preuves empiriques suggérant que certains ensembles de données sparses et de grande dimension sont mieux modélisés sur une hypersphère unité, la plupart des modèles existants dans le contexte de la fouille de textes et du FC s’appuient sur des hypothèses populaires : distributions Gaussiennes ou Multinomiales, qui sont malheureusement inadéquates pour des données directionnelles. Dans cette thèse, nous nous focalisons sur deux challenges d’actualité, à savoir la classification des documents textuels et la recommandation d’items, qui ne cesse d’attirer l’attention dans les domaines de la fouille de textes et celui du filtrage collaborative, respectivement. Afin de répondre aux limitations ci-dessus, nous proposons une série de nouveaux modèles et algorithmes qui s’appuient sur la distribution de von Mises-Fisher (vMF) qui est plus appropriée aux données directionnelles distribuées sur une hypersphère unité. / Cluster analysis or clustering, which aims to group together similar objects, is undoubtedly a very powerful unsupervised learning technique. With the growing amount of available data, clustering is increasingly gaining in importance in various areas of data science for several reasons such as automatic summarization, dimensionality reduction, visualization, outlier detection, speed up research engines, organization of huge data sets, etc. Existing clustering approaches are, however, severely challenged by the high dimensionality and extreme sparsity of the data sets arising in some current areas of interest, such as Collaborative Filtering (CF) and text mining. Such data often consists of thousands of features and more than 95% of zero entries. In addition to being high dimensional and sparse, the data sets encountered in the aforementioned domains are also directional in nature. In fact, several previous studies have empirically demonstrated that directional measures—that measure the distance between objects relative to the angle between them—, such as the cosine similarity, are substantially superior to other measures such as Euclidean distortions, for clustering text documents or assessing the similarities between users/items in CF. This suggests that in such context only the direction of a data vector (e.g., text document) is relevant, not its magnitude. It is worth noting that the cosine similarity is exactly the scalar product between unit length data vectors, i.e., L 2 normalized vectors. Thus, from a probabilistic perspective using the cosine similarity is equivalent to assuming that the data are directional data distributed on the surface of a unit-hypersphere. Despite the substantial empirical evidence that certain high dimensional sparse data sets, such as those encountered in the above domains, are better modeled as directional data, most existing models in text mining and CF are based on popular assumptions such as Gaussian, Multinomial or Bernoulli which are inadequate for L 2 normalized data. In this thesis, we focus on the two challenging tasks of text document clustering and item recommendation, which are still attracting a lot of attention in the domains of text mining and CF, respectively. In order to address the above limitations, we propose a suite of new models and algorithms which rely on the von Mises-Fisher (vMF) assumption that arises naturally for directional data lying on a unit-hypersphere.
