Global ETD Search

391	Développement d'un alphabet structural intégrant la flexibilité des structures protéiques / Development of a structural alphabet integrating the flexibility of protein structures Sekhi, Ikram 29 January 2018 (has links) L’objectif de cette thèse est de proposer un Alphabet Structural (AS) permettant une caractérisation fine et précise des structures tridimensionnelles (3D) des protéines, à l’aide des chaînes de Markov cachées (HMM) qui permettent de prendre en compte la logique issue de l’enchaînement des fragments structuraux en intégrant l’augmentation des conformations 3D des structures protéiques désormais disponibles dans la banque de données de la Protein Data Bank (PDB). Nous proposons dans cette thèse un nouvel alphabet, améliorant l’alphabet structural HMM-SA27,appelé SAFlex (Structural Alphabet Flexibility), dans le but de prendre en compte l’incertitude des données (données manquantes dans les fichiers PDB) et la redondance des structures protéiques. Le nouvel alphabet structural SAFlex obtenu propose donc un nouveau modèle d’encodage rigoureux et robuste. Cet encodage permet de prendre en compte l’incertitude des données en proposant trois options d’encodages : le Maximum a posteriori (MAP), la distribution marginale a posteriori (POST)et le nombre effectif de lettres à chaque position donnée (NEFF). SAFlex fournit également un encodage consensus à partir de différentes réplications (chaînes multiples, monomères et homomères) d’une même protéine. Il permet ainsi la détection de la variabilité structurale entre celles-ci. Les avancées méthodologiques ainsi que l’obtention de l’alphabet SAFlex constituent les contributions principales de ce travail de thèse. Nous présentons aussi le nouveau parser de la PDB (SAFlex-PDB) et nous démontrons que notre parser a un intérêt aussi bien sur le plan qualitatif (détection de diverses erreurs)que quantitatif (rapidité et parallélisation) en le comparant avec deux autres parsers très connus dans le domaine (Biopython et BioJava). Nous proposons également à la communauté scientifique un site web mettant en ligne ce nouvel alphabet structural SAFlex. Ce site web représente la contribution concrète de cette thèse alors que le parser SAFlex-PDB représente une contribution importante pour le fonctionnement du site web proposé. Cette caractérisation précise des conformations 3D et la prise en compte de la redondance des informations 3D disponibles, fournies par SAFlex, a en effet un impact très important pour la modélisation de la conformation et de la variabilité des structures 3D, des boucles protéiques et des régions d’interface avec différents partenaires, impliqués dans la fonction des protéines / The purpose of this PhD is to provide a Structural Alphabet (SA) for more accurate characterization of protein three-dimensional (3D) structures as well as integrating the increasing protein 3D structure information currently available in the Protein Data Bank (PDB). The SA also takes into consideration the logic behind the structural fragments sequence by using the hidden Markov Model (HMM). In this PhD, we describe a new structural alphabet, improving the existing HMM-SA27 structural alphabet, called SAFlex (Structural Alphabet Flexibility), in order to take into account the uncertainty of data (missing data in PDB files) and the redundancy of protein structures. The new SAFlex structural alphabet obtained therefore offers a new, rigorous and robust encoding model. This encoding takes into account the encoding uncertainty by providing three encoding options: the maximum a posteriori (MAP), the marginal posterior distribution (POST), and the effective number of letters at each given position (NEFF). SAFlex also provides and builds a consensus encoding from different replicates (multiple chains, monomers and several homomers) of a single protein. It thus allows the detection of structural variability between different chains. The methodological advances and the achievement of the SAFlex alphabet are the main contributions of this PhD. We also present the new PDB parser(SAFlex-PDB) and we demonstrate that our parser is therefore interesting both qualitative (detection of various errors) and quantitative terms (program optimization and parallelization) by comparing it with two other parsers well-known in the area of Bioinformatics (Biopython and BioJava). The SAFlex structural alphabet is being made available to the scientific community by providing a website. The SAFlex web server represents the concrete contribution of this PhD while the SAFlex-PDB parser represents an important contribution to the proper function of the proposed website. Here, we describe the functions and the interfaces of the SAFlex web server. The SAFlex can be used in various fashions for a protein tertiary structure of a given PDB format file; it can be used for encoding the 3D structure, identifying and predicting missing data. Hence, it is the only alphabet able to encode and predict the missing data in a 3D protein structure to date. Finally, these improvements; are promising to explore increasing protein redundancy data and obtain useful quantification of their flexibility Structure 3D d'une protéine Alphabet Structural (AS) Chaînes de Markov cachées (HMM) Parser PDB Encodage structural Maximum a Posteriori (MAP) Marginal posterior distribution (POST) Entropie d’encodage Protein three-dimensional structure Structural Alphabet (SA) Hidden Markov Model (HMM) PDB (Protein Data Bank) PDB File Format PDB file parser Structural protein encoding Maximum a Posteriori (MAP) Marginal posterior distribution (POST) Encoding entropy
392	A Multi-Factor Stock Market Model with Regime-Switches, Student's T Margins, and Copula Dependencies Berberovic, Adnan, Eriksson, Alexander January 2017 (has links) Investors constantly seek information that provides an edge over the market. One of the conventional methods is to find factors which can predict asset returns. In this study we improve the Fama and French Five-Factor model with Regime-Switches, student's t distributions and copula dependencies. We also add price momentum as a sixth factor and add a one-day lag to the factors. The Regime-Switches are obtained from a Hidden Markov Model with conditional Student's t distributions. For the return process we use factor data as input, Student's t distributed residuals, and Student's t copula dependencies. To fit the copulas, we develop a novel approach based on the Expectation-Maximisation algorithm. The results are promising as the quantiles for most of the portfolios show a good fit to the theoretical quantiles. Using a sophisticated Stochastic Programming model, we back-test the predictive power over a 26 year period out-of-sample. Furthermore we analyse the performance of different factors during different market regimes. finance statistics stock market stocks factor factors probability probability distribution students t distrbution students t copula markov chain hidden markov model regime switching stochastic programming optimisation optimization multi factor model arbitrage pricing theory return performance back test expectation maximisation expectation maximization multiple linear regression stochastic process primal-dual interior point qq-plot qq plot excess return market regimes bear market bull market market index index Probability Theory and Statistics Sannolikhetsteori och statistik
393	Překlad z češtiny do angličtiny / Czech-English Translation Petrželka, Jiří January 2010 (has links) Tato diplomová práce popisuje principy statistického strojového překladu a demonstruje, jak sestavit systém pro statistický strojový překlad Moses. V přípravné fázi jsou prozkoumány volně dostupné bilingvní česko-anglické korpusy. Empirická analýza časové náročnosti vícevláknových nástrojů pro zarovnání slov demonstruje, že MGIZA++ může dosáhnout až pětinásobného zrychlení, zatímco PGIZA++ až osminásobného zrychlení (v porovnání s GIZA++). Jsou otestovány tři způsoby morfologického pre-processingu českých trénovacích dat za použití jednoduchých nefaktorových modelů. Zatímco jednoduchá lemmatizace může snížit BLEU, sofistikovanější přístupy většinou BLEU zvyšují. Positivní efekty morfologického pre-processingu se vytrácejí s růstem velikosti korpusu. Vztah mezi dalšími charakteristikami korpusu (velikost, žánr, další data) a výsledným BLEU je empiricky měřen. Koncový systém je natrénován na korpusu CzEng 0.9 a vyhodnocen na testovacím vzorku z workshopu WMT 2010.

Page generated in 0.0352 seconds