Global ETD Search

1	Conversation of Intrinsic Disorder in Protein Domains and Families Chen, Jessica Walton 08 1900 (has links) Submitted to the faculty of the Bioinformatics Graduate Program in partial fulfillment of the requirements for the degree Master of Science in the School of Informatics, Indiana University August 2005 / Protein regions which lack a fixed structure are called ‘disordered’. These intrinsically disordered regions are not only very common in many proteins, they are also crucial to the function of many proteins, especially proteins involved in signaling and regulation. The goal of this work was to identify the prevalence, characteristics, and functions of conserved disordered regions within protein domains and families. A database was created to store the amino acid sequences of nearly one million proteins and their domain matches from the InterPro database, a resource integrating eight different protein family and domain databases. Disorder prediction was performed on these protein sequences. Regions of sequence corresponding to domains were aligned using a multiple sequence alignment tool. From this initial information, regions of conserved predicted disorder were found within the domains. The methodology for this search consisted of finding regions of consecutive positions in the multiple sequence alignments in which a 90% or more of the sequences were predicted to be disordered. This procedure was constrained to find such regions of conserved disorder prediction that were at least 20 amino acids in length. The results of this work were 3,653 regions of conserved disorder prediction, found within 2,898 distinct InterPro entries. Most regions of conserved predicted disorder detected were short, with less than 10% of those found exceeding 30 residues in length. Regions of conserved disorder prediction were found in protein domains from all available InterPro member databases, although with varying frequency. Regions of conserved disorder prediction were found in proteins from all kingdoms of life, including viruses. However, domains found in eukaryotes and viruses contained a higher proportion of long regions of conserved disorder than did domains found in bacteria and archaea. In both this work and previous work, eukaryotes had on the order of ten times more proteins containing long disordered regions than did archaea and bacteria. Sequence conservation in regions of conserved disorder varied, but was on average slightly lower than in regions of conserved order. Both this work and previous work indicate that in some cases, disordered regions evolve faster, in others they evolve slower, and in the rest they evolve at roughly the same rate. A variety of functions were found to be associated with domains containing conserved disorder. The most common were DNA/RNA binding, and protein binding. Many ribosomal protein families also were found to contain conserved disordered regions. Other functions identified included membrane translocation and amino acid storage for germination. Due to limitations of current knowledge as well as the methodology used for this work, it was not determined whether or not these functions were directly associated with the predicted disordered region. However, the functions associated with conserved disorder in this work are in agreement with the functions found in other studies to correlate to disordered regions. This work has shown that intrinsic disorder may be more common in bacterial and archaeal proteins than previously thought, but this disorder is likely to be used for different purposes than in eukaryotic proteins, as well as occurring in shorter stretches of protein. Regions of predicted disorder were found to be conserved within a large number of protein families and domains. Although many think of such conserved domains as being ordered, in fact a significant number of them contain regions of disorder that are likely to be crucial to their function. disorder protein domains
2	Protein Interactions from the Molecular to the Domain Level Björkholm, Patrik January 2014 (has links) The basic unit of life is the cell, from single-cell bacteria to the largest creatures on the planet. All cells have DNA, which contains the blueprint for proteins. This information is transported in the form of messenger RNA from the genome to ribosomes where proteins are produced. Proteins are the main functional constituents of the cell, they usually have one or several functions and are the main actors in almost all essential biological processes. Proteins are what make the cell alive. Proteins are found as solitary units or as part of large complexes. Proteins can be found in all parts of the cell, the most common place being the cytoplasm, a central space in all cells. They are also commonly found integrated into or attached to various membranes. Membranes define the cell architecture. Proteins integrated into the membrane have a wide number of responsibilities: they are the gatekeepers of the cell, they secrete cellular waste products, and many of them are receptors and enzymes. The main focus of this thesis is the study of protein interactions, from the molecular level up to the protein domain level. In paper I use reoccurring local protein structures to try and predict what sections of a protein interacts with another part using only sequence information. In papers II and III we use a randomization approach on a membrane protein motif that we know interacts with a sphingomyelin lipid to find other candidate proteins that interact with sphingolipids. These are then experimentally verified as sphingolipid-binding. In the last paper, paper IV, we look at how protein domain interaction networks overlap and can be evaluated. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 3: Manuscript.</p> Protein interactions protein domains membrane proteins
3	Recherche de domaines protéiques divergents à l'aide de modèles de Markov cachés : application à Plasmodium falciparum / Protein Domain Detection with Hidden Markov Models : application to Plasmodium falciparum Terrapon, Nicolas 03 December 2010 (has links) Les modèles de Markov cachés (MMC) par exemple ceux de la librairie Pfam sont des outils très populaires pour l'annotation des domaines protéiques. Cependant, ils ne sont pas toujours adaptés aux protéines les plus divergentes. C'est notamment le cas avec Plasmodium falciparum (principal agent du paludisme chez l'Homme), où les MMC de Pfam identifient peu de familles distinctes de domaines, et couvrent moins de 50% des protéines de l'organisme. L'objectif de cette thèse est d'apporter des méthodes nouvelles pour affiner la détection de domaines dans les protéines divergentes.Le premier axe développé est une approche d'identification de domaines utilisant leurs propriétés de co-occurrence. Différentes études ont montré que la majorité des domaines apparaissent dans les protéines avec un ensemble très réduits d'autres domaines favoris. Notre méthode exploite cette propriété pour détecter des domaines trop divergents pour être identifiés par l'approche classique. Cette détection s'accompagne d'une estimation du taux d'erreur par une procédure de ré-échantillonnage. Chez P. falciparum, elle permet d'identifier, avec un taux d'erreur estimé inférieur à 20%, 585 nouveaux domaines dont 159 familles étaient inédites dans cet organisme ce qui représente 16% du nombre de domaines connus.Le second axe de mes recherches présente plusieurs méthodes de corrections statistiques et évolutives des MMC pour l'annotation d'organismes divergents. Deux types d'approches ont été proposées. D'un côté, nous intégrons aux alignements d'apprentissage des MMC, les séquences précédemment identifiés dans l'organisme cible ou ses proches relatifs. La limitation de cette solution est que seules des familles de domaines déjà connues dans le taxon peuvent ainsi être identifiées. Le deuxième type d'approche contourne cette limitation en corrigeant tous les modèles par une prise en compte de l'évolution des séquences d'apprentissage. Pour cela, nous faisons appel à des techniques classiques de la bioinformatique et de l'apprentissage statistique. Les résultats obtenus offrent un ensemble de prédictions complémentaires totalisant 663 nouveaux domaines supplémentaires dont 504 familles inédites soit une augmentation de 18% à ajouter aux précédents résultats. / Hidden Markov Models (HMMs) from Pfam database for example are popular tools for protein domain annotation. However, they are not well suited for studying highly divergent proteins. This is notably the case with Plasmodium falciparum (main causal agent of human malaria), where Pfam HMMs identify few distinct domain families and cover less than 50% of its proteins. This thesis aims at providing new methods to enhance domain detection in divergent proteins.The first axis of this work is an approach of domain identification based on domain co-occurrence. Several studies shown that a majority of domains appear in proteins with a small set of other favourite domains. Our method exploits this tendency to detect domains escaping to the classical procedure because of their divergence. Detected domains come along with an false discovery rate (FDR) estimation computed with a shuffling procedure. In P. falciparum proteins, this approach allows us identify, with an FDR below 20%, 585 new domains with 159 families that were previously unseen in this organism which account for 16% of the known domains.The second axis of my researches involves the development of statistical and evolutionary methods of HMM correction to improve the annotation of divergent organisms. Two kind of approaches are proposed. On the one hand, the sequences previously identified in the target organism and its close relatives are integrated in the learning alignments. An obvious limitation of this solution is that only new occurrences of previously known families in the taxon can be discovered. On the other hand, we evade this limitation by adjusting HMM parameters by simulating the evolution of the learning sequences. To this end, classical techniques from bioinformatics and statistical learning were used. Alternative libraries offer a complementary set of predictions summing 663 new domains with 504 previously unseen families corresponding to an improvement of 18% to add to the previous results. Domaines protéiques Modèles de Markov cachés Paludisme Protein Domains Hidden Markov Models Malaria
4	Méthodes pour l'identification de domaines protéiques divergents / Functional annotation of divergent genomes : application to Leishmania parasite Ghouila, Amel 16 December 2013 (has links) L'étude de la composition des protéines en domaines est une étape clé pour la détermination de ses fonctions. Pfam est l'une des banques de domaines les plus répandues où chaque domaine est représenté par un HMM profil construit à partir d'un alignement multiple de protéines contenant le domaine. La méthode classique de recherche des domaines Pfam consiste à comparer la séquence cible à la librairie complète des HMM profils pour mesurer sa ressemblance aux différents modèles. Cependant, appliquée aux protéines d'organismes divergents, cette méthode manque de sensibilité. L'objectif de cette thèse est d'apporter de nouvelles méthodes pour améliorer le processus de prédictions des domaines plus adaptées à l'étude des protéines divergentes. Les premiers travaux ont consisté en l'adaptation et application de la méthode CODD, récemment proposée, à l'ensemble des pathogènes de la base de données EuPathDB. Une base de données nommée EupathDomains (http://www.atgc-montpellier.fr/EuPathDomains/) recensant l'ensemble des domaines connus et ceux nouvellement prédits chez ces pathogènes a été mise en place à l'issue de ces travaux. Nous nous sommes ensuite attachés à proposer diverses améliorations. Nous proposons un algorithme ''CODD_exclusive'' qui utilise des informations d'incompatibilité de domaines pour améliorer la précision des prédictions. Nous proposons également une autre stratégie basée sur l'utilisation de règles d'association pour la détermination des co-occurrences de domaines utilisées dans le processus de certification. La dernière partie de cette thèse s'intéresse à l'utilisation des méthodes profil/profil pour annoter un génome entier. Couplée à la procédure d'annotation par co-occurrence, cette approche permet une amélioration notable en termes de nombre de domaines certifiés et également en termes de précision. / The determination of protein domain composition provides strong clues for the protein function prediction. One of the most widelyused domain scheme is the Pfam database in which each family is represented by a multiple sequence alignment and a profileHidden Markov Model (profile HMM). When analyzing a new sequence, each Pfam HMM is used to compute a score measuring the similarity between the sequenceand the domain. However, applied to divergent proteins, this strategy may miss several domains. This is the case for all eukaryotic pathogens, where noPfam domains are detected in half or even more of their proteins.The main objective of this thesis is to develop methods to improve the sensitivity of Pfam domain detection in divergent proteins. We first adapted the recently proposed CODD method to the whole set of pathogens in EupathDB. A public database named EupathDomains (http://www.atgc-montpellier.fr/EuPathDomains/) gathers known and new domains detected by CODD, along with the associated confidence measurements and the GO annotations.We then proposed other methods to further improve domain detection in these organisms. We proposed ''CODD_exclusive'' algorithm that integrates domain exclusion information to prune false positive domains that are in conflict with other domains of the protein. We also suggested the use of association rules to determine the correlations between domains and used these informations in the certification process.In the last part of this thesis, we focused in the use of profile/profile methods to predict protein domains in a whole genome. Combined with the co-occurrence informations, it achieved high sensitivity and accuracy in predicting domains. Bioinformatique Annotation fonctionnelle Domaines protéiques Leishmania Plasmodium Pathogènes Bioinformatics Functional annotation Protein domains Leishmania Plasmodium Pathogens
5	Identificação de domínios em proteínas com redes complexas / Protein domain identification with complex networks. Mostaço-Guidolin, Luiz Carlos Büttner 20 January 2011 (has links) A utilização de redes complexas para a descriçãoi de diversos sistemas naturais e artificias,compreendidos nas mais diversas áreas do conhecimento humano, tem se mostrado uma abordagem poderosa para a redução da complexidade inerente a tais sistemas. Em muitos casos, tal complexidade resulta do número de componentes envolvidos e de suas intrincadas relações. Uma forma de reduzir a complexidade associada a tais sistemas, consiste em identificar e agrupar componentes que possuam características similares. Sendo assim, desenvolvemos nesta tese métodos de identificação de comunidades em redes complexas. Tais métodos se baseiam na ideia de que comunidades surgem quando grupos de vértices possuem um número mais elevado de conexões entre os vértices do mesmo grupo do que com vértices externos à este grupo. Além disso, utilizamos a função modularidade como função objetivo e como forma de avaliação e comparação dos resultados obtidos nesta tese com resultados previamente reportados na literatura. Uma vez estabelecido um método de identificação de comunidades, utilizamos a abordagem de redes complexas para a determinação de domínios estruturais de proteínas. Para tal, criamos redes de contato entre os aminoácidos de uma proteí?na buscando representar apenas as ligações relevantes do ponto de vista topológico. Por meio destas representações, aplicamos os métodos de identificação de comunidades desenvolvidos nesta tese, no intuito de identificar domínios estruturais de cadeias proteicas. Por fim, desenvolvemos um método específico para a identificação de domínios em proteínas com dois domínios sequencias, concluindo desta forma, os objetivos propostos nesta tese. / The use of complex networks for the representation of various natural and artificial sys- tems in the most diverse fields of human knowledge, has proven to be a powerful approach for the reduction of the complexity in the study of such systems. In many cases, this complexity emerges from the number of components of the system and from the intricate relationship between them. A reduction in this complexity is made possible by the iden- tification and grouping of the components of the system with similar characteristics. In this way, we developed in this thesis, methods for community identification in complex networks. Such methods are based on the notion that communities arise when groups of vertices are more densely connected with vertices of their same group, than with ver- tices belonging to other groups in the network. Moreover, the modularity function has been used as an objective function, and as a score for the evaluation and comparison of the results obtained in this thesis with the results reported in the literature of complex networks. Upon the establishment of a method for community detection, we used the framework of complex networks to the determination of structural protein domains. The- refore, we have created contact networks of amino acids of protein chains, focusing on the representation of only the most relevant interactions between them, from a topological point of view. We have applied to these networks the methods for community identi- fication developed in this thesis, aiming to identify the structural domains of proteins. Finally, we have developed a specific method for the identification of protein domains in protein chains with two sequential domains, concluding in this way, the objectives proposed in this thesis. complex networks domínios em proteínas extremal optimization modularidade. modularity otimização extrema protein domains redes complexas
6	High-throughput evaluation of protein folding conditions and expression constructs for structural genomics / High-throughput evaluation of protein folding conditions and expression constructs for structural genomics Scheich, Christoph January 2004 (has links) Das E. coli Expressionssystem ist das am häufigsten angewandte hinsichtlich der rekombinante Proteinexpression für strukturelle und funktionelle Analysen aufgrund der hohen erzielten Ausbeuten und der einfachen Handhabbarkeit. Allerdings ist insbesondere die Expression eukaryotischer Proteine in E. coli problematisch, z.B. wenn das Protein nicht korrekt gefaltet ist und in unlöslichen Inclusion Bodies anfällt. In manchen Fällen ist die Analyse von Deletionskonstrukten oder einzelnen Proteindomänen der Untersuchung des Vollängeproteins vorzuziehen. Dies umfasst die Herstellung eines Satzes von Expressionskonstrukten, welche charakterisiert werden müssen. In dieser Arbeit werden Methoden optimiert und evaluiert für die in vitro-Faltung von Inclusion Body-Proteinen sowie die Entwicklung einer Hochdurchsatz-Charakterisierung von Expressionskonstrukten. Die Überführung von Inclusion Body-Proteinen in den nativen Zustand beinhaltet zwei Schritte: (a) Auflösen mit einen chaotropen Reagenz oder starkem ionischen Detergenz und (b) Faltung des Proteins durch Beseitigung des Chaotrops begleitet von dem Transfer in einen geeigneten Puffer. Die Ausbeute an nativ gefaltetem Protein ist oft stark eingeschränkt aufgrund von Aggregation und Fehlfaltung; sie kann allerdings durch die Zugabe bestimmter Additive zum Faltungspuffer erhöht werden. Solche Additive müssen empirisch identifiziert werden. In dieser Arbeit wurde eine Testprozedur für Faltungsbedingungen entwickelt. Zur Reduzierung der möglichen Kombinationen der getesteten Additive wurden sowohl empirische Beobachtungen aus der Literatur als auch bekannte Eigenschaften der Additive berücksichtigt. Zur Verminderung der eingesetzten Proteinmenge und des Arbeitsaufwandes wurde der Test automatisiert und miniaturisiert mittels eines Pipettierroboters. 20 Bedingungen zum schnellen Verdünnen von denaturierten Proteinen werden hierbei getestet und zwei Bedingungen zur Faltung von Proteinen mit dem Detergenz/Cyclodextrin Protein-Faltungssystem von Rozema et al. (1996). 100 µg Protein werden pro Bedingung eingesetzt. Zusätzlich werden acht Bedingungen für die Faltung von His-Tag-Fusionsproteinen (ca. 200 µg), welche an eine Metallchelat-Matrix immobilisiert sind, getestet. Die Testprozedur wurde erfolgreich angewendet zur Faltung eines humanen Proteins, der p22 Untereinheit von Dynactin, welche in E. coli in Inclusion Bodies exprimiert wird. So wie es sich bei vielen Proteinen darstellt, war auch für p22 Dynactin kein biologischer Nachweistest vorhanden, um den Erfolg des Faltungsexperimentes zu messen. Die Löslichkeit des Proteins kann nicht als eindeutiges Kriterium dienen, da neben nativ gefaltetem Protein, lösliche fehlgefaltete Spezies und Mikroaggregate auftreten können. Diese Arbeit evaluiert Methoden zur Detektion kleiner Mengen nativen Proteins nach dem automatisierten Faltungstest. Bevor p22 Dynactin gefaltet wurde, wurden zwei Modellenzyme zur Evaluierung eingesetzt, bovine Carboanhydrase II (CAB) und Malat Dehydrogenase aus Schweineherz-Mitochondrien. Die wiedererlangte Aktivität nach der Rückfaltung wurde korreliert mit verschiedenen biophysikalischen Methoden. Bindungsstudien mit 8-Anilino-1-Naphtalenesulfonsäure ergaben keine brauchbaren Informationen bei der Rückfaltung von CAB aufgrund der zu geringen Sensitivität und da fehlgefaltete Proteine nicht eindeutig von nativem Protein unterschieden werden konnten. Tryptophan Fluoreszenzspektren der rückgefalteten CAB wurden zur Einschätzung des Erfolges der Rückfaltung angewandt. Die Verschiebung des Intensitätsmaximum zu einer niedrigeren Wellenlänge im Vergleich zum denaturiert entfalteten Protein sowie die Fluoreszenzintensität korrelierten mit der wiedererlangten enzymatischen Aktivität. Für beide Modellenzyme war analytische hydrophobe Interaktionschromatographie (HIC) brauchbar zur Identifizierung rückgefalteter Proben mit aktivem Enzym. Kompakt gefaltetes, aktives Enzym eluierte in einem distinkten Peak im abnehmenden Ammoniumsulfat-Gradienten. Das Detektionslimit für analytische HIC lag bei 5 µg. Im Falle von CAB konnte gezeigt werden, dass Tryptophan-Fluoreszenz-Spektroskopie und analytische HIC in Kombination geeignet sind um Falsch-Positive oder Falsch-Negative, welche mit einem der Monitore erhalten wurden, auszuschließen. Diese beiden Methoden waren ebenfalls geeignet zur Identifizierung der Faltungsbedingungen von p22 Dynactin. Tryptophan-Fluoreszenz-Spektroskopie kann jedoch zu Falsch-Positiven führen, da in machen Fällen Spektren von löslichen Mikroaggregaten kaum unterscheidbar sind von Spektren des nativ gefalteten Proteins. Dies zusammenfassend wurde eine schnelle und zuverlässige Testprozedur entwickelt, um Inclusion Body-Proteine einer strukturellen und funktionellen Analyse zugänglich zu machen. In einem separaten Projekt wurden 88 verschiedene E. coli-Expressionskonstrukte für 17 humane Proteindomänen, welche durch Sequenzanalyse identifiziert wurden, mit einer Hochdurchsatzreinigung und –faltungsanalytik untersucht, um für die Strukturanalyse geeignete Kandidaten zu erhalten. Nach Expression in einem Milliliter im 96er Mikrotiterplattenformat und automatisierter Proteinreinigung wurden löslich exprimierte Proteindomänen direkt analysiert mittels 1D ¹H-NMR Spektroskopie. Hierbei zeigte sich, dass insbesondere isolierte Methylgruppen-Signale unter 0.5 ppm sensitive und zuverlässige Sonden sind für gefaltetes Protein. Zusätzlich zeigte sich, dass – ähnlich zur Evaluierung des Faltungstests – analytische HIC effizient eingesetzt werden kann zur Identifizierung von Konstrukten, welche kompakt gefaltetes Protein ergeben. Sechs Konstrukte, welche zwei Domänen repräsentieren, konnten schnell als tauglich für die Strukturanalyse gefunden werden. Die Struktur einer dieser Domänen wurde kürzlich von Mitarbeitern gelöst, die andere Struktur wurde im Laufe dieses Projektes von einer anderen Gruppe veröffentlicht. / For recombinant production of proteins for structural and functional analyses, the E. coli expression system is the most widely used due to high yields and straightforward processing. However, particularly the expression of eukaryotic proteins in E. coli is often problematic, e.g. when the protein is not folded correctly and is deposited in insoluble inclusion bodies. In some cases it is favourable to analyse deletion constructs of a protein or an individual protein domain instead of the full-length protein. This implies the generation of a set of expression constructs that need to be characterised. In this work methods to optimise and evaluate in vitro folding of inclusion body proteins as well as high-throughput characterisation of expression constructs were developed. Transferring inclusion body proteins to their native state involves two steps: (a) solubilisation with a chaotropic reagent or a strong ionic detergent and (b) folding of the protein by removal of the chaotrop accompanied by the transfer into an appropriate buffer. The yield of natively folded protein is often substantially reduced due to aggregation or misfolding; it may, however, be improved by certain additives to the folding buffer. These additives need to be identified empirically. In this thesis a screening procedure for folding conditions was developed. To reduce the number of possible combinations of screening additives, empirical observations documented in the literature as well as well known properties of certain screening additives were considered. To decrease the amount of protein and work invested, the screen was miniaturised and automated using a pipetting robot. Twenty rapid dilution conditions for the denatured protein are tested and two conditions for folding of proteins using the detergent/cyclodextrin protein folding system of Rozema et al. (1996). 100 µg protein is used per condition. In addition, eight conditions can be tested for folding of His-tagged proteins (approx. 200 µg) immobilised on metal chelate resins. The screen was successfully applied to fold a human protein, the p22 subunit of dynactin that is expressed in inclusion bodies in E. coli. For p22 dynactin – as is the case for many proteins – there was no biological assay available to assess the success of the folding screen. Protein solubility can not be used as a stringent criterion because beside natively folded protein, soluble misfolded species and microaggregates may occur. This work evaluates methods to detect small amounts of natively folded protein after automated folding screening. Before folding screening with p22 dynactin, two model enzymes, bovine carbonic anhydrase II (CAB) and pig heart mitochondrial malate dehydrogenase, were used for evaluation. Recovered activity after refolding was correlated to different biophysical methods. 8-anilino-1-naphtalenesulfonic acid binding-experiments gave no useful information when refolding CAB, due to low sensitivity and because misfolded protein could not be readily distinguished from native protein. Tryptophan fluorescence spectra of refolded CAB were used to assess the success of refolding. The shift of the intensity maximum to a shorter wavelength, compared to the denaturant unfolded protein, as well as the fluorescence intensity correlated to recovered enzymatic activity. For both model enzymes, analytical hydrophobic interaction chromatography (HIC) was useful to identify refolded samples that contain active enzyme. Compactly folded, active enzyme eluted in a distinct peak in a decreasing ammonium sulfate gradient. The detection limit of analytical HIC was approx. 5 µg. In case of CAB, tryptophan fluorescence spectroscopy and analytical HIC showed that both methods in combination can be useful to rule out false positives or false negatives obtained with one method. These two methods were also useful to identify conditions for folding of p22 dynactin. However, tryptophan fluorescence spectroscopy can lead to false positives because in some cases spectra of soluble microaggregates are not well distinguishable from spectra of natively folded protein. In summary, a fast and reliable screening procedure was developed to make inclusion body proteins accessible to structural or functional analyses. In a separate project, 88 different E. coli expression constructs for 17 human protein domains that had been identified by sequence analysis were analysed using high-throughput purification and folding analysis in order to obtain candidates suitable for structural analysis. After 96 deep-well microplate expression and automated protein purification, solubly expressed protein domains were directly analysed using 1D ¹H-NMR spectroscopy. It was found that isolated methyl group signals below 0.5 ppm are particularly sensitive and reliable probes for folded protein. In addition – similar to the evaluation of a folding screen – analytical HIC proved to be an efficient tool for identifying constructs that yield compactly folded protein. Both methods, 1D ¹H-NMR spectroscopy and analytical HIC, provided complementary results. Six constructs, representing two domains, could be quickly identified as targets that are well suitable for structural analysis. The structure of one of these domains was solved recently by co-workers, the other structure was published by another group during this project. Life sciences
7	Engineering and characterization of disulfide bond isomerases in Escherichia coli Arredondo, Silvia A. 18 January 2011 (has links) Disulfide bond formation is an essential process for the folding and biological activity of most extracellular proteins; however, it may become the limiting step when the production of these proteins is attempted in heterologous hosts such as Escherichia coli. The rearrangement of incorrect disulfide bonds between cysteines that do not normally interact in the native structure of a protein is carried out by disulfide isomerase enzymes. The disulfide isomerase present in the bacterial secretory compartment (the periplasmic space) is the homodimeric enzyme DsbC. The objective of this dissertation was to understand the key features of how DsbC catalyzes disulfide bond isomerization. Chimeric disulfide isomerases comprising of protein domains that share a similar function, or are homologous to domains of DsbC were constructed in an effort to understand the effect of the domain orientation in the dimeric protein, and the need for a substrate binding region in disulfide isomerases. We successfully created a series of fusion enzymes, FkpA-DsbAs, which catalyze in vivo disulfide isomerization with comparable efficiency to DsbC. These enzymes comprise of the peptide binding region of the periplasmic chaperone FkpA, which is functionally and structurally similar to the binding domain of DsbC but share no amino acid homology with it, fused to the bacterial oxidase DsbA. In addition, these chimeric enzymes were shown to assist in the initial formation of disulfide bonds, a function that is normally exhibited only by DsbA. Directed evolution of the FkpA-DsbA proteins conferred improved resistance to CuCl₂, a phenotype dependent on disulfide bond isomerization and highlighted the importance of an optimal catalytic site. The bacterial disulfide isomerase DsbC is a homodimeric V-shaped enzyme that consists of a dimerization domain, two α-helical linkers and two opposing catalytic domains. The functional significance of the existence of two catalytic domains of DsbC is not well understood yet. The fact that identical subunits naturally dimerize to generate DsbC has so far limited the study of the individual catalytic sites in the homodimer. In chapter 3 we discuss the engineering, in vivo function, and biochemical characterization chapter 3 we discuss the engineering, in vivo function, and biochemical characterization of DsbC variants covalently linked via (Gly3Ser) flexible linkers. We have either inactivated one of the catalytic sites (CGYC), or entirely removed one of the catalytic domains while maintaining the putative binding area intact. Our results support the hypotheses that dual catalytic domains in DsbC are not necessary for disulfide bond isomerization, but are important in terms of increasing the effective concentration of catalytic equivalents, and that the availability of a substrate binding region is a determining feature in isomerization. Finally, we have carried out initial studies to map the residues and sequence motifs that are recognized in substrate proteins that interact with DsbC. Although the main putative binding region of DsbC has been localized within the limits of the hydrophobic cleft that emerges from the interaction of the N-terminal domains of this enzyme, and, a few native substrates have already been identified, no information on the features of substrate proteins that are recognized by the enzyme has been reported. To address this problem, we have screened two different, 15 amino-acid random peptide libraries for binding to DsbC. We have successfully isolated several peptides with high affinity for the enzyme. Possible consensus binding motifs were identified and their significance in substrate recognition will be examined in future studies. / text Disulfide bonds between cysteines Disulfide bond isomerization Catalyze Chimeric disulfide isomerases Protein domains
8	Identificação de domínios em proteínas com redes complexas / Protein domain identification with complex networks. Luiz Carlos Büttner Mostaço-Guidolin 20 January 2011 (has links) A utilização de redes complexas para a descriçãoi de diversos sistemas naturais e artificias,compreendidos nas mais diversas áreas do conhecimento humano, tem se mostrado uma abordagem poderosa para a redução da complexidade inerente a tais sistemas. Em muitos casos, tal complexidade resulta do número de componentes envolvidos e de suas intrincadas relações. Uma forma de reduzir a complexidade associada a tais sistemas, consiste em identificar e agrupar componentes que possuam características similares. Sendo assim, desenvolvemos nesta tese métodos de identificação de comunidades em redes complexas. Tais métodos se baseiam na ideia de que comunidades surgem quando grupos de vértices possuem um número mais elevado de conexões entre os vértices do mesmo grupo do que com vértices externos à este grupo. Além disso, utilizamos a função modularidade como função objetivo e como forma de avaliação e comparação dos resultados obtidos nesta tese com resultados previamente reportados na literatura. Uma vez estabelecido um método de identificação de comunidades, utilizamos a abordagem de redes complexas para a determinação de domínios estruturais de proteínas. Para tal, criamos redes de contato entre os aminoácidos de uma proteí?na buscando representar apenas as ligações relevantes do ponto de vista topológico. Por meio destas representações, aplicamos os métodos de identificação de comunidades desenvolvidos nesta tese, no intuito de identificar domínios estruturais de cadeias proteicas. Por fim, desenvolvemos um método específico para a identificação de domínios em proteínas com dois domínios sequencias, concluindo desta forma, os objetivos propostos nesta tese. / The use of complex networks for the representation of various natural and artificial sys- tems in the most diverse fields of human knowledge, has proven to be a powerful approach for the reduction of the complexity in the study of such systems. In many cases, this complexity emerges from the number of components of the system and from the intricate relationship between them. A reduction in this complexity is made possible by the iden- tification and grouping of the components of the system with similar characteristics. In this way, we developed in this thesis, methods for community identification in complex networks. Such methods are based on the notion that communities arise when groups of vertices are more densely connected with vertices of their same group, than with ver- tices belonging to other groups in the network. Moreover, the modularity function has been used as an objective function, and as a score for the evaluation and comparison of the results obtained in this thesis with the results reported in the literature of complex networks. Upon the establishment of a method for community detection, we used the framework of complex networks to the determination of structural protein domains. The- refore, we have created contact networks of amino acids of protein chains, focusing on the representation of only the most relevant interactions between them, from a topological point of view. We have applied to these networks the methods for community identi- fication developed in this thesis, aiming to identify the structural domains of proteins. Finally, we have developed a specific method for the identification of protein domains in protein chains with two sequential domains, concluding in this way, the objectives proposed in this thesis. domínios em proteínas modularidade. otimização extrema redes complexas complex networks extremal optimization modularity protein domains
9	Nouvelles approches pour l'analyse et la prédiction de la structure tridimensionnelle des protéines / New strategies for protein structure analysis and prediction Ghouzam, Yassine 18 October 2016 (has links) Ce travail de thèse est une étude in silico des structures tridimensionnelles des protéines, qui a fait l’objet de 5 publications scientifiques.D’une manière plus précise, les travaux s’articulent autour de trois thématiques originales et complémentaires dans le domaine de la bioinformatique structurale : la caractérisation d’un nouvel échelon de description de la structure des protéines (les unités protéiques), intermédiaire entre les structures secondaires et les domaines.Le deuxième axe de cette thèse porte sur le développement d’une nouvelle méthode de prédiction des structures protéiques, appelée ORION.Cette méthode permet une détection accrue d’homologues lointains grâce à la prise en compte de l’information structurale sous forme d’un alphabet structural (les blocs protéiques).Une seconde version améliorée a été rendue accessible à la communauté scientifique par le biais d’une interface web : http://www.dsimb.inserm.fr/ORION/.Le dernier axe de cette thèse, s’oriente autour du développement d’outils, pour la prédiction de l’orientation et l’évaluation de la membrane dans les structures de protéines membranaires effectué dans le cadre de plusieurs collaborations.Les outils développés (ANVIL et MAIDEN) ont été mise à la disposition de la communauté scientifique par le biais d’une interface web appelée OREMPRO et accessible à l’adresse suivante : http://www.dsimb.inserm.fr/OREMPRO/. / This thesis deals with three complementary themes in the field of structural bioinformatics : the characterization of a new level of description of the protein structure (Protein Units) which is an intermediate level between the secondary structures and protein domains. The second part focus on the development of a new method for predicting protein structures,called ORION. It boosts the detection of remote protein homologs by taking into account thestructural information in the form of a structural alphabet (Protein Blocks). A second improved version was made available to the scientific community through a web interface : http://www.dsimb.inserm.fr/ORION/. The last part of this thesis describes the collaborative development of new tools for predicting and assessing the orientation of proteins in the membrane. The two methods developed (ANVIL and MAIDEN) were made available to the scientific community through a webinterface called OREMPRO: http: / /www.dsimb.inserm.fr/OREMPRO. Modélisation protéique Domaines protéiques Unités protéiques Structural modeling Protein domains Protein Units
10	The evolution, modifications and interactions of proteins and RNAs Surappa-Narayanappa, Ananth Prakash January 2017 (has links) Proteins and RNAs are two of the most versatile macromolecules that carry out almost all functions within living organisms. In this thesis I have explored evolutionary and regulatory aspects of proteins and RNAs by studying their structures, modifications and interactions. In the first chapter of my thesis I investigate domain atrophy, a term I coined to describe large-scale deletions of core structural elements within protein domains. By looking into truncated domain boundaries across several domain families using Pfam, I was able to identify rare cases of domains that showed atrophy. Given that even point mutations can be deleterious, it is surprising that proteins can tolerate such large-scale deletions. Some of the structures of atrophied domains show novel protein-protein interaction interfaces that appear to compensate and stabilise their folds. Protein-protein interactions are largely influenced by the surface and charge complementarity, while RNA-RNA interactions are governed by base-pair complementarity; both interaction types are inherently different and these differences might be observed in their interaction networks. Based on this hypothesis I have explored the protein-protein, RNA-protein and the RNA-RNA interaction networks of yeast in the second chapter. By analysing the three networks I found no major differences in their network properties, which indicates an underlying uniformity in their interactomes despite their individual differences. In the third chapter I focus on RNA-protein interactions by investigating post-translational modifications (PTMs) in RNA-binding proteins (RBPs). By comparing occurrences of PTMs, I observe that RBPs significantly undergo more PTMs than non-RBPs. I also found that within RBPs, PTMs are more frequently targeted at regions that directly interact with RNA compared to regions that do not. Moreover disorderedness and amino acid composition were not observed to significantly influence the differential PTMs observed between RBPs and nonRBPs. The results point to a direct regulatory role of PTMs in RNA-protein interactions of RBPs. In the last chapter, I explore regulatory RNA-RNA interactions. Using differential expression data of mRNAs and lncRNAs from mouse models of hereditary hemochromatosis, I investigated competing regulatory interactions between mRNA, lncRNA and miRNA. A mutual interaction network was created from the predicted miRNA interaction sites on mRNAs and lncRNAs to identify regulatory RNAs in the disease. I also observed interesting relations between the sense-antisense mRNA-lncRNA pairs that indicate mutual regulation of expression levels through a yet unknown mechanism.

Search results