• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 49
  • 14
  • 11
  • 9
  • 6
  • 2
  • 1
  • 1
  • Tagged with
  • 110
  • 76
  • 55
  • 47
  • 22
  • 20
  • 17
  • 17
  • 16
  • 15
  • 14
  • 14
  • 12
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Proteus : A new predictor for protean segments

Söderquist, Fredrik January 2015 (has links)
The discovery of intrinsically disordered proteins has led to a paradigm shift in protein science. Many disordered proteins have regions that can transform from a disordered state to an ordered. Those regions are called protean segments. Many intrinsically disordered proteins are involved in diseases, including Alzheimer's disease, Parkinson's disease and Down's syndrome, which makes them prime targets for medical research. As protean segments often are the functional part of the proteins, it is of great importance to identify those regions. This report presents Proteus, a new predictor for protean segments. The predictor uses Random Forest (a decision tree ensemble classifier) and is trained on features derived from amino acid sequence and conservation data. Proteus compares favourably to state of the art predictors and performs better than the competition on all four metrics: precision, recall, F1 and MCC. The report also looks at the differences between protean and non-protean regions and how they differ between the two datasets that were used to train the predictor.
82

Degradation mechanisms of TTP/TIS11 proteins, major effectors of the AU-rich element-mediated mRNA decay in eukaryotes

Vo Ngoc, Long 25 September 2014 (has links)
Regulation of gene expression occurs at several levels in a cell. While control of transcription is often viewed as the main source of regulation, it is now clear that post-transcriptional processes are essential to fine-tune protein availability. The presence of AU-rich elements (ARE) in the 3’ untranslated region (3’UTR) of many important mRNAs exemplifies one such process. AREs alter the mRNA translation or degradation status by recruiting ARE-binding proteins (ARE-BP). ARE-BPs of the TTP/TIS11 family bind to their cognate ARE-RNAs using their conserved tandem zinc-finger domain and induce rapid decay of their targets. This allows for proper regulation of cell proliferation, cell death and inflammation. In this regard, TTP/TIS11 are main regulators of gene expression, and as such are put under strict transcriptional, post-transcriptional as well as several layers of post-translational control.<p>In this work, we aimed at elucidating the degradation mechanisms affecting TTP/TIS11. Using Drosophila as a model, we found that dTIS11 protein turnover is rapid due to continuous degradation by the proteasome. However, proteasomal recognition did not require ubiquitination of dTIS11 as non-ubiquitinable mutants were efficiently degraded by the proteasome. In addition, dTIS11 was digested by the 20S proteasome that lacks ubiquitin-recognition domains. Our results further indicate that intrinsically disordered regions (IDR) in dTIS11 may be responsible for proteasomal recognition. In fact, dTIS11 is predicted as disordered and possesses the main characteristics of intrinsically disordered proteins (IDP). We also identified dTIS11 N- and C-terminal domains as functional signals for degradation, potentially due to their destructuration. This ubiquitination-independent, disorder-dependent degradation process is conserved throughout evolution as dTIS11 mammalian counterpart, TTP, undergoes the same degradation by default pathway. In addition, we established that phosphorylation prevents degradation of TTP/TIS11 by the proteasome. <p>Together, our results pinpoint a new essential characteristic for TTP/TIS11 that may redefine the identity of these proteins. In addition, we unraveled a novel and conserved mechanism of regulation of TTP/TIS11. This control is essential for cell physiology as defects in this process can lead to defects in the inflammatory response, increased radiation-induced lung toxicity and tumorigenesis.<p> / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
83

Exploration par résonance magnétique de l'espace conformationnel et de la dynamique du facteur de transcription partiellement désordonné Engrailed-2 / The conformational space and dynamics of the partially disordered transcription factor engrailed-2 explored with magnetic resonance

Khan, Shahid Nawaz 12 March 2015 (has links)
Les protéines intrinsèquement désordonnées (IDP), dépourvues d’une structure rigide et stable, constituent une classe de protéines diverses et fonctionnellement importantes. La résonance magnétique nucléaire (RMN) est une technique spectroscopique bien établie pour caractériser les propriétés conformationnelles et dynamiques des IDP avec une résolution atomique. L’espace conformationnel, en général large et varié, des IPD en fait une cible difficile pour la biologie structurale dont le but est de déterminer avec précision et exactitude les propriétés structurales, dynamique et physico-chimiques qui sous-tendent la fonction des macromolécules biologiques. Ce manuscrit présente une étude biophysique détaillée de la région intrinsèquement désordonnée (IDR) du facteur de transcription Engrailed-2, avant tout par RMN. Après une présentation de cette homéoprotéine, nous décrivons les protocoles d’expression et de purification de cette protéine isotopiquement marquée. Nous introduisons ensuite une nouvelle approche pour la caractérisation des mouvements pico- et nanoseconde des protéines intrinsèquement désordonnées à partir de données de relaxation des spins nucléaires enregistrées à plusieurs champs magnétiques. Les effets de relaxation paramagnétique (PRE) ont été utilisés pour identifier des interactions transitoires entre la région désordonnée et l’homéodomaine d’Engrailed-2. L’interaction d’Engrailed-2 avec l’ADN a été étudiée en détail en utilisant l’anisotropie de fluorescence sur une série de constructions de la protéine, afin de mettre en lumière le rôle de la partie désordonnée dans l’interaction avec l’ADN. Nous avons également employé la résonance paramagnétique électronique pour tenter de détecter une interaction potentielle entre le noyau hydrophobe de l’hexapeptide dans la région désordonnée et l’homéodomaine. Les couplages dipolaires résiduels (RDC) dans les paires 1H-15N, Cα-Hα et Cα-C′ ont également été mesurés sur des échantillons d’Engrailed en milieu anisotrope. Ces données seront essentielles pour reconstituer l’espace conformationnel d’Engrailed 2. L’ensemble des approches présentées a permis de constituer un socle solide de connaissances qui permettent de mieux comprendre les propriétés conformationnelles, dynamiques et fonctionnelles de l’IDR d’Engrailed-2. / Intrinsically Disordered Proteins (IDPs), which lack a stable rigid structure constitute a large and functionally important class of proteins. Nuclear Magnetic Resonance (NMR) is a well-established technique to characterize the structural and dynamical features of IDPs at atomic resolution. The broad conformational space of IDPs makes them challenging targets for structural biology to define their precise structural features and motions, the physical and chemical properties that underlie their biological functions. The present thesis establishes biophysical investigation of the disordered region of the transcription factor Engrailed-2 (13.5 kDa) primarily by NMR. After describing the protocol of expression and purification of the isotopically labeled protein, we present a novel approach to characterize the pico – nano second motions in IDPs using nuclear spin relaxation data at multiple fields. Paramagnetic Relaxation Enhancements (PREs) are used to identify transient long-range interactions between the disordered region and the folded homeodomain of Engrailed-2. Binding to DNA was studied by fluorescence anisotropy and highlights the role of the disordered region in the DNA binding. We used Electron Paramagnetic Resonance (EPR) to probe the potential interaction between the hydrophobic cluster (hexapeptide) in the disordered region and the homeodomain. The one-bond 1H-15N, Cα-Hα and Cα-C′ residual dipolar couplings (RDCs) measured for Engrailed-2 provide important constraints for the refinement of the conformational space of Engrailed_2. All these approaches provide valuable insights in understanding the structural, dynamical and functional properties of this IDP.
84

Une région intrinsèquement désordonnée dans OSBP contrôle la géometrie et la dynamique du site de contact membranaire / An intrinsically disordered region of OSBP controls membrane contact site geometry and dynamics

Jamecna, Denisa 12 December 2018 (has links)
La protéine OSBP est un transporteur de lipides qui régule la distribution cellulaire du cholestérol. OSBP comprend un domaine PH, deux séquences « coiled coil », un motif FFAT (deux phénylalanines dans un environement acide), et un domaine de liaison de lipides (ORD) à son extrémité C-terminale. Le domaine PH interagit avec le PI(4)P et la petite protéine G Arf1-GTP au niveau du Golgi, alors que le motif FFAT interagit avec la protéine VAP-A, résidente du réticulum endoplasmique (RE). En liant simultanément tous ces déterminants, OSBP stabilise des sites de contact membranaire entre RE et Golgi, permettant ainsi un contre-échange cholestérol / PI(4)P par l'ORD. OSBP contient également une longue séquence N-terminale d’environ 80 aa, intrinsèquement désordonnée, composée principalement de glycine, proline et d'alanine. Nous démontrons que la présence de ce N-terminus désordonné augmente le rayon de Stoke de OSBP tronquée du domaine ORD, et limite sa densité d’association sur la membrane portant le PI(4)P. La protéine dépourvue du N terminus favorise l'agrégation symétrique des liposomes PI(4)P (mimant la membrane du Golgi) par les deux domaines PH du dimère OSBP, alors que la présence de la séquence désordonnée empêche cette association symétrique. De même, nous observons que la distribution d’OSBP sur la membrane de vésicules unilamellaires géantes (GUV) varie selon la présence ou l'absence du N-terminus. En présence de la séquence désordonnée, la protéine est répartie de manière homogène sur toute la surface du GUV, alors que la protéine sans N-terminal a tendance à s'accumuler à l'interface entre deux GUV de type Golgi. Cette accumulation locale ralentit fortement la mobilité de la protéine à l’interface. Un effet similaire du N-terminal sur la dynamique des protéines est observé lorsque l’association de membranes de type ER et Golgi est assuré par des protéines monomériques (dépourvue du coiled coil) en présence de Vap-A. Les résultats de nos expériences in vitro ont été confirmés en cellules vivantes, où la séquence intrinsèquement désordonnée contrôle le recrutement d’OSBP sur les membranes Golgiennes, sa mobilité et sa dynamique d’activité au cours des cycles de transfert de lipides. La plupart des protéines de la famille d’OSBP contiennent des séquences N-terminales de faible complexité, suggérant un mécanisme général de régulation. / Oxysterol binding protein (OSBP) is a lipid transfer protein that regulates cholesterol distribution in cell membranes. OSBP consists of a pleckstrin homology (PH) domain, two coiled-coils, a “two phenylalanines in acidic tract” (FFAT) motif and a C-terminal lipid binding OSBP-Related Domain (ORD). The PH domain recognizes PI(4)P and small G protein Arf1-GTP at the Golgi, whereas the FFAT motif interacts with the ER-resident protein VAP-A. By binding all these determinants simultaneously, OSBP creates membrane contact sites between ER and Golgi, allowing the counter-transport of cholesterol and PI(4)P by the ORD. OSBP also contains an intrinsically disordered ~80 aa long N-terminal sequence, composed mostly of glycine, proline and alanine. We demonstrate that the presence of disordered N-terminus increases the Stoke’s radius of OSBP truncated proteins and limits their density and saturation level on PI(4)P-containing membrane. The N-terminus also prevents the two PH domains of OSBP dimer to symmetrically tether two PI(4)P-containing (Golgi-like) liposomes, whereas protein lacking the disordered sequence promotes symmetrical liposome aggregation. Similarly, we observe a difference in OSBP membrane distribution on tethered giant unilamellar vesicles (GUVs), based on the presence/absence of N-terminus. Protein with disordered sequence is homogeneously distributed all over the GUV surface, whereas protein without N-terminus tends to accumulate at the interface between two PI(4)P-containing GUVs. This protein accumulation leads to local overcrowding, which is reflected by slow in-plane diffusion. The effect of N-terminus is also manifested in monomeric OSBPderived proteins that tether ER-like and Golgi-like membranes in the presence of VAP-A. Findings from our in vitro experiments are confirmed in living cells, where N-terminus controls the recruitment of OSBP on Golgi membranes, its motility and the on-and-off dynamics during lipid transfer cycles. Most OSBP-related proteins contain low complexity N-terminal sequences, suggesting a general effect.
85

Étude théorique de peptides amyloidogènes : Ensemble conformationnel, oligomérisation et inhibition par des ligands peptidomimétiques / Theoretical Study of Amyloidogenic Peptide : Conformational Ensemble, Oligomerization and Inhibition by Peptidomimetic Ligands

Tran, Thi Thuy Linh 15 December 2016 (has links)
De nombreuses protéines associées aux maladies neurodégénératives humaines sont intrinsèquement désordonnées. Ce sont des protéines qui sont dépourvues de structure tertiaire ou secondaire stable dans des conditions physiologiques. Plus précisément, les protéines intrinsèquement désordonnées (IDPs) subissent diverses changements conformationnels entre la pelote aléatoire, des conformations hélicoïdales et des structures en feuillet-β, ces deux dernières étant généralement impliquées dans la reconnaissance protéine-protéine. Parmi une vingtaine de peptides amyloïdogènes connus liés aux maladies dégénératives humaines, notre étude porte sur deux protéines désordonnées: le peptide Amyloïde-β (Aβ) associé à la maladie d'Alzheimer et l'Islet Amyloid Polypeptide (IAPP) impliqué dans le diabète de type II. Aβ possède deux alloformes courants de 40 et 42 résidus, tandis que IAPP est une hormone peptidique de 37 résidus. Les agrégats de Aβ sont toxiques pour les cellules du cerveau, tandis que la fibrillisation de IAPP affecte les cellules-β du pancréas. Le mécanisme d'agrégation de ces deux peptides reste encore mal connu, mais il a été proposé qu’en solution, ces peptides visitent différentes conformations, l'une d'entre elles étant riche en feuillets-β. Cela conduirait à l’oligomérisation de ces peptides, par le biais d’interactions feuillet-β / feuillet-β et, éventuellement, à la formation de fibrilles. Le but de notre étude est de mieux caractériser la dynamique conformationnelle de ces deux peptides, dans leur forme monomérique et oligomérique. Comprendre les premières étapes de leur agrégation est crucial pour le développement de nouvelles molécules thérapeutiques efficaces contre ces protéines amyloïdes. / Many proteins associated with human neurodegenerative diseases are intrinsically disordered. They are proteins which lack stable tertiary or secondary structure under physiological conditions. More specifically, intrinsically disordered proteins (IDPs) undergo various structural conversions between random coil, helical conformations and β-strand structures, these two latter being generally involved in protein-protein recognition. Among about twenty known amyloidogenic peptides related to human degenerative diseases, we focus our study on two disordered proteins: the Amyloid-β peptide (Aβ) associated to the Alzheimer’s disease and the Islet Amyloid Polypeptide (IAPP) involved in type II diabetes. Aβ has two common alloforms of 40 and 42 residues in length, meanwhile IAPP is a 37-residues peptide hormone. Aggregates of Aβ are toxic to the brain cells, meanwhile IAPP fibrillization affects the pancreatic β-cells. The aggregation mechanism of these two peptides is not known in detail, but it was proposed that in solution, these peptides visit various conformations, one of them being rich in β-strands. This would lead to peptide oligomerization, through β-strand / β-strand interactions and eventually to the fibril formation. The aim of our study is to provide insights into the conformational dynamics of these two peptides in monomeric and oligomeric forms. Understanding the early steps of their aggregation is crucial for the development of new effective therapeutic molecules against these amyloid proteins.De nombreuses protéines associées aux maladies neurodégénératives humaines sont intrinsèquement désordonnées. Ce sont des protéines qui sont dépourvues de structure tertiaire ou secondaire stable dans des conditions physiologiques. Plus précisément, les protéines intrinsèquement désordonnées (IDPs) subissent diverses changements conformationnels entre la pelote aléatoire, des conformations hélicoïdales et des structures en feuillet-β, ces deux dernières étant généralement impliquées dans la reconnaissance protéine-protéine. Parmi une vingtaine de peptides amyloïdogènes connus liés aux maladies dégénératives humaines, notre étude porte sur deux protéines désordonnées: le peptide Amyloïde-β (Aβ) associé à la maladie d'Alzheimer et l'Islet Amyloid Polypeptide (IAPP) impliqué dans le diabète de type II. Aβ possède deux alloformes courants de 40 et 42 résidus, tandis que IAPP est une hormone peptidique de 37 résidus. Les agrégats de Aβ sont toxiques pour les cellules du cerveau, tandis que la fibrillisation de IAPP affecte les cellules-β du pancréas. Le mécanisme d'agrégation de ces deux peptides reste encore mal connu, mais il a été proposé qu’en solution, ces peptides visitent différentes conformations, l'une d'entre elles étant riche en feuillets-β. Cela conduirait à l’oligomérisation de ces peptides, par le biais d’interactions feuillet-β / feuillet-β et, éventuellement, à la formation de fibrilles. Le but de notre étude est de mieux caractériser la dynamique conformationnelle de ces deux peptides, dans leur forme monomérique et oligomérique. Comprendre les premières étapes de leur agrégation est crucial pour le développement de nouvelles molécules thérapeutiques efficaces contre ces protéines amyloïdes.
86

Intrinsically disordered proteins in molecular recognition and structural proteomics

Oldfield, Christopher John 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Intrinsically disordered proteins (IDPs) are abundant in nature, being more prevalent in the proteomes of eukaryotes than those of bacteria or archaea. As introduced in Chapter I, these proteins, or portions of these proteins, lack stable equilibrium structures and instead have dynamic conformations that vary over time and population. Despite the lack of preformed structure, IDPs carry out many and varied molecular functions and participate in vital biological pathways. In particular, IDPs play important roles in cellular signaling that is, in part, enabled by the ability of IDPs to mediate molecular recognition. In Chapter II, the role of intrinsic disorder in molecular recognition is examined through two example IDPs: p53 and 14-3-3. The p53 protein uses intrinsically disordered regions at its N- and C-termini to interact with a large number of partners, often using the same residues. The 14-3-3 protein is a structured domain that uses the same binding site to recognize multiple intrinsically disordered partners. Examination of the structural details of these interactions highlights the importance of intrinsic disorder and induced fit in molecular recognition. More generally, many intrinsically disordered regions that mediate interactions share similar features that are identifiable from protein sequence. Chapter IV reviews several models of IDP mediated protein-protein interactions that use completely different parameterizations. Each model has its relative strengths in identifying novel interaction regions, and all suggest that IDP mediated interactions are common in nature. In addition to the biologic importance of IDPs, they are also practically important in the structural study of proteins. The presence of intrinsic disordered regions can inhibit crystallization and solution NMR studies of otherwise well-structured proteins. This problem is compounded in the context of high throughput structure determination. In Chapter III, the effect of IDPs on structure determination by X-ray crystallography is examined. It is found that protein crystals are intolerant of intrinsic disorder by examining existing crystal structures from the PDB. A retrospective analysis of Protein Structure Initiative data indicates that prediction of intrinsic disorder may be useful in the prioritization and improvement of targets for structure determination.
87

Probing and Modeling Biomolecule-Nanoparticle Interactions by Solution Nuclear Magnetic Resonance Spectroscopy

Xie, Mouzhe 04 December 2018 (has links)
No description available.
88

Étude de complexes non-covalents et de polymères organiques par couplage entre la spectrométrie de masse et la mobilité ionique / Structural study of non-covalent complexes and organic polymers by mass spectrometry coupled with ion mobility

Ballivian, Renaud 28 October 2010 (has links)
L’étude de la structure de complexes non-covalents présente un intérêt fondamental dans la recherche en chimie des protéines. Le premier objectif est de caractériser les interactions physico-chimiques sur lesquelles repose l’adoption d’une structure tridimensionnelle fonctionnelle par un édifice multimoléculaire. Le second objectif est de mettre en évidence les changements structuraux induits par le phénomène de complexation, et leur influence sur la fonction du système. Le couplage entre la spectrométrie de masse et la mobilité ionique (IM/MS) est une technique d’étude structurale en phase gazeuse, dont le principe repose sur la séparation d’ions selon leur forme et leur rapport masse sur charge, et qui permet en outre de mesurer leurs sections efficaces de diffusion. Grâce à cette technique, nous avons réalisé l’étude structurale de trois complexes non-covalents : l’agrégation de molécules de tanin sur la protéine salivaire humaine IB5, la fixation du ligand Ac2KAA sur la vancomycine, et la complexation de cations métalliques sur des polymères poly-lactide. L’évolution des sections efficaces en fonction de la taille du système ou de l’état de complexation met en évidence la présence de transitions structurales. De plus, utilisé avec de la modélisation moléculaire ou de la spectroscopie laser, le couplage IM/MS s’avère pertinent pour caractériser les interactions responsables de la stabilisation de tels complexes. Ces travaux de thèse montrent que cette technique , au-delà du simple aspect analytique (séparation d’isomères), peut également être utilisée au sein d’études plus globales, mettant en jeu plusieurs techniques afin de résoudre la structure de systèmes complexes / Knowing the structure of non-covalent complexes is essential to understand many biological processes. The first step is the characterization of the interactions leading to the adoption of a functional tridimensional structure by a multimeric assembly. The second step consists of underlining the structural modifications induced by the complexation, and their influence on the system’s function. The Ion Mobility/Mass Spectrometry (IM/MS) is a gas-phase method that is used to separate ions according to their geometry and their masse-to-charge ratio. IM/MS also provides insights on their intrinsic properties, by measuring their collision cross sections. Using this method, we have studied the structure of three different non-covalent complexes: the aggregation of tannins on the human salivary protein IB-5, the fixation of a small ligand (Ac2KAA) on vancomycin, and the complexation between metallic cations and poly-lactid polymers. The evolution of the collision cross-sections as a function of the size of the system or the complexation state clearly shows structural transitions. Moreover, combined with molecular modeling or laser spectroscopy, the IM/MS technique reveals to be a powerful tool to characterize the relevant interactions in such systems. This work proves that IM/MS, besides a powerful analytical aspect, can also be used in global studies that involve several structural methods to resolve the structure of large multimeric assemblies
89

Structure and dynamics of intrinsically disordered regions of MAPK signalling proteins / Structure et dynamique des régions intrinsèquement désordonnées des MAPK

Kragelj, Jaka 11 December 2014 (has links)
Les voies de transduction du signal cellulaire permettent aux cellules de répondre aux signaux de l'environnement et de les traiter. Les voies de transduction de kinases MAP (MAPK) sont bien conservées dans toutes les cellules eucaryotes et sont impliquées dans la régulation de nombreux processus cellulaires importants. Les régions intrinsèquement désordonnées (RID), présentes dans de nombreuses MAPK, n'étaient pas encore structurellement caractérisées. Les RID de MAPK sont particulièrement importantes car elles contiennent des motifs de liaison qui contrôlent les interactions entre les protéines MAPK elles-mêmes et aussi entre les protéines MAPK et d'autres protéines contenant les mêmes motifs. La résonance magnétique nucléaire (RMN) en combinaison avec d'autres techniques biophysiques a été utilisée pour étudier les RID de kinase des voies de transduction du signal MAPK. La spectroscopie RMN est bien adaptée pour l'étude des protéines intrinsèquement désordonnées à l'échelle atomique. Les déplacements chimiques et couplages dipolaires résiduels peuvent être utilisés conjointement avec des méthodes de sélection d'ensemble pour étudier la structure résiduelle dans les RID. La relaxation de spin nucléaire nous renseigne sur les mouvements rapides. Des titrations par RMN et des techniques de spectroscopie d'échange peuvent être utilisées pour surveiller la cinétique d'interactions protéine-protéine. Cette étude contribuera à la compréhension du rôle des RID dans les voies de transduction du signal cellulaire. / Protein signal transduction pathways allow cells respond to and process signals from the environment. A group of such pathways, called mitogen-activated protein kinase (MAPK) signal transduction pathways, is well conserved in all eukaryotic cells and is involved in regulating many important cell processes. Long intrinsically disordered region (IDRs), present in many MAPKs, have remained structurally uncharacterised. The IDRs of MAPKs are especially important as they contain docking-site motifs which control the interactions between MAPK proteins themselves and also between MAPKs and other interacting proteins containing the same motifs. Nuclear magnetic resonance (NMR) spectroscopy in combination with other biophysical techniques was used to study IDRs of MAPKs. NMR spectroscopy is well suited for studying intrinsically disordered proteins (IDPs) at atomic-level resolution. NMR observables, such as for example chemical shifts and residual dipolar couplings, can be used together with ensemble selection methods to study residual structure in IDRs. Nuclear spin relaxation informs us about fast pico-nanosecond motions. NMR titrations and exchange spectroscopy techniques can be used to monitor kinetics of protein-protein interactions. The mechanistic insight into function of IDRs and motifs will contribute to understanding of how signal transduction pathways work.
90

Seleção de características e predição intrinsecamente multivariada em identificação de redes de regulação gênica / Feature selection and intrinsically multivariate prediction in gene regulatory networks identification

Martins Junior, David Corrêa 01 December 2008 (has links)
Seleção de características é um tópico muito importante em aplicações de reconhecimento de padrões, especialmente em bioinformática, cujos problemas são geralmente tratados sobre um conjunto de dados envolvendo muitas variáveis e poucas observações. Este trabalho analisa aspectos de seleção de características no problema de identificação de redes de regulação gênica a partir de sinais de expressão gênica. Particularmente, propusemos um modelo de redes gênicas probabilísticas (PGN) que devolve uma rede construída a partir da aplicação recorrente de algoritmos de seleção de características orientados por uma função critério baseada em entropia condicional. Tal critério embute a estimação do erro por penalização de amostras raramente observadas. Resultados desse modelo aplicado a dados sintéticos e a conjuntos de dados de microarray de Plasmodium falciparum, um agente causador da malária, demonstram a validade dessa técnica, tendo sido capaz não apenas de reproduzir conhecimentos já produzidos anteriormente, como também de produzir novos resultados. Outro aspecto investigado nesta tese é o fenômeno da predição intrinsecamente multivariada (IMP), ou seja, o fato de um conjunto de características ser um ótimo caracterizador dos objetos em questão, mas qualquer de seus subconjuntos propriamente contidos não conseguirem representá-los de forma satisfatória. Neste trabalho, as condições para o surgimento desse fenômeno foram obtidas de forma analítica para conjuntos de 2 e 3 características em relação a uma variável alvo. No contexto de redes de regulação gênica, foram obtidas evidências de que genes alvo de conjuntos IMP possuem um enorme potencial para exercerem funções vitais em sistemas biológicos. O fenômeno conhecido como canalização é particularmente importante nesse contexto. Em dados de microarray de melanoma, constatamos que o gene DUSP1, conhecido por exercer função canalizadora, foi aquele que obteve o maior número de conjuntos de genes IMP, sendo que todos eles possuem lógicas de predição canalizadoras. Além disso, simulações computacionais para construção de redes com 3 ou mais genes mostram que o tamanho do território de um gene alvo pode ter um impacto positivo em seu teor de IMP com relação a seus preditores. Esta pode ser uma evidência que confirma a hipótese de que genes alvo de conjuntos IMP possuem a tendência de controlar diversas vias metabólicas cruciais para a manutenção das funções vitais de um organismo. / Feature selection is a crucial topic in pattern recognition applications, especially in bioinformatics, where problems usually involve data with a large number of variables and small number of observations. The present work addresses feature selection aspects in the problem of gene regulatory network identification from expression profiles. Particularly, we proposed a probabilistic genetic network model (PGN) that recovers a network constructed from the recurrent application of feature selection algorithms guided by a conditional entropy based criterion function. Such criterion embeds error estimation by penalization of rarely observed patterns. Results from this model applied to synthetic and real data sets obtained from Plasmodium falciparum microarrays, a malaria agent, demonstrate the validity of this technique. This method was able to not only reproduce previously produced knowledge, but also to produce other potentially relevant results. The intrinsically multivariate prediction (IMP) phenomenon has been also investigated. This phenomenon is related to the fact of a feature set being a nice predictor of the objects in study, but all of its properly contained subsets cannot predict such objects satisfactorily. In this work, the conditions for the rising of this phenomenon were analitically obtained for sets of 2 and 3 features regarding a target variable. In the gene regulatory networks context, evidences have been achieved in which target genes of IMP sets possess a great potential to execute vital functions in biological systems. The phenomenon known as canalization is particularly important in this context. In melanoma microarray data, we verified that DUSP1 gene, known by having canalization function, was the one which composed the largest number of IMP gene sets. It was also verified that all these sets have canalizing predictive logics. Moreover, computational simulations for generation of networks with 3 or more genes show that the territory size of a target gene can contribute positively to its IMP score with regard to its predictors. This could be an evidence that confirms the hypothesis stating that target genes of IMP sets are inclined to control several metabolic pathways essential to the maintenance of the vital functions of an organism.

Page generated in 0.0845 seconds