1 |
Πρόβλεψη πρωτεϊνικής λειτουργίας με χρήση μεθόδου συγχρονισμού σύνθετων δικτύωνΤσιούτσιου, Βάια 11 October 2013 (has links)
Οι πρωτεϊνικές αλληλεπιδράσεις (PPI) αναφέρονται στην σύνδεση δύο ή περισσοτέρων πρωτεϊνών ώστε να εκτελεστεί μια βιολογική λειτουργία. Την τελευταία δεκαετία, νέες τεχνολογίες υψηλής απόδοσης για τον εντοπισμό αυτών των αλληλεπιδράσεων έχουν παράγει μεγάλης κλίμακας σύνολα δεδομένων τόσο του ανθρώπου όσο και των περισσοτέρων ειδών. Με την αναπαράσταση αυτών των δεδομένων σε δίκτυα, με τους κόμβους να αναπαριστούν τις πρωτεΐνες και τις ακμές τις αλληλεπιδράσεις, μπορούν να εξαχθούν χρήσιμες πληροφορίες σχετικά με τον προσδιορισμό της λειτουργίας των πρωτεϊνών/πρόβλεψη ή σχετικά με το πώς να σχεδιαστούν κατάλληλα φάρμακα που προσδιορίζουν τα νέα γονίδια-στόχους για τον καρκίνο ή τους μηχανισμούς που ελέγχουν (ή ρυθμίζουν) τις βιολογικές αλληλεπιδράσεις που είναι υπεύθυνες για την καλή ή την κακή λειτουργία ενός κυττάρου.
Στα πλαίσια της παρούσας διπλωματικής, κληθήκαμε να κάνουμε λειτουργική πρόβλεψη των πρωτεϊνών στο δίκτυο πρωτεϊνικών αλληλεπιδράσεων του ανθρώπου εφαρμόζοντας μια μέθοδο δυναμικής επικάλυψης η οποία βασίζεται στον έλεγχο του πώς οι ταλαντωτές οργανώνονται σε ένα «αρθρωτό»(modular) δίκτυο σχηματίζοντας διεπαφές συγχρονισμού και κοινότητες επικάλυψης. Μελετήσαμε το δίκτυο πρωτεϊνικών αλληλεπιδράσεων του ανθρώπου και τις κλάσεις λειτουργιών θεωρώντας ένα σύνολο ταλαντωτών φάσης (phase oscillators) με μία τοπολογία συνδέσεων που ορίζεται από το δίκτυο πρωτεϊνικών αλληλεπιδράσεων του ανθρώπου. Συγκεκριμένα, αρχίσαμε με μία απλή ομαδοποίηση για κάθε πρωτεΐνη και έπειτα χρησιμοποιήσαμε την μέθοδο δυναμικής επικάλυψης για τον προσδιορισμό των λειτουργιών των πρωτεϊνών του PPI δικτύου. Στην συνέχεια, εντοπίσαμε εκείνες τις πρωτεΐνες οι οποίες δεν είχαν ομαδοποιηθεί σωστά καθώς και τις πρωτεϊνες που ήταν πιθανόν να «συμμετείχαν» σε περισσότερες από μία λειτουργικές κλάσεις (πολυλειτουργικές πρωτεΐνες).
Με κατάλληλο έλεγχο των αλληλεπιδράσεων μεσαίας κλίμακας του δικτύου των δυναμικών συστημάτων που δημιουργήθηκε παρήχθησαν χρήσιμες πληροφορίες για τις μικρής και μεγάλης κλίμακας διαδικασίες μέσω των οποίων οι βιολογικές διεργασίες οργανώνονται σε ένα κύτταρο γεγονός που αποκαλύπτει ότι η μέθοδος είναι ικανή όχι μόνο να εντοπίσει τις μη σωστά ομαδοποιημένες πρωτεΐνες αλλά και να αποκαλύψει αυτές που έχουν διπλή λειτουργικότητα (2 λειτουργίες). / Protein-protein interactions (PPI) refer to the binding of two or more proteins to perform a biological function. In the last decade, novel high-throughput technologies for detecting those interactions have produced large-scale data sets across human and most model species. By embedding these data in networks, with nodes representing proteins and edges the detected PPIs, useful information can be extracted regarding protein functional annotation/prediction or on how to design proper drugs, identifying new targets on cancer, or mechanisms to control (or regulate) the biological interactions responsible for the functioning,or malfunctioning, of a cell.
Under the framework of my master thesis, I had to apply a method of dynamical overlap based on the inspection of how oscillators organize in a modular network by forming synchronization interfaces and overlapping communities to the human PPI network. I studied the human protein interaction network (PIN) and its functional modules by considering an ensemble of phase oscillators with a topology of connections defined by the human PIN. In particular, I started with a single classification for each protein and I used the dynamical overlap method for identifying/predicting of the proteins function(s) in the PPI network. Then, I identified all those proteins that were misclassified and those proteins that were likely to be involved in more than one of the functional categories in our data(multifunctional proteins).
A proper inspection on the meso-scale interactions of the generated network of dynamical systems provided useful information on the micro- and macro- scale processes through which biological processes are organized in a cell, that is, the method is not only able to identify the misclassified proteins but also to unveil those proteins that have double functionality.
|
2 |
Multi-resolution Visualization Of Large Scale Protein Networks Enriched With Gene Ontology AnnotationsYasar, Sevgi 01 September 2009 (has links) (PDF)
Genome scale protein-protein interactions (PPIs) are interpreted as networks or graphs with thousands of nodes from the perspective of computer science. PPI networks represent various types of possible interactions among proteins or genes of a genome. PPI data is vital in protein function prediction since functions of the cells are performed by groups of proteins interacting with each other and main complexes of the cell are made of proteins interacting with each other.
Recent increase in protein interaction prediction techniques have made great amount of protein-protein interaction data available for genomes. As a consequence, a systematic visualization and analysis technique has become crucial.
To the best of our knowledge, no PPI visualization tool consider multi-resolution viewing of PPI network. In this thesis, we implemented a new approach for PPI network visualization
which supports multi-resolution viewing of compound graphs. We construct compound nodes and label them by using gene set enrichment methods based on Gene Ontology annotations.
This thesis further suggests new methods for PPI network visualization.
|
3 |
Computational Selection and Prioritization of Disease Candidate GenesChen, Jing 28 August 2008 (has links)
No description available.
|
4 |
De novo genome-scale prediction of protein-protein interaction networks using ontology-based background knowledgeNiu, Kexin 18 July 2022 (has links)
Proteins and their function play one of the most essential roles in various biological processes. The study of PPI is of considerable importance. PPI network data are of great scientific value, however, they are incomplete and experimental identification is time and money consuming. Available computational methods perform well on model organisms’ PPI prediction but perform poorly for a novel organism. Due to the incompleteness of interaction data, it is challenging to train a model for a novel organism. Also, millions to billions of interactions need to be verified which is extremely compute-intensive.
We aim to improve the performance of predicting whether a pair of proteins will interact, with only two sequences as input. And also efficiently predict a PPI network with a proteome of sequences as input.
We hypothesize that information about cellular locations where proteins are
active and proteins' 3D structures can help us to significantly improve predict performance.
To overcome the lack of experimental data, we use predicted structures by AlphaFold2 and cellular locations by DeepGoPlus.
We believe that proteins belonging to disjoint biological components have very little chance to interact. We manually choose several disjoint pairs and further confirmed it by experimental PPI.
We generate new no-interaction pairs with disjoint classes to update the D-SCRIPT dataset. As result, the AUPR has improved by 10% compared to the D-SCRIPT dataset. Besides, we pre-filter the negatives instead of enumerating all the potential PPI for de-novo PPI network prediction. For E.coli, we can pass around a million negative interactions.
To combine the structure and sequence information, we generate a graph for each protein. A graph convolution network using Self-Attention Graph Pooling in Siamese architecture is used to learn these graphs for PPI prediction. In this way, we can improve around 20% in AUPR compared to our baseline model D-SCRIPT.
|
5 |
Novel Monte Carlo Approaches to Identify Aberrant Pathways in CancerGu, Jinghua 27 August 2013 (has links)
Recent breakthroughs in high-throughput biotechnology have promoted the integration of multi-platform data to investigate signal transduction pathways within a cell. In order to model complicated dynamics and heterogeneity of biological pathways, sophisticated computational models are needed to address unique properties of both the biological hypothesis and the data. In this dissertation work, we have proposed and developed methods using Markov Chain Monte Carlo (MCMC) techniques to solve complex modeling problems in human cancer research by integrating multi-platform data. We focus on two research topics: 1) identification of transcriptional regulatory networks and 2) uncovering of aberrant intracellular signal transduction pathways.
We propose a robust method, called GibbsOS, to identify condition specific gene regulatory patterns between transcription factors and their target genes. A Gibbs sampler is employed to sample target genes from the marginal function of outlier sum of regression t statistic. Numerical simulation has demonstrated significant performance improvement of GibbsOS over existing methods against noise and false positive connections in binding data. We have applied GibbsOS to breast cancer cell line datasets and identified condition specific regulatory rewiring in human breast cancer.
We also propose a novel method, namely Gibbs sampler to Infer Signal Transduction (GIST), to detect aberrant pathways that are highly associated with biological phenotypes or clinical information. By converting predefined potential functions into a Gibbs distribution, GIST estimates edge directions by learning the distribution of linear signaling pathway structures. Through the sampling process, the algorithm is able to infer signal transduction directions which are jointly determined by both gene expression and network topology. We demonstrate the advantage of the proposed algorithms on simulation data with respect to different settings of noise level in gene expression and false-positive connections in protein-protein interaction (PPI) network.
Another major contribution of the dissertation work is that we have improved traditional perspective towards understanding aberrant signal transductions by further investigating structural linkage of signaling pathways. We develop a method called Structural Organization to Uncover pathway Landscape (SOUL), which emphasizes on modularized pathways structures from reconstructed pathway landscape. GIST and SOUL provide a very unique angle to computationally model alternative pathways and pathway crosstalk. The proposed new methods can bring insight to drug discovery research by targeting nodal proteins that oversee multiple signaling pathways, rather than treating individual pathways separately. A complete pathway identification protocol, namely Infer Modularization of PAthway CrossTalk (IMPACT), is developed to bridge downstream regulatory networks with upstream signaling cascades. We have applied IMPACT to breast cancer treated patient datasets to investigate how estrogen receptor (ER) signaling pathways are related to drug resistance. The identified pathway proteins from patient datasets are well supported by breast cancer cell line models. We hypothesize from computational results that HSP90AA1 protein is an important nodal protein that oversees multiple signaling pathways to drive drug resistance. Cell viability analysis has supported our hypothesis by showing a significant decrease in viability of endocrine resistant cells compared with non-resistant cells when 17-AAG (a drug that inhibits HSP90AA1) is applied.
We believe that this dissertation work not only offers novel computational tools towards understanding complicated biological problems, but more importantly, it provides a valuable paradigm where systems biology connects data with hypotheses using computational modeling. Initial success of using microarray datasets to study endocrine resistance in breast cancer has shed light on translating results from high throughput datasets to biological discoveries in complicated human disease studies. As the next generation biotechnology becomes more cost-effective, the power of the proposed methods to untangle complicated aberrant signaling rewiring and pathway crosstalk will be finally unleashed. / Ph. D.
|
6 |
Aprendizado de Máquina e Biologia de Sistemas aplicada ao estudo da Síndrome de Microdeleção 22q11Alves, Camila Cristina de Oliveira. January 2019 (has links)
Orientador: Lucilene Arilho Ribeiro Bicudo / Resumo: A Síndrome de Microdeleção 22q11 (SD22q11), causada por uma deleção de aproximadamente 3Mb na região 22q11, apresenta uma frequencia média de 1 em 4000 a 9800 nascidos vivos sendo considera a síndrome de microdeleção mais frequente e a segunda causa mais comum de atraso no desenvolvimento e de doença congênita grave, após a síndrome de Down. De acordo com o tamanho e a localização da deleção, diferentes genes podem ser afetados e o principal gene considerado como responsável pelos sinais clássicos da síndrome é o TBX1. A SD22q11 caracteriza-se por um espectro fenotípico bastante amplo, com efeitos pleiotrópicos que resultam no acometimento de praticamente todos os órgãos e/ou sistemas, altamente variáveis com mais de 180 sinais clínicos já descritos, tanto físicos como comportamentais. Nesse trabalho aplicamos ferramentas de bioinformática com o intuito de descobrir padrões clínicos e sistêmicos da deleção 22q11, classificando casos sindrômicos em típicos e atípicos e estudando o impacto da deleção em redes de interação proteína-proteína (PPI). Para avaliação dos sinais clínicos que pudessem diferenciar pacientes sindrômicos foi aplicado uma metodologia baseada em aprendizado de máquina para classificar os casos em típico e atípico de acordo com os sinais clínicos através do algoritmo J48 (um algoritmo de árvore de decisão). As árvores de decisão selecionadas foram altamente precisas. Sinais clínicos como fissura oral, insuficiência velofaríngea, atraso no desenvolvimento de ... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The 22q11 Microdeletion Syndrome (22q11DS), caused by a deletion of approximately 3Mb in the 22q11 region, has an average frequency of 1 in 4000 to 9800 live births and is considered the most frequent microdeletion syndrome and the second most common cause of developmental delay and severe congenital disease after Down syndrome. According to the size and location of the deletion, different genes may be affected and the main gene considered to be responsible for the classic signs of the syndrome is TBX1. 22q11DS is characterized by a very broad phenotypic spectrum with pleiotropic effects that result in the involvement of variable organs and/or systems with more than 180 clinical signs already described, both physical and behavioral. In this work, we applied bioinformatics tools to detect clinical and systemic patterns of 22q11 deletion, classifying typical and atypical syndromic cases, and studying the impact of deletion on protein-protein interaction (PPI) networks. To evaluate clinical signs that could differentiate syndromic patients, a machine-learning based methodology was used to classify the cases into typical and atypical according to the clinical signs through the algorithm J48 (a decision tree algorithm). The selected decision trees were highly accurate. Clinical signs such as oral fissure, velopharyngeal insufficiency, speech and language development delay, specific learning disability, behavioral abnormality and growth delay were indicative for case classification... (Complete abstract click electronic access below) / Mestre
|
7 |
Etude du rôle de la protéine CDC48 dans l'immunité des plantes / Study of the role of the CDC48 chaperone protein in plant immunityBegue, Hervé 22 November 2018 (has links)
La protéine chaperonne CDC48 (Cell division cycle 48) est un acteur important du contrôle qualité des protéines chez les eucaryotes et est associée à divers processus physio(patho)logiques chez les mammifères. En revanche, son rôle au sein du règne végétal a été peu appréhendé. Ce travail de thèse s’inscrit dans l’étude des fonctions de CDC48 chez les plantes et concerne plus particulièrement son implication dans la réponse immunité induite chez le tabac par cryptogéine produite par l’oomycète phytophthora cryptogea.Trois stratégies ont été adoptées. Premièrement, la dynamique d’accumulation de la protéine CDC48 ainsi que les événements intracellulaires sous-jacents à la réponse immunitaire ont été étudiés à la fois dans des cellules de tabac sauvages et des cellules sur-exprimant la protéine CDC48 (lignée CDC48-TAP). Deuxièmement, une liste de protéines interagissant avec CDC48 a été établie suite à des expériences d’immuno-précipitation de CDC48 suivit d’analyses de spectrométrie de masse. Parmi celles-ci, la forme cytosolique de l’ascorbate peroxydase (cAPX), une enzyme impliquée dans la détoxication du H2O2 intracellulaire, a fait l’objet d’une étude ciblée. Enfin, ces travaux ont été complétés par une analyse bio-informatique de l’ensemble des partenaires de CDC48 identifiés chez le tabac et d’établissement du réseau d’interaction protéique de CDC48 chez Arabidopsis thaliana.Les principaux résultats obtenus ont montré que l’activation de la réponse immunitaire s’accompagne de l’induction d’une accumulation des transcrits et la protéine CDC48. De plus, une mort cellulaire précoce a été observée chez les cellules CDC48-TAP, suggérant un rôle de cette dernière dans la régulation de la réponse hypersensible. L’interaction physique entre CDC48 et cAPX a été confirmée par différentes approches. De façon intéressante, il s’est avéré que l’activité et la dynamique d’accumulation de cAPX sont fortement impactées par la surexpression de CDC48. En accord avec ses résultats, le statut rédox s’est également révélé altéré dans la lignée surexpresseur. Enfin, l’analyse bio-informatique du réseau d’interaction protéique de CDC48 a permis de dégager de nouvelles protéines cibles, en particulier celles impliquées dans le métabolisme de la S-adenosylméthionine, une molécule substrat des réactions de trans-méthylation et précurseur de l’éthylène et de la nicotianamine. De plus, cette analyse a confirmé son rôle dans du système de dégradation Ubiquitine/protéasome.Pour conclure, ce travail de thèse apporte de nouvelles informations quant au rôle de CDC48 dans la biologie des plantes. Il indique que celle-ci est mobilisée dans les cellules végétales exprimant une réponse immunitaire et impacte le statut rédox via la régulation du turnover de cAPX. De nouvelles pistes de recherche ont été dégagées, en particulier un rôle probable de CDC48 dans la régulation de la synthèse de la S-adenosylméthionine et de la réponse hypersensible suivant des mécanismes restant à déterminer. / The chaperone protein CDC48 (Cell division cycle 48) is a major regulator of the quality control of proteins and is involved in various cellular processes in animals and yeast. In contrast, the role of CDC48 in plants is poorly known. In the present work, we investigated the function of CDC48 in plant immunity thanks to the cryptogein/tobacco biological model, cryptogein being produced by the oomycete phytophthora cryptogea.Three strategies were carried out. First, the dynamic of accumulation CDC48 together with intracellular events inherent to the immune response were analyzed in both wild-type and CDC48 overexpressing tobacco cells (CDC48-TAP line). Second, a list if CDC48 partners was established based on immunoprecipitation assays followed by mass spectroscopy analysis. Among those partners the cytosolic form of acorbate peroxidase (cAPX), a central enzyme of the regulation of the redox status regulation, has been specifically studied. Finally, a computational analysis of the partner list of CDC48 and the subsequent generation of the protein-protein interaction (PPI) network of CDC48 in Arabidopsis thaliana were undertook.Our data indicated that the activation of the immune response is accompanied by an induction of the accumulation of both CDC48 transcript and protein. In addition, an early and exacerbated cell death was observed in the CDC48-TAP line, suggesting a role for CDC48 in the hypersensitive response. The interaction between CDC48 and cAPX was confirmed by different approaches. Interestingly, the activity of CDC48 and its dynamic of accumulation were strongly impacted in the CDC48 overexpressing line. Accordingly, a dysregulation of the redox status also occurred in this line. Finally, the computational analysis of the CDC48 PPI network highlighted new potential target proteins including proteins involved in the metabolism of S-adenosylmethionine, a substrate molecule of trans-methylation reactions and precursor of ethylene and nicotianamine.To summarize, this work provides new information about CDC48 in plant biology. It indicates that CDC48 is mobilized by plant cells undergoing an immune response and impacts the redox status through the regulation of the cAPX turnover. New research avenues emerged from our study, notably a putative role of CDC48 in the regulation of S-adenosylmethionine biosynthesis and in the establishment of hypersensitive response through process which remain to be investigated
|
8 |
Working Together: Using protein networks of bacterial species to compare essentiality, centrality, and conservation in Escherichia coli.Wimble, Christopher 01 January 2015 (has links)
Proteins in Escherichia coli were compared in terms of essentiality, centrality, and conservation. The hypotheses of this study are: for proteins in Escherichia coli, (1) there is a positive, measureable correlation between protein conservation and essentiality, (2) there is a positive relationship between conservation and degree centrality, and (3) essentiality and centrality also have a positive correlation. The third hypothesis was supported by a moderate correlation, the first with a weak correlation, and the second hypotheis was not supported. When proteins that did not map to orthologous groups and proteins that had no interactions were removed, the relationship between essentality and conservation increased to a strong relationship. This was due to the effect of proteins that did not map to orthologus groups and suggests that protein orthology represented by clusters of orthologus groups does not accurately dipict protein conservation among the species studied.
|
9 |
Machine Learning Methods For Using Network Based Information In Microrna Target PredictionSualp, Merter 01 February 2013 (has links) (PDF)
Computational microRNA (miRNA) target identification in animal genomes is a challenging problem due to the imperfect pairing of the miRNA with the target site. Techniques based on sequence alone are prone to produce many false positive interactions. Therefore, integrative techniques have been developed to utilize additional genomic, structural features, and evolu- tionary conservation information for reducing the high false positive rate. We propose that the context of a putative miRNA target in a protein-protein interaction (PPI) network can be used as an additional filter in a computational miRNA target pr ediction algorithm. We compute several graph theoretic measures on human PPI network as indicators of network context. We assess the performance of individual and combined contextual measures in increasing the precision of a popular miRNA target prediction tool, TargetScan, using low throughput and high throughput datasets of experimentally verified human miRNA targets. We used clas- sification algorithms for that assessment. Since there exists only miRNA targets as training samples, this problem becomes a One Class Classification (OCC) problem. We devised a novel OCC method, DiVo, based on simple distance metrics and voting. Comparative analysis with the state of the art methods show that, DiVo attains better classification performance. Our eventual results indicate that topological properties of target gene products in PPI networks are valuable sources of information for filtering out false positive miRNA target genes. We show that, for targets of a number of miRNAs, netwo rk context correlates better with being a target compared to a sequence based score provided by the prediction tool.
|
10 |
Machine Learning Methods For Using Network Based Information In Microrna Target PredictionSualp, Merter 01 February 2013 (has links) (PDF)
Computational microRNA (miRNA) target identification in animal genomes is a challenging problem due to the imperfect pairing of the miRNA with the target site. Techniques based on sequence alone are prone to produce many false positive interactions. Therefore, integrative techniques have been developed to utilize additional genomic, structural features, and evolu- tionary conservation information for reducing the high false positive rate. We propose that the context of a putative miRNA target in a protein-protein interaction (PPI) network can be used as an additional filter in a computational miRNA target prediction algorithm. We compute several graph theoretic measures on human PPI network as indicators of network context. We assess the performance of individual and combined contextual measures in increasing the precision of a popular miRNA target prediction tool, TargetScan, using low throughput and high throughput datasets of experimentally verified human miRNA targets. We used clas- sification algorithms for that assessment. Since there exists only miRNA targets as training samples, this problem becomes a One Class Classification (OCC) problem. We devised a novel OCC method, DiVo, based on simple distance metrics and voting. Comparative analysis with the state of the art methods show that, DiVo attains better classification performance. Our eventual results indicate that topological properties of target gene products in PPI networks are valuable sources of information for filtering out false positive miRNA target genes. We show that, for targets of a number of miRNAs, network context correlates better with being a target compared to a sequence based score provided by the prediction tool.
|
Page generated in 0.1183 seconds