311 |
Analyses et prédictions bioinformatiques de réseaux d'interactions protéine-protéines contextualisésSouiai, Oussema 15 June 2011 (has links)
Mes travaux de thèse ont pour objet l'analyse et les prédictions bioinformatiques de réseaux d'interactions protéines-protéines contextualisés. Au cours de la première partie de mes travaux nous, avons prédit des interactomes tissulaires sur la base de la co-expression des deux interacteurs composant l'interaction dans un tissu. Par la suite nous avons analysé les caractéristiques fonctionnelles et topologiques des interactomes prédits. Cette analyse a permis de mettre en évidence l'existence d'un noyau d'interactions centrales dédiées aux fonctions de ménages, des interactions spécifiques localisées au centre dédiées aux processus de régulation et des interactions spécifiques localisées à la périphérie et dédiées aux accomplissements des fonctions physiologiques. Au cours de la deuxième partie de mes travaux, nous nous sommes intéressés à la contextualisation d'un interactome de macrophage via l'intégration de méta-données et des données de génomique (données d'expressions, annotation de termes) décrivant les interactions. Les résultats de la comparaison entre les analyses de trascriptomes et d'interactomes de macrophage suite à l'infection par le Mycobacterium tuberculosis se sont avérés complémentaires. En effet, alors que les analyses de transcriptomes mettent en évidence des processus immunitaires déployés par l'hôte, l'analyse des interactomes fait émerger des fonctions tout aussi cruciales pour l'éradication du pathogène telles que l'apoptose et sa régulation. / This work aims at contextualizing and studying contextualized protein interaction networks. The first topic of my investigations is about predicting and analyzing tissular interactomes. Combined functional and topological analyses were performed. The combination of these features highlighted the existence of a functional core centrally located dedicated to housekeeping functions, central tissue-specific interactions involved in regulatory and developmental functions and peripheral tissue-specific interactions involved in organ physiological functions. This gradient of functions recapitulates the organization of organs, from cells to organs. The second topic of my thesis is the contextualization of macrophage interaction network. To infer the most likely macrophage interactome, we integrated the PPI dataset with other type of meta-data, statistically evaluated them and proposed a macrophage-contextualized interactome. The set of selected interactions is enriched in : experimentally verified interactions and immune related Biological Processes. The functional analysis of such networks brings valuable information on the cellular and molecular mechanisms sustaining the infection.
|
312 |
A method for identification of putatively co-regulated genesAndersson, Malin January 2002 (has links)
The genomes of several organisms have been sequenced and the need for methods to analyse the data is growing. In this project a method is described that tries to identify co-regulated genes. The method identifies transcription factor binding sites, documented in TRANSFAC, in the non-coding regions of genes. The algorithm counts the number of common binding sites and the number of unique binding sites for each pair of genes and decides if the genes are co-regulated. The result of the method is compared with the correlation between the gene expression patterns of the genes. The method is tested on 21 gene pairs from the genome of Saccharomyces cerevisiae. The algorithm first identified binding sites from all organisms. The accuracy of the program was very low in this case. When the algorithm was modified to only identify binding sites found in plants the accuracy was much improved, from 52% to 76% correct predictions.
|
313 |
Deriving Protein Networks by Combining Gene Expression and Protein Chip AnalysisGunnarsson, Ida January 2002 (has links)
In order to derive reliable protein networks it has recently been suggested that the combination of information from both gene and protein level is required. In this thesis a combination of gene expression and protein chip analysis was performed when constructing protein networks. Proteins with high affinity to the same substrates and encoded by genes with high correlation is here thought to constitute reliable protein networks. The protein networks derived are unfortunately not as reliable as were hoped for. According to the tests performed, the method derived in this thesis does not perform more than slightly better than chance. However, the poor results can depend on the data used, since mismatching and shortage of data has been evident.
|
314 |
Deriving Genetic Networks Using Text MiningOlsson, Elin January 2002 (has links)
On the Internet an enormous amount of information is available that is represented in an unstructured form. The purpose with a text mining tool is to collect this information and present it in a more structured form. In this report text mining is used to create an algorithm that searches abstracts available from PubMed and finds specific relationships between genes that can be used to create a network. The algorithm can also be used to find information about a specific gene. The network created by Mendoza et al. (1999) was verified in all the connections but one using the algorithm. This connection contained implicit information. The results suggest that the algorithm is better at extracting information about specific genes than finding connections between genes. One advantage with the algorithm is that it can also find connections between genes and proteins and genes and other chemical substances.
|
315 |
Evaluation of search models for Molecular Replacement using MolRepPasalic, Zlatana January 2002 (has links)
he aim of this study is to use several homology models of different completeness and accuracy and to evaluate them as search models for Molecular Replacement (MR).Three structural groups are evaluated: α-, β- and α/β- group. From every group one template structure and a couple of search models are selected. The search models are manipulated and evaluated. B-factor manipulation, side chain removal and homology modelling are the ways the search models are manipulated. This work shows that B-factor manipulation do not improve the search models. The work also shows that removing the side chains is not improving the search models. Finally the work shows that homology modelling did not model better search models.
|
316 |
Data Mining with Decision Trees in the Gene Logic Database : A Breast Cancer StudyRahpeymai, Neda January 2002 (has links)
Data mining approaches have been increasingly used in recent years in order to find patterns and regularities in large databases. In this study, the C4.5 decision tree approach was used for mining of Gene Logic database, containing biological data. The decision tree approach was used in order to identify the most relevant genes and risk factors involved in breast cancer, in order to separate healthy patients from breast cancer patients in the data sets used. Four different tests were performed for this purpose. Cross validation was performed, for each of the four tests, in order to evaluate the capacity of the decision tree approaches in correctly classifying ‘new’ samples. In the first test, the expression of 108 breast related genes, shown in appendix A, for 75 patients were used as input to the C4.5 algorithm. This test resulted in a decision tree containing only four genes considered to be the most relevant in order to correctly classify patients. Cross validation indicates an average accuracy of 89% in classifying ‘new’ samples. In the second test, risk factor data was used as input. The cross validation result shows an average accuracy of 87% in classifying ‘new’ samples. In the third test, both gene expression data and risk factor data were put together as one input. The cross validation procedure for this approach again indicates an average accuracy of 87% in classifying ‘new’ samples. In the final test, the C4.5 algorithm was used in order to indicate possible signalling pathways involving the four genes identified by the decision tree based on only gene expression data. In some of cases, the C4.5 algorithm found trees suggesting pathways which are supported by the breast cancer literature. Since not all pathways involving the four putative breast cancer genes are known yet, the other suggested pathways should be further analyzed in order to increase their credibility. In summary, this study demonstrates the application of decision tree approaches for the identification of genes and risk factors relevant for the classification of breast cancer patients
|
317 |
Development of database support for production of doubled haploidsEngerberg, Malin January 2002 (has links)
In this project relational and Lotus Notes database technology are evaluated with regard to their suitability in providing computer-based support in plant breeding in general and specifically in the production of doubled haploids. The two developed databases are compared based on a set of requirements produced together with the DH-group which is the main users of the databases. The results indicate that both Lotus Notes and the relational databases are able to fulfil all needs documented in this project, although both systems have their limitations. An often expressed opinion is that it is difficult to combine biology and databases. The experience gained in this project however suggests that it does not need to be the case in instances where data is not as complicated as often discussed. Observations made during this project indicate that data warehousing with integrated data mining and OLAP tools are surprisingly similar to how the DH-group at Svalöf Weibull works and could be a suitable solution for the production of doubled haploids.
|
318 |
Analysing subsets of gene expression data to find putatively co-regulated genesKarjalainen, Merja January 2002 (has links)
This project is an investigation of whether analysing subsets of time series gene expression data can give additional information about putatively co-regulated genes, compared to only using the whole time series. The original gene expression data set was partitioned into subsets and similarity was computed for both the whole timed series and subsets. Pearson correlation was used as similarity measure between gene expression profiles. The results indicate that analysing co-expression in subsets of gene expression data derives true-positive connections, with respect to co-regulation, that are not detected by only using the whole time series data. Unfortunately, with the actual data set, chosen similarity measure and partitioning of the data, randomly generated connections have the same amount of true-positives as the ones derived by the applied analysis. However, it is worth to continue further analysis of the subsets of gene expression data, which is based on the multi-factorial nature of gene regulation. E.g. other similarity measures, data sets and ways of partitioning the data set should be tried.
|
319 |
3DPOPS : From carbohydrate sequence to 3D structureNordström, Rickard January 2002 (has links)
In this project a web-based system called 3DPOPS have been designed, developed and implemented. The system creates initial 3D structures of oligosaccharides according to user input data and is intended to be integrated with an automatized 3D prediction system for saccharides. The web interface uses a novel approach with a dynamically updated graphical representation of the input carbohydrate. The interface is embedded in a web page as a Java applet. Both expert and novice users needs are met by informative messages, a familiar concept and a dynamically updated graphical user interface in which only valid input can be created. A set of test sequences was collected from the CarbBank database. An initial structure to each sequence could be created. All contained the information necessary to serve as starting points in a conformation search carried out by a 3D prediction system for carbohydrates.
|
320 |
Comparing NR Expression among Metabolic Syndrome Risk FactorsJacobsson, Annelie January 2003 (has links)
The metabolic syndrome is a cluster of metabolic risk factors such as diabetes type II, dyslipidemia, hypertension, obesity, microalbuminurea and insulin resistance, which in the recent years has increased greatly in many parts of the world. In this thesis decision trees were applied to the BioExpress database, including both clinical data about donors and gene expression data, to investigate nuclear receptors ability to serve as markers for the metabolic syndrome. Decision trees were created and the classification performance for each individual risk factor were then analysed. The rules generated from the risk factor trees were compared in order to search for similarities and dissimilarities. The comparisons of rules were performed in pairs of risk factors, in groups of three and on all risk factors and they resulted in the discovery of a set of genes where the most interesting were the Peroxisome Proliferator Activated Receptor - Alpha, the Peroxisome Proliferator Activated Receptor - Gamma and the Glucocorticoid Receptor. These genes existed in pathways associated with the metabolic syndrome and in the recent scientific literature.
|
Page generated in 0.1313 seconds