1 |
Identifying gene regulatory interactions using functional genomics dataJohansson, Annelie January 2014 (has links)
Previously studies used correlation of DNase I hypersensitivity sites sequencing (DNase-seq) experiments to predict interactions between enhancers and its target promoter gene. We investigate the correlation methods Pearson’s correlation and Mutual Information, using DNase-seq data for 100 cell-types in regions on chromosome one. To assess the performances, we compared our results of correlation scores to Hi-C data from Jin et al. 2013. We showed that the performances are low when comparing it to the Hi-C data, and there is a need of improved correlation metrics. We also demonstrate that the use of Hi-C data as a gold standard is limited, because of its low resolution, and we suggest using another gold standard in further studies.
|
2 |
Study of 3D genome organisation in budding yeast by heterogeneous polymer simulationsFahmi, Zahra January 2019 (has links)
Investigating the arrangement of the packed DNA inside the nucleus has revealed the essential role of genome organisation in controlling genome function. Furthermore, genome architecture is highly dynamic and significant chromatin re-organisation occurs in response to environmental changes. However, the mechanisms that drive the 3D organisation of the genome remain largely unknown. To understand the effect of biophysical properties of chromatin on the dynamics and structure of chromosomes, I developed a 3D computational model of the nucleus of the yeast S. cerevisiae during interphase. In the model, each chromosome was a hetero-polymer informed by our bioinformatics analysis for heterogeneous occupancy of chromatin-associated proteins across the genome. Two different conditions were modelled, normal growth (25°C) and heat shock (37°C), where a concerted redistribution of proteins was observed upon transition from one temperature to the other. Movement of chromatin segments was based on Langevin dynamics and each segment had a mobility according to their protein occupancy and the expression level of their corresponding genes. The model provides a significantly improved match with quantitative microscopy measurements of telomere positions, the distributions of 3D distances between pairs of different loci, and the mean squared displacement of a labelled locus. The quantified contacts between chromosomal segments were similar to the observed Hi-C data. At both 25°C and 37°C conditions, the segments that were highly occupied by proteins had high number of interactions with each other, and the highly transcribed genes had lower contacts with other segments. In addition, similar to the experimental observations, heat-shock genes were found to be located closer to the nuclear periphery upon activation in the simulations. It was also shown that the determined distribution of proteins along the genome is crucial to achieve the correct genome organisation. Hence, the heterogeneous binding of proteins, which results in differential mobility of chromatin segments, leads to 3D self-organisation.
|
3 |
Human genome segmentation into structural domains : from chromatin conformation data to nuclear functions / Segmentation du génome humain en domaines structuraux : des données de conformation de la chromatine aux fonctions nucléairesBoulos, Rasha 21 October 2015 (has links)
Le programme de réplication d’environ la moitié du génome des mammifères est caractérisé par des U/N-domaines de réplication de l’ordre du méga-base en taille. Ces domaines sont bordés par des origines de réplication maitresses (MaOris) correspondantes à des régions (~200 kb) de chromatine ouverte favorables à l’initiation précoce de la réplication et de la transcription. Grâce au développement récent de technologies à haut débit de capture de conformations des chromosomes (Hi-C), des matrices de fréquences de co-localisation 3D entre toutes les paires de loci sont désormais déterminées expérimentalement. Il est apparu que les U/N-domaines sont reliés à l’organisation du génome en unités structurelles. Dans cette thèse, nous avons effectué une analyse combinée de données de Hi-C de lignées cellulaires humaines et de profils de temps de réplication pour explorer davantage les relations structure/fonction dans le noyau. Cela nous a conduit à décrire de nouveaux domaines de réplication de grande tailles (>3 Mb) : les split-U-domaines aussi bordés par des MaOris; à démontrer que la vague de réplication initiée aux MaOris ne dépend que du temps pendant la phase S et de montrer que le repliement de la chromatine est compatible avec un modèle d’équilibre 3D pour les régions euchromatiniennes à réplication précoces et un modèle d’équilibre 2D pour les régions heterochromatiniennes à réplication tardives associées à la lamina nucléaire. En représentant les matrices de co-localisation issues du Hi-C en réseaux d’interactions structurelles et en déployant des outils de la théorie des graphes, nous avons aussi démontré que les MaOris sont des hubs interconnectés à longue portée dans le réseau structurel, fondamentaux pour l’organisation 3D du génome et nous avons développé une méthodologie multi-échelle basée sur les ondelettes sur graphes pour délimiter objectivement des unités structurelles à partir des données Hi-C. Ce travail nous permet de discuter de la relation entre les domaines de réplication et les unités structurelles entre les différentes lignées cellulaires humaines. / The replication program of about one half of mammalian genomes is characterized by megabase-sized replication U/N-domains. These domains are bordered by master replication origins (MaOris) corresponding to ~200 kb regions of open chromatin favorable for early initiation of replication and transcription. Thanks to recent high-throughput chromosome conformation capture technologies (Hi-C), 3D co-localization frequency matrices between all genome loci are now experimentally determined. It appeared that U/N-domains were related to the organization of the genome into structural units. In this thesis, we performed a combined analysis of human Hi-C data and replication timing profiles to further explore the structure/function relationships in the nucleus. This led us to describe novel large (>3 Mb) replication timing split-U domains also bordered by MaOris, to demonstrate that the replication wave initiated at MaOris only depends of the time during S phase and to show that chromatin folding is compatible with a 3D equilibrium in early-replicating euchromatin regions turning to a 2D equilibrium in the late-replicating heterochromatin regions associated to nuclear lamina. Representing Hi-C co-localization matrices as structural networks and deploying graph theoretical tools, we also demonstrated that MaOris are long-range interconnected hubs in the structural network, central to the 3D organization of the genome and we developed a novel multi-scale methodology based on graph wavelets to objectively delineate structural units from Hi-C data. This work allows us to discuss the relationship between replication domains and structural units across different human cell lines.
|
4 |
Hi-C實驗資料正規化 / Hi-C data normalization魏孝全 Unknown Date (has links)
本研究探討高通量染色體捕捉技術 (high-throughput chromosome conformation capture, Hi-C) 實驗所產生的關聯矩陣資料之正規化方法。已知該類實驗主要用來測量染色體之間的空間距離,正規化的目的是移除資料中的系統性偏差,本文主要針對基因特徵所造成之偏差。有別於Hu等人 (2012) 所提出的「局部基因特徵正規化法」(local genome feature normalization, LGF法),我們所提出的「二次函數正規化法」(quadratic function normalization, QF法) 建立在更為一般化的二次對數模型與負二項分配假設上。本研究透過模擬實驗以及人類淋巴細胞資料 (GSE18199) 來評估QF法的表現,並且與其他方法比較。在模擬實驗中,我們發現當模型正確時,QF法能有效消除偏差。在實例中,當基因特徵偏差被消除後,則染色體之間的相對距離在重複實驗資料之間有更為一致的結果。另一方面,我們發現實驗所採用的限制酶影響關聯矩陣的結果,而且運用這些正規化方法並不能有效消除限制酶造成的偏差。 / Recently, the high-throughput chromosome conformation capture (Hi-C) experiment is developed to explore the three-dimensional structure of genomics. To assess the chromosomal interaction, a contact matrix is produced from a Hi-C experiment. Very often, systematic technical biases appear in the contact matrix and lead to inadequate conclusions. Consequently, data normalization to remove these biases is essential and necessary prior advanced inference. In this research, we propose the so-called quadratic function normalization method, which is a modification of the local genome feature normalization (Hu et al., 2012) by considering a more general model. Simulation studies are conducted to evaluate the proposed method. When the model assumption holds, the proposed method has adequate performance. Further, a Hi-C data set of a human lymphoblastoid cell GSE18199 is employed for a comparison of our method and two existing methods. It’s observed that normalization improves the reproducibility between experimental replicates. However, the effect of normalization is lean in eliminating the bias of restriction enzymes.
|
Page generated in 0.0499 seconds