Spelling suggestions: "subject:"ehe text 1generation"" "subject:"ehe text 4egeneration""
351 |
Microfluidic Technology for Low-Input Epigenomic AnalysisZhu, Yan 25 May 2018 (has links)
Epigenetic modifications, such as DNA methylation and histone modifications, play important roles in gene expression and regulation, and are highly involved in cellular processes such as stem cell pluripotency/differentiation and tumorigenesis. Chromatin immunoprecipitation (ChIP) is the technique of choice for examining in vivo DNA-protein interactions and has been a great tool for studying epigenetic mechanisms. However, conventional ChIP assays require millions of cells for tests and are not practical for examination of samples from lab animals and patients. Automated microfluidic chips offer the advantage to handle small sample sizes and facilitate rapid reaction. They also eliminate cumbersome manual handling.
In this report, I will talk about three different projects that utilized microfluidic immunoprecipitation followed by next genereation sequencing technologies to enable low input and high through epigenomics profiling. First, I examined RNA polymerase II transcriptional regulation with microfluidic chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) assays. Second, I probed the temporal dynamics in the DNA methylome during cancer development using a transgenic mouse model with microfluidic methylated DNA immunoprecipitation followed by next generation sequencing (MeDIP-seq) assays. Third, I explored negative enrichment of circulating tumor cells (CTCs) followed by microfluidic ChIP-seq technology for studying temporal dynamic histone modification (H3K4me3) of patient-derived tumor xenograft on an immunodeficient mouse model during the course of cancer metastasis.
In the first study, I adapted microfluidic ChIP-seq devices to achieve ultrahigh sensitivity to study Pol2 transcriptional regulation from scarce cell samples. I dramatically increased the assay sensitivity to an unprecedented level (~50 K cells for pol2 ChIP-seq). Importantly, this is three orders of magnitude more sensitive than the prevailing pol2 ChIP-seq assays. I showed that MNase digestion provided better ChIP-seq signal than sonication, and two-steps fixation with MNase digestion provided the best ChIP-seq quality followed by one-step fixation with MNase digestion, and lastly, no fixation with MNase digestion.
In the second study, I probed dynamic epigenomic changes during tumorigenesis using mice often require profiling epigenomes using a tiny quantity of tissue samples. Conventional epigenomic tests do not support such analysis due to the large amount of materials required by these assays. In this study, I developed an ultrasensitive microfluidics-based methylated DNA immunoprecipitation followed by next-generation sequencing (MeDIP-seq) technology for profiling methylomes using as little as 0.5 ng DNA (or ~100 cells) with 1.5 h on-chip process for immunoprecipitation. This technology enabled me to examine genome-wide DNA methylation in a C3(1)/SV40 T-antigen transgenic mouse model during different stages of mammary cancer development. Using this data, I identified differentially methylated regions and their associated genes in different periods of cancer development. Interestingly, the results showed that methylomic features are dynamic and change with tumor developmental stage.
In the last study, I developed a negative enrichment of CTCs followed by ultrasensitive microfluidic ChIP-seq technology for profiling histone modification (H3K4Me3) of CTCs to resolve the technical challenges associated with CTC isolation and difficulties related with tools for profiling whole genome histone modification on tiny cell samples. / Ph. D. / The human genome has been sequenced and completed over a decade ago. The information provided by the genomic map inspired numerous studies on genetic variations and their roles in diseases. However, genomic information alone is not always sufficient to explain important biological processes. Gene activation and expression are not only associated with alteration in the DNA sequence, but also affected by other changes to DNA and histones. Epigenetics refers to the molecular mechanisms that affect gene expression and phenotypes without involving changes in the DNA sequence.
For example, the DNA can get methylated, the histone protein that is wrapped around by DNA can also get methylated or acetylatied, and transcription factors can bind to different part of DNA. All of these can affect gene expression without alter the DNA sequences. Epigenetic changes occur throughout all stages of cell development or in response to environmental cues. They change transcription patterns in a tissue/cell-specific fashion. For example, transcriptional silencing of tumor-suppressor genes by DNA methylation plays an important role in cancer development. Therefore, understanding of epigenetic regulations will help to improve various aspects of biomedicine. For instance, personalized medicine can be vi tailored based on epigenetic profile of certain patient to specifically control gene expression in the disease treatment. However, the technology for profiling epigenetic modifications, i.e. Chromatin Immunoprecipitation (ChIP), suffers from serious limitations. The key limitation is the sensitivity of the assay. Conventional assay requires a large number of cells (>10⁶ cells per ChIP). This is feasible when using cell lines. However, such requirement has become a major challenge when primary cells are used because very limited amounts of samples can be generated from lab animals or patients. Population heterogeneity information may also be lost when a large cell number is used.
In this project, we developed an automated ultrasensitive microfluidic chromatin/DNA immunoprecipitation followed by next-generation sequencing (ChIP/MeDIP-Seq) technology for profiling epigenetic modifications (e.g., histone modifications, transcriptional regulations, and DNA methylation). We extensively optimized design parameters for each and every step of ChIP/MeDIP (e.g. sonication/crosslinking time, antibody concentration, washing conditions) in order to reach highest sensitivity of 0.1 ng DNA (or ~50-100 cells) as starting material for IP, which is roughly 4-5 orders of magnitude higher than the prevailing protocol and 2-3 orders of magnitude higher than the-state-of-the-art(~50 ng). With such sensitivity, we were able to study temporal dynamics in the DNA methylomes during the various stages of mammary cancer development from a transgenic mouse mode. We were able to investigate transcriptional regulation of RNA polymerase II from scarce cell samples. We were also able to study histone modification (H3K4Me3) of circulating tumor cells during cancer metastasis.
|
352 |
Methods for Differential Analysis of Gene Expression and Metabolic Pathway ActivityTemate Tiagueu, Yvette Charly B, Temate Tiagueu, Yvette C. B. 09 May 2016 (has links)
RNA-Seq is an increasingly popular approach to transcriptome profiling that uses the capabilities of next generation sequencing technologies and provides better measurement of levels of transcripts and their isoforms. In this thesis, we apply RNA-Seq protocol and transcriptome quantification to estimate gene expression and pathway activity levels. We present a novel method, called IsoDE, for differential gene expression analysis based on bootstrapping. In the first version of IsoDE, we compared the tool against four existing methods: Fisher's exact test, GFOLD, edgeR and Cuffdiff on RNA-Seq datasets generated using three different sequencing technologies, both with and without replicates. We also introduce the second version of IsoDE which runs 10 times faster than the first implementation due to some in-memory processing applied to the underlying gene expression frequencies estimation tool and we also perform more optimization on the analysis.
The second part of this thesis presents a set of tools to differentially analyze metabolic pathways from RNA-Seq data. Metabolic pathways are series of chemical reactions occurring within a cell. We focus on two main problems in metabolic pathways differential analysis, namely, differential analysis of their inferred activity level and of their estimated abundance. We validate our approaches through differential expression analysis at the transcripts and genes levels and also through real-time quantitative PCR experiments. In part Four, we present the different packages created or updated in the course of this study. We conclude with our future work plans for further improving IsoDE 2.0.
|
353 |
Analysing sex determination in farmed fish using Next Generation DNA sequencingPalaiokostas, Christos January 2013 (has links)
The aim of the current thesis was the analysis of the genetics of sex determination of farmed fish with sexual dimorphism, using Next Generation Sequencing. Three different species of farmed fish with sex-determining systems of varying complexity were studied. Both full-sibs and more distantly related specimens of Atlantic halibut (Hippoglossus hippoglossus), Nile tilapia (Oreochromis niloticus) and European sea bass (Dicentrarchus labrax) were used for this study. Application of Restriction-site Associated DNA sequencing (RAD-seq) and double digest Restriction-site Associated DNA sequencing (ddRAD-seq), two related techniques based on next generation sequencing, allowed the identification of thousands of Single Nucleotide Polymorphisms (SNPs; > 3,000) for each of the above species. The first SNP-based genetic maps for the above species were constructed during the current study. The first evidence concerning the location of the sex-determining region of Atlantic halibut is provided in this study. In the case of Nile tilapia both novel sex-determining regions and fine mapping of the major sex-determining region are presented. In the study of European sea bass evidence concerning the absence of a major sex-determining gene was provided. Indications of putative sex-determining regions in this species are also provided. The results of the current thesis help to broaden current knowledge concerning sex determination in three important farmed fish. In addition the results of the current thesis have practical applications as well, towards the production of mono-sex stocks of those species for the aquaculture industry.
|
354 |
Caractérisation du microDNome et sa modulation par le traitement anti-cancerMehanna, Pamela 11 1900 (has links)
Récemment, une nouvelle classe d'ADN circulaire extrachromosomique (eccDNA) appelée microADN a été identifiée dans des tissus humains et murins. Ces microADNs ont une longueur de 100 à 400 pb, sont dérivés de régions génomiques non répétitives uniques et présentent un enrichissement au niveau des régions géniques et riches en GC. Bien qu'il ait été proposé qu'ils puissent provenir du métabolisme de l'ARN ou des défauts de réplication, leurs mécanismes de production et leur éventuelle fonctionnalité restent à déterminer. Grâce à l'analyse des microADNs extraits d'une série de 10 lignées cellulaires lymphoblastoïdes humaines (LCL), nous avons confirmé la distribution nonaléatoire des microADNs vers les régions actives du génome. Les microADNs identifiés présentaient
des loci d'origine redondants et une périodicité de taille de 190 pb pouvant correspondre à la fragmentation de l'ADN lors de l'apoptose caspase-dépendante. L'apoptose induite de ces LCLs par des drogues chimiothérapeutiques (méthotrexate ou L-asparaginase) a entrainé la modulation de la diversité et de la taille des microADNs, suggérant qu'une partie de ces entités pourrait être des produits résiduels de la mort cellulaire apoptotique. Ainsi, bien que compatible avec l'observation initiale suggérant que les microADNs proviennent d'un processus physiologique normal, ces résultats impliquent une source de production alternative ou complémentaire. / Recently, a new class of extrachromosomal circular DNA (eccDNA) called microDNA was identified in mouse and human tissues. These microDNAs are 100 to 400 bp long, derive from unique nonrepetitive genomic regions and show an enrichment in GC rich and genic sequences. While it has been proposed that they could arise from RNA metabolism or replication defects, their production mechanisms and eventual functionality remain unclear. Through the analysis of microDNAs extracted from a series of 10 human lymphoblastoid cell lines (LCLs), we confirmed the non-random distribution of microDNA towards active regions of the genome. Identified microDNAs showed redundant loci of origin and a size periodicity of 190 bp that matched caspase-dependant DNA fragmentation of apoptotic cells. Strikingly, the chemotherapeutic drug-induced apoptosis (using methotrexate or Lasparaginase) of these LCLs modulated both diversity and size of microDNAs further suggesting that a part of microDNAs could represent circularized by-products of the programmed cell death. Thus, while compatible with the original observation that microDNAs originated from a normal physiological process, these results imply an alternative or complementary source of production.
|
355 |
Multigene panel next generation sequencing in a patient with cherry red macular spotMütze, Ulrike, Bürger, Friederike, Hoffmann, Jessica, Tegetmeyer, Helmut, Heichel, Jens, Nickel, Petra, Lemke, Johannes R., Syrbe, Steffen, Beblo, Skadi 25 January 2017 (has links) (PDF)
Background: Lysosomal storage diseases (LSD) often manifest with cherry red macular spots. Diagnosis is based on clinical features and specific biochemical and enzymatic patterns. In uncertain cases, genetic testing with next generation sequencing can establish a diagnosis, especially in milder or atypical phenotypes. We report on the diagnostic work-up in a boy with sialidosis type I, presenting initially with marked cherry red macular spots but non-specific urinary oligosaccharide patterns and unusually mild excretion of bound sialic acid. Methods: Biochemical, enzymatic and genetic tests were performed in the patient. The clinical and electrophysiological data was reviewed and a genotype-phenotype analysis was performed. In addition a systematic literature review was carried out. Case report and results: Cherry red macular spotswere first noted at 6 years of age after routine screening myopia. Physical examination, psychometric testing, laboratory investigations aswell as cerebralMRIwere unremarkable at 9 years of age. So far no clinical myoclonic seizures occurred, but EEG displays generalized epileptic discharges and visual evoked potentials are prolonged bilaterally. Urine thin layer chromatography showed an oligosaccharide pattern compatible with different LSD including sialidosis, galactosialidosis, GM1 gangliosidosis or mucopolysaccharidosis type IV B. Urinary bound sialic acid excretion was mildly elevated in spontaneous and 24 h urine samples. In cultured fibroblasts, α-sialidase activity was markedly decreased to b1%; however, bound and free sialic acid were within normal range. Diagnosis was eventually established by multigene panel next generation sequencing of genes associated to LSD, identifying two novel, compound heterozygous variants in NEU1 gene (c.699CNA, p.S233R in exon 4 and c.803ANG; p.Y268C in Exon 5 in NEU1 transcriptNM_000434.3), leading to amino acid changes predicted to impair protein function. Discussion: Sialidosis should be suspected in patients with cherry red macular spots, even with non-significant urinary sialic acid excretion. Multigene panel next generation sequencing can establish a definite diagnosis, allowing for counseling of the patient and family.
|
356 |
Efficient analysis of complex, multimodal genomic dataAcharya, Chaitanya Ramanuj January 2016 (has links)
<p>Our primary goal is to better understand complex diseases using statistically disciplined approaches. As multi-modal data is streaming out of consortium projects like Genotype-Tissue Expression (GTEx) project, which aims at collecting samples from various tissue sites in order to understand tissue-specific gene regulation, new approaches are needed that can efficiently model groups of data with minimal loss of power. For example, GTEx project delivers RNA-Seq, Microarray gene expression and genotype data (SNP Arrays) from a vast number of tissues in a given individual subject. In order to analyze this type of multi-level (hierarchical) multi-modal data, we proposed a series of efficient-score based tests or score tests and leveraged groups of tissues or gene isoforms in order map genomic biomarkers. We model group-specific variability as a random effect within a mixed effects model framework. In one instance, we proposed a score-test based approach to map expression quantitative trait loci (eQTL) across multiple-tissues. In order to do that we jointly model all the tissues and make use of all the information available to maximize the power of eQTL mapping and investigate an overall shift in the gene expression combined with tissue-specific effects due to genetic variants. In the second instance, we showed the flexibility of our model framework by expanding it to include tissue-specific epigenetic data (DNA methylation) and map eQTL by leveraging both tissues and methylation. Finally, we also showed that our methods are applicable on different data type such as whole transcriptome expression data, which is designed to analyze genomic events such alternative gene splicing. In order to accomplish this, we proposed two different models that exploit gene expression data of all available gene-isoforms within a gene to map biomarkers of interest (either genes or gene-sets) in paired early-stage breast tumor samples before and after treatment with external beam radiation. Our efficient score-based approaches have very distinct advantages. They have a computational edge over existing methods because they do not need parameter estimation under the alternative hypothesis. As a result, model parameters only have to be estimated once per genome, significantly decreasing computation time. Also, the efficient score is the locally most powerful test and is guaranteed a theoretical optimality over all other approaches in a neighborhood of the null hypothesis. This theoretical performance is born out in extensive simulation studies which show that our approaches consistently outperform existing methods both in statistical power and computational speed. We applied our methods to publicly available datasets. It is important to note that all of our methods also accommodate the analysis of next-generation sequencing data.</p> / Dissertation
|
357 |
Genome-wide analysis of selection in mammals, insects and fungiRidout, Kate E. January 2012 (has links)
Characterising and understanding factors that affect the rate of molecular evolution in proteins has played a major part in the development of evolutionary theory. The early analyses of amino acid substitutions stimulated the development of the neutral theory of molecular evolution, which later evolved into the nearly neutral theory. More recent work has lead to a better understanding of the role selection plays at the molecular level, but there is still limited understanding of how higher levels of protein organisation affect the way natural selection acts. The investigation of this question is the central aim of this thesis, which is addressed via the analysis of selective pressures in secondary protein structures in insects, mammals and fungi. The analyses for the first two groups were conducted using publically available datasets. To conduct the analyses in fungi, genome sequence data from the fungal genus Microbotryum (sequenced in our laboratory) was assembled and annotated, resulting in the development of a number of bioinformatics tools which are described here. The fungal, insect and mammalian datasets were interrogated with regard to a number of structural features, such as protein secondary structure, position of a site with regard to adaptively evolving sites, hydropathy and solvent-accessibility. These features were correlated with the signals of positive and purifying selection detected using phylogenetic maximum likelihood and Bayesian approaches. I conclude that all of the factors examined can have an effect on the rate of molecular evolution. In particular, disordered and hydrophilic regions of the protein are found to experience fewer physiochemical constraints and contain a higher proportion of adaptively evolving sites. It is also revealed that positively selected residues are ‘clustered’ together spatially, and these trends persist in the three taxa. Finally, I show that this variation in adaptive evolution is a result of both selective events and physiochemical constraint.
|
358 |
Identification des bases génétiques des myopathies à multi-minicores avec ou sans cardiomyopathieChauveau, Claire 09 1900 (has links)
Thèse réalisée en cotutelle avec l'Université Pierre et Marie Curie, Paris 6(UPMC, Paris, France). / Bien que les bases physiopathologiques de beaucoup de maladies musculaires soient dorénavant connues, les myopathies congénitales à cores (MCs), maladies génétiques qui se présentent
dès la naissance avec un retard du développement moteur, une faiblesse musculaire et des complications respiratoires et/ou cardiaques parfois mortelles, demeurent mal comprises. Des mutations dans RYR1, SEPN1, TTN, ACTA1, CFL2 et MEGF10 ont été associées aux MCs, pourtant, dans plus de 50% des cas, le gène responsable reste à identifier.
L’objectif de ma thèse a été de clarifier les mécanismes physiopathologiques des MCs par
l’identification de nouveaux gènes ou de nouvelles mutations. Cette thèse a eu une dimension internationale concrétisée par la mise en place d’une cotutelle UPMC (France) et UdeM (Québec).
J’ai développé deux axes de recherche complémentaires. D’une part j’ai étudié 21 familles informatives avec MC récessive, scoliose et atteinte respiratoire, en combinant clonage positionnel et
étude de gènes candidats et en utilisant des outils variés allant du génotypage au séquençage de nouvelle génération (NGS). En parallèle, j’ai étudié 24 familles avec une MC autosomique récessive affectant les muscles cardiaque et squelettiques et dont le phénotype était semblable à celui observé
chez des patients avec des délétions dans les 6 derniers exons de TTN. Ainsi pour l'analyse de cette deuxième cohorte, nous avons appliqué une stratégie de séquençage de gène candidat ciblée sur ces exons et de NGS pour le reste du gène.
Pendant mon doctorat j'ai identifié les défauts génétiques de 8 des 45 familles étudiées (18 %), et caractérisé 3 nouvelles entités médicales, dont deux MCs dues à des nouvelles mutations de TTN.
Ces résultats ont servi à l’identification de nouvelles interactions protéiques de la titine et contribuent à définir TTN comme une cause majeure de pathologies musculaires cardiaques et/ou squelettiques.
Une troisième nouvelle forme de MC est provoquée par une mutation d'un coactivateur
transcriptionnel peu connu et jamais associé à une maladie. Ces résultats ont révélé un nouvel acteur clef et une nouvelle voie de signalisation dans la physiopathologie du muscle, ont eu un bénéfice direct en termes de conseil génétique et ouvrent la voie pour le développement de thérapies. / While the pathophysiological bases of many muscular diseases are nowadays well known,
congenital core myopathies (CMs) remain poorly understood. CMs are genetic diseases which generally present at birth with delayed motor development, muscle weakness, and sometimes fatal respiratory or cardiological complications. Mutations in RYR1, SEPN1, ACTA1, TTN and MEGF10 have been associated with various CMs, yet for about 50% of CM cases the responsible gene has not been identified.
The objective of my thesis was to clarify the pathophysiological mechanisms of new forms of
CM through the identification of new genes or new mutations in known genes. This thesis had an international dimension as manifested by a UPMC (France) and UdeM (Québec) joint direction.
I developed two complementary axes of research. First, I studied 21 informative families with a recessive CM with scoliosis and respiratory failure, for which I combined positional cloning and candidate gene studies, using various tools from genotyping to next generation sequencing (NGS). The
second part of this work consisted on the analysis of 24 families with recessive CM affecting both cardiac and skeletal muscles. Their phenotype was similar to that previously observed in cases with
deletions in the last 6 exons of the giant gene TTN. Thus we applied a candidate gene strategy through direct Sanger sequencing coupled with NGS for the analysis of this second cohort.
During my PhD work I identified the molecular defect in 8 out of the 45 families included
(18%), which led to the identification and characterization of 3 novel medical entities, including two new CMs due to novel defects of TTN. These results served to identify new titin protein interactions, and participate in the definition of TTN defects as a major cause of both cardiac and skeletal muscle
conditions. A third new form of CM is due to mutations of a poorly-known transcriptional coactivator whose role in striated muscle physiology was unknown and which had never been associated to a human condition. Globally, these results unveiled a novel important protein and pathway in muscle
pathophysiology, have direct health benefits (molecular diagnosis) and open the way for therapeutic investigations.
|
359 |
Predicting the Diffusion of Next Generation 9-1-1 in the Commonwealth of Virginia: An Application Using the Deployment of Wireless E9-1-1 TechnologiesSpears-Dean, Dorothy 18 April 2011 (has links)
This study examines the deployment of Wireless E9-1-1 Phase One and Wireless E9-1-1 Phase Two as a diffusion of innovation. The research method used in this study is a cross-sectional study employing secondary data in a discriminant function analysis. The study population is Virginia units of local governments (95 counties and 39 cities) that had not deployed Wireless E9-1-1 Phase One or Wireless E9-1-1 Phase Two as of January 1, 2001. The period of time included in this study is from 2001 to 2006. The purpose of the study is to assess the overall accuracy of the three principle theories of policy innovation adoption: diffusion, internal determinants, and unified theory, which are variations of the fundamental diffusion theory, in predicting the deployment of wireless E9-1-1 by Virginia units of local government. This assessment was conducted by identifying Virginia specific variables from models associated with these policy innovation theories to determine the best performing model for the deployment of Wireless E9-1-1 throughout the Commonwealth of Virginia. The Virginia specific variables utilized in this study are: Wealth, Population, Fiscal Health, Dedicated Funding, Financial Dependency, Urbanization, Regionalism, and Proximity to Interstate. Dedicated Funding and Regionalism had the largest absolute size of correlation among the predictor variables for the deployment of Wireless E9-1-1 Phase One and Wireless E9-1-1 Phase Two, thus generating the best performing model. This information will provide the basis from which to develop a statewide comprehensive policy and plan for Next Generation 9-1-1 and will help provide an answer to the question of when and how governments get involved in designing and implementing a 9-1-1 emergency service network.
|
360 |
Typage de la classe génotypique du gène PRDM9 à partir de données de séquençage de Nouvelle GénérationAng Houle, Marie-Armande 07 1900 (has links)
Les positions des évènements de recombinaison s’agrègent ensemble, formant des hotspots déterminés en partie par la protéine à évolution rapide PRDM9. En particulier, ces positions de hotspots sont déterminées par le domaine de doigts de zinc (ZnF) de PRDM9 qui reconnait certains motifs d’ADN. Les allèles de PRDM9 contenant le ZnF de type k ont été préalablement associés avec une cohorte de patients affectés par la leucémie aigüe lymphoblastique. Les allèles de PRDM9 sont difficiles à identifier à partir de données de séquençage de nouvelle génération (NGS), en raison de leur nature répétitive. Dans ce projet, nous proposons une méthode permettant la caractérisation d’allèles de PRDM9 à partir de données de NGS, qui identifie le nombre d’allèles contenant un type spécifique de ZnF. Cette méthode est basée sur la corrélation entre les profils représentant le nombre de séquences nucléotidiques uniques à chaque ZnF retrouvés chez les lectures de NGS simulées sans erreur d’une paire d’allèles et chez les lectures d’un échantillon. La validité des prédictions obtenues par notre méthode est confirmée grâce à analyse basée sur les simulations. Nous confirmons également que la méthode peut correctement identifier le génotype d’allèles de PRDM9 qui n’ont pas encore été identifiés. Nous conduisons une analyse préliminaire identifiant le génotype des allèles de PRDM9 contenant un certain type de ZnF dans une cohorte de patients atteints de glioblastomes multiforme pédiatrique, un cancer du cerveau caractérisé par les mutations récurrentes dans le gène codant pour l’histone H3, la cible de l’activité épigénétique de PRDM9. Cette méthode ouvre la possibilité d’identifier des associations entre certains allèles de PRDM9 et d’autres types de cancers pédiatriques, via l’utilisation de bases de données de NGS de cellules tumorales. / The positions of recombination events cluster tightly together in recombination hotspots, which are determined in part by the rapidly evolving protein PRDM9 via its tri- methyltransferase activity. The locations of hotspots are determined by the repetitive ZnF array of PRDM9, which binds to DNA. Alleles of PRDM9 containing the k-ZnF have previously been associated with patients affected with childhood acute lymphoblastic leukaemia. PRDM9 alleles are notoriously difficult to type due to the repetitive nature of the ZnF arrays. Here, we propose a method to characterize the alleles of PRDM9 from next- generation sequencing samples, by identifying the number of alleles containing a specific ZnF type. Our method is based on the correlation between profiles from the sample, representing the counts of nucleotide sequences unique to each ZnF, and from ideal sets of short reads representing an allele pair. We conduct a simulation analysis to examine the validity of the predictions obtained by our method with all pairs of known alleles. We confirm that the method can accurately genotype previously unobserved PRDM9 alleles. We also conducted a preliminary analysis to identify the PRDM9 k-ZnF genotype in a cohort of paediatric glioblastoma (pGBM), a childhood cancer characterized by the recurrent mutations in the coding sequence of the histone H3, the target of the enzymatic activity of PRDM9. Although no associations of k-ZnF containing PRDM9 alleles is found in our pGBM cohort, this method opens the possibility of identifying associations between certain PRDM9 alleles with other types of early onset childhood cancers, through a data-mining effort in public cancer databases.
|
Page generated in 0.1935 seconds