• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 59
  • 17
  • 12
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 116
  • 116
  • 42
  • 23
  • 22
  • 21
  • 20
  • 19
  • 19
  • 16
  • 16
  • 13
  • 13
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Generation of a human gene index and its application to disease candidacy.

Christoffels, Alan January 2001 (has links)
<p>With easy access to technology to generate expressed sequence tags (ESTs), several groups have sequenced from thousands to several thousands of ESTs. These ESTs benefit from consolidation and organization to deliver significant biological value. A number of EST projects are underway to extract maximum value from fragmented EST resources by constructing gene indices, where all transcripts are partitioned into index classes such that transcripts are put into the same index class if they represent the same gene. Therefore a gene index should ideally represent a non-redundant set of transcripts. Indeed, most gene indices aim to reconstruct the gene complement of a genome and their technological developments are directed at achieving this goal. The South African National Bioinformatics Institute (SANBI), on the other hand, embarked on the development of the sequence alignment and consensus knowledgebase (STACK) database that focused on the detection and visualisation of transcript variation in the context of developmental and pathological states, using all publicly available ESTs. Preliminary work on the STACK project employed an approach of partitioning the EST data into arbitrarily chosen tissue categories as a means of reducing the EST sequences to manageable sizes for subsequent processing. The tissue partitioning provided the template material for developing error-checking tools to analyse the information embedded in the error-laden EST sequences. However, tissue partitioning increases redundancy in the sequence data because one gene can be expressed in multiple tissues, with the result that multiple tissue partitioned transcripts will correspond to the same gene.</p> <p><br /> Therefore, the sequence data represented by each tissue category had to be merged in order to obtain a comprehensive view of expressed transcript variation across all available tissues. The need to consolidate all EST information provided the impetus for developing a STACK human gene index, also referred to as a whole-body index. In this dissertation, I report on the development of a STACK human gene index represented by consensus transcripts where all constituent ESTs sample single or multiple tissues in order to provide the correct development and pathological context for investigating sequence variation. Furthermore, the availability of a human gene index is assessed as a diseasecandidate gene discovery resource. A feasible approach to construction of a whole-body index required the ability to process error-prone EST data in excess of one million sequences (1,198,607 ESTs as of December 1998). In the absence of new clustering algorithms, at that time, we successfully ported D2_CLUSTER, an EST clustering algorithm, to the high performance shared multiprocessor machine, Origin2000. Improvements to the parallelised version of D2_CLUSTER included: (i) ability to cluster sequences on as many as 126 processors. For example, 462000 ESTs were clustered in 31 hours on 126 R10000 MHz processors, Origin2000. (ii) enhanced memory management that allowed for clustering of mRNA sequences as long as 83000 base pairs. (iii) ability to have the input sequence data accessible to all processors, allowing rapid access to the sequences. (iv) a restart module that allowed a job to be restarted if it was interrupted. The successful enhancements to the parallelised version of D2_CLUSTER, as listed above, allowed for the processing of EST datasets in excess of 1 million sequences. An hierarchical approach was adopted where 1,198,607 million ESTs from GenBank release 110 (October 1998) were partitioned into &quot / tissue bins&quot / and each tissue bin was processed through a pipeline that included masking for contaminants, clustering, assembly, assembly analysis and consensus generation. A total of 478,707 consensus transcripts were generated for all the tissue categories and these sequences served as the input data for the generation of the wholebody index sequences. The clustering of all tissue-derived consensus transcripts was followed by the collapse of each consensus sequence to its individual ESTs prior to assembly and whole-body index consensus sequence generation. The hierarchical approach demonstrated a consolidation of the input EST data from 1,198607 ESTs to 69,158 multi-sequence clusters and 162,439 singletons (or individual ESTs). Chromosomal locations were added to 25,793 whole-body index sequences through assignment of genetic markers such as radiation hybrid markers and g&eacute / n&eacute / thon markers. The whole-body index sequences were made available to the research community through a sequence-based search engine (http://ziggy.sanbi.ac.za/~alan/researchINDEX.html).</p>
62

Computational discovery of cis-regulatory modules in human genome by genome comparison

Mok, Kwai-lung. January 2008 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2008. / Includes bibliographical references (leaf 115-130) Also available in print.
63

Termo de consentimento livre e esclarecido (TCLE): fatores que interferem na adesão / Informed Consent (TCLE): compliance in accordance with interference factors

Miriam Karine de Souza 25 November 2005 (has links)
As pesquisas envolvendo seres humanos geram preocupações éticas, pois os voluntários aceitam riscos e inconveniências com o objetivo de contribuir para o avanço do conhecimento científico e beneficiar outrem. A disposição para participar de pesquisas clínicas se mostra quando o paciente adere ao Termo de Consentimento Livre e Esclarecido (TCLE), compreendendo-o, assinando-o e comprometendo-se a cumprir todas as normas estabelecidas nesse documento, embora consciente de que, a qualquer momento, poderá suspender sua adesão. O TCLE aborda informações que precisam estar descritas de forma clara e de fácil compreensão, destacando riscos, possíveis benefícios e procedimentos. Além disso, garantir a participação voluntária e sua desistência em qualquer momento da pesquisa. Atualmente discute-se a possibilidade de sujeitos de pesquisa não entenderem totalmente o texto do TCLE nem seus direitos como participantes, mesmo tendo assinado o TCLE e aderido à pesquisa. A presente casuística analisa os dados de 793 pacientes, que foram convidados a participar de diferentes protocolos de pesquisa clínica, como especifica a seguir: 380 pacientes, que foram convidados a participar do grupo controle do projeto Genoma Clínico do Câncer; 365 pacientes, que foram convidados a participar do projeto Genoma Clínico do Câncer do Aparelho do Digestivo por apresentarem tumor em uma das seguintes localizações: câncer colorretal, câncer esofágico, câncer de cárdia ou câncer gástrico.; 48 pacientes que foram convidados a participar do Estudo Multicêntrico, Internacional, Randomizado, de Grupos Paralelos, Controlado por Placebo, Duplo-Cego, com subsidiária cega, para determinar o efeito de 156 semanas de tratamento com MK-966(antiinflamatório Anti-COX 2) na recorrência de pólipo adenomatoso de intestino grosso, em pacientes com histórico de adenoma colorretal ressecado por colonoscopia. Coletaram-se dados dos fichários de pesquisa científica para avaliar a aderência do sujeito de pesquisa ao protocolo, correlacionando-a com fatores demográficos (raça, sexo e idade), sociais (local de nascimento, morada atual e instituição de tratamento), relação risco/beneficio envolvida e nível de escolaridade. O grau de dificuldade dos textos que compõem os TCLE foi avaliado, aplicando-se os Índices de Legibilidade Flesch Reading Ease e Flesch- Kincaid. Aplicou-se questionário aos entrevistadores para avaliar, a posteriori, a postura do sujeito de pesquisa à adesão ao TCLE no momento de sua assinatura ou discordância. A adesão dos sujeitos de pesquisa aos protocolos propostos não teve influência dos fatores demográficos e sociais, no entanto, verificou-se maior adesão entre os pacientes de instituição de tratamento público (99,7%) em comparação com instituição de tratamento privada (93,7%). A adesão foi maior entre os pacientes que participaram de protocolos com menor risco (99,73%) em comparação com os pacientes que participaram de protocolos com maior risco (81,3%). Apesar de a adesão não ter tido influência do nível de escolaridade, este foi menor ou igual a 8 anos de estudo para 462 pacientes (58,26%), entre os quais 444 (96,1%) pacientes eram de instituição de tratamento público. Os índices de legibilidade obtidos variaram de 9.9 12 para o teste de Flesch-Kincaid e 33,1 51,3 para o teste de Flesch Reading Ease. Os resultados encontrados na aplicação dos testes de legibilidade classificaram todos os textos avaliados em nível de difícil compreensão, exigindo maior nível de escolaridade para o seu entendimento Os entrevistadores estimaram, através do questionário aplicado a eles, que 90% dos pacientes do hospital público preferem ouvir a explicação do TCLE a ler o texto. Na instituição privada esta estimativa foi de 40%. Apenas onze sujeitos de pesquisa não aderiram ao TCLE. A adesão não recebeu influência de fatores demográficos e sociais. O risco inerente aos protocolos apresentados influenciou a adesão dos sujeitos de pesquisa. Os textos avaliados não se constituíram em linguagem escrita de fácil entendimento, necessitando mais de 9 anos de estudo para sua compreensão. Esta pesquisa sugere que, apesar da alta incidência de adesão, a avaliação de novos métodos de aplicação do TCLE é necessária para que o sujeito de pesquisa menos instruído tenha condições de compreender adequadamente todo o conteúdo do texto proposto no TCLE. / Researches engaging human beings pose ethical concerns since volunteers take on risks and inconveniences aiming to contribute to advanced scientific knowledge and to benefit others. The moment patients sign the term of voluntary and informed consent TCLE (Termo de Consentimento Livre e Esclarecido) they show they are willing to participate in clinical trials and that they understand the term and commit to complying with all rules in the document, aware that they can, at any moment, withdraw acceptance. The TCLE addresses all issues in the research process and are therefore important to the study participants. The information given at the TCLE must be clearly stated and easily understood, highlighting risks, possible benefits and procedures in addition to guaranteeing volunteer participation and consent withdrawal at any time during the trial. Lately, it has been speculated that the study participants do not totally understand the TCLEs text content and their participants rights before accepting the TCLE and joining the trial. This study analyzes the data from 793 patients, invited to take part in different protocols of clinical trials, as follows: 380 patients, invited to join the Clinic Cancer Genome Project Control Group; 365 patients, invited to join the Genome Clinic Cancer Genome of the Digestive System since they had one of the four tumors: colorectal cancer, cancer of the esophagus, cardia adenocarcinoma and gastric cancer; 48 patients were invited to join the International Multicenter double-blind, randomized, parallel-group, placebocontrolled study, with undisclosed sponsor, to determine the outcome of a 156-week treatment with MK-966(anti-inflammatory Anti-COX 2) in recurrent adenomatous polyp of the large bowel, in patients with a history of colorectal resection for adenoma at colonoscopy. Data were collected from previous scientific studies to assess study participants acceptance, correlating it to demographic factors (ethnic group, gender and age), social (birthplace, home place, health institution), cost/benefit and schooling. The level of difficulty in the TCLE texts was assessed with Flesch Reading Ease and Flesch-Kincaid readability measures. Interviewers answered a questionnaire a posteriori, to evaluate the study participants attitude toward the TCLE acceptance at the moment they signed it or did not accept it. The study participants acceptance of the suggested protocols was not influenced by demographic and social factors. However, patients from public health institutions (99,7%) outnumbered those from private health institutions (93,7%). Acceptance was higher among patients taking part in low-risk protocols (99,73%) than in high-risk protocols (81,3%). Although schooling did not influence acceptance, it was 8 years or less in 462 patients (58,26%), among who 444 (96,1%) were patients from public health institutions. The indices of legibility had varied of 9.9 - 12 for the test of Flesch-Kincaid and 33.1 - 51,3 for the test of Flesch Reading Ease. The results found in the application of the legibility tests had classified all the texts evaluated in level of difficult understanding, demanding higher school level for its agreement. Interviewers reported in questionnaires that 90% of the patients from public hospitals would rather listen to an explanation of the TCLE than read the text whereas in patients from private institution the percentage dropped to 40%. Only eleven study participants did not join the TCLE. Acceptance was not influenced by social and demographic factors, but the protocols risk levels influenced the study participants decisions. The evaluated texts proved to be difficult to understand, demanding over 9 years of schooling to be understood. This study suggests that, in spite of being highly accepted, the TCLE requires new application methods so that less educated people can properly understand its text contents.
64

Le programme spatio-temporel de réplication de l'ADN et son impact sur l'asymétrie de composition : d'une modélisation théorique à l'analyse de données génomiques et épigénétiques / Linking the DNA strand asymmetry to the spatio-temporal replication program : from theory to the analysis of genomic and epigenetic data

Baker, Antoine 08 December 2011 (has links)
Deux processus majeures de la vie cellulaire, la transcription et la réplication, nécessitent l'ouverture de la double hélice d'ADN et agissent différemment sur les deux brins, ce qui génère des taux de mutation différents (asymétrie de mutation), et aboutit à des compositions en nucléotides différentes des deux brins (asymétrie de composition). Nous nous proposons de modéliser le programme spatio-temporel de réplication et son impact sur l'évolution des séquences d'ADN. Dans le génome humain, nous montrons que les asymétries de composition et de mutation peuvent être décomposées en deux contributions, l'une associée à la transcription et l'autre à la réplication. Celle associée à la réplication est proportionnelle à la polarité des fourches de réplication, elle-même proportionnelle à la dérivée du “timing” de réplication. La polarité des fourches de réplication délimite, le long des chromosomes humains, des domaines de réplication longs de plusieurs Mpb où le “timing” de réplication a une forme de U. Ces domaines de réplication sont également observés dans la lignée germinale, où ils sont révélés par une asymétrie de composition en forme de N, indiquant la conservation de ce programme de réplication sur plusieurs centaines de millions d'années. Les bords de ces domaines de réplication sont constituées d'euchromatine, permissive à la transcription et à l'initiation de la réplication. L'analyse de données d'interaction à longue portée de la chromatine suggère que ces domaines correspondent à des unités structurelles de la chromatine, au coeur d'une organisation hautement parallélisée de la réplication dans le génome humain. / Two key cellular processes, namely transcription and replication, require the opening of the DNA double helix and act differently on the two DNA strands, generating different mutational patterns (mutational asymmetry) that may result, after long evolutionary time, in different nucleotide compositions on the two DNA strands (compositional asymmetry). Here, we propose to model the spatio-temporal program of DNA replication and its impact on the DNA sequence evolution. The mutational and compositional asymmetries observed in the human genome are shown to decompose into transcription- and replication-associated components. The replication-associated asymmetry is related to the replication fork polarity, which is also shown to be proportional to the derivative of the mean replication timing. The large-scale variation of the replication fork polarity delineate Mbp scale replication domains where the replication timing is shaped as a U. Such replication domains are also observed in the germline, where they are revealed by a N-shaped compositional asymmetry, which indicates the conservation of this replication program over several hundred million years. The replication domains borders are enriched in open chromatin markers, and correspond to regions permissive to transcription and replication initiation. The analysis of chromatin interaction data suggests that these replication domains correspond to self-interacting chromatin structural units, at the heart of a highly parallelized organization of the replication program in the human genome.
65

Etude quantitative des variations structurelles des chromosomes chez Saccharomyces cerevisiae / Quantitative study of structural variations of chromosomes in saccharomyces cerevisiae

Gillet-Markowska, Alexandre 21 September 2015 (has links)
L’accumulation de remaniements de la structure des chromosomes aussi appelés variations structurelles (SV) est un important contributeur à la transformation des cellules malignes et à la constitution d’une hétérogénéité intratumorale. Nous avons développé un outil bio-informatique qui permet désormais d’obtenir une image fine de ces SV qui se produisent dans le génome humain. Nous avons ainsi pu démontrer l’existence de SV présentes à de faibles fréquences dans différentes populations cellulaires supposées clonales montrant que les taux de formation des SV pourraient être grandement sous-estimés. Parallèlement, nous avons montré que le niveau d’instabilité des individus dépend de facteurs génétiques de prédisposition. Pour les identifier, nous avons développé des systèmes génétiques de mesure des taux de SV chez la levure qui vont nous permettre d'identifier les gènes contrôlant l'instabilité chromosomique par analyse de liaison à grande échelle. Ces régulateurs représenteront de nouveaux gènes candidats impliqués dans le développement du cancer chez l’homme, car les déterminants génétiques impliqués dans le métabolisme de l'ADN sont très conservés entre la levure et les mammifères. / The accumulation of chromosomal rearrangements also called Structural Variations (SV) is a major contributor to the transformation of tumoral cells and to the constitution of intratumoral heterogeneity. We have developed a bio-informatic tool that can now provide a sharp image of SV that occur in the human genome. We have demonstrated the existence of SV present in low proportions in different supposedly clonal cell populations showing that the rates of SV formation could be greatly underestimated. In parallel, we have shown that the level of instability of the genome depends on predisposition factors. To identify those, we have developed genetic assays to measure the rate of SV in yeast that will allow us to identify new genes controlling the stability of the genome using large scale linkage analysis. These regulators represent new gene-candidates involved in the development of cancer in human as the determinants involved in DNA metabolism are very conserved between yeast and mammals.
66

Generation of a human gene index and its application to disease candidacy

Christoffels, Alan January 2001 (has links)
Philosophiae Doctor - PhD / With easy access to technology to generate expressed sequence tags (ESTs), several groups have sequenced from thousands to several thousands of ESTs. These ESTs benefit from consolidation and organization to deliver significant biological value. A number of EST projects are underway to extract maximum value from fragmented EST resources by constructing gene indices, where all transcripts are partitioned into index classes such that transcripts are put into the same index class if they represent the same gene. Therefore a gene index should ideally represent a non-redundant set of transcripts. Indeed, most gene indices aim to reconstruct the gene complement of a genome and their technological developments are directed at achieving this goal. The South African National Bioinformatics Institute (SANBI), on the other hand, embarked on the development of the sequence alignment and consensus knowledgebase (STACK) database that focused on the detection and visualisation of transcript variation in the context of developmental and pathological states, using all publicly available ESTs. Preliminary work on the STACK project employed an approach of partitioning the EST data into arbitrarily chosen tissue categories as a means of reducing the EST sequences to manageable sizes for subsequent processing. The tissue partitioning provided the template material for developing error-checking tools to analyse the information embedded in the error-laden EST sequences. However, tissue partitioning increases redundancy in the sequence data because one gene can be expressed in multiple tissues, with the result that multiple tissue partitioned transcripts will correspond to the same gene.Therefore, the sequence data represented by each tissue category had to be merged in order to obtain a comprehensive view of expressed transcript variation across all available tissues. The need to consolidate all EST information provided the impetus for developing a STACK human gene index, also referred to as a whole-body index. In this dissertation, I report on the development of a STACK human gene index represented by consensus transcripts where all constituent ESTs sample single or multiple tissues in order to provide the correct development and pathological context for investigating sequence variation. Furthermore, the availability of a human gene index is assessed as a diseasecandidate gene discovery resource. A feasible approach to construction of a whole-body index required the ability to process error-prone EST data in excess of one million sequences (1,198,607 ESTs as of December 1998). In the absence of new clustering algorithms, at that time, we successfully ported D2_CLUSTER, an EST clustering algorithm, to the high performance shared multiprocessor machine, Origin2000. Improvements to the parallelised version of D2_CLUSTER included: (i) ability to cluster sequences on as many as 126 processors. For example, 462000 ESTs were clustered in 31 hours on 126 R10000 MHz processors, Origin2000. (ii) enhanced memory management that allowed for clustering of mRNA sequences as long as 83000 base pairs. (iii) ability to have the input sequence data accessible to all processors, allowing rapid access to the sequences. (iv) a restart module that allowed a job to be restarted if it was interrupted. The successful enhancements to the parallelised version of D2_CLUSTER, as listed above, allowed for the processing of EST datasets in excess of 1 million sequences. An hierarchical approach was adopted where 1,198,607 million ESTs from GenBank release 110 (October 1998) were partitioned into &quot;tissue bins&quot; and each tissue bin was processed through a pipeline that included masking for contaminants, clustering, assembly, assembly analysis and consensus generation. A total of 478,707 consensus transcripts were generated for all the tissue categories and these sequences served as the input data for the generation of the wholebody index sequences. The clustering of all tissue-derived consensus transcripts was followed by the collapse of each consensus sequence to its individual ESTs prior to assembly and whole-body index consensus sequence generation. The hierarchical approach demonstrated a consolidation of the input EST data from 1,198607 ESTs to 69,158 multi-sequence clusters and 162,439 singletons (or individual ESTs). Chromosomal locations were added to 25,793 whole-body index sequences through assignment of genetic markers such as radiation hybrid markers and g&eacute;n&eacute;thon markers. The whole-body index sequences were made available to the research community through a sequence-based search engine (http://ziggy.sanbi.ac.za/~alan/researchINDEX.html). / South Africa
67

Integration of Functional Genomic Data in Genetic Analysis

Chen, Siying January 2021 (has links)
Identifying disease risk genes is a central topic of human genetics. Cost-effective exome and whole genome sequencing enabled large-scale discovery of genetic variations. However, the statistical power of finding new risk genes through rare genetic variation is fundamentally limited by sample sizes. As a result, we have an incomplete understanding of genetic architecture and molecular etiology of most of human conditions and diseases. In this thesis, I developed new computational methods that integrate functional genomics data sets, such as epigenomic profiles and single-cell transcriptomics, to improve power for identifying genetic risks and gain more insights on etiology of developmental disorders. The overall hypothesis that disease risk genes contributing to developmental disorders are bottleneck genes under normal development and subject to precise transcriptional regulations to maintain spatiotemporal specific expression during development. In this thesis I describe two major research projects. The first project, Episcore, predicts haploinsufficient genes based on a large integrated epigenomic profiles from multiple tissues and cell lines by supervised machine learning methods. The second one, A-risk, predicts plausibility of being risk genes of autism spectrum disorder based on single-cell RNA-seq data collected in human fetal midbrain and prefrontal cortex. Both methods were shown to be able to improve gene discovery in analysis of de novo mutations in developmental disorders. Overall, my thesis represents an effort to integrate functional genomics data by machine learning to facilitate both discovery and interpretation of genetic studies of human diseases. We believe that such integrative analysis can help us better understand genetic variants and disease etiology.
68

Genetic regulatory variant effects across tissues and individuals

Flynn, Elise Duboscq January 2021 (has links)
Gene expression is regulated by local genetic sequence, and researchers have identified thousands of common genetic variants in the human population that associate with altered gene expression. These expression quantitative trait loci (eQTLs) often co-localize with genome wide association study (GWAS) loci, suggesting that they may hold the key to understanding genetic effects on human phenotype and cause disease. eQTLs are enriched in cis-regulatory elements, suggesting that many affect gene expression via non-coding mechanisms. However, many of the discovered loci lie in noncoding regions of the genome for which we lack understanding, and determining their mechanisms of action remains a challenge. To complicate matters further, genetic variants may have varied effects in different tissues or under different environmental conditions. The research presented here uses statistical methods to investigate genetic variants’ mechanisms of actions and context specificity. In Chapter 1, we introduce eQTLs and discuss challenges associated with their discovery and analysis. In Chapter 2, we investigate cross-tissue eQTL and gene expression patterns, including for GWAS genes. We find that eQTL effects show increasing, decreasing, and non-monotonic relationships with gene expression levels across tissues, and we observe higher eQTL effects and eGene expression for GWAS genes in disease-relevant tissues. In Chapter 3, we use the natural variation of transcription factor activity among tissues and between individuals to elucidate mechanisms of action of eQTL regulatory variants and understand context specificity of eQTL effects. We discover thousands of potential transcription factor mechanisms of eQTL effects, and we investigate the transcription factors’ roles with orthogonal datasets and experimental approaches. Finally, in Chapter 4, we focus on a locus implicated in coronary artery disease risk and unravel the likely causal variants and functional mechanisms of the locus’s effects on gene expression and disease. We confirm the locus’s colocalization with an eQTL for the LIPA gene, and using statistical, functional, and experimental approaches, we highlight two potential causal variants in partial linkage disequilibrium. Taken together, this work develops a framework for understanding eQTL context variability and highlights the complex genetic and environmental contributions to gene regulation. It provides a deeper understanding of gene regulation and of genetic and environmental contributions to complex traits and disease, enabling future research surrounding the context variability of genetic effects on gene expression and disease.
69

Characterization of an Evolutionarily Old Human Alphoid DNA

Carnahan, Susan L., Palamidis-Bourtsos, Eleni, Musich, Phillip R., Doering, Jeffrey L. 30 January 1993 (has links)
A recently isolated human alphoid DNA (in plasmid pHH550) has been sequenced and found to have an exceptionally high degree of similarity to the human alphoid consensus sequence, while its component monomers are unusually heterogeneous in sequence. In contrast to other alphoid DNAs, this DNA is found in all primates tested. Thus this may be an evolutionarily old sequence similar to the one from which other human alphoid DNAs diverged. The pHH550 sequences are found on a number of human chromosomes, including 21 and 22. On chromosome 21 most members of this new sequence group are located distal to other alphoid DNAs.
70

Genetic Variant Effects on Transcription Factor Regulation

Li, Xiaoting January 2023 (has links)
Assessing the functional impact of genetic variants across the human genome is essential for understanding the molecular mechanisms underlying complex traits and disease risk. Genetic variation that causes changes in gene expression can be analyzed through parallel genotyping and functional genomics assays across sets of individuals. In particular, regulatory variants may impact transcription factor regulation. In this thesis, to map variants that impact the expression of many genes simultaneously through a shared transcription factor (TF), we use an approach in which the protein-level regulatory activity of the TF is inferred from genome-wide expression data and then genetically mapped as a quantitative trait. In Chapter 2, we developed a generalized linear model (GLM) to estimate TF activity levels in an individual-specific manner, and used it to analyze RNA-seq profiles from the Genotype-Tissue Expression (GTEx) project. A key feature is that we fit a beta-binomial GLM at the level of pairs of neighboring genes in order to control for variation in local chromatin structure along the genome and other confounding effects. As a predictor in our model, we use differential gene expression signatures from TF perturbation experiments. After estimating genotype-specific activities for 55 TFs across 49 tissues, in Chapter 3, we performed genome-wide association analysis on the virtual TF activity trait. This revealed hundreds of TF activity quantitative trait loci, or aQTLs, highlighting the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omic approach. Lastly, in Chapter 4, we studied the direct impact of genetic variants on TF binding by predicting genetic effects on TF binding affinity. Specifically, we predicted binding affinity on allele-specific binding data using TF binding models derived by the ProBound recently developed by our laboratory, and constructed a likelihood model to assess the performances across different binding models. This indicates that ProBound provides a promising tool for the prediction of genetic effects on in vivo TF binding.

Page generated in 0.4929 seconds