91 |
Comparison of Middle Eastern Bedouin genotypes with previously studies populations using polymorphic Alu insertionsPitt, Alison Patricia January 2009 (has links)
[Truncated abstract] Polymorphic Alu insertions (POALINs) are known to contribute to the variation and genetic diversity of the human genome. In this report specific POALINs of the Major Histocompatibility Complex (MHC) were studied. Previous population studies on the MHC POALINs have focused on individuals of African, European and Asian descent. In this study, we expand the research by studying a new and previously uncharacterised population, focusing on the Bedouin from the Middle East. Specifically we report on the individual insertion frequencies of four POALINs within the MHC class I region of this population. POALINs are members of a young Alu subfamily that have only recently been inserted into the human genome. POALINs are either present or absent at particular sites. Individuals that share the inserted (or deleted) polymorphism inherited the insertion (or deletion) from a common ancestor, making Alu alleles identical by decent. In population genetics a comparison of the resulting products from each population can then be done by comparing the lengths of the PCR products in a series of unrelated individuals and may also detect polymorphisms with regard to the presence or absence of the Alu repeats. As a direct result of their abundance and sequence identity, they promote genetic recombination events that are responsible for large-scale deletions, duplication and translocations. The deletions occur mostly in the A-T rich regions and have found to be unlikely to have been created independently of the insertions of the Alu elements (Callinan et al, 2005) The easy genotyping of the POALINs has proven to be very valuable as lineage markers for the study of human population genetics, pedigree and forensics as well as genomic diversity and evolution. POALINs have been used in a range of applications, primarily focusing on anthropological analysis of human populations. As a result of its ease of use and its utility as a marker in human evolutions studies, combining the POALINs along with other markers used in forensics could lead to improved identity testing in forensic science. More specifically, in combination with more traditional markers, race specific genotypes and haplotypes could be used for profiling crime scene samples. ... This is supported by previously reported molecular data using various types of genetic markers. In a study using six separate Alu genes, Antunez-de-Mayolo et al were able to generate a phylogenetic tree, in which the biogeographical groups followed a pattern. The biogeographical groups started with African populations that were found to relate closely to the hypothetical ancestral African population. The African populations were then followed in order by Southwest Asian populations, European populations which include Middle Eastern groups (Antunez-de-Mayolo et al, 2002). This study shows the similarities and differences between the frequencies of the Middle Eastern Bedouin and the rest of the compared populations. Though no clear results were determined, the information from the POALINs along with information provided from other genetic markers can lead to further research on the Bedouin population and the improvement of the forensic population database in order to accurately test individual ethnic background of samples to be analysed.
|
92 |
Localização de regiões potenciais para integração do kDNA de Trypanosoma cruzi no genoma humano / LOCALIZATION OF POTENTIAL REGIONS FOR INTEGRATION OF Trypanosoma cruzi KDNA IN THE HUMAN GENOMESantana, Jhonne Pedro Pedott 23 March 2016 (has links)
Submitted by Luciana Sebin (lusebin@ufscar.br) on 2016-09-26T19:33:26Z
No. of bitstreams: 1
DissJPPS.pdf: 2939420 bytes, checksum: 44366c4d259a65ba75e54d36b01b8483 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-27T19:57:29Z (GMT) No. of bitstreams: 1
DissJPPS.pdf: 2939420 bytes, checksum: 44366c4d259a65ba75e54d36b01b8483 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-27T19:57:35Z (GMT) No. of bitstreams: 1
DissJPPS.pdf: 2939420 bytes, checksum: 44366c4d259a65ba75e54d36b01b8483 (MD5) / Made available in DSpace on 2016-09-27T19:57:40Z (GMT). No. of bitstreams: 1
DissJPPS.pdf: 2939420 bytes, checksum: 44366c4d259a65ba75e54d36b01b8483 (MD5)
Previous issue date: 2016-03-23 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Knowledge about horizontal gene transfer has been proposed even before the determination of the molecular structure of DNA. It has been experimentally shown
that micro-homologies rich in adenine and cytosine mediates the integration of
Trypanosoma cruzi’s kDNA minicircle, in the vertebrate genome. After human genome sequencing, the genome characterization of different organisms has been one of the main driving forces of science, providing a quantity of biological data for modern biomedical research, unprecedented in the history of science. However, even though traditional DNA mapping algorithms are highly accurate, they operate at a much lower rate than that needed for the next generation sequencers to accumulate new data. This great asymmetry between data generation and analysis capability requires the rapid evolution of mapping and reading algorithms so that this large volume of information can be worked through targeted searches. Thus, this work proposes an efficient, fast and easy way to search and locate multiple signatures of indicators that allow exogenous kDNA integration in the human genome, by creating a set of scripts for in silico analysis adapted to large files sequences. Three scripts based in R language were developed: to permute the elements (nucleic acids or amino acids codes); for search, grouping and plotting matches in genome; and for counting total matches and chromosomal window. All adenine and cytosine signatures were properly identified in the human genome, but no point more susceptible to T. cruzi kDNA integration was identified. With the obtained data, a genetic map was created, listing all matchings in each cytogenetic band, but it was not possible to identify which chromosome was more prone to mutations, since the bigger the chromosome is, the higher the quantity of matches are. / Com o sequenciamento do genoma humano e tantas outras espécies, abre-se agora uma nova janela de oportunidades analíticas. Podemos pensar em fazer buscas orientadas dentro dessa massa enorme de dados publicados em bancos de dados biológicos. Tendo isso em foco, buscamos estruturar uma forma automatizada de busca dentro do genoma humano, pela qual pudéssemos inferir sobre os sítios mais
prováveis de integração de DNA exógeno. Para isso utilizamos como modelo os trabalhos que indicam que a doença de Chagas é produzida pela introgressão do kDNA de Trypanosoma cruzi no genoma
hospedeiro, por meio de herança genética horizontal. Já foi demonstrado experimentalmente que micro-homologias ricas em adenina e citosina medeiam as integrações de minicírculos de kDNA do T. cruzi, no
genoma de vertebrados. Deste modo, o presente trabalho propõe uma maneira eficiente, fácil e rápida para a busca e localização de múltiplas assinaturas dos sinalizadores que propiciam a introgressão do kDNA exógeno no genoma humano, através da criação de um conjunto de scripts para análises in silico, adaptados a grandes arquivos de sequências. Foram desenvolvidos três scripts, baseados na linguagem R: para permutação de elementos (ácidos nucleicos ou aminoácidos); para busca, agrupamento e plotagem das correspondências em genoma; e para contagem total de correspondências e contagem por janela cromossômica. Todas as assinaturas compostas por adenina e citosina (motivos CA’s) foram devidamente
identificadas no genoma humano, porém não foi identificado nenhum ponto mais suscetível à integração do kDNA de T. cruzi. Com os dados obtidos, um mapa genético foi criado, listando as correspondências em cada banda citogenética, porém não foi possível identificar qual cromossomo possui maior propensão à
mutações, já que quanto maior o cromossomo, maior é a quantidade de correspondências presentes.
|
93 |
Proteção do genoma humano e socioambientalismo : aspectos bioéticos e jurídicosLiedke, Mônica Souza 14 December 2009 (has links)
A construção do paradigma socioambiental é resultado da compreensão de que não é possível a proteção isolada, implicando o cuidado conjunto. O ser humano, ente integrante da biodiversidade, está amparado pelas legislações que lhe são próprias, assim como pelas legislações ambientais. O genoma humano é próprio de cada indivíduo e o distingue dos demais entes da mesma espécie. O desenvolvimento do Projeto Genoma Humano possibilitou o acesso e o uso das informações genéticas. A evolução da ciência deve ser regulamentada para evitar a utilização indevida das informações genéticas, assim como para que os benefícios proporcionados por essa evolução sejam acessíveis a toda população. O ser humano não pode sofrer discriminação em razão da sua carga genética. O acesso às informações contidas nos genes deve ocorrer unicamente para melhorar a saúde dos indivíduos. O consentimento informado é imprescindível para acesso e uso das informações genéticas. A bioinformática possibilita a descoberta das funções de cada gene específico. A farmacogenômica, por sua vez, proporciona o tratamento e a cura de doenças de acordo com a carga genética de cada indivíduo. Os biorepositórios e os biobancos são importantes para conservar o material genético destinado à pesquisa, bem como a ser utilizado no futuro tratamento médico do próprio doador. As pesquisas genéticas devem ser conduzidas de forma transparente e regulada a fim de evitar a detenção do biopoder. O acesso do genoma humano pode permitir a manipulação desse com finalidades bioterroristas de modo a atingir à população em geral ou a determinado grupo específico. Alguns países já estão patenteando os genes, muito embora sejam considerados descoberta e não invenção. A não permissão do patenteamento de genes no Brasil deixa o país em desvantagem quanto aos demais que permitem, pois futuramente, nosso país, poderá ter que pagar royalties pela utilização dos genes já patenteados no desenvolvimento de pesquisas genéticas. Todas essas situações demonstram a importância de proteger o genoma humano para que as atuais e futuras gerações não tenham sua carga genética alterada. A criação de uma legislação nacional e, principalmente, internacional é indispensável. / Submitted by Marcelo Teixeira (mvteixeira@ucs.br) on 2014-05-29T18:15:58Z
No. of bitstreams: 1
Dissertacao Monica Souza Liedke.pdf: 1347371 bytes, checksum: 041ecbd172b0c26294c008a919ffc003 (MD5) / Made available in DSpace on 2014-05-29T18:15:58Z (GMT). No. of bitstreams: 1
Dissertacao Monica Souza Liedke.pdf: 1347371 bytes, checksum: 041ecbd172b0c26294c008a919ffc003 (MD5) / The build of the socio-environment paradigm is resultant from the comprehension that isolated protection is not possible, implying collective care. The human being, as a biodiversity integrant, is supported by his laws, but also by environment laws. The human genome is unique for each individual and distinguishes itself from the others beings of the same species. The development of The Human Genome Project made possible the access to and the use of genetic information. Science evolution must be ruled to avoid improper use of genetic information, but also for granting universal access to it. The human being must not be discriminated by its genetic information. Genetic information access should only be for individual health improvement. Informed assent is essential for the access and the use of this information. Pharmacogenomics, in its turn, provides treatment and cure for diseases in agreement with every individual genetic information. The biorepositories and biobanks are important to preserve genetic material destined to research, such as future use in the medical treatment of the donor. Genetic researches must be lead in a clear and ruled form in order to avoid retention of biopower. The access to the human genome can permit its manipulation with bioterrorist intents of reaching general population or a specific group. Some countries are already patenting the genes, although they are considered findings and not inventions. The non permission of gene patenting in Brazil put the country in disadvantage with the others that permit, because, in the future, our country could pay royalties for the already patented in the development of genetic researches. All these situations show the importance of protecting human genome for this and future generations. The creation of national and, mainly, international laws is indispensable.
|
94 |
Proteção do genoma humano e socioambientalismo : aspectos bioéticos e jurídicosLiedke, Mônica Souza 14 December 2009 (has links)
A construção do paradigma socioambiental é resultado da compreensão de que não é possível a proteção isolada, implicando o cuidado conjunto. O ser humano, ente integrante da biodiversidade, está amparado pelas legislações que lhe são próprias, assim como pelas legislações ambientais. O genoma humano é próprio de cada indivíduo e o distingue dos demais entes da mesma espécie. O desenvolvimento do Projeto Genoma Humano possibilitou o acesso e o uso das informações genéticas. A evolução da ciência deve ser regulamentada para evitar a utilização indevida das informações genéticas, assim como para que os benefícios proporcionados por essa evolução sejam acessíveis a toda população. O ser humano não pode sofrer discriminação em razão da sua carga genética. O acesso às informações contidas nos genes deve ocorrer unicamente para melhorar a saúde dos indivíduos. O consentimento informado é imprescindível para acesso e uso das informações genéticas. A bioinformática possibilita a descoberta das funções de cada gene específico. A farmacogenômica, por sua vez, proporciona o tratamento e a cura de doenças de acordo com a carga genética de cada indivíduo. Os biorepositórios e os biobancos são importantes para conservar o material genético destinado à pesquisa, bem como a ser utilizado no futuro tratamento médico do próprio doador. As pesquisas genéticas devem ser conduzidas de forma transparente e regulada a fim de evitar a detenção do biopoder. O acesso do genoma humano pode permitir a manipulação desse com finalidades bioterroristas de modo a atingir à população em geral ou a determinado grupo específico. Alguns países já estão patenteando os genes, muito embora sejam considerados descoberta e não invenção. A não permissão do patenteamento de genes no Brasil deixa o país em desvantagem quanto aos demais que permitem, pois futuramente, nosso país, poderá ter que pagar royalties pela utilização dos genes já patenteados no desenvolvimento de pesquisas genéticas. Todas essas situações demonstram a importância de proteger o genoma humano para que as atuais e futuras gerações não tenham sua carga genética alterada. A criação de uma legislação nacional e, principalmente, internacional é indispensável. / The build of the socio-environment paradigm is resultant from the comprehension that isolated protection is not possible, implying collective care. The human being, as a biodiversity integrant, is supported by his laws, but also by environment laws. The human genome is unique for each individual and distinguishes itself from the others beings of the same species. The development of The Human Genome Project made possible the access to and the use of genetic information. Science evolution must be ruled to avoid improper use of genetic information, but also for granting universal access to it. The human being must not be discriminated by its genetic information. Genetic information access should only be for individual health improvement. Informed assent is essential for the access and the use of this information. Pharmacogenomics, in its turn, provides treatment and cure for diseases in agreement with every individual genetic information. The biorepositories and biobanks are important to preserve genetic material destined to research, such as future use in the medical treatment of the donor. Genetic researches must be lead in a clear and ruled form in order to avoid retention of biopower. The access to the human genome can permit its manipulation with bioterrorist intents of reaching general population or a specific group. Some countries are already patenting the genes, although they are considered findings and not inventions. The non permission of gene patenting in Brazil put the country in disadvantage with the others that permit, because, in the future, our country could pay royalties for the already patented in the development of genetic researches. All these situations show the importance of protecting human genome for this and future generations. The creation of national and, mainly, international laws is indispensable.
|
95 |
Geração de \"Etiquetas de sequências expressas\" dirigidas para porções codificadoras dos genes (Orestes): identificação de novos genes humanos expressos em câncer de mama / Generation of \"Expressed sequence labels\" directed to coding portions of genes (Ores): identification of new human genes expressed in breast cancerRicardo Garcia Corrêa 16 February 2001 (has links)
Etiquetas de sequências expressas (ESTs) são fundamentais para a identificação de genes no genoma humano e para definir características de expressão gênica. Neste trabalho, descrevemos uma nova abordagem para a geração de bibliotecas de cDNA, utilizando iniciadores arbitrários para a produção, por PCR, de mini-bibliotecas a partir de mRNA derivado de câncer de mama. Clones destas bibliotecas foram sequenciadas para gerar 6029 ESTs. Utilizando esta abordagem, foi possível observar uma significante normalização das diferentes sub-populações de mRNA e amplificação preferencial de porções centrais dos genes. Análise bioinformática destas sequências mostra que 3.350 ESTs (56%) tem similaridade significante a sequências de DNA e/ou cDNA já conhecidas (sequências anotadas) descritas em diferentes organismos, e 1509 ESTs (25%) não possuem qualquer similaridade a diferentes bancos de dados. Dentre as sequências anotadas, identificamos algumas sequências com alta similaridade a genes conhecidos em diferentes organismos, indicando a descoberta de alguns genes homólogos possivelmente envolvidos com processos carcinogênicos. Como exemplo, isolamos e caracterizamos parcialmente (i) uma nova isoforma do gene NABC1 (novel amplified sequence in breast carcinoma 1), o qual é pouco expresso em tumores coloretais, (ii) um novo gene da família de semaforinas (moléculas de motilidade axonal) que apresenta uma baixa expressão em linhagens celulares de glioblastoma tratadas com ácido retinóico, um agente antitumoral e (iii) o gene ortólogo humano Notch 2, aparentemente superexpresso em tumores mamários com maior malignidade. / Expressed sequence tags (ESTs) are of fundamental importance for the identification of genes within the human genome and defining gene expression characteristics. In this work, we describe a new approach for generating cDNA libraries using essentially arbitrary primers to construct PCR-based minilibraries from breast tumor mRNA. Clones from these libraries were sequenced to generate 6,029 ESTs. Using this approach, we were able to observe a significant normalization of the different mRNA subpopulations and a preferential amplification of the central portions of the genes. Bioinformatic analysis of these sequences shows that 3,350 ESTs (56%) have significant similarity to known DNA and/or cDNA sequences (annotated sequences) from different organisms and 1,509 ESTs (25%) show no similarity to any sequences on different databases. From the annotated sequences, we have identified some sequences with high similarity to known genes from different organisms, indicating the discovery of some homologous genes possibly correlated with carcinogenic processes. For instance, we have isolated and partially characterized (i) a new NABC1 (novel amplified sequence in breast carcinoma 1) isoform which is downregulated in colorectal tumors, (ii) a novel semaphorin member of axon guidance molecules that is down-regulated in glioblastoma cell lines treated with all-trans-retinoic acid, an anti-tumor agent and (iii) the ortolog Notch 2 human gene, apparenty overexpressed in breast tumors with higher malignancy.
|
96 |
Plasticité du programme spatio-temporel de réplication au cours du développement et de la différenciation cellulaire / Plasticity of human replication program during differentiation in relation with change in gene expression and chromatin reorganizationJulienne, Hanna 11 December 2013 (has links)
Le séquençage du génome humain, il y a maintenant 12 ans, a mis en lumière la complexité des mécanismes des processus nucléaires tels que la transcription, la réplication ou l'organisation de la chromatine. Depuis, afin de mieux comprendre ces processus, un ensemble sans cesse croissant de données sur le noyau cellulaire a été produit et mis en ligne par un nombre important de laboratoires de par le monde. Ces données sont à la fois d'une richesse extraordinaire et d'une complexité embarrassante. Dans cette thèse, nous mettons à profit l'ensemble de ces données afin de mieux comprendre les déterminants nucléaires du programme spatio-temporel de réplication. Pour cela nous utilisons pas moins d'une centaine de profils épigénétiques ChiP-seq le long des chromosomes humains et dans diverses lignées cellulaires pour caractériser la structure primaire de la chromatine. Nous démontrons, à l'aide d'outils issus des statistiques multivariées, que l'immense complexité potentielle de ces jeux de données peut être réduite à quatre états chromatiniens principaux et ce dans toutes les lignées cellulaires somatiques étudiées. Cette classification simple, robuste et néanmoins complète est un excellent point d'appui pour l'étude de la réplication. Les quatre états principaux de chromatine sont répliqués à des moments distinct de la phase S (leur « timing » de réplication est différent) et ont un contenu en gènes drastiquement différents. Leur répartition spatiale le long du génome est structurée et est particulièrement visible dans les domaines où le « timing » de réplication dessine un U comme signature de l'existence d'un gradient de polarité des fourches de réplication. Ces U-domaines de la taille du Mpb recouvrent 50% du génome humain et les quatre états chromatiniens principaux se succèdent du bord au centre de ces U-domaines. Les mêmes techniques statistiques appliquées au cas d'une lignée embryonnaire révèlent aussi l'existence de quatre états principaux de chromatine mais de nature différente. La classification en quatre états s'avèrent alors très utile pour comparer l'épigénétique d'une lignée somatique à celle d'une lignée embryonnaire. Aussi, les spécificités du programme de réplication embryonnaire sont mises en rapport avec les spécificités de l'organisation de la chromatine dans cette lignée cellulaire. En particulier, notre étude révèle le rôle majeur de l'histone variant H2AZ dans la pluripotence. / The sequencing of the human genome, twelve years ago, revealed the complexity of the mechanisms underlying nuclear process such as transcription, replication and chromatin organization. In the past few years, to delineate better these processes, datasets on the cell nucleus were gathered and made available online by numerous laboratories around the world. These datasets are, at once, extraordinarily rich and daunting to handle. In this thesis, we take advantage of these datasets to understand better the nuclear determinants of the replication program. We analyze not less than a hundred ChiP-seq profiles along human chromosomes in several cell lines to characterize the primary structure of chromatin. We demonstrate, when using tools from multivariate statistics, that the immense potential complexity of these datasets can be reduced to four prevalent chromatin states in all studied somatic cell lines. This simple and comprehensive classification is an excellent starting point for the study of replication. The four prevalent chromatin states are replicated at different moments of the S-phase (they have a different replication “timing”) and have drasticaly different gene contents. Their spatial repartition along the genome is structured, especially in domains where the timing replication is U-shaped. These megabase sized U-domains cover 50% of the human genome and the four prevalent chromatin states succeed each other from their borders to their center. The same statistical techniques applied on an embryonic stem cell (ESC) also reduced the epigenetic complexity to four prevalent chromatin states which are qualitatively different from the ones in somatic cells. We further show that the specificities of embryonic replication program are link to the specificities of embryonic chromatin. Importantly, our study reveals that the histone variant H2AZ plays a major role in pluripotency.
|
97 |
On Identifying Signatures of Positive Selection in Human Populations: A DissertationCrisci, Jessica L. 25 June 2013 (has links)
As sequencing technology continues to produce better quality genomes at decreasing costs, there has been a recent surge in the variety of data that we are now able to analyze. This is particularly true with regards to our understanding of the human genome—where the last decade has seen data advances in primate epigenomics, ancient hominid genomics, and a proliferation of human polymorphism data from multiple populations. In order to utilize such data however, it has become critical to develop increasingly sophisticated tools spanning both bioinformatics and statistical inference. In population genetics particularly, new statistical approaches for analyzing population data are constantly being developed—unfortunately, often without proper model testing and evaluation of type-I and type-II error. Because the common Wright-Fisher assumptions underlying such models are generally violated in natural populations, this statistical testing is critical. Thus, my dissertation has two distinct but related themes: 1) evaluating methods of statistical inference in population genetics, and 2) utilizing these methods to analyze the evolutionary history of humans and our closest relatives. The resulting collection of work has not only provided important biological insights (including some of the first strong evidence of selection on human-specific epigenetic modifications (Shulha, Crisci, Reshetov, Tushir et al. 2012, PLoS Bio), and a characterization of human-specific genetic changes distinguishing modern humans from Neanderthals (Crisci et al. 2011, GBE)), but also important insights in to the performance of population genetic methodologies which will motivate the future development of improved approaches for statistical inference (Crisci et al, in review).
|
98 |
Identification and characterization of small-molecule inhibitors of aldehyde dehydrogenase 1A1Morgan, Cynthia A. 01 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The human genome encodes 19 members of the aldehyde dehydrogenase (ALDH) superfamily, critical enzymes involved in the metabolism of aldehyde substrates. A major function of the ALDH1A subfamily is the oxidation of retinaldehyde to retinoic acid, a key regulator of numerous cell growth and differentiation pathways. ALDH1A1 has been identified as a biomarker for both normal stem cells and cancer stem cells. Small molecule probes are needed to better understand the role of this enzyme in both normal and disease states. However, there are no commercially available, small molecules that selectively inhibit ALDH1A1. Our goal is to identify and characterize small molecule inhibitors of ALDH1A1 as chemical tools and as potential therapeutics. To better understand the basis for selective inhibition of ALDH1A1, we characterized N,N-diethylaminobenzaldehyde (DEAB), which is a commonly used inhibitor of ALDH1A1 and purported to be selective. DEAB serves as the negative control for the Aldefluor assay widely utilized to identify stem cells. Rather than being a selective inhibitor for ALDH1A1, we found that DEAB is a slow substrate for multiple ALDH isoenzymes, and depending on the rate of turnover, DEAB behaves as either a traditional substrate or as an inhibitor. Due to its very slow turnover, DEAB is a potent inhibitor of ALDH1A1 with respect to propionaldehyde oxidation, but it is not a good candidate for the development of selective ALDH1A1 inhibitors because of its promiscuity. Next, to discover novel selective inhibitors, we used an in vitro, high-throughput screen of 64,000 compounds to identify 256 hits that either activate or inhibit ALDH1A1 activity. We have characterized two structural classes of compounds, CM026 and CM037, using enzyme kinetics and X-ray crystallographic structural data. Both classes contained potent and selective inhibitors for ALDH1A1. Structural studies of ALDH1A1 with CM026 showed that CM026 binds at the active site, and its selectivity is achieved by a single residue substitution. Importantly, CM037 selectively inhibits proliferation of ALDH+ ovarian cancer cells. The discovery of these two selective classes of ALDH1A1 inhibitors may be useful in delineating the role of ALDH1A1 in biological processes and may seed the development of new chemotherapeutic agents.
|
99 |
Understanding the Code of Life: Holistic Conceptual Modeling of the GenomeGarcía Simón, Alberto 23 January 2023 (has links)
[ES] En las últimas décadas, los avances en la tecnología de secuenciación han producido cantidades significativas de datos genómicos, hecho que ha revolucionado nuestra comprensión de la biología. Sin embargo, la cantidad de datos generados ha superado con creces nuestra capacidad para interpretarlos.
Descifrar el código de la vida es un gran reto. A pesar de los numerosos avances realizados, nuestra comprensión del mismo sigue siendo mínima, y apenas estamos empezando a descubrir todo su potencial, por ejemplo, en áreas como la medicina de precisión o la farmacogenómica.
El objetivo principal de esta tesis es avanzar en nuestra comprensión de la vida proponiendo una aproximación holística mediante un enfoque basado en modelos que consta de tres artefactos: i) un esquema conceptual del genoma, ii) un método para su aplicación en el mundo real, y iii) el uso de ontologías fundacionales para representar el conocimiento del dominio de una forma más precisa y explícita. Las dos primeras contribuciones se han validado mediante la implementación de sistemas de información genómicos basados en modelos conceptuales. La tercera contribución se ha validado mediante experimentos empíricos que han evaluado si el uso de ontologías fundacionales conduce a una mejor comprensión del dominio genómico.
Los artefactos generados ofrecen importantes beneficios. En primer lugar, se han generado procesos de gestión de datos más eficientes, lo que ha permitido mejorar los procesos de extracción de conocimientos. En segundo lugar, se ha logrado una mejor comprensión y comunicación del dominio. / [CA] En les últimes dècades, els avanços en la tecnologia de seqüenciació han produït quantitats significatives de dades genòmiques, fet que ha revolucionat la nostra comprensió de la biologia. No obstant això, la quantitat de dades generades ha superat amb escreix la nostra capacitat per a interpretar-los.
Desxifrar el codi de la vida és un gran repte. Malgrat els nombrosos avanços realitzats, la nostra comprensió del mateix continua sent mínima, i a penes estem començant a descobrir tot el seu potencial, per exemple, en àrees com la medicina de precisió o la farmacogenómica.
L'objectiu principal d'aquesta tesi és avançar en la nostra comprensió de la vida proposant una aproximació holística mitjançant un enfocament basat en models que consta de tres artefactes: i) un esquema conceptual del genoma, ii) un mètode per a la seua aplicació en el món real, i iii) l'ús d'ontologies fundacionals per a representar el coneixement del domini d'una forma més precisa i explícita. Les dues primeres contribucions s'han validat mitjançant la implementació de sistemes d'informació genòmics basats en models conceptuals. La tercera contribució s'ha validat mitjançant experiments empírics que han avaluat si l'ús d'ontologies fundacionals condueix a una millor comprensió del domini genòmic.
Els artefactes generats ofereixen importants beneficis. En primer lloc, s'han generat processos de gestió de dades més eficients, la qual cosa ha permés millorar els processos d'extracció de coneixements. En segon lloc, s'ha aconseguit una millor comprensió i comunicació del domini. / [EN] Over the last few decades, advances in sequencing technology have produced significant amounts of genomic data, which has revolutionised our understanding of biology. However, the amount of data generated has far exceeded our ability to interpret it.
Deciphering the code of life is a grand challenge. Despite our progress, our understanding of it remains minimal, and we are just beginning to uncover its full potential, for instance, in areas such as precision medicine or pharmacogenomics.
The main objective of this thesis is to advance our understanding of life by proposing a holistic approach, using a model-based approach, consisting of three artifacts: i) a conceptual schema of the genome, ii) a method for its application in the real-world, and iii) the use of foundational ontologies to represent domain knowledge in a more unambiguous and explicit way. The first two contributions have been validated by implementing genome information systems based on conceptual models. The third contribution has been validated by empirical experiments assessing whether using foundational ontologies leads to a better understanding of the genomic domain.
The artifacts generated offer significant benefits. First, more efficient data management processes were produced, leading to better knowledge extraction processes. Second, a better understanding and communication of the domain was achieved. / Las fructíferas discusiones y los resultados derivados de los proyectos INNEST2021
/57, MICIN/AEI/10.13039/501100011033, PID2021-123824OB-I00, CIPROM/2021/023 y PDC2021-
121243-I00 han contribuido en gran medida a la calidad final de este tesis. / García Simón, A. (2022). Understanding the Code of Life: Holistic Conceptual Modeling of the Genome [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/191432
|
100 |
Computational approaches to understand mechanisms of human genetic disordersZhong, Guojie January 2024 (has links)
Human genetics is one of the strongest risk factors for complex diseases. Understandingthe effects of genetic variations not only serves as a fundamental approach to studying disease mechanisms but also offers unprecedented opportunities for improved clinical screening, disease diagnosis and therapeutic discoveries. Despite decades of extensive DNA sequencing and genetic research involving large cohorts, two major challenges remain. First, the majority of disease risk genes remain unidentified due to limited statistical power. Second, the functional effects of rare variants, especially missense variants, in disease risk genes are understudied. In this thesis, I describe new computational approaches to address those challenges using statistical genetics and machine learning methods implementing intuition of biological mechanisms. First, I worked on a statistical framework that can identify disease related pathways from de novo coding variants data. I applied this framework to study the genetics of esophageal atresia / tracheoesophageal fistula (EA/TEF) and identified several potential disease causal pathways that involved in endosome trafficking.
Next, I developed a new method to identifying disease risk genes by integrating genetic (rare de novo variants) and functional genomics data. Identifying risk genes using rare variants typically has low statistical power due to the rarity of genotype data. Using functional genomics data has the potential to address this challenge as it serves as informative priors of disease risk. Therefore, I developed a statistical method called VBASS. VBASS is a semi-supervised algorithm that uses a neural network to encode biological priors, such as cell type-specific expression values, into a rigorous Bayesian statistical model to increase statistical power. On simulated data, VBASS demonstrated proper error rate control and better power than current state-of-the-art methods. We applied VBASS to congenital heart disease (CHD) and autism spectrum disorder (ASD), identifying several novel disease risk genes along with their associated cell types.
Finally, I focused on predicting the functional mechanisms of missense variants that cause diseases. Pathogenic missense variants may act through different modes of action (e.g., gain-of-function or loss-of-function) by affecting various aspects of protein function. These variants may result in distinct clinical conditions requiring different treatments, yet current computational tools cannot distinguish between them because their predictions heavily relied on evolutional conservation data. The recent breakthrough of AI-powered protein structure prediction tools provides an opportunity to address this challenge because the functional mechanisms of variants is intrinsically embedded in its structural properties. Therefore, I developed a deep learning method called PreMode. PreMode is a pretrained SE(3)-equivariant graph neural network model designed to capture the effects of missense variants from their structural contexts and evolutionary information. I pretrained PreMode using labeled pathogenicity data to enable the model to learn a general representation of variant effects, followed by protein-specific transfer learning to predict mode-of-action effects. I applied PreMode to the mode-of-action predictions of 17 genes and demonstrated that PreMode achieved state-of-the-art performance compared to existing models. PreMode has various applications, including identifying novel gain/loss-of-function variants, improving the study design of deep mutational scans and optimization in protein engineering.
|
Page generated in 0.1822 seconds