• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 59
  • 17
  • 12
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 116
  • 116
  • 42
  • 23
  • 22
  • 21
  • 20
  • 19
  • 19
  • 16
  • 16
  • 13
  • 13
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Data mining methods for single nucleotide polymorphisms analysis in computational biology

Liu, Yang 01 January 2011 (has links)
No description available.
42

Multi-scale analysis of chromosome and nuclear architecture

Olivares Chauvet, Pedro January 2013 (has links)
Mammalian nuclear function depends on the complex interaction of genetic and epi-genetic elements coordinated in space and time. Structure and function overlap to such a degree that they are usually considered as being inextricably linked. In this work I combine an experimental approach with a computational one in order to answer two main questions in the field of mammalian chromosome organization. In the first section of this thesis, I attempted to answer the question, to what extent does chromatin from different chromosome territories share the same space inside the nucleus? This is a relatively open question in the field of chromosome territories. It is well-known and accepted that interphase chromosomes are spatially constrained inside the nucleus and that they occupy their own territory, however, the degree of spatial interaction between neighbouring chromosomes is still under debate. Using labelling methods that directly incorporate halogenated DNA precursors into newly replicated DNA without the need for immuno-detection or in situ hybridization, we show that neighbouring chromosome territories colocalise at very low levels. We also found that the native structure of DNA foci is partially responsible for constraining the interaction of chromosome territories as disruption of the innate architecture of DNA foci by treatment with TSA resulted in increased colocalisation signal between adjacent chromosomes territories. The second major question I attempted to answer concerned the correlation between nuclear function and the banding pattern observed in human mitotic chromosomes. Human mitotic chromosomes display characteristic patterns of light and dark bands when visualized under the light microscope using specific chemical dyes such as Giemsa. Despite the long standing use of the Giemsa banding pattern in human genetics for identifying chromosome abnormalities and mapping genes, little is known about the molecular mechanisms that generate the Giemsa banding pattern or its biological relevance. The recent availability of many genetic and epigenetic features mapped to the human genome permit a high-resolution investigation of the molecular correlates of Giemsa banding. Here I investigate the relationship of more than 50 genomic and epigenomic features with light (R) and dark (G) bands. My results confirm many classical results, such as the low gene density of the most darkly staining G bands and their late replication time, using genome-wide data. Surprisingly, I found that for virtually all features investigated, R bands show intermediate properties between the lightest and darkest G bands, suggesting that many R bands contain G-like sequences within them. To identify R bands that show properties of G bands, I employed an unsupervised learning approach to classify R bands on their genomic and epigenomic properties and show that the smallest R bands show a tendency to have characteristics typical of G bands. I revisit the evidence supporting the boundaries of G and R bands in the current cytogenomic map and conclude that inaccurate placement of weakly supported band boundaries can explain the intermediate pattern of R bands. Finally, I propose an approach based on aggregating data from multiple genomic and epigenomic features to improve the positioning of band boundaries in the human cytogenomic map. My results suggest that contiguous domains showing a high degree of uniformity in the ratio of heterochromatin and euchromatin sub-domains define the Giemsa banding pattern in human chromosomes.
43

Disentangling mutation and selection in human genetic variation: promises and pitfalls

Agarwal, Ipsita January 2021 (has links)
A subset of germline mutations that arise de novo each generation are deleterious and may cause severe genetic diseases. Predicting where in the genome and how often we expect to see deleterious mutations requires an understanding both of the distribution of mutation rates and the distribution of fitness effects in the genome. Both aspects are addressed in turn in the two projects described in this thesis. The distribution of mutations in the genome is poorly understood because germline mutations occur very rarely. In Chapter 1 of this work, we investigated the sources of mutations by using the spectrum of low-frequency variants in 13,860 human X chromosomes and autosomes as a proxy for the spectrum of germline de novo mutations. By comparing the mutation spectrum in multiple genomic compartments on the autosomes and between the X and autosomes that have unique biochemical and sex-specific properties, we ascribed specific mutation patterns to replication timing and recombination and identified differences in the types of mutations that accrue in males and females. Understanding mutational mechanisms provides a basis for modeling mutation rate variation in the genome, which is ultimately needed to infer the fitness effects of mutations. In Chapter 2, we used patterns of human genetic variation at methylated CpGsites, known to experience mutations at very high rates, to directly learn about the fitness effects of mutations at these sites. In whole exome sequences now available for 390,000 humans, 99% of putatively-neutral, synonymous CpG sites have experienced a C>T mutation; at current sample sizes, not seeing a C>T mutation at these sites indicates strong selection against that mutation. We leveraged the saturation of neutral C>T mutations and the similarity of mutation rates at methylated CpG sites across annotations to identify the subset of sites in a given functional annotation of interest that are likely to be under strong selection. One implication of this work is that for the vast majority of sites in the genome, there will be little information about strong selection even in samples that are many times larger than at present; the distribution of fitness effects at highly mutable CpG sites may then serve as an anchor for what to expect for other types of sites. Through the two specific cases described, this work illustrates the potential of large contemporary repositories of human genetic variation to inform human genetics and evolution, as well as their limitations in the absence of suitable models of mutation, selection, and other aspects of the evolutionary process.
44

Bioinformatics and Pharmacogenomics in Drug Discovery and Development- a Socio-economic Perspective

Anyanwu, Chukwuma Eustace 26 July 2006 (has links)
Submitted to the faculty of the Informatics Graduate Program in partial fulfillment of the requirements for the degree Master of Science in Bioinformatics in the School of Informatics Indiana University May 2006 / A plethora of genomic and proteomic information was uncovered by the U.S Human Genome Project (HGP) – mostly by means of bioinformatics tools and techniques. Despite the impact that bioinformatics and pharmacogenomics were projected to have in the drug discovery and development process, the challenges facing the pharmaceutical industry, such as the high cost and the slow pace of drug development, appear to persist. Socio-economic barriers exist that mitigate the full integration of bioinformatics and pharmacogenomics into the drug discovery and development process, hence limiting the desired and expected effects.
45

Beyond summary statistics: extracting etiological insights from genome-wide association cohorts

Yuan, Jie January 2021 (has links)
Over the past 20 years, Genome-Wide Association Studies (GWAS) have identified thousands of variants in the genome linked to genetic diseases. However, these associations often reveal little about underlying genetic etiology, which for many phenotypes is thought to be highly heterogeneous. This work investigates statistical methods to move beyond conventional GWAS methods to both improve estimation of associations and to extract additional etiological insights from known associations, with a focus on schizophrenia. This thesis addresses the above aim through three primary topics: First, we describe DNA.Land, a web platform to crowdsource the collection of genomic data with user consent and active participation, thereby rapidly increasing sample sizes and power required for GWAS. Second, we describe methods to characterize the latent genomic contributors to heterogeneity in GWAS phenotypes. We develop a Z-score test to detect heterogeneity using correlations between variants among affected individuals, and we develop a contrastive tensor decomposition to explicitly characterize subtype-specific SNP effects independently of confounding heterogeneity such as ancestry. Using these methods we provide evidence of significant heterogeneity in GWAS cohorts for schizophrenia. Lastly, a major avenue of investigation beyond GWAS is identifying the genes through which associated SNPs mechanistically affect the presentation of phenotypes. We develop a method to improve estimation of expression quantitative trait loci by joint inference over gene expression reference data and GWAS data, incorporating insights from the liability threshold model. These methods will advance ongoing efforts to explain the complex etiology of genetic diseases as well as improve the accuracy of disease prediction models based on these insights.
46

A STUDY OF THE EFFECT OF SINGLE NUCLEOTIDE POLYMORPHISMS IN HUMAN GENOME ON THE SECONDARY STRUCTURE OF PROTEINS

Aswathanarayanan, Subramanian 21 June 2002 (has links)
No description available.
47

Common and rare genetic effects on the transcriptome and their contribution to human traits

Einson, Jonah January 2022 (has links)
Bridging the gap between genetic variants and functional relevance is a principal goal of human genetics. Despite centuries of research, interpreting the biological mechanisms that link variants to phenotypes is a continuous challenge. This goal applies to rare and common variants, although the specific challenges vary depending on the variant’s frequency and effect on gene dosage or protein structure. Deciphering these variants’ modes of action is crucial for a more holistic understanding of genome regulation. This dissertation advances interpretation of rare and common variants across the annotation spectrum, by utilizing functional data derived from population scale RNA-sequencing studies. Thus, three main research questions are addressed: (1) How do rare variants affect gene expression, and can these subtle changes be robustly detected? (2) How do common variants that influence pre-mRNA splicing influence protein structure and human traits? (3) Can joint effects between common splice-regulatory and rare loss-of-function variants be detected through the lens of purifying selection? All three chapters build on knowledge acquired through large-scale transcriptomics and open access data. Chapter 1 evaluates the utility of allele specific expression to prioritize variants with functional effects. Chapter 2 involves quantifying splicing using the common Percent Spliced In (PSI) metric, and performing quantitative trait locus (QTL) mapping. Chapter 3 builds on the known phenomenon of modified penetrance, where common regulatory variants reduce the pathogenicity of rare coding variants. Ultimately, these three studies will contribute to our knowledge of genome regulation, which will be crucial in a future of personalized medicine.
48

Microarray Approaches to Experimental Genome Annotation

Bertone, Paul 03 1900 (has links)
This work describes the development and application of genomic DNA tiling arrays: microarrays designed to represent all of the DNA comprising a chromosome or other genomic locus, regardless of the genes that may be annotated in the region of interest. Because tiling arrays are intended for the unbiased interrogation of genomic sequence, they enable the discovery of novel functional elements beyond those described by existing gene annotation. This is of particular importance in mapping the gene structures of higher eukaryotes, where combinatorial exon usage produces rare splice variants or isoforms expressed in low abundance that may otherwise elude detection. Issues related to the design of both oligonucleotide- and amplicon-based tiling arrays are discussed; the latter technology presents distinct challenges related to the selection of suitable amplification targets from genomic DNA. Given the widespread fragmentation of mammalian genomes by repetitive elements, obtaining maximal coverage of the non-repetitive sequence with a set of fragments amenable to high-throughput polymerase chain reaction (PCR) amplification represents a non-trivial optimization problem. To address this issue, several algorithms are described for the efficient computation of optimal tile paths for the design of amplicon tiling arrays. Using these methods, it is possible to recover an optimal tile path that maximizes the coverage of non-repetitive DNA while minimizing the number of repetitive elements included in the resulting sequence fragments. Tiling arrays were constructed and used for the chromosome- and genome-wide assessment of human transcriptional activity, via hybridization to complementary DNA derived from polyadenylated RNA expressed in normal complex tissues. The approach is first demonstrated with amplicon arrays representing all of the non-repetitive DNA of human chromosome 22, then extended to the entire genome using maskless photolithographic DNA synthesis technology. A large-scale tiling array survey revealed the presence of over 10,000 novel transcribed regions and verified the expression of nearly 13,000 predicted genes, providing the first global transcription map of the human genome. In addition to those likely to encode protein sequences on the basis of evolutionary sequence conservation, many of the novel transcripts constitute a previously uncharacterized population of non-coding RNAs implicated in myriad structural, catalytic and regulatory functions.
49

Computational discovery of cis-regulatory modules in human genome by genome comparison

Mok, Kwai-lung., 莫貴龍. January 2008 (has links)
published_or_final_version / Biochemistry / Master / Master of Philosophy
50

Generation of a human gene index and its application to disease candidacy.

Christoffels, Alan January 2001 (has links)
<p>With easy access to technology to generate expressed sequence tags (ESTs), several groups have sequenced from thousands to several thousands of ESTs. These ESTs benefit from consolidation and organization to deliver significant biological value. A number of EST projects are underway to extract maximum value from fragmented EST resources by constructing gene indices, where all transcripts are partitioned into index classes such that transcripts are put into the same index class if they represent the same gene. Therefore a gene index should ideally represent a non-redundant set of transcripts. Indeed, most gene indices aim to reconstruct the gene complement of a genome and their technological developments are directed at achieving this goal. The South African National Bioinformatics Institute (SANBI), on the other hand, embarked on the development of the sequence alignment and consensus knowledgebase (STACK) database that focused on the detection and visualisation of transcript variation in the context of developmental and pathological states, using all publicly available ESTs. Preliminary work on the STACK project employed an approach of partitioning the EST data into arbitrarily chosen tissue categories as a means of reducing the EST sequences to manageable sizes for subsequent processing. The tissue partitioning provided the template material for developing error-checking tools to analyse the information embedded in the error-laden EST sequences. However, tissue partitioning increases redundancy in the sequence data because one gene can be expressed in multiple tissues, with the result that multiple tissue partitioned transcripts will correspond to the same gene.</p> <p><br /> Therefore, the sequence data represented by each tissue category had to be merged in order to obtain a comprehensive view of expressed transcript variation across all available tissues. The need to consolidate all EST information provided the impetus for developing a STACK human gene index, also referred to as a whole-body index. In this dissertation, I report on the development of a STACK human gene index represented by consensus transcripts where all constituent ESTs sample single or multiple tissues in order to provide the correct development and pathological context for investigating sequence variation. Furthermore, the availability of a human gene index is assessed as a diseasecandidate gene discovery resource. A feasible approach to construction of a whole-body index required the ability to process error-prone EST data in excess of one million sequences (1,198,607 ESTs as of December 1998). In the absence of new clustering algorithms, at that time, we successfully ported D2_CLUSTER, an EST clustering algorithm, to the high performance shared multiprocessor machine, Origin2000. Improvements to the parallelised version of D2_CLUSTER included: (i) ability to cluster sequences on as many as 126 processors. For example, 462000 ESTs were clustered in 31 hours on 126 R10000 MHz processors, Origin2000. (ii) enhanced memory management that allowed for clustering of mRNA sequences as long as 83000 base pairs. (iii) ability to have the input sequence data accessible to all processors, allowing rapid access to the sequences. (iv) a restart module that allowed a job to be restarted if it was interrupted. The successful enhancements to the parallelised version of D2_CLUSTER, as listed above, allowed for the processing of EST datasets in excess of 1 million sequences. An hierarchical approach was adopted where 1,198,607 million ESTs from GenBank release 110 (October 1998) were partitioned into &quot / tissue bins&quot / and each tissue bin was processed through a pipeline that included masking for contaminants, clustering, assembly, assembly analysis and consensus generation. A total of 478,707 consensus transcripts were generated for all the tissue categories and these sequences served as the input data for the generation of the wholebody index sequences. The clustering of all tissue-derived consensus transcripts was followed by the collapse of each consensus sequence to its individual ESTs prior to assembly and whole-body index consensus sequence generation. The hierarchical approach demonstrated a consolidation of the input EST data from 1,198607 ESTs to 69,158 multi-sequence clusters and 162,439 singletons (or individual ESTs). Chromosomal locations were added to 25,793 whole-body index sequences through assignment of genetic markers such as radiation hybrid markers and g&eacute / n&eacute / thon markers. The whole-body index sequences were made available to the research community through a sequence-based search engine (http://ziggy.sanbi.ac.za/~alan/researchINDEX.html).</p>

Page generated in 0.08 seconds