Global ETD Search

381	A Global Mapping of Protein Complexes in S. cerevisiae Vlasblom, James 13 August 2013 (has links) Systematic identification of protein-protein interactions (PPIs) on a genome scale has become an important focus of biology, as the majority of cellular functions are mediated by these interactions. Several high throughput experimental techniques have emerged as effective tools for querying the protein-protein interactome and can be broadly categorized into those that detect direct, physical protein-protein interactions and those that yield information on the composition of protein complexes. Tandem affinity purification followed by mass spectrometry (TAP/MS) is an example of the latter that identifies proteins that co-purify with a given tagged query (bait) protein. Though TAP/MS enables these co-complexed associations to be identified on a proteome scale, the amount of data generated by the systematic querying of thousands of proteins can be extremely large. Data from multiple purifications are combined to form a very large network of proteins linked by edges whenever the corresponding pairs might form an association. Only a fraction of these pairwise associations correspond to physical interactions, however, and further computational analysis is necessary to filter out non-specific associations. This thesis examines how differing computational procedures for the analysis of TAP/MS data can affect the final PPI network, and outlines a procedure to accurately identify protein complexes from data consolidated from multiple proteome-scale TAP/MS experiments in the budding yeast \textit{Saccharomyces cerevisiae}. In collaboration with the Greenblatt and Emili laboratories at the University of Toronto, this methodology was extended to yeast membrane proteins to derive a comprehensive network of 13,343 PPIs and 720 protein complexes spanning both membrane and non-membrane proteins. Proteomics Bioinformatics 0487
382	Clustering time-course gene-expression array data Gershman, Jason Andrew January 2008 (has links) This thesis examines methods used to cluster time-course gene expression array data. In the past decade, various model-based methods have been published and advocated for clustering this type of data in place of classic non-parametric techniques like K-means and hierarchical clustering. On simulated data, where the variance between clusters is large, I show that the model-based MCLUST outperforms model-based SSClust and non-model-based K-means clustering. I also show that the number of genes or the number of clusters has no significant effect on the performance of these model-based clustering techniques. On two real data sets, where the variance between clusters is smaller, I show that model-based SSClust outperforms both MCLUST and K-means clustering. Since the "truth" is often not known for real data sets, I use the clustered data as "truth" and then perturb the data by adding pointwise noise to cluster this noisy data. Throughout my analysis of real and simulated expression data, I use the misclassification rate and the overall success rate as measures of success of the clustering algorithm. Overall, the model-based methods appear to cluster the data better than the non-model-based methods. Later, I examine the role of gene ontology (GO) and using gene ontology data to cluster gene expression data. I find that clustering expression data, using a synthesis of gene expression and gene ontology not only provides clustering that has a biologic meaning but also clusters the data well. I also introduce an algorithm for clustering expression profiles on both gene expression and gene ontology data when some of the genes are missing the ontology data. Instead of some other methods which ignore the missing data or lump it all into a miscellaneous cluster, I use classification and inferential techniques to cluster using all of the available data and this method shows promising results. I also examine which ontology, among molecular function, biological process, and cellular component, is best in clustering expression data. This analysis shows that biological process is the preferred ontology for clustering expression data. Statistics Biology, Bioinformatics
383	A Bayesian hierarchical model for detecting associations between haplotypes and disease using unphased SNPs Fox, Garrett Reed January 2008 (has links) This thesis addresses using haplotypes to detect disease predisposing chromosomal regions based on a Bayesian hierarchical model for case-control data. By utilizing the Stochastic Search Variable Selection (SSVS) procedure of George and McCulloch (1997), the number of parameters is riot constrained by the sample size, as are the frequentist methods. Haplotype information is used in the form of estimated haplotype frequencies, and using these values in the model as if they were the true population frequencies. A Bayesian hierarchical probit model was developed by estimating the distribution of haplotype pairs for an individual based on these estimated populaltion frequencies and using SSVS to make decisions about model selection. To date, Bayesian models for haplotype based case-control data assume either the haplotypes are known, or that haplotypes can be clustered such that every haplotype within a cluster has the same effect on disease status. A simulation was performed analyzing the testing properties of this Bayesian model and comparing it to a popular frequentist method (Schaid, 2002). Both real genotype data from the Dallas Heart Study (DHS) and simulated data were used to study the operating characteristics of the new model The Bayesian method is shown to have higher power than Schaid's frequentist method when there are a limited number of common haplotypes in a region, a situation that appears to be common (Gabriel, 2002). An approach based on the maximum of Chi-squared statistics at each marker locus performed suprisingly well against both haplotype methods in various cases. These simulations contribute to the ongoing debate on the efficacy of haplotype methods. The most suprising result was the ability of the genotype methods to outperform the haplotype methods in various instances where there were cis-acting interactions. The Bayesian haplotype method performed better in comparison when dealing with low penetrance in highly conserved blocks. Additionally, a set of simulations were based on a number of genes from the DHS data set with multiple haplotype block regions. This demonstrated the similarities of the haplotype methods and the added flexibility when analyzing posterior distributions. We also demonstrate that interactions between loci in separate blocks can be detected without having interaction terms in the regression model. Future work should focus on more efficient methods of detecting these and other complex interactions. Statistics Biology, Bioinformatics
384	The vervet regulator of G protein signaling 4 (RGS4) gene, a candidate gene for quantifiable behavioral dimensions associated with psychopathology : sequence, bioinformatic analysis, and association study of a novel polymorphism with social isolation Trakadis, John January 2004 (has links) Regulators of G-protein coupled signaling (RGS) accelerate GTP hydrolysis and consequently influence signal termination. The RGS-4 gene has recently been reported to be implicated in a wide range of neuropsychiatric disorders including schizophrenia, Alzheimer's disease and addictions. / In this study, the vervet RGS-4 gene was sequenced on a CEQ 8000 genetic analysis system (Beckman Coulter) and characterized using molecular and bioinformatic tools. The obtained vervet sequence overall showed 95.3% sequence identity with the human RGS4 gene. / Thereafter, SNPs in the region encompassing the proximal promoter, exon 1 and the first 450 bp of intron 1 were identified by direct sequencing of 8 unrelated individuals. One of the identified SNPs, +35 [A/G], was genotyped in 155 juvenile vervets previously phenotyped for personality traits, including social isolation. Although preliminary association analysis fails to attain statistical significance (p=0.074), the sample size is small. Additional genotyping of phenotypically defined individuals needs to be undertaken. Biology, Genetics. Biology, Bioinformatics.
385	Algorithms and statistics for the detection of binding sites in coding regions Chen, Hui, 1974- January 2006 (has links) This thesis deals with the problem of detecting binding sites in coding regions. A new comparative analysis method is developed by improving an existing method called COSMO. / The inter-species sequence conservation observed in coding regions may be the result of two types of selective pressure: the selective pressure on the protein encoded and, sometimes, the selective pressure on the binding sites. To predict some region in coding regions as a binding site, one needs to make sure that the conservation observed in this region is not due to the selective pressure on the protein encoded. To achieve this, COSMO built a null model with only the selective pressure on the protein encoded and computed p-values for the observed conservation scores, conditional on the fixed set of amino acids observed at the leaves. / It is believed, however, that the selective pressure on the protein assumed in COSMO is overly strong. Consequently, some interesting regions may be left undetected. In this thesis, a new method, COSMO-2, is developed to relax this assumption. / The amino acids are first classified into a fixed number of overlapping functional classes by applying an expectation maximization algorithm on a protein database. Two probabilities for each gene position are then calculated: (i) the probability of observing a certain degree of conservation in the orthologous sequences generated under each class in the null model (i.e. the p-value of the observed conservation under each class); and (ii) the probability that the codon column associated with that gene position belongs to each class. The p-value of the observed conservation for each gene position is the sum of the products of the two probabilities for all classes. Regions with low p-values are identified as potential binding sites. / Five sets of orthologous genes are analyzed using COSMO-2. The results show that COSMO-2 can detect the interesting regions identified by COSMO and can detect more interesting regions than COSMO in some cases. Biology, Bioinformatics. Computer Science.
386	Purification and characterisation of plasmodium falciparum Hypoxanthine phosphoribosyltransferase. Murungi, Edwin Kimathi. January 2007 (has links) <p>Malaria remains the most important parasitic disease worldwide. It is estimated that over 500 million infections and more that 2.7 million deaths arising from malaria occur each year. Most (90%) of the infections occur in Africa with the most affected groups being children of less than five years of age and women. this dire situation is exacerbated by the emrggence of drug resistant strains of Plasmodium falciparum. The work reported in this thesis focuses on improving the purification of PfHPRT by investigating the characteristics of anion exchange DE-52 chromatography (the first stage of purification), developing an HPLC gel filtration method for examining the quaternary structure of the protein and possible end stage purification, and initialcrystalization trials. a homology model of the open, unligaded PfHPRT is constructed using the atoomic structures of human, T.ccruz and STryphimurium HPRT as templates.</p> Plasmodium falciparum Bioinformatics Malaria.
387	Genetic analysis of differentiation of T-helper lymphocytes Wang, Qixin 28 November 2013 (has links) <p> In the human immune system, T-helper cells are able to differentiate into two lymphocyte subsets: Th1 and Th2. The intracellular signaling pathways of differentiation form a dynamic regulation network by secreting distinctive types of cytokines, while differentiation is regulated by two major gene loci: T-bet and GATA-3. We developed a system dynamics model to simulate the differentiation and re-differentiation process of T-helper cells, based on gene expression levels of T-bet and GATA-3 during differentiation of these cells. We arrived at three ultimate states of the model and came to the conclusion that cell differentiation potential exists as long as the system dynamics is at an unstable equilibrium point; the T-helper cells will no longer have the potential of differentiation when the model reaches a stable equilibrium point. In addition, the time lag caused by expression of transcription factors can lead to oscillations in the secretion of cytokines during differentiation.</p> Statistics\|Biology, Bioinformatics
388	A Global Mapping of Protein Complexes in S. cerevisiae Vlasblom, James 13 August 2013 (has links) Systematic identification of protein-protein interactions (PPIs) on a genome scale has become an important focus of biology, as the majority of cellular functions are mediated by these interactions. Several high throughput experimental techniques have emerged as effective tools for querying the protein-protein interactome and can be broadly categorized into those that detect direct, physical protein-protein interactions and those that yield information on the composition of protein complexes. Tandem affinity purification followed by mass spectrometry (TAP/MS) is an example of the latter that identifies proteins that co-purify with a given tagged query (bait) protein. Though TAP/MS enables these co-complexed associations to be identified on a proteome scale, the amount of data generated by the systematic querying of thousands of proteins can be extremely large. Data from multiple purifications are combined to form a very large network of proteins linked by edges whenever the corresponding pairs might form an association. Only a fraction of these pairwise associations correspond to physical interactions, however, and further computational analysis is necessary to filter out non-specific associations. This thesis examines how differing computational procedures for the analysis of TAP/MS data can affect the final PPI network, and outlines a procedure to accurately identify protein complexes from data consolidated from multiple proteome-scale TAP/MS experiments in the budding yeast \textit{Saccharomyces cerevisiae}. In collaboration with the Greenblatt and Emili laboratories at the University of Toronto, this methodology was extended to yeast membrane proteins to derive a comprehensive network of 13,343 PPIs and 720 protein complexes spanning both membrane and non-membrane proteins. Proteomics Bioinformatics 0487
389	Genetic analysis of 100 loci for coronary artery disease and associated phenotypes in a founder population Paré, Guillaume. January 2006 (has links) Coronary artery disease (CAD) is a major health concern for both developed and developing countries. With a heritability estimated at around 50%, there is a strong rationale to better define the genetic contribution of CAD. In order to do so, my thesis project consists in the genetic analysis of over 1400 individuals from the Saguenay Lac St-Jean region using 1536 single nucleotide polymorphisms in 103 candidate genes for CAD. Using this data, suggestive linkage for HDL cholesterol was found on chromosome 1 and several significant associations were observed with lipoprotein-related traits as well as adiponectin plasma concentration, including two novel associations. Biology, Genetics. Biology, Bioinformatics.
390	A data-intensive assessment of the species-abundance distribution Baldridge, Elita 13 May 2015 (has links) <p> The hollow curve species abundance distribution describes the pattern of large numbers of rare species and a small number of common species in a community. The species abundance distribution is one of the most ubiquitous patterns in nature and many models have been proposed to explain the mechanisms that generate this pattern. While there have been numerous comparisons of species abundance distribution models, most of these comparisons only use a small subset of available models, focus on a single ecosystem or taxonomic group, and fail to use the most appropriate statistical methods. This makes it difficult to draw general conclusions about which, if any, models provide the best empirical fit to species abundance distributions. I compiled data from the literature to significantly expand the available data for underrepresented taxonomic groups, and combined this with other macroecological datasets to perform comprehensive model comparisons for the species abundance distribution. A multiple model comparison showed that most available models for the species abundance distribution fit the data equivalently well across a diverse array of ecosystems and taxonomic groups. In addition, a targeted comparison of the species abundance distribution predicted by a major ecological theory, the unified neutral theory of biodiversity (neutral theory), against a non-neutral model of species abundance, demonstrates that it is difficult to distinguish between these two classes of theory based on patterns in the species abundance distribution. In concert, these studies call into question the potential for using the species abundance distribution to infer the processes operating in ecological systems.</p> Biology, Ecology\|Biology, Bioinformatics

Search results