• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2450
  • 314
  • 255
  • 242
  • 52
  • 46
  • 31
  • 31
  • 31
  • 31
  • 31
  • 31
  • 20
  • 20
  • 14
  • Tagged with
  • 4117
  • 1475
  • 559
  • 550
  • 529
  • 453
  • 444
  • 442
  • 441
  • 417
  • 340
  • 337
  • 335
  • 332
  • 327
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
951

Normalization of microRNA expression levels in Quantitative RT-PCR arrays

Deo, Ameya January 2010 (has links)
Background: Real-time quantitative Reverse Transcriptase Polymerase Chain Reaction (qRT-PCR) is recently used for characterization and expression analysis of miRNAs. The data from such experiments need effective analysis methods to produce reliable and high-quality data. For the miRNA prostate cancer qRT-PCR data used in this study, standard housekeeping normalization method fails due to non-stability of endogenous controls used. Therefore, identifying appropriate normalization method(s) for data analysis based on other data driven principles is an important aspect of this study. Results: In this study, different normalization methods were tested, which are available in the R packages Affy and qpcrNorm for normalization of the raw data. These methods reduce the technical variation and represent robust alternatives to the standard housekeeping normalization method. The performance of different normalization methods was evaluated statistically and compared against each other as well as with the standard housekeeping normalization method. The results suggest that qpcrNorm Quantile normalization method performs best for all methods tested. Conclusions: The qpcrNorm Quantile normalization method outperforms the other normalization methods and standard housekeeping normalization method, thus proving the hypothesis of the study. The data driven methods used in this study can be applied as standard procedures in cases where endogenous controls are not stable.
952

Pattern Discovery in DNA Sequences

Yan, Rui 20 March 2014 (has links)
A pattern is a relatively short sequence that represents a phenomenon in a set of sequences. Not all short sequences are patterns; only those that are statistically significant are referred to as patterns or motifs. Pattern discovery methods analyze sequences and attempt to identify and characterize meaningful patterns. This thesis extends the application of pattern discovery algorithms to a new problem domain - Single Nucleotide Polymorphism (SNP) classification. SNPs are single base-pair (bp) variations in the genome, and are probably the most common form of genetic variation. On average, one in every thousand bps may be an SNP. The function of most SNPs, especially those not associated with protein sequence changes, remains unclear. However, genome-wide linkage analyses have associated many SNPs with disorders ranging from Crohn’s disease, to cancer, to quantitative traits such as height or hair color. As a result, many groups are working to predict the functional effects of individual SNPs. In contrast, very little research has examined the causes of SNPs: Why do SNPs occur where they do? This thesis addresses this problem by using pattern discovery algorithms to study DNA non-coding sequences. The hypothesis is that short DNA patterns can be used to predict SNPs. For example, such patterns found in the SNP sequence might block the DNA repair mechanism for the SNP, thus causing SNP occurrence. In order to test the hypothesis, a model is developed to predict SNPs by using pattern discovery methods. The results show that SNP prediction with pattern discovery methods is weak (50 2%), whereas machine learning classification algorithms can achieve prediction accuracy as high as 68%. To determine whether the poor performance of pattern discovery is due to data characteristics (such as sequence length or pattern length) or to the specific biological problem (SNP prediction), a survey was conducted by profiling eight representative pattern discovery methods at multiple parameter settings on 6,754 real biological datasets. This is the first systematic review of pattern discovery methods with assessments of prediction accuracy, CPU usage and memory consumption. It was found that current pattern discovery methods do not consider positional information and do not handle short sequences well (<150 bps), including SNP sequences. Therefore, this thesis proposes a new supervised pattern discovery classification algorithm, referred to as Weighted-Position Pattern Discovery and Classification (WPPDC). The WPPDC is able to exploit positional information to identify positionally-enriched motifs, and to select motifs with a high information content for further classification. Tree structure is applied to WPPDC (referred to as T-WPPDC) in order to reduce algorithmic complexity. Compared to pattern discovery methods T-WPPDC not only showed consistently superior prediction accuracy and but generated patterns with positional information. Machine-learning classification methods (such as Random Forests) showed comparable prediction accuracy. However, unlike T-WPPDC, they are classification methods and are unable to generate SNP-associated patterns.
953

Pattern Discovery in DNA Sequences

Yan, Rui 20 March 2014 (has links)
A pattern is a relatively short sequence that represents a phenomenon in a set of sequences. Not all short sequences are patterns; only those that are statistically significant are referred to as patterns or motifs. Pattern discovery methods analyze sequences and attempt to identify and characterize meaningful patterns. This thesis extends the application of pattern discovery algorithms to a new problem domain - Single Nucleotide Polymorphism (SNP) classification. SNPs are single base-pair (bp) variations in the genome, and are probably the most common form of genetic variation. On average, one in every thousand bps may be an SNP. The function of most SNPs, especially those not associated with protein sequence changes, remains unclear. However, genome-wide linkage analyses have associated many SNPs with disorders ranging from Crohn’s disease, to cancer, to quantitative traits such as height or hair color. As a result, many groups are working to predict the functional effects of individual SNPs. In contrast, very little research has examined the causes of SNPs: Why do SNPs occur where they do? This thesis addresses this problem by using pattern discovery algorithms to study DNA non-coding sequences. The hypothesis is that short DNA patterns can be used to predict SNPs. For example, such patterns found in the SNP sequence might block the DNA repair mechanism for the SNP, thus causing SNP occurrence. In order to test the hypothesis, a model is developed to predict SNPs by using pattern discovery methods. The results show that SNP prediction with pattern discovery methods is weak (50 2%), whereas machine learning classification algorithms can achieve prediction accuracy as high as 68%. To determine whether the poor performance of pattern discovery is due to data characteristics (such as sequence length or pattern length) or to the specific biological problem (SNP prediction), a survey was conducted by profiling eight representative pattern discovery methods at multiple parameter settings on 6,754 real biological datasets. This is the first systematic review of pattern discovery methods with assessments of prediction accuracy, CPU usage and memory consumption. It was found that current pattern discovery methods do not consider positional information and do not handle short sequences well (<150 bps), including SNP sequences. Therefore, this thesis proposes a new supervised pattern discovery classification algorithm, referred to as Weighted-Position Pattern Discovery and Classification (WPPDC). The WPPDC is able to exploit positional information to identify positionally-enriched motifs, and to select motifs with a high information content for further classification. Tree structure is applied to WPPDC (referred to as T-WPPDC) in order to reduce algorithmic complexity. Compared to pattern discovery methods T-WPPDC not only showed consistently superior prediction accuracy and but generated patterns with positional information. Machine-learning classification methods (such as Random Forests) showed comparable prediction accuracy. However, unlike T-WPPDC, they are classification methods and are unable to generate SNP-associated patterns.
954

Identification and characterization of sexually dimorphic genes in the developing mouse cortex and hippocampus

Armoskus, Christopher 08 April 2014 (has links)
<p> In both mice and humans, males and females exhibit differences in behavior and response to neurological and psychological diseases that are linked to the cortex and hippocampus. The perinatal exposure of males to testosterone secreted by the testes creates alterations in neural structures and behaviors that can persist throughout their lives; however, the molecular mechanisms that underlie the actions of sex steroids to produce these lasting changes are still unclear. Given that regulation of gene expression is a primary mechanism whereby sex steroids exert changes to an organism, I sought to identify genes expressed at different levels between the sexes in the cortex and hippocampus and to determine the effect of testosterone on expression of these genes. Using gene expression microarrays and RT-qPCR, I identified genes that are differentially expressed between the sexes in the neonatal mouse cortex and hippocampus; however, whether perinatal testosterone is regulating these differences remains unclear.</p>
955

Data analysis in proteomics novel computational strategies for modeling and interpreting complex mass spectrometry data

Sniatynski, Matthew John 11 1900 (has links)
Contemporary proteomics studies require computational approaches to deal with both the complexity of the data generated, and with the volume of data produced. The amalgamation of mass spectrometry -- the analytical tool of choice in proteomics -- with the computational and statistical sciences is still recent, and several avenues of exploratory data analysis and statistical methodology remain relatively unexplored. The current study focuses on three broad analytical domains, and develops novel exploratory approaches and practical tools in each. Data transform approaches are the first explored. These methods re-frame data, allowing for the visualization and exploitation of features and trends that are not immediately evident. An exploratory approach making use of the correlation transform is developed, and is used to identify mass-shift signals in mass spectra. This approach is used to identify and map post-translational modifications on individual peptides, and to identify SILAC modification-containing spectra in a full-scale proteomic analysis. Secondly, matrix decomposition and projection approaches are explored; these use an eigen-decomposition to extract general trends from groups of related spectra. A data visualization approach is demonstrated using these techniques, capable of visualizing trends in large numbers of complex spectra, and a data compression and feature extraction technique is developed suitable for use in spectral modeling. Finally, a general machine learning approach is developed based on conditional random fields (CRFs). These models are capable of dealing with arbitrary sequence modeling tasks, similar to hidden Markov models (HMMs), but are far more robust to interdependent observational features, and do not require limiting independence assumptions to remain tractable. The theory behind this approach is developed, and a simple machine learning fragmentation model is developed to test the hypothesis that reproducible sequence-specific intensity ratios are present within the distribution of fragment ions originating from a common peptide bond breakage. After training, the model shows very good performance associating peptide sequences and fragment ion intensity information, lending strong support to the hypothesis.
956

In Vitro Cell Culture Models to Study Cystic Fibrosis Respiratory Secretions

Peters-Hall, Jennifer Ruth 26 November 2013 (has links)
<p> Cystic fibrosis (CF) is the most common lethal autosomal recessive genetic disorder that affects the Caucasian population. CF is caused by mutations in the CF transmembrane conductance regulator (CFTR), and is characterized by a viscous airway surface liquid (ASL) that impairs mucociliary function and facilitates bacterial infection. The molecular mechanisms by which these symptoms result from CFTR malfunction are unclear. We hypothesized that expression and secretion of innate immune proteins is altered in CF ASL. </p><p> We sought to use cell culture models in which the only source of secreted proteins was differentiated airway epithelium. Since CFTR localizes to the apical surface of airway submucosal glands (SMG) and ciliated epithelium, cell culture models that recapitulate two parts of respiratory tract epithelium were studied: 1) SMG acini and 2) mucociliary epithelium. </p><p> We developed a three-dimensional system wherein CF (&Delta;F508/&Delta;F508) and non-CF human bronchial epithelial (HBE) cells differentiated on Matrigel into polarized glandular acini with mature lumens by two weeks with no significant variability in size. Bronchial acini expressed and secreted SMG proteins, MUC5B and lysozyme, at day 22, and exhibited vectorial secretions that were collected along with acinar cell lysates. Proteome profiling demonstrated unique protein signatures for each cellular space. However, abundant contaminating proteins from Matrigel and growth media were identified. Therefore, the ALI cell culture model of airway epithelium was chosen for quantitative proteomic comparison of CF and non-CF HBE apical secretions because the protein-rich media does not contact the apical surface. </p><p> CF and non-CF HBE cells were labeled by stable isotope labeling with amino acids in cell culture and differentiated at ALI. LC-MS/MS and bioinformatic analysis identified seventy-one proteins with altered levels in CF secretions (+/&minus;1.5 fold-change; p-value&lt;0.05). Validation with antibody based biochemical assays demonstrated increased levels of MUC5AC, MUC5B, fibronectin and MMP9, and increased proteolysis/activation of complement C3, in CF secretions. Overall, the function of altered proteins in the CF secretome is indicative of an airway epithelium in a state of repair and altered immunity in the absence of infection, suggesting the downstream consequences of mutated CFTR in CF airways set the stage for chronic inflammation and infection.</p>
957

Acceleration of Coevolution Detection for Predicting Protein Interactions

Rodionov, Alexandr 25 August 2011 (has links)
Protein function is the ultimate expression of the genetic code of every organism, and determining which proteins interact helps reveal their functions. MatrixMatchMaker (MMM) is a computational method of predicting protein-protein interactions that works by detecting co-evolution between pairs of proteins. Although MMM has several advanced features compared to other co-evolution-based methods, these come at the cost of high computation, and so the goal of this research is to improve the performance of MMM. First we redefine the computational problem posed by the method, and then develop a new algorithm to solve it, achieving a total speedup of 570x over the existing MMM algorithm for a biologically meaningful data set. We also develop hardware which has not yet succeeded in further improving the performance of MMM, but could serve as a platform that could lead to further gains.
958

Pre-mRNA Architecture and Sequence Element Regulation of Alternative Splicing

Mueller, William F. 30 April 2013 (has links)
<p> Human genes are split into regions that code for protein, exons, and regions that don't, introns. Upon transcription, the removal of these intervening introns is necessary if a usable mRNA molecule is to be translated. The process of intron removal and subsequent ligation of exons is called splicing and is carried out by a large complex called the spliceosome. This process is driven by sequence elements within the pre-mRNA itself and is the major contributor of diversity to the human transcriptome. Due to the ubiquitous nature of alternative splicing in almost every multi-exon gene, the regulation pathways of exon inclusion are a subject of wide study. </p><p> The different lengths of introns and exons as well as location of splice sites in a pre-mRNA molecule have been shown to have differing affects on the spliceosomes ability to recognize them. Using <i>in vitro</i> splicing and complex formation assays in parallel with cell transfection experiments, we determined that the distance between two splice sites across the intron or across the exon are strong predictors of splice site usage. Additionally, we found that two splice sites interact differently when placed at different lengths apart. Our findings suggest a mechanism for observed selection of specific intron/exon architectures. </p><p> Splice site recognition is also influenced by the presence of protein binding sequence elements in the pre-mRNA that alter spliceosomal recruitment. Previously, these proteins and sequence elements had been rigidly classified into splice enhancing or inhibiting categories. We show that this rigid classification is incorrect. We found that the location of these elements relative to the splice site determines their enhancing or silencing effect. That is, an enhancing element found upstream of a splice site imposes a silencing effect when relocated downstream of the splice site (and vice versa). </p><p> Spliceosomal proteins are conserved from yeast to humans. The sequence elements used in pre-mRNA sequences have been evolving over time but under pressure from multiple cellular processes, including splicing. To observe the effect of splicing on evolution, we took advantage of the synonymous mutation positions that are under the least amount of selective pressure from the genetic code. We mutated these positions and found that some caused a large decrease in exon inclusion. When we analyzed the comparative alignment data, we found that these specific nucleotide mutations were selected against across species in order to maintain exon inclusion. SNP analysis showed that this pattern of selection was broadly observable at synonymous positions throughout the human genome.</p>
959

Kernel Based Relevance Vector Machine for Classification of Diseases

Tcheimegni, Elie 21 May 2013 (has links)
<p> Motivated by improvements of diseases and cancers depiction that will be facilitated by an ability to predict the related syndrome occurrence; this work employs a data-driven approach to developing cancer classification/prediction models using Relevance Vector Machine (RVM), a probabilistic kernel-based learning machine. </p><p> Drawing from the work of Bertrand Luvision, Chao Dong, and the outcome result classification of electrocardiogram signals by S. Karpagachelvi ,which show the superiority of the RVM approach as compared to traditional classifiers, the problem addressed in this research is to design a program of piping components together in a graphic workflows which could help improve the accuracy classification/regression of two models structure methods (Support vector machines and kernel based Relevance Vector machines) for better prediction performance of related diseases and then make a comparison among both methods using clinical data. </p><p> Would the application of relevance vector machine on these data classification improve their coverage. We developed a hierarchical Bayesian model for binary and bivariate data classification using the RBF, sigmoid kernel, with different parameterization and varied threshold. The parameters of the kernel function are considered as model parameters. The finding results allow us to conclude that RVM is almost equal to SVM on training efficiency and classification accuracy, but RVM performs better on sparse property, generalization ability, and decision speed. </p><p> Meanwhile, the use of RVM raise some issues due to the fact that it used less support vectors but it trains much faster for non-linear kernel than SVM-light. Finally, we test those approaches on a corpus of public release phenotype data. Further research to improve the accuracy prediction with more patients' data is needed. Appendices provide the SVM and RVM derivation in detail. One important area of focus is the development of models for predicting cancers. </p><p> <b>Keywords:</b> Support Vector Machines, Relevance Vector Machine, Rapidminer, Tanagra, Accuracy's values.</p>
960

Analysis and Visualization of Local Phylogenetic Structure within Species

Wang, Jeremy R. 03 July 2013 (has links)
<p> While it is interesting to examine the evolutionary history and phylogenetic relationship between species, for example, in a sort of "tree of life", there is also a great deal to be learned from examining population structure and relationships within species. A careful description of phylogenetic relationships within species provides insights into causes of phenotypic variation, including disease susceptibility. The better we are able to understand the patterns of genotypic variation within species, the better these populations may be used as models to identify causative variants and possible therapies, for example through targeted genome-wide association studies (GWAS). My thesis describes a model of local phylogenetic structure, how it can be effectively derived under various circumstances, and useful applications and visualizations of this model to aid genetic studies. </p><p> I introduce a method for discovering phylogenetic structure among individuals of a population by partitioning the genome into a minimal set of intervals within which there is no evidence of recombination. I describe two extensions of this basic method. The first allows it to be applied to heterozygous, in addition to homozygous, genotypes and the second makes it more robust to errors in the source genotypes. </p><p> I demonstrate the predictive power of my local phylogeny model using a novel method for genome-wide genotype imputation. This imputation method achieves very high accuracy&mdash;on the order of the accuracy rate in the sequencing technology&mdash;by imputing genotypes in regions of shared inheritance based on my local phylogenies. </p><p> Comparative genomic analysis within species can be greatly aided by appropriate visualization and analysis tools. I developed a framework for web-based visualization and analysis of multiple individuals within a species, with my model of local phylogeny providing the underlying structure. I will describe the utility of these tools and the applications for which they have found widespread use.</p>

Page generated in 0.0923 seconds