• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 3
  • 2
  • 2
  • Tagged with
  • 19
  • 19
  • 12
  • 7
  • 6
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

LDA-based dimensionality reduction and domain adaptation with application to DNA sequence classification

Mungre, Surbhi January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Doina Caragea / Several computational biology and bioinformatics problems involve DNA sequence classification using supervised machine learning algorithms. The performance of these algorithms is largely dependent on the availability of labeled data and the approach used to represent DNA sequences as {\it feature vectors}. For many organisms, the labeled DNA data is scarce, while the unlabeled data is easily available. However, for a small number of well-studied model organisms, large amounts of labeled data are available. This calls for {\it domain adaptation} approaches, which can transfer knowledge from a {\it source} domain, for which labeled data is available, to a {\it target} domain, for which large amounts of unlabeled data are available. Intuitively, one approach to domain adaptation can be obtained by extracting and representing the features that the source domain and the target domain sequences share. \emph{Latent Dirichlet Allocation} (LDA) is an unsupervised dimensionality reduction technique that has been successfully used to generate features for sequence data such as text. In this work, we explore the use of LDA for generating predictive DNA sequence features, that can be used in both supervised and domain adaptation frameworks. More precisely, we propose two dimensionality reduction approaches, LDA Words (LDAW) and LDA Distribution (LDAD) for DNA sequences. LDA is a probabilistic model, which is generative in nature, and is used to model collections of discrete data such as document collections. For our problem, a sequence is considered to be a ``document" and k-mers obtained from a sequence are ``document words". We use LDA to model our sequence collection. Given the LDA model, each document can be represented as a distribution over topics (where a topic can be seen as a distribution over k-mers). In the LDAW method, we use the top k-mers in each topic as our features (i.e., k-mers with the highest probability); while in the LDAD method, we use the topic distribution to represent a document as a feature vector. We study LDA-based dimensionality reduction approaches for both supervised DNA sequence classification, as well as domain adaptation approaches. We apply the proposed approaches on the splice site predication problem, which is an important DNA sequence classification problem in the context of genome annotation. In the supervised learning framework, we study the effectiveness of LDAW and LDAD methods by comparing them with a traditional dimensionality reduction technique based on the information gain criterion. In the domain adaptation framework, we study the effect of increasing the evolutionary distances between the source and target organisms, and the effect of using different weights when combining labeled data from the source domain and with labeled data from the target domain. Experimental results show that LDA-based features can be successfully used to perform dimensionality reduction and domain adaptation for DNA sequence classification problems.
2

IN VITRO AND IN VIVO CHARACTERIZATION OF A TRANS EXCISION-SPLICING RIBOZYME

Baum, Dana Ann 01 January 2005 (has links)
Group I introns are catalytic RNAs with the ability to splice out of RNA transcripts, often without the aid of proteins. These self-splicing introns have been reengineered to create ribozymes with the ability to catalyze reactions. One such ribozyme, derived from a Pneumocystis carinii group I intron, has been engineered to sequence specifically remove a targeted segment from within an RNA substrate, which is called the trans excision-splicing reaction.The two catalytic steps of the trans excision-splicing reaction occur at positions on the substrate known as the 5' and 3' splice sites. Strict sequence requirements at these sites could potentially limit the target choices for the trans excision-splicing ribozyme, so the sixteen possible base pair combinations at the 5' splice site and the four possible nucleotides at the 3' splice site were tested for reactivity. All base pair combinations at the 5' splice site allow the first reaction step (5' hydrolysis) to occur and several combinations allow the second step to occur, resulting in trans excision-splicing product formation. Moreover, we found that non-Watson-Crick base pairs are important for 5' splice site recognition and prevent product degradation via hydrolysis at other sequence positions. The sequence requirement at the 3' splice site is absolute, as guanosine alone produced complete product.To date, the experiments with the trans excision-splicing ribozyme have been conducted in vitro. The further development of this ribozyme as a biochemical tool and as a potential therapeutic agent requires in vivo reactivity. Thus, a prokaryotic system was designed and tested to assess the catalytic potential of the trans excision-splicing ribozyme. We show that the ribozyme successfully excised a single, targeted nucleotide from a mutated green fluorescent protein transcript in Escherichia coli. On average, 12% correction was observed as measured by fluorescence and approximately 1.2% correction was confirmed through sequence analysis of isolated transcripts.We have used these studies to further characterize trans excision-splicing ribozymes in vitro and to pave the way for future development of this ribozymereaction in vivo. These results increase our understanding of this ribozyme and advance this reaction as a biochemical tool with potential therapeutic applications.
3

Domain adaptation algorithms for biological sequence classification

Herndon, Nic January 1900 (has links)
Doctor of Philosophy / Department of Computing and Information Sciences / Doina Caragea / The large volume of data generated in the recent years has created opportunities for discoveries in various fields. In biology, next generation sequencing technologies determine faster and cheaper the exact order of nucleotides present within a DNA or RNA fragment. This large volume of data requires the use of automated tools to extract information and generate knowledge. Machine learning classification algorithms provide an automated means to annotate data but require some of these data to be manually labeled by human experts, a process that is costly and time consuming. An alternative to labeling data is to use existing labeled data from a related domain, the source domain, if any such data is available, to train a classifier for the domain of interest, the target domain. However, the classification accuracy usually decreases for the domain of interest as the distance between the source and target domains increases. Another alternative is to label some data and complement it with abundant unlabeled data from the same domain, and train a semi-supervised classifier, although the unlabeled data can mislead such classifier. In this work another alternative is considered, domain adaptation, in which the goal is to train an accurate classifier for a domain with limited labeled data and abundant unlabeled data, the target domain, by leveraging labeled data from a related domain, the source domain. Several domain adaptation classifiers are proposed, derived from a supervised discriminative classifier (logistic regression) or a supervised generative classifier (naïve Bayes), and some of the factors that influence their accuracy are studied: features, data used from the source domain, how to incorporate the unlabeled data, and how to combine all available data. The proposed approaches were evaluated on two biological problems -- protein localization and ab initio splice site prediction. The former is motivated by the fact that predicting where a protein is localized provides an indication for its function, whereas the latter is an essential step in gene prediction.
4

Vliv sekvencí intronů na efektivitu sestřihu v Saccharomyces cerevisiae. / The influence of intron sequences on splicing effectivity in Saccharomyces cerevisiae

Oplová, Michaela January 2015 (has links)
Pre-mRNA splicing is a highly regulated cellular process. The tight cooperation of spliceosome and other splicing factors that enable pre-mRNA cis-elements interpretation results in precise pre-mRNA splicing regulation. Short conserved splicing sequences within introns represent an elementary and indispensable element for intron removal from primary transcript, yet they are not sufficient signals for efficient splicing events. Additional pre-mRNA features affect complex splicing regulation. We took advantage of strains with slightly disrupted spliceosome (prp45(1-169)) to study the effect of ACT1 and MAF1 intronic sequences on splicing efficiency. Here we show, that ACT1 intron region between branch point (BP) and 3' splice site (3'ss) maintains splicing efficiency in mutant cells. However, the specific element within this region was not determined. In addition, results implicate an alternative BP in splicing efficiency modulation in yeast Saccharomyces cerevisiae. Interestingly, this alternative BP is localized in ACT1 intron outside of the BP-3'ss region. Furthermore, splicing factors with potential influence on 3'ss selection were studied. Heterodimer composed of Slu7p and Prp18p participates in 3'ss positioning to the active site of the spliceosome. Splicing analysis of substrates with two...
5

Understanding the Noise : Spliceosomal snRNA Profiling

Conze, Lei Liu January 2012 (has links)
The concept of the gene has been constantly challenged by new discoveries in the life sciences. Recent challenging observations include the high frequency of alternative splicing events and the common transcription of non-protein-coding-RNAs (ncRNAs) from the genome. The latter has long been considered noise in biological systems. Multiple lines of evidence from genomic studies indicate that alternative splicing and ncRNA play important roles in expanding proteome diversity in eukaryotes. Here, the aim is to find the link between alternative splicing and ncRNAs by studying the expression profile of the spliceosomal snRNAs (U snRNA). Spliceosomal snRNAs are essential for pre-mRNA splicing in eukaryotes. They participate in splice site selection, recruitment of protein factors and catalyzing the splicing reaction. Because of this, both the abundance and diversity of U snRNAs were expected to be large. In our study we deeply analyzed the U snRNA population in primates using a combination of bioinformatical, biochemical and high throughput sequencing approaches. This transcriptome profiling has revealed that human, chimpanzee and rhesus have similar U snRNA populations, i.e. the vast majority of U snRNAs originate from few well-defined gene loci and the heterogeneity observed in U snRNA populations was largely due to the presence of SNPs at these loci. It seems that the gene loci that could potentially encode a significantly heterogeneous population of U snRNAs are mostly silent. Only few minority transcripts were detected in our study, and among them three U1-like snRNAs might play a role in the regulation of alternative splicing by recognizing non-canonical splicing sites. Mutations of U snRNA have been shown to impact the splicing process. Therefore, our study provides a reference to study the biological significance of SNPs in U snRNA genes and their association with diseases.
6

Computational Approaches to the Identification and Characterization of Non-Coding RNA Genes

Larsson, Pontus January 2009 (has links)
Non-coding RNAs (ncRNAs) have emerged as highly diverse and powerful key players in the cell, the range of capabilities spanning from catalyzing essential processes in all living organisms, e.g. protein synthesis, to being highly specific regulators of gene expression. To fully understand the functional significance of ncRNAs, it is of critical importance to identify and characterize the repertoire of ncRNAs in the cell. Practically every genome-wide screen to identify ncRNAs has revealed large numbers of expressed ncRNAs and often identified species-specific ncRNA families of unknown function. Recent years' advancement in high-throughput sequencing techniques necessitates efficient and reliable methods for computational identification and annotation of genes. A major aim in the work underlying this thesis has been to develop and use computational tools for the identification and characterization of ncRNA genes. We used computational approaches in combination with experimental methods to study the ncRNA repertoire of the model organism Dictyostelium discoideum. We report ncRNA genes belonging to well-characterized gene families as well as previously unknown and potentially species-specific ncRNA families. The complicated task of de novo ncRNA gene prediction was successfully addressed by developing a method for nucleotide composition-based gene prediction using maximal-scoring partial sums and considering overlapping dinucleotides. We also report a substantial heterogeneity among human spliceosomal snRNAs. Northern blot analysis and cDNA cloning, as well as bioinformatical analysis of publicly available microarray data, revealed a large number of expressed snRNAs. In particular, U1 snRNA variants with several nucleotide substitutions that could potentially have dramatic effects on splice site recognition were identified. In conclusion, we have by using computational approaches combined with experimental analysis identified a rich and diverse ncRNA repertoire in the eukaryotes D. discoideum and Homo sapiens. The surprising diversity among the snRNAs in H. sapiens suggests a functional involvement in recognition of non-canonical introns and regulation of messenger RNA splicing.
7

MOLECULAR RECOGNITION PROPERTIES AND KINETIC CHARACTERIZATION OF TRANS EXCISION-SPLICING REACTION CATALYZED BY A GROUP I INTRON-DERIVED RIBOZYME

Sinha, Joy 01 January 2006 (has links)
Group I introns belong to a class of large RNAs that catalyze their own excision from precursor RNA through a two-step process called self-splicing reaction. These self-splicing introns have often been converted into ribozymes with the ability site specifically cleave RNA molecules. One such ribozyme, derived from a self-splicing Pneumocystis carinii group I intron, has subsequently been shown to sequence specifically excise a segment from an exogenous RNA transcript through trans excision-splicing reaction.The trans excision-splicing reaction requires that the substrate be cleaved at two positions called the 5' and 3' splice sites. The sequence requirements at these splice sites were studied. All sixteen possible base pair combinations at the 5' splice site and the four possible nucleotides at the 3' splice site were tested for reactivity. It was found that all base pair combinations at the 5' splice site allow the first reaction step and seven out of sixteen combinations allow the second step to occur. Moreover, it was also found that non-Watson-Crick base pairs are important for 5' splice site recognition and suppress cryptic splicing. In contrast to the 5' splice site, 3' splice site absolutely requires a guanosine.The pathway of the trans excision-splicing reaction is poorly understood. Therefore, as an initial approach, a kinetic framework for the first step (5' cleavage) was established. The framework revealed that substrate binds at a rate expected for RNA-RNA helix formation. The substrate dissociates with a rate constant (0.9 min-1), similar to that for substrate cleavage (3.9 min-1). Following cleavage, the product dissociation is slower than the cleavage, making this step rate limiting for multiple-turnover reactions. Furthermore, evidence suggests that P10 helix forms after the 5' cleavage step and a conformational change exists between the two reaction steps of trans excision-splicing reaction. Combining the data presented herein and the prior knowledge of RNA catalysis, provide a much more detailed view of the second step of the trans excision-splicing reaction.These studies further characterize trans excision-splicing reaction in vitro and provide an insight into its reaction pathway. In addition, the results describe the limits ofthe trans excision-splicing reaction and suggest how key steps can be targeted for improvement using rational ribozyme design approach.
8

Caracterização molecular (PCR) e infecção de Metarhizium anisopliae var. acridum e Metarhizium anisopliae em Zaprionus indianus

LEÃO, Mariele Porto Carneiro January 2006 (has links)
Made available in DSpace on 2014-06-12T15:05:47Z (GMT). No. of bitstreams: 2 arquivo4602_1.pdf: 941507 bytes, checksum: c0ae7f823963e2cbd711e367c4dd1d23 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2006 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Foram analisadas as linhagens Metahizium anisopliae var. acridum e Metarhizium anisopliae var. anisopliae quanto à patogênicidade sobre Zaprinus indianus, utilizando as concentrações 104, 105, 106, 107, 108 conídios/mL considerando o percentual de emergência de adultos. De acordo com a metodologia empregada verificou-se que as duas linhagens apresentaram ação contra Z. indianus. Os marcadores moleculares ITS (Internal Trancride Spacer) do rDNA, Intron Splice Site Primer e Microssatélite (SSR- Simple Sequence Repeats), foram utilizados para avaliar a diversidade genética entre as linhagens antes e após a passagem pela mosca. A análise de agrupamento usando o método de UPGMA baseada nas distâncias genéticas dos marcadores moleculares confirmou a diversidade genética reconhecida no gênero Metarhizium. O microssatélite (GTG)5 e o intron do grupo mRNA nuclear tiveram a mesma sensibilidade em detectar a variabilidade genética entre as linhagens de Metarhizium . Os produtos de amplificação dos loci ITS1-5.8-ITS2 do rDNA com os iniciadores ITS4 e ITS5 foram eficientes em demonstrar que as linhagens estudadas pertence à espécie Metarhizium anisopliae, apesar da diversidade genética demonstrada pelos marcadores (GTG)5 e EI1. Os perfis de amplificações da região microssatélite, intron e ITS após a passagem por Z. indianus comprovaram que as linhagens reisoladas foram às mesmas que foram utilizadas para infectar
9

Caracterização molecular de espécies deMetarhizium e patogenicidade sobre Diatraeasaccharalis

LIMA, Maria do Livramento Ferreira January 2005 (has links)
Made available in DSpace on 2014-06-12T15:03:44Z (GMT). No. of bitstreams: 2 arquivo4555_1.pdf: 1910369 bytes, checksum: 5a0109c9538c737e8440a1c2af5b2757 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2005 / Foram analisadas 15 linhagens de Metarhizium isoladas de diferentes regiões e hospedeiros quanto às características genéticas e 7 linhagens quanto a patogenicidade sobre Diatraea saccharalis. Os marcadores moleculares ITS (Internal Transcrided Spacer) do rDNA, Intron splice site primer, RAPD e Microssatélites (SSR-Simple Sequence Repeats) foram utilizados para avaliar a diversidade genética entre as linhagens. A análise de agrupamento usando o método UPGMA baseada nas distâncias genéticas dos quatro marcadores moleculares confirmou a diversidade genética reconhecida no gênero Metarhizium. As enzimas de restrição, HaeIII e MspI, evidenciaram a diversidade genética entre as linhagens ao digerirem os produtos de amplificação do locus ITS1-5.8S-ITS2 do rDNA com os iniciadores ITS4 e ITS5, a enzima DraI não apresentou sítios de restrição. Os introns do grupo mRNA nuclear discriminaram as linhagens de Metarhizium apenas com a utilização do iniciador EI1. As técnicas de RAPD e regiões de Microssatélite foram eficientes em demonstrar a diversidade entre as linhagens. Porém o microssatélite (GACA)4 foi mais sensível em detectar a variabilidade intra e interespecífica entre as diferentes linhagens de Metarhizium. Não houve correlação entre grupos e regiões geográficas. As linhagens 4415, 4400 e 4897 causaram maior percentual de mortalidade das larvas de Diatraea saccharalis. Também não houve correlação entre os agrupamentos gerados pelas técnicas moleculares e percentual de mortalidade de larvas de D. saccharalis
10

Análise da diversidade genética através de marcadores moleculares e características citomorfológicas de Colletotrichum gloeosporioides

SOUSA, Adna Cristina Barbosa de January 2004 (has links)
Made available in DSpace on 2014-06-12T15:04:33Z (GMT). No. of bitstreams: 2 arquivo4458_1.pdf: 786506 bytes, checksum: 30ccb6ed2c83e0fa52dedbb745159fb1 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2004 / Foram analisadas 20 linhagens de C. gloeosporioides quanto às características genéticas e citomorfológicas. Os marcadores moleculares, RAPD, microssatélites, Intron Spice Site Primer e região ITS do DNA ribossomal, foram utilizados para avaliar a diversidade genética entre as linhagens. A análise de agrupamento através do método UPGMA confirmou a diversidade genética intraespecífica reconhecida em C. gloeosporioides. Com a técnica de RAPD foi detectada uma maior similaridade genética entre as linhagens. As regiões de microssatélites investigadas, demonstraram alto polimorfismo genético e os introns discriminaram todos as linhagens apenas com o primer (EI-1), e revelaram maior diversidade genética em relação aos outros marcadores moleculares utilizados. As três enzimas de restrição testadas, HaeIII, DraI e MspI evidenciaram a diversidade genética entre as linhagens nos produtos de amplificação dos loci ITS1-5.8S-ITS2 do rDNA com os primers ITS1 e ITS4. Todos os marcadores empregados, foram eficientes em demonstrar o alto grau de polimorfismo genético, constatado pela formação de grupos altamente diversificados, sem apresentar correlação entre os hospedeiros. Os aspectos macroscópicos exibiram uma variação na coloração, textura e segregação de setores nas colônias, e as observações microscópicas demonstraram a formação de estruturas vegetativas e reprodutivas peculiares da espécie. A condição nuclear investigada através da técnica de HCl-Giemsa, evidenciou conídios 100% uninucleados

Page generated in 0.0441 seconds