Global ETD Search

201	Marker extractions in DNA sequences using sub-sequence segmentation tree. January 2005 (has links) Hung Wah Johnson. / Thesis submitted in: August 2004. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 116-121). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Problem Statement --- p.3 / Chapter 1.3 --- Outline of the thesis --- p.6 / Chapter 2 --- Background --- p.8 / Chapter 2.1 --- Biological Background --- p.8 / Chapter 2.2 --- Sequence Alignments --- p.9 / Chapter 2.2.1 --- Pairwise Sequences Alignment --- p.11 / Chapter 2.2.2 --- Multiple Sequences Alignment --- p.15 / Chapter 2.3 --- Neighbor Joining Tree --- p.16 / Chapter 2.4 --- Marker Extractions --- p.18 / Chapter 2.5 --- Neural Network --- p.19 / Chapter 2.6 --- Conclusion --- p.22 / Chapter 3 --- Related Work --- p.23 / Chapter 3.1 --- FASTA --- p.23 / Chapter 3.2 --- Suffix Tree --- p.25 / Chapter 4 --- Sub-Sequence Segmentation Tree --- p.28 / Chapter 4.1 --- Introduction --- p.28 / Chapter 4.2 --- Problem Statement --- p.29 / Chapter 4.3 --- Design --- p.33 / Chapter 4.4 --- Time and space complexity analysis --- p.38 / Chapter 4.4.1 --- Performance Evaluation --- p.40 / Chapter 4.5 --- Summary --- p.48 / Chapter 5 --- Applications: Global Sequences Alignment --- p.51 / Chapter 5.1 --- Introduction --- p.51 / Chapter 5.2 --- Problem Statement --- p.53 / Chapter 5.3 --- Pairwise Alignment --- p.53 / Chapter 5.3.1 --- Algorithm --- p.53 / Chapter 5.3.2 --- Time and Space Complexity Analysis --- p.64 / Chapter 5.4 --- Multiple Sequences Alignment --- p.67 / Chapter 5.4.1 --- The Clustalw Algorithm --- p.68 / Chapter 5.4.2 --- MSA Using SSST --- p.70 / Chapter 5.4.3 --- Time and Space Complexity Analysis --- p.70 / Chapter 5.5 --- Experiments --- p.71 / Chapter 5.5.1 --- Experiment Setting --- p.72 / Chapter 5.5.2 --- Experimental Results --- p.72 / Chapter 5.6 --- Summary --- p.80 / Chapter 6 --- Applications: Marker Extractions --- p.81 / Chapter 6.1 --- Introduction --- p.81 / Chapter 6.2 --- Problem Statement --- p.82 / Chapter 6.3 --- The Multiple Sequence Alignment Approach --- p.85 / Chapter 6.3.1 --- Design --- p.85 / Chapter 6.4 --- Reference Sequence Alignment Approach --- p.88 / Chapter 6.4.1 --- Design --- p.90 / Chapter 6.5 --- Time and Space Complexity Analysis --- p.95 / Chapter 6.6 --- Experiments --- p.95 / Chapter 6.7 --- Summary --- p.99 / Chapter 7 --- HBV Application Framework --- p.101 / Chapter 7.1 --- Motivations --- p.101 / Chapter 7.2 --- The Procedure Flow of the Application --- p.102 / Chapter 7.2.1 --- Markers Extractions --- p.103 / Chapter 7.2.2 --- Rules Training and Prediction --- p.103 / Chapter 7.3 --- Results --- p.105 / Chapter 7.3.1 --- Clustering --- p.106 / Chapter 7.3.2 --- Classification --- p.107 / Chapter 7.4 --- Summary --- p.110 / Chapter 8 --- Conclusions --- p.112 / Chapter 8.1 --- Contributions --- p.112 / Chapter 8.2 --- Future Works --- p.114 / Chapter 8.2.1 --- HMM Learning --- p.114 / Chapter 8.2.2 --- Splice Sites Learning --- p.114 / Chapter 8.2.3 --- Faster Algorithm for Multiple Sequences Alignment --- p.115 / Bibliography --- p.121 Nucleotide sequence--Methodology Hepatitis B virus Sequence Analysis, DNA--methods Sequence Alignment--methods Hepatitis B virus
202	Bioinformatics analyses for next-generation sequencing of plasma DNA. January 2012 (has links) 1997年，Dennis等證明胚胎DNA在孕婦母體中存在的事實開啟了產前無創診斷的大門。起初的應用包括性別鑒定和恒河猴血型系統的識別。隨著二代測序的出現和發展，對外周血游離DNA更加成熟的分析和應用應運而生。例如當孕婦懷孕十二周時，應用二代測序技術在母體外周血DNA中預測胎兒21號染色體是否是三倍體，其準確性達到98%。本論文的第一部分介紹如何應用母體外周血DNA構建胎兒的全基因組遺傳圖譜。這項研究極具挑戰，原因是孕後12周，胎兒對外周血DNA貢獻很小，大多數在10%左右，另外外周血中的胎兒DNA大多數短於200 bp。目前的演算法和程式都不適合於從母體外周血DNA中構建胎兒的遺傳圖譜。在這項研究中，根據母親和父親的基因型，用生物資訊學手段先構建胎兒可能有的遺傳圖譜，然後將母體外周血DNA的測序資訊比對到這張可能的遺傳圖譜上。如果在母親純和遺傳背景下，決定父親的特異遺傳片段，只要定性檢測父親的特異遺傳片段是否在母體外周血中存在。如果在母親雜合遺傳背景下，決定母親的遺傳特性，就要進行定量分析。我開發了單倍型相對劑量分析方案，統計學上判斷母親外周血中的兩條單倍型相對劑量水準，顯著增加的單倍型即為最大可能地遺傳給胎兒的單倍型。單倍型相對劑量分析方案可以加強測序資訊的分析效率，降低測序數據波動，比單個位點分析更加穩定，強壯。 / 隨著靶標富集測序出現，測序價格急劇下降。第一部分運用母親父親的多態位點基因型的組合加上測序的資訊可以計算出胎兒DNA在母體外周血中的濃度。但是該方法的局限是要利用母親父親的多態位點的基因型，而不能直接從測序的資訊中推測胎兒DNA在母體外周血中的濃度。本論文的第二部分，我開發了基於二項分佈的混合模型直接預測胎兒DNA在母體外周血中的濃度。當混合模型的似然值達到最大的時候，胎兒DNA在母體外周血中的濃度得到最優估算。由於靶標富集測序可以提供高倍覆蓋的測序資訊，從而有機會直接根據概率模型識別出母親是純和而且胎兒是雜合的有特異信息量的位點。 / 除了母體外周血DNA水準分析推動產前無創診斷外，表觀遺傳學的分析也不容忽視。在本論文的第三部分，我開發了Methyl-Pipe軟體，專門用於全基因組的甲基化的分析。甲基化測序數據分析比一般的基因組測序分析更加複雜。由於重亞硫酸鹽測序文庫的沒有甲基化的胞嘧啶轉化成尿嘧啶，最後以胸腺嘧啶的形式存在PCR產物中，但是對於甲基化的胞嘧啶則保持不變。因此，為了實現將重亞硫酸鹽處理過的測序序列比對到參考基因組。首先，分別將Watson和Crick鏈的參考基因組中胞嘧啶轉化成全部轉化為胸腺嘧啶，同時也將測序序列中的胞嘧啶轉化成胸腺嘧啶。然後將轉化後的測序序列比對到參考基因組上。最後根據比對到基因組上的測序序列中的胞嘧啶和胸腺嘧啶的含量推到全基因組的甲基化水準和甲基化特定模式。Methyl-Pipe可以用於識別甲基化水平顯著性差異的基因組區別，因此它可以用於識別潛在的胎兒特異的甲基化位點用於產前無創診斷。 / The presence of fetal DNA in the cell-free plasma of pregnant women was first described in 1997. The initial clinical applications of this phenomenon focused on the detection of paternally inherited traits such as sex and rhesus D blood group status. The development of massively parallel sequencing technologies has allowed more sophisticated analyses on circulating cell-free DNA in maternal plasma. For example, through the determination of the proportional representation of chromosome 21 sequences in maternal plasma, noninvasive prenatal diagnosis of fetal Down syndrome can be achieved with an accuracy of >98%. In the first part of my thesis, I have developed bioinformatics algorithms to perform genome-wide construction of the fetal genetic map from the massively parallel sequencing data of the maternal plasma DNA sample of a pregnant woman. The construction of the fetal genetic map through the maternal plasma sequencing data is very challenging because fetal DNA only constitutes approximately 10% of the maternal plasma DNA. Moreover, as the fetal DNA in maternal plasma exists as short fragments of less than 200 bp, existing bioinformatics techniques for genome construction are not applicable for this purpose. For the construction of the genome-wide fetal genetic map, I have used the genome of the father and the mother as scaffolds and calculated the fractional fetal DNA concentration. First, I looked at the paternal specific sequences in maternal plasma to determine which portions of the father’s genome had been passed on to the fetus. For the determination of the maternal inheritance, I have developed the Relative Haplotype Dosage (RHDO) approach. This method is based on the principle that the portion of maternal genome inherited by the fetus would be present in slightly higher concentration in the maternal plasma. The use of haplotype information can enhance the efficacy of using the sequencing data. Thus, the maternal inheritance can be determined with a much lower sequencing depth than just looking at individual loci in the genome. This algorithm makes it feasible to use genome-wide scanning to diagnose fetal genetic disorders prenatally in a noninvasive way. / As the emergence of targeted massively parallel sequencing, the sequencing cost per base is reducing dramatically. Even though the first part of the thesis has already developed a method to estimate fractional fetal DNA concentration using parental genotype informations, it still cannot be used to deduce the fractional fetal DNA concentration directly from sequencing data without prior knowledge of genotype information. In the second part of this thesis, I propose a statistical mixture model based method, FetalQuant, which utilizes the maximum likelihood to estimate the fractional fetal DNA concentration directly from targeted massively parallel sequencing of maternal plasma DNA. This method allows fetal DNA concentration estimation superior to the existing methods in term of obviating the need of genotype information without loss of accuracy. Furthermore, by using Bayes’ rule, this method can distinguish the informative SNPs where mother is homozygous and fetus is heterozygous, which is potential to detect dominant inherited disorder. / Besides the genetic analysis at the DNA level, epigenetic markers are also valuable for noninvasive diagnosis development. In the third part of this thesis, I have also developed a bioinformatics algorithm to efficiently analyze genomewide DNA methylation status based on the massively parallel sequencing of bisulfite-converted DNA. DNA methylation is one of the most important mechanisms for regulating gene expression. The study of DNA methylation for different genes is important for the understanding of the different physiological and pathological processes. Currently, the most popular method for analyzing DNA methylation status is through bisulfite sequencing. The principle of this method is based on the fact that unmethylated cytosine residues would be chemically converted to uracil on bisulfite treatment whereas methylated cytosine would remain unchanged. The converted uracil and unconverted cytosine can then be discriminated on sequencing. With the emergence of massively parallel sequencing platforms, it is possible to perform this bisulfite sequencing analysis on a genome-wide scale. However, the bioinformatics analysis of the genome-wide bisulfite sequencing data is much more complicated than analyzing the data from individual loci. Thus, I have developed Methyl-Pipe, a bioinformatics program for analyzing the DNA methylation status of genome-wide methylation status of DNA samples based on massively parallel sequencing. In the first step of this algorithm, an in-silico converted reference genome is produced by converting all the cytosine residues to thymine residues. Then, the sequenced reads of bisulfite-converted DNA sequences are aligned to this modified reference sequence. Finally, post-processing of the alignments removes non-unique and low-quality mappings and characterizes the methylation pattern in genome-wide manner. Making use of this new program, potential fetal-specific hypomethylated regions which can be used as blood biomarkers can be identified in a genome-wide manner. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Jiang, Peiyong. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 100-105). / Abstracts also in Chinese. / Chapter SECTION I : --- BACKGROUND --- p.1 / Chapter CHAPTER 1: --- Circulating nucleic acids and Next-generation sequencing --- p.2 / Chapter 1.1 --- Circulating nucleic acids --- p.2 / Chapter 1.2 --- Next-generation sequencing --- p.3 / Chapter 1.3 --- Bioinformatics analyses --- p.9 / Chapter 1.4 --- Applications of the NGS --- p.11 / Chapter 1.5 --- Aims of this thesis --- p.12 / Chapter SECTION II : --- Mathematically decoding fetal genome in maternal plasma --- p.14 / Chapter CHAPTER 2: --- Characterizing the maternal and fetal genome in plasma at single base resolution --- p.15 / Chapter 2.1 --- Introduction --- p.15 / Chapter 2.2 --- SNP categories and principle --- p.17 / Chapter 2.3 --- Clinical cases and SNP genotyping --- p.20 / Chapter 2.4 --- Sequencing depth and fractional fetal DNA concentration determination --- p.24 / Chapter 2.5 --- Filtering of genotyping errors for maternal genotypes --- p.26 / Chapter 2.6 --- Constructing fetal genetic map in maternal plasma --- p.27 / Chapter 2.7 --- Sequencing error estimation --- p.36 / Chapter 2.8 --- Paternal-inherited alleles --- p.38 / Chapter 2.9 --- Maternally-derived alleles by RHDO analysis --- p.39 / Chapter 2.1 --- Recombination breakpoint simulation and detection --- p.49 / Chapter 2.11 --- Prenatal diagnosis of β- thalassaemia --- p.51 / Chapter 2.12 --- Discussion --- p.53 / Chapter SECTION III : --- Statistical model for fractional fetal DNA concentration estimation --- p.56 / Chapter CHAPTER 3: --- FetalQuant: deducing the fractional fetal DNA concentration from massively parallel sequencing of maternal plasma DNA --- p.57 / Chapter 3.1 --- Introduction --- p.57 / Chapter 3.2 --- Methods --- p.60 / Chapter 3.2.1 --- Maternal-fetal genotype combinations --- p.60 / Chapter 3.2.2 --- Binomial mixture model and likelihood --- p.64 / Chapter 3.2.3 --- Fractional fetal DNA concentration fitting --- p.66 / Chapter 3.3 --- Results --- p.71 / Chapter 3.3.1 --- Datasets --- p.71 / Chapter 3.3.2 --- Evaluation of FetalQuant algorithm --- p.75 / Chapter 3.3.3 --- Simulation --- p.78 / Chapter 3.3.4 --- Sequencing depth and the number of SNPs required by FetalQuant --- p.81 / Chapter 3.5 --- Discussion --- p.85 / Chapter SECTION IV : --- NGS-based data analysis pipeline development --- p.88 / Chapter CHAPTER 4: --- Methyl-Pipe: Methyl-Seq bioinformatics analysis pipeline --- p.89 / Chapter 4.1 --- Introduction --- p.89 / Chapter 4.2 --- Methods --- p.89 / Chapter 4.2.1 --- Overview of Methyl-Pipe --- p.90 / Chapter 4.3 --- Results and discussion --- p.96 / Chapter SECTION V : --- CONCLUDING REMARKS --- p.97 / Chapter CHAPTER 5: --- Conclusion and future perspectives --- p.98 / Chapter 5.1 --- Conclusion --- p.98 / Chapter 5.2 --- Future perspectives --- p.99 / Reference --- p.100 DNA--Analysis--Data processing Nucleotide sequence--Data processing Bioinformatics Sequence Analysis, DNA--methods Computational Biology--methods
203	Thermodynamics studies of DNA: development of the next nearest-neighbor (NNN) model. January 2001 (has links) Ip Lai Nang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 67-71). / Abstracts in English and Chinese. / ABSTRACT (ENGLISH) --- p.iii / ABSTRACT (CHINESE) --- p.iv / ACKNOWLEDGEMENTS --- p.v / TABLE OF CONTENTS --- p.vi / LIST OF TABLES --- p.viii / LIST OF FIGURES --- p.ix / LIST OF APPENDIX --- p.x / Chapter CHAPTER 1 --- INTRODUCTION --- p.1 / Chapter CHAPTER 2 --- BACKGROUND --- p.3 / Chapter 2.1 --- Structure of DNA --- p.3 / Chapter 2.2 --- Sequence dependent stability --- p.8 / Chapter 2.3 --- Thermodynamics of DNA --- p.9 / Chapter 2.4 --- Model for predicting thermodynamic parameters of DNA sequence --- p.15 / Chapter 2.4.1 --- The nearest-neighbor (NN) model / Chapter 2.4.1.1 --- Background --- p.15 / Chapter 2.4.1.2 --- Method for predicting thermodynamic parameters --- p.16 / Chapter 2.4.1.3 --- Limitation of the NN model --- p.19 / Chapter CHAPTER 3 --- EXPERIMENTAL METHOD --- p.20 / Chapter 3.1 --- Design of DNA sequences PAGE --- p.20 / Chapter 3.2 --- DNA synthesis and purification --- p.22 / Chapter 3.3 --- UV measurement --- p.23 / Chapter CHAPTER 4 --- THE NEXT NEAREST-NEIGHBOR (NNN) MODEL --- p.27 / Chapter 4.1 --- Method for extracting the NNN thermodynamic parameters --- p.30 / Chapter 4.2 --- Discussions --- p.34 / Chapter 4.2.1 --- Comparison of the NN model and the NNN model --- p.34 / Chapter 4.2.2 --- The NNN effect --- p.38 / Chapter 4.2.3 --- Sequence-specific local structure of DNA and the NNN effect / Chapter CHAPTER 5 --- SUMMARY AND FUTURE WORK --- p.49 / APPENDIX I´ؤ XVI --- p.51 / REFERENCE --- p.67 DNA--Structure DNA--Stability Nucleotide sequence Thermodynamics DNA--chemistry Base Sequence Thermodynamics
204	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas / Andrighetti, Tahila. January 2015 (has links) Orientador: José Luiz Rybarczyk Filho / Coorientador: Ney Lemke / Banca: Manuela Leal da Silva / Banca: Laurita dos Santos / Resumo: Comunidades microbianas desempenham papéis cruciais em todos ecosistemas da Terra, uma vez que metabolizam compostos essenciais. Essa característica torna importantes alvos de pesquisas em diversas áreas como médica, ambiental, alimentícia e biotecnológica. Entretanto, somente 1% de todas espécies de micro-organismos conhecidos podem ser cultivadas in vitro, dificultando o estudo de suas funções e de sua classificação taxonômica. Com o surgimento de novas tecnologias de sequenciamento, o genoma inteiro de micro-organismos de um habitat pode ser experimentalmente extraído, mas em pequenos fragmentos (¡1500 pb), tornando o processamento dos dados um grande desafio. As ferramentas de análise de metagenômica mais utilizadas classificam as sequências por homologia. Entretanto, o tempo computacional aumenta exponencialmente conforme o tamanho dos fragmentos diminuem. Isso mostra uma necessidade evidente de métodos alternativos que possam analisar dados de metagenômica de maneira rápida e precisa. Esse estudo propõe um novo método de identificação de sequências de bactérias que analisa esses dados. Os genomas de 2164 linhagens de bactérias foram obtidos pelo GenBank e fragmentados em grupos de teste e controle. Cada grupo foi aleatóriamente fragmentado em sequências de 64, 128, 256, 512, 1024, 2048 e 4096 pares de base. As medidas de organização de sequências aplicadas nos fragmentos foram: conteúdo GC, abundância de dinucleotídeos e entropias de dipletes, tripletes e tetrapletes. Foram calculados a média e o desvio padrão dos valores das sequências controle para cada espécie, gênero e família de bactéria. Foram feitas combinações de medidas para classificar as sequências em famílias, gêneros e espécies. A performance da metodologia foi determinada por medidas de sensibilidade, especificidade, precição e média harmônica para conjuntos de... / Abstract: Microbial communities play a crucial role in all ecosystems on Earth since they metabolize essential compounds. Given this relevant role they are investigated in Medicine, Biotechnology, Ecology, Food Sciences among other fields. However, only 1% of all known micro-organisms species can be cultivated in vitro. The unravelling of their functions and taxonomic classification demands the development of new approaches. With the advent of new sequencing strategies, the entire genome of microrganisms on a given habitat can be experimentally extracted, but the fragments obtained are small (<1500 bps), and the data processing remains a huge challenge. The most used metagenomic analysis tools classify the sequences by homology. However, the computational time grows exponentially as the read length decreases. There is an evident need for alternative methods that can analyze metagenomic data quickly and accurately. This study proposes a new bacteria sequences identification method to be used in metagenomic data. The genomes of 2164 bacterial strains were obtained from the GenBank and distributed into test and control sets. Each group was randomly fragmented into sequences of 64, 128, 256, 512, 1024, 2048, and 4096 base pair. The sequences organization measures applied in the reads were: GC content, dinucleotide abundance and diplets, triplets and tetraplets entropy. The average and standard deviation of the control sequences values of each species, genus and families of bacteria were calculated. Combinations of genomic signatures and entropy were performed allowing classifying bacteria sequences into family, genus and species. The performance of the proposed methodology was determined by measuring sensitivity, specificity, accuracy and harmonic mean for the test set. The results indicated that the GC content presented the best performance among the signatures investigated. We also considered combinations of features, the combination considering GC ... / Mestre Nucleotídeos. Bioinformática. Entropia. Genoma humano. Micro-organismos. Seqüenciamento de nucleotídeo. Nucleotide sequence.
205	Transcrição cooperativa de genes ribossomais em Escherichia coli usando um modelo estocástico e dependente de sequência / Nakajima, Rafael Takahiro. January 2015 (has links) Orientador: Ney Lemke / Banca: Paulo Eduardo Martins Ribolla / Banca: Antônio Sérgio Kimus Braz / Resumo: Transcrição é o processo catalizado por um complexo enzimático, RNA polimerase (RNAP), responsável pela síntese de RNA mensageiro a partir de uma sequência de DNA. Diferentes estudos experimentais foram realizados para investigar esse processo, como técnicas bioquímicas, de pinça ótica ou magnética, microscopia de força atômica e fluorescência de molécula única. Com os estudos bioquímicos, por exemplo, sabe-se que várias RNAPs podem transcrever uma sequência simultaneamente. O número de diferentes moléculas depende da necessidade da célula, número de RNAP livres, regeneração do promotor e fatores de transcrição. Um dos sistemas mais investigados para o estudo desse processo é a síntese dos genes ribossomais em Escherichia coli. Os genes ribossomais são fundamentais na fisiologia dos organismos, são expressos abundantemente e existem evidências da aceleração da transcrição devido ao comportamento colaborativo entre as RNAPs. Neste trabalho, propomos simular a transcrição múltipla dos genes ribossomais em E. coli com o modelo estocástico e dependente de sequências desenvolvido em nosso laboratório. As reações químicas foram simuladas utilizando o algoritmo de Gillespie. Essa metodologia apresenta uma boa relação entre custo computacional e realismo biológico e inclui alguns parâmetros não utilizados em estudos teóricos prévios. O modelo considera o alongamento em back e forward tracking, identificando os sítios de pausas e colisões entre RNAPs, determinando o tempo de permanência e predizendo a ocorrência de transcrição abortiva ou a aceleração da transcrição devido ao fenômeno colaborativo entre múltiplas RNAPs. A sequência do operon ribossomal rrnB da E. coli foi simulada variando o número de RNAP (R), a força de interação entre RNAPs (F) e a concentração de nucleosídeo trifosfato ([NTP]). Nossos resultados mostraram-se promissores quando utilizamos uma... / Abstract: The process that produces messenger RNAs from DNA sequences is called transcription, and these reactions are catalyzed by the RNA polimerase enzyme. Many different experimental techniques have been applied to investigate this process including biochemical techniques, optical and magnetic tweezers, atomic force microscopy and single molecule florescence. These biochemical process studies showed that many RNAP molecules operate simultaneously on a single DNA strand. The number of different molecules depends on cellular demands, concentration of free RNAPs, promoter strength and the presence of transcription factors. Escherichia coli ribosomal genes are a popular experimental model to investigate the transcription process. These genes are essential to cell physiology, and they are strongly expressed. There are evidences that some cellular mechanisms collaborate to accelerate their transcription. In this work we investigate the RNAP collaborative transcription in E. coli ribosomal genes using a stochastic and sequence dependent model proposed by our group. The chemical reactions were simulated using a model based on the Gillespie algorithm. This methodology is a good compromise between computational cost and biological realism and includes some ingredients that were missing in previous theoretical studies. The model considers back and forward tracking elongation and it identifies pauses by determining the dwell time on specific sites. The model also predicts abortive transcription and transcription acceleration due to collaborative RNAP interaction. The E. coli rrnB ribosomal operon sequence was simulated by varying (i) the number of RNAP (R) on the DNA strand, (ii) the interaction force between two colliding RNAPs (F) and (iii) the concentration of nucleoside triphosphate ([NTP]). Our results are promising for F =15 pN, R = 50 and [NTP] ... / Mestre Escherichia coli. Reação em cadeia da polimerase. Análise estocástica. Seqüenciamento de nucleotídeo. Marcadores genéticos. Genetic markers. Nucleotide sequence.
206	Analysis of nonsense-mediated decay targeted RNA (nt-RNA) in high-throughput sequencing data / CUHK electronic theses & dissertations collection January 2015 (has links) Nonsense-mediated mRNA decay (NMD) is an important protective mechanism to guard against erroneous transcripts particularly mRNA transcripts containing premature termination codons (PTC). In classical teaching, such erroneous transcripts (called nonsense-mediated decay targeted RNA, nt-RNA here) are considered as incidental non-specific side-products of the cellular transcription machinery and they are rapidly cleared by NMD and thus they exists in scanty quantity inside a cell (i.e. at a very low steady state abundance). As a side product of stochastic transcriptional error, they are also commonly considered to carry no biologic function. / By analysis of a large collection of RNA-seq data in TCGA (over 4000 samples and the hard disk storage was over 50 TB), it was found that nt-RNA were produced in large amount for some genes, sometimes, they were even more abundant than the normal transcripts of the corresponding genes. / Based on the hypothesis that some nt-RNA are specifically produced by a biological process (in contrast to a process happened by chance), the aims of this work are: 1) To quantify the expression of nt-RNA (survey of the spectrum); 2) To examine the relationship between nt-RNA and protein expression (biological roles); 3) To detect nt-RNAs that affect prognosis of cancer (biological roles); 4) To apply nt-RNA as diagnostic biomarkers for cancer (application); 5) To identify nt-RNAs to classify tumors for unknown primary (CUP, application). / Firstly, nt-RNA were defined from Gene databases and all PTC containing transcripts were compared to their corresponding normal transcripts to locate specific signature tags (both short segments of sequences and splice junctions) for each of the nt-RNA. And the presence and counts of these nt-RNA signature tag were searched in all RNA reads of RNA-seq datasets. Such search and counting produced the read counts of each nt-RNA signature tag and all RNA-read containing such tags are targets for NMD. RNA-seq datasets used in this study included TCGA normal samples, TCGA tumor samples and cancer cell lines for 13 cancer types. / In the example of KIRC, it was found that most differentially expressed nt-RNA (tumor vs control) were related to differential expression of the corresponding normal transcripts. However, nt-RNA were produced in 900 genes which were independent of higher production of the normal transcripts. In the example of KIRC, collection of 12 genes in the proteasome ubiquitination pathway standed out among the highly produced nt-RNA. This finding is very interesting as VHL-HIF1A is a key oncogenesis mechanism in KIRC and normal HIF1A degradation required proteasomal ubiquitination pathway. GO analysis was highly significant at p-value<4.11E-05. And the nt-RNA producing genes included PSMB4, PSMD14, PSMC6, PSMD13, PSMB1, VCP, ANAPC5, PSMA4, PSMD3, ANAPC7, OS9, GCLC. / Secondly, some nt-RNA retarded translation of the normal transcripts. By using proteome data, the relationship between quantity of nt-RNA unique tags and normal protein product were analyzed by ANOVA comparison of linear models. It was found that 422 nt-RNA unique tags influenced the expression of proteins, which suggested a potential biological action of these nt-RNA. PTEN also produced nt-RNA in KIRC and tumor cells with higher PTEN nt-RNA had a lower PTEN protein level (p-value of ANOVA comparison of linear models: 0.017). Survival analysis results showed that PTEN nt-RNA levels affected survival, which suggested that it can be used as biomarker for prognosis. Furthermore, survival analysis were done for other nt-RNA unique tags which affected protein expression using clinical data. / Thirdly, the application of nt-RNA as diagnostic markers and markers to define tumor origin in CUP were examined. nt-RNA were identified in different types of tumors. Here, only nt-RNA that were independent of the normal gene transcripts in term of differential expression were used as biomarkers. By comparing tumor samples with normal samples, nt-RNAs as diagnostic markers were detected. Unsupervised clustering was performed for these nt-RNAs and heat maps showed high degree of separation of tumor and normal samples. For studying tumor origin in CUP, in both cross-validation study in the training dataset (N=541) and independent sample set external validation (N=2462), a highly discriminating sets of nt-RNAs were defined for most cancers examined (400 nt-RNA seq. tags). Unsupervised clustering was performed for the 400 nt-RNA seq. tags and heat maps showed its power to define tumor origin in CUP. And then the significance of classifier formed by 400 nt-RNA seq. tags was measured by performing 100 resampling of the training set. The results for the 100 resampling showed that the correctly classified instance rate for training set had 96.4895% ± 0.75% (mean ± standard deviation); for validation set had 91.0239% ± 1.032611%. / In conclusion, this study showed nt-RNA can have important biological function and be used for various applications. It’s a potential biomarker for diagnosis and prognosis of diseases. And it can also be used to decide the origin site of tumors, which indicates that nt-RNA will provide great information for potential application in diagnosis of cancer and determining the origin in cancer of unknown primary site (CUP). [With diagram] / 無意介導的mRNA降解（NMD）是一種重要的保護機制，它可以防止錯誤的轉錄本，特別是含有提前終止密碼子的轉錄本。在經典的教學里，這種錯誤的轉錄本（這裡稱為無意介導的mRNA降解所靶向的轉錄本，記為nt-RNA）被認為是細胞轉錄過程中偶然產生的非特異性的副產物，它們很快被NMD清除，因此它們在細胞內的表達很少（即穩態時它們的表達量很少）。作為隨機的轉錄錯誤的一個副產物，它們通常被認為是沒有生物功能的。 / 通過分析大量的來自TCGA的RNA-seq的數據（超過4000個樣本，存儲空間超過50TB），我們發現一些基因的nt-RNA有很高的表達量，有的甚至超過同一個基因的正常轉錄本的表達量。 / 我們的假設是一些nt-RNA是由某個生物過程特定產生的，而不是偶然產生的。基於這一假設，本研究的目標有：（1）量化nt-RNA的表達（表達譜的調查）；（2）探索nt-RNA與蛋白質表達的關係（生物功能）；（3）尋找可以影響癌症預後的nt-RNA（生物功能）；（4）用nt-RNA作為癌症診斷的生物標記物（應用）；（5）識別可以用來區分原发灶不明的癌症的nt-RNA（應用）。 / 首先，通過基因的數據庫定義nt-RNA，并將這些nt-RNA與相應的正常的轉錄本進行比較，找到每個nt-RNA特有的標簽（包括系列的片段和剪接位点）。進而在RNA-seq數據所有的讀段中搜索這些nt-RNA特有的標簽并記數。通過這樣的搜索和記數，產生了每個nt-RNA特有標簽的讀段數目，而包含這些標簽的讀段就是NMD的靶標。本研究中使用的RNA-seq數據包含13種癌症的TCGA正常和癌症樣本，以及癌細胞系的樣本數據。 / 在腎癌的例子中，大多數差異表達（癌症與正常比較）的nt-RNA和它相應的正常的轉錄本的差異表達是有關聯的。然而，900个基因產生的nt-RNA與正常轉錄本的高表達是獨立的。我們發現與白酶體泛素化通路相關的12個基因高表達nt-RNA。這個發現是很有意思的，因為VHL-HIF1A是KIRC的一個重要的致癌機制，而正常的HIF1A的降解需要通過白酶體泛素化通路。白酶體泛素化通路在基因富集分析中是顯著的（p值<4.11E-05）。這12個基因分別是PSMB4，PSMD14，PSMC6，PSMD13，PSMB1，VCP，ANAPC5，PSMA4，PSMD3，ANAPC7，OS9，GCLC。 / 其次，一些nt-RNA可以降低正常轉錄本的翻譯。利用蛋白組數據，我們用ANOVA比較線性模型的方法研究了nt-RNA特有的標簽與正常的蛋白產物的關係。結果發現，422个nt-RNA特有的標簽影響蛋白質的表達，這說明nt-RNA具有潛在的生物作用。PTEN也在KIRC裡產生nt-RNA，PTEN的nt-RNA表達越高的樣本，含有越少的PTEN蛋白產物（ANOVA比較線性模型的p值=0.017）。生存分析的結果顯示PTEN的nt-RNA影響生存率，這說明PTEN的nt-RNA可以作為癌症預後的生物標記物。進一步，對其他的影響蛋白表達的nt-RNA特有的標簽也做了生存分析。 / 最後，我檢查了nt-RNA作為診斷標記物和用來定義原发灶不明的癌症（CUP）的起源的標記物的兩大應用。只有在差異表達方面獨立於正常轉錄本的那些nt-RNA會被用作生物標記物。通過比較癌症和正常的樣本，檢查了哪些nt-RNA可以作為診斷標記物。利用無監督的聚類分析和熱圖顯示了這些nt-RNA可以很明顯地將癌症和正常樣本分開。在研究原发灶不明的癌症（CUP）的起源中，通過對訓練集（N=541）和獨立的外部驗證集（N=2462）進行交叉驗證學習，定義了一個可以識別大多數癌症樣本的nt-RNA標簽集（400個nt-RNA特有的片段標簽）。無監督的聚類分析和熱圖顯示了用這些nt-RNA定義原发灶不明的癌症（CUP）的起源的能力。隨後，通過從訓練集的樣本隨機抽樣100次，檢查了由400個nt-RNA特有的片段標簽組成的分類器的顯著性。100次隨機抽樣的結果顯示：對訓練集，樣本準確分類率的均值和標準差分別是96.4895%和0.75%；對驗證集，樣本準確分類率的均值和標準差分別是91.0239%和1.032611%。 / 總之，本研究顯示了nt-RNA有重要的生物功能和多種應用。它是癌症診斷和預後的潛在的生物標記物。它也可以被用來決定癌症的原发灶，這意味著nt-RNA將會為癌症診斷和決定原发灶不明的癌症的原发灶的這些潛在應用提供很好的信息。[附圖] / Hu, Fuyan. / Thesis Ph.D. Chinese University of Hong Kong 2015. / Includes bibliographical references (leaves 173-211). / Abstracts also in Chinese. / Title from PDF title page (viewed on 12, October, 2016). / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. Genetic transcription Nucleotide sequence Nonsense Mediated mRNA Decay Base Sequence QH450.2 .H83 2015
207	Development of bioinformatics platforms for methylome and transcriptome data analysis. January 2014 (has links) 高通量大規模並行測序技術，又称為二代測序（NGS），極大的加速了生物和醫學研究的進程。隨著測序通量和複雜度的不斷提高，在分析大量的資料以挖掘其中的資訊的過程中，生物訊息學變得越發重要。在我的博士研究生期間（及本論文中），我主要從事於以下兩個領域的生物訊息學演算法的開發：DNA甲基化資料分析和基因間區長鏈非編碼蛋白RNA（lincRNA）的鑒定。目前二代測序技術在這兩個領域的研究中有著廣泛的應用，同時急需有效的資料處理方法來分析對應的資料。 / DNA甲基化是一種重要的表觀遺傳修飾，主要用來調控基因的表達。目前，全基因組重亞硫酸鹽測序（BS-seq）是最準確的研究DNA甲基化的實驗方法之一，該技術的一大特點就是可以精確到單個堿基的解析度。為了分析BS-seq產生的大量測序數據，我參與開發並深度優化了Methy-Pipe軟體。Methy-Pipe集成了測序序列比對和甲基化程度分析，是一個一體化的DNA甲基化資料分析工具。另外，在Methy-Pipe的基礎上，我又開發了一個新的用於檢測DNA甲基化差異區域（DMR）的演算法，可以用於大範圍的尋找DNA甲基化標記。Methy-Pipe在我們實驗室的DNA甲基化研究項目中得到廣泛的應用，其中包括基於血漿的無創產前診斷（NIPD）和癌症的檢測。 / 基因間區長鏈非編碼蛋白RNA（lincRNA）是一種重要的調節子，其在很多生物學過程中發揮作用，例如轉錄後調控，RNA的剪接，細胞老化等。lincRNA的表達具有很強的組織特異性，因此很大一部分lincRNA還沒有被發現。最近，全轉錄組測序技術（RNA-seq）結合基因從頭組裝，為新的lincRNA鑒定以及構建完整的轉錄組列表提供了最有力的方法。然而，有效並準確的從大量的RNA-seq測序數據中鑒定出真實的新的lincRNA仍然具有很大的挑戰性。為此，我開發了兩個生物訊息學工具：1）iSeeRNA，用於區分lincRNA和編碼蛋白RNA（mRNA）；2）sebnif，用於深層次資料篩選以得到高品質的lincRNA列表。這兩個工具已經在多個生物學系統中使用並表現出很好的效果。 / 總的來說，我開發了一些生物訊息學方法，這些方法可以幫助研究人員更好的利用二代測序技術來挖掘大量的測序數據背後的生物學本質，尤其是DNA甲基化和轉錄組的研究。 / High-throughput massive parallel sequencing technologies, or Next-Generation Sequencing (NGS) technologies, have greatly accelerated biological and medical research. With the ever-growing throughput and complexity of the NGS technologies, bioinformatics methods and tools are urgently needed for analyzing the large amount of data and discovering the meaningful information behind. In this thesis, I mainly worked on developing bioinformatics algorithms for two research fields: DNA methylation data analysis and large intergenic noncoding RNA discovery, where the NGS technologies are in-depth employed and novel bioinformatics algorithms are highly needed. / DNA methylation is one of the important epigenetic modifications to control the transcriptional regulations of the genes. Whole genome bisulfite sequencing (BS-seq) is one of the most precise methodologies for DNA methylation study which allows us to perform whole methylome research at single-base resolution. To analyze the large amount of data generated by BS-seq experiments, I have co-developed and optimized Methy-Pipe, an integrated bioinformatics pipeline which can perform both sequencing read alignment and methylation state decoding. Furthermore, I’ve developed a novel algorithm for Differentially Methylated Regions (DMR) mining, which can be used for large scale methylation marker discovery. Methy-Pipehas been routinely used in our laboratory for methylomic studies, including non-invasive prenatal diagnosis and early cancer detections in human plasma. / Large intergenic noncoding RNAs, or lincRNAs, is avery important novel family of gene regulators in many biological processes, such as post-transcriptional regulation, splicing and aging. Due to high tissue-specific expression pattern of the lincRNAs, a large proportion is still undiscovered. The development of Whole Transcriptome Shotgun Sequencing, also known as RNA-seq, combined with de novo or ab initio assembly, promises quantity discovery of novel lincRNAs hence building the complete transcriptome catalog. However, to efficiently and accurately identify the novel lincRNAs from the large transcriptome data stillremains a bioinformatics challenge.To fill this gap, I have developed two bioinformatics tools: I) iSeeRNAfor distinguishing lincRNAs from mRNAs and II) sebnif for comprehensive filtering towards high quality lincRNA screening which has been used in various biological systems and showed satisfactory performance. / In summary, I have developed several bioinformatics algorithms which help the researchers to take advantage of the strength of the NGS technologies(methylome and transcriptome studies) and explore the biological nature behind the large amount of data. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Sun, Kun. / Thesis (Ph.D.) Chinese University of Hong Kong, 2014. / Includes bibliographical references (leaves 118-126). / Abstracts also in Chinese. DNA--Methylation Nucleotide sequence Sequence alignment (Bioinformatics) Gene Expression Profiling Computational Biology
208	Sex, parasitic DNA and adaptation in experimental populations of Saccharomyces cerevisiae and Chlamydomonas reinhardtii Zeyl, Clifford. January 1996 (has links) No description available. Nucleotide sequence. Chlamydomonas reinhardtii. Saccharomyces cerevisiae. Molecular parasitology. Translocation (Genetics) Transposons.
209	The development of an efficient method of mitochondrial DNA analysis Tan, Angela Y. C. January 2003 (has links) Abstract not available Identification Forensic genetics -- Technique Mitochondrial DNA Polymerase chain reaction Gene amplification Nucleotide sequence -- Analysis
210	Expression and function of cucumoviral genomes Shi, Bu-Jun. January 1997 (has links) (PDF) Bibliography: leaves 104-130. The aim of this thesis is to characterise subgenomic RNAs of cucumoviruses and the functions of their encoding genes. Strains of cucumber mosaic virus (CMV) are classified into two major subgroups (I and II) on the basis of nucleotide sequence homology. The V strain of tomato aspermy virus (V-TAV) and a subgroup I CMV strain (WAII) are chosen to determine whether the 2b genes encoded by these viruses are expressed 'in vivo'. For further investigation of the 2b gene function, cDNA clones of three genomic RNAs of V-TAV are constructed. Using the infectious cDNA clones of V-TAV, a mutant virus containing only one of the two repeats is constructed. RNA viruses Genetics Genomes Genetic vectors Nucleotide sequence Plasmids Genetics Cucumoviruses Viral genetics

Search results