Global ETD Search

1	Chromosome 1 map, sequence and variation Gregory, Simon Gray January 2003 (has links) No description available. 611 Genomic sequence
2	A Hash Trie Filter Approach to Approximate String Match for Genomic Databases Hsu, Min-tze 28 June 2005 (has links) Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the long length of each genomic sequence and the increase of the size of genomic sequence databases, the importance of efficient searching methods for fast queries grows. The DNA sequences are composed of four kinds of nucleotides, and these genomic sequences can be regarded as the text strings. However, there is no concept of words in a genomic sequence, which makes the search of the genomic sequence in the genomic database much difficult. Approximate String Matching (ASM) with k errors is considered for genomic sequences, where k errors would be caused by insertion, deletion, and replacement operations. Filtration of the DNA sequence is a widely adopted technique to reduce the number of the text areas (i.e., candidates) for further verification. In most of the filter methods, they first split the database sequence into q-grams. A sequence of grams (subpatterns) which match some part of the text will be passed as a candidate. The match problem of grams with the part of the text could be speed up by using the index structure for the exact match. Candidates will then be examined by dynamic programming to get the final result. However, in the previous methods for ASM, most of them considered the local order within each gram. Only the (k + s) h-samples filter considers the global order of the sequence of matched grams. Although the (k + s) h-samples filter keeps the global order of the sequence of the grams, it still has some disadvantages. First, to be a candidate in the (k + s) h-samples filter, the number of the ordered matched grams, s, is always fixed to 2 which results in low precision. Second, the (k + s) h-samples filter uses the query time to build the index for query patterns. In this thesis, we propose a new approximate string matching method, the hash trie filter, for efficiently searching in genomic databases. We build a hash trie in the pre-computing time for the genomic sequence stored in database. Although the size q of each split grams is also decided by the same formula used in the (k + s) h-samples filter, we have proposed a different way to find the ordered subpatterns in text T. Moreover, we reduce the number of candidates by pruning some unreasonable matched positions. Furthermore, unlike the (k + s) h-samples filter which always uses s = 2 to decide whether s matched subpatterns could be a candidate or not, our method will dynamically decide s, resulting in the increase of precision. The simulation results show that our hash trie filter outperforms the (k +s) h-samples filter in terms of the response time, the number of verified candidates, and the precision under different length of the query patterns and different error levels. global order genomic sequence databases local order approximate string match filter methods
3	Sequence assembly and annotation of the bovine major histocompatibility complex (BoLA) class IIb region, and in silico detection of sequence polymorphisms in BoLA IIb Childers, Christopher P. 25 April 2007 (has links) Cattle are vitally important to American agriculture industry, generating over 24.6 billion pounds of beef (by carcass weight), and 79.5 billion dollars in 2005, and over 27 billion dollars in milk sales in 2004. As of July 2006, the U.S. beef and dairy industry is comprised of 104.5 million head of cattle, 32.4 million of which were processed in 2005. The health of the animals has always been an important concern for breeders, as healthy animals grow faster and are more likely to reach market weight. Animals that exhibit natural resistance to disease do not require chemicals to stimulate normal weight gain, and are less prone to disease related wasting. The major histocompatibility complex (MHC) is a collection of genes, many of which function in antigen processing and presentation. The bovine MHC (BoLA) differs from typical mammalian MHCs in that the class II region was disrupted by a chromosomal inversion into two subregions, designated BoLA IIa and BoLA IIb. BoLA IIb was transposed to a position near the centromere on bovine chromosome 23,while BoLA IIa retains its position in BoLA. Comparative sequence analysis of BoLA IIb with the human MHC revealed the location of the region containing the proximal inversion breakpoint. Gene content, order and orientation of BoLA IIb are consistent with the single inversion hypothesis when compared to the corresponding region of the human class II MHC (HLA class II). BoLA IIb spans approximately 450 kb. The genomic sequence of BoLA IIb was used to detect sequence variation through comparison to other bovine sequences, including data from the bovine genome project, and two regions in the BAC scaffold used to develop the BoLA IIb sequence. Analysis of the bovine genome project sequence revealed a total of 10,408 mismatching bases, 30 out of 231 polymorphic microsatellites, and 15 sequences corresponding to the validated SNP panel generated by the bovine genome sequencing project. The two overlapping regions in the BoLA IIb BAC scaffold were found to have 888 polymorphisms, including a total of 6 out of 42 polymorphic microsatellites indicating that each BAC derived from a different chromosome. Genetics MHC BoLA IIb BoLA Genomic Sequence Major Histocompatibility Complex Cow
4	High performance reconfigurable architectures for biological sequence alignment Isa, Mohammad Nazrin January 2013 (has links) Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort. 572.80285
5	UTILIZING TRANSFER LEARNING AND MULTI-TASK LEARNING FOR EVALUATING THE PREDICTION OF CHROMATIN ACCESSIBILITY IN CANCER AND NEURON CELL LINES USING GENOMIC SEQUENCES Toluwanimi O Shorinwa (16626360) 02 October 2023 (has links) <p>The prediction of chromatin accessibility for cancer and neuron cell lines using genomic sequences is quite challenging. Advances in machine learning and deep learning techniques allow such challenges to be addressed. This thesis investigates the use of both the transfer learning and the multi-task learning techniques. In particular, this research demonstrates the potential of transfer learning and multi-task learning in improving the prediction accu?racy for twenty-three cancer types in human and neuron cell lines. Three different network architectures are used: the Basset network, the network, and the DeepSEA network. In addition, two transfer learning techniques are also used. In the first technique data relevant to the desired prediction task is not used during the pre-training stage while the second technique includes limited data about the desired prediction task in the pre-training phase. The preferred performance evaluation metric used to evaluate the performance of the models was the AUPRC due to the numerous negative samples. Our results demonstrate an average improvement of 4% of the DeepSEA network in predicting all twenty-three cancer cell line types when using the first technique, a decrease of 0.42% when using the second technique, and an increase of 0.40% when using multi-task learning. Also, it had an average improvement of 3.09% when using the first technique, 1.16% when using the second technique and 4.60% for the multi-task learning when predicting chromatin accessibility for the 14 neuron cell line types. The DanQ network had an average improvement of 1.18% using the first transfer learning technique, the second transfer learning technique showed an average decrease of 1.93% and also, a decrease of 0.90% for the multi-task learning technique when predicting for the different cancer cell line types. When predicting for the different neuron cell line types the DanQ had an average improvement of 1.56% using the first technique, 3.21% when using the second technique, and 5.35% for the multi-task learning techniques. The Basset network showed an average improvement of 2.93% using the first transfer learning technique and an average decrease of 0.02%, and 0.63% when using the second technique and multi-task learning technique respectively. Using the Basset network for prediction of chromatin accessibility in the different neuron types showed an average increase of 2.47%, 9 3.80% and 5.50% for the first transfer learning technique, second transfer learning technique and the multi-task learning technique respectively. The results show that the best technique for the cancer cell lines prediction is the first transfer learning model as it showed an improvement for all three network types, while the best technique for predicting chromatin accessibility in the neuron cell lines is the multi-task learning technique which showed the highest average improvement among all networks. The DeepSEA network showed the greatest improvement in performance among all techniques when predicting the different cancer cell line types. Also, it showed the greatest improvement when using the first transfer learning technique for predicting chromatin accessibility for neuron cell lines in the brain. The basset network showed the greatest improvement for the multi-task learning technique and the second transfer learning technique when predicting the accessibility for neuron cell lines. </p> Genomics and transcriptomics Cancer diagnosis Deep learning Chromatin Accessibility Cancer Diagnosis Genomic Sequence Transfer Learning
6	On a class of distributed algorithms over networks and graphs Lee, Sang Hyun, 1977- 01 June 2011 (has links) Distributed iterative algorithms are of great importance, as they are known to provide low-complexity and approximate solutions to what are otherwise high-dimensional intractable optimization problems. The theory of message-passing based algorithms is fairly well developed in the coding, machine learning and statistical physics literatures. Even though several applications of message-passing algorithms have already been identified, this work aims at establishing that a plethora of other applications exist where it can be of great importance. In particular, the goal of this work is to develop and demonstrate applications of this class of algorithms in network communications and computational biology. In the domain of communications, message-passing based algorithms provide distributed ways of inferring the optimal solution without the aid of a central agent for various optimization problems that happen in the resource allocation of communication networks. Our main framework is Affinity Propagation (AP), originally developed for clustering problems. We reinterpret this framework to unify the development of distributed algorithms for discrete resource allocation problems. Also, we consider a network-coded communication network, where continuous rate allocation is studied. We formulate an optimization problem with a linear cost function, and then utilize a Belief Propagation (BP) approach to determine a decentralized rate allocation strategy. Next, we move to the domain of computational biology, where graphical representations and computational biology play a major role. First, we consider the motif finding problem with several DNA sequences. In effect, this is a sequence matching problem, which can be modeled using various graphical representations and also solved using low-complexity algorithms based on message-passing techniques. In addition, we address the application of message-passing algorithms for a DNA sequencing problem where the one dimensional structure of a single DNA sequence is identified. We reinterpret the problem as being equivalent to the decoding of a nonlinear code. Based on the iterative decoding framework, we develop an appropriate graphical model which enables us to derive a message-passing algorithm to improve the performance of the DNA sequencing problem. Although this work consists of disparate application domains of communications, networks and computational biology, graphical models and distributed message-passing algorithms form a common underlying theme. / text Distributed algorithms Graphical models Belief propagation Affinity propagation Resource allocation Genomic sequence analysis Distributed iterative algorithms Message-passing algorithms
7	Chromatin alterations imposed by the oncogenic transcription factor PML-RAR Morey Ramonell, Lluís 01 February 2008 (has links) En mamíferos, así como en plantas, mutaciones en AND helicasas/ATPasas del la família SNF2, no solo afectan a la estructura de la cromatina, sino que también afectan al patrón global de la metilación del ADN. Sugiriendo una relación funcional entre la estructura de la cromatina y la epigenética. El complejo NuRD, el cual posee una ATPasa de la familía SNF2, está relacionado con la represión de la transcripción y en el remodelamiento de la cromatina. Nuestro laboratorio demostró que la proteína leucémica PML-RARα reprime la transcripción de sus genes diana por el reclutamiento de DNMTs y el complejo PRC2. En esta tesis, demostramos una relación directa del complejo NuRD en la represión génica y en los cambios epigenéticos en la leucemia promielocítica aguda (APL). Mostramos que PML-RARα se une y recluta NuRD a sus genes diana, incluyendo el gen supresor de tumores RAR2, facilitando que el complejo de Polycomb se reclute y metile la lisina 27 de la histona H3. Tratamiento con Acido Retinóico (RA), el qual se utiliza en pacientes, reduce la ocupación de NuRD en células leucémicas. Eliminando NuRD no solo provoca que las histonas no se deacetilen y que la cromatina no se compacte, sino que también provoca que tanto la metilación del ADN y de las histonas no se produzca, así como la represión génica del gen RAR2, favoreciendo la diferenciación celular. Nuestros resultados caracterizan un nuevo papel del complejo NuRD en el establecimiento de los patrones epigenéticos en APL, demostrando una relación esencial entre la estructura de la cromatina y epigenética durante el desarrollo de la leucemia, pudiéndose aplicar a la terapia de esta enfermedad. / In mammals, as in plants, mutations in SNF2-like DNA helicases/ATPases were shown to affect not only chromatin structure but also global methylation patterns, suggesting a potential functional link between chromatin structure and epigentic marks. The SNF2-like containing NuRD complex is involved in gene transcriptional repression and chromatin remodeling. We have previously shown that the leukemogenic protein PMLRARα represses target genes through recruitment of DNMTs and Polycomb complex. In this thesis, we demonstrate a direct role of the NuRD complex in aberrant gene repression and transmission of epigenetic repressive marks in acute promyelocytic leucemia (APL). We show that PML-RARα binds and recruits NuRD to target genes, including to the tumor-suppressor gene RAR2. In turn, the NuRD complex facilitates Polycomb binding and histone methylation at lysine 27. Retinoic acid treatment reduced the promoter occupancy of the NuRD complex. Knock-down of the NuRD complex in leukemic cells not only prevented histone deacetylation and chromatin compaction, but also impaired DNA and histone methylation as well as stable silencing, thus favoring cellular differentiation. These results unveil an important role for NuRD in the establishment of altered epigenetic marks in APL, demonstrating an essential link between chromatin structure and epigenetics in leukemogenesis that could be exploited for therapeutic intervention. DNaseI transcription repression chromatin immunoprecipitation bisulfite genomic sequence histone modifications DNA methylation RAR2 Suz12 PRC2 Mi-2 MTA2 MBD3 HDAC1/2 Retinoic Acid NuRD PML-RARα APL 575 616.4
8	Zpracování genomických signálů fraktály / Processing of fractal genomic signals Nedvěd, Jiří January 2012 (has links) This diploma project is showen possibilities in classification of genomic sequences with CGR and FCGR methods in pictures. From this picture is computed classificator with BCM. Next here is written about the programme and its opportunities for classification. In the end is compared many of sequences computed in different options of programme.
9	Genom- und Transkriptionsanalyse von <i>Bacillus licheniformis</i> DSM13 - einem Organismus mit großem industriellem Potential / Genomic and transcriptional analyses of <i>Bacillus licheniformis</i> DSM13 - an organism of high industrial relevance Veith, Birgit 25 January 2005 (has links) No description available. 570 Biowissenschaften, Biologie Mathematics and Computer Science Genomsequenz Glyoxylatzyklus Isocitrat-Lyase Malat-Synthase 2;3-Butanediol Acetat anaerobe Ribonukleotid-Reduktase Type I Restriktionssystem DNA Microarray Transkriptionsanalyse kontinuierliche Kultur genomic sequence glyoxylic acid shunt isocitrate-lyase malate-synthase 2;3-butanediol acetate anaerobic ribonucleotid-reductase type I restriction system DNA microarray transcriptional analysis continuous culture 42.13 42.30 42.49 WUE200 WUK000 WUZ300

Search results