• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 374
  • 47
  • 33
  • 20
  • 17
  • 10
  • 8
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 702
  • 702
  • 367
  • 189
  • 173
  • 106
  • 96
  • 94
  • 90
  • 82
  • 80
  • 78
  • 78
  • 76
  • 73
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
271

Imputing Genotypes Using Regularized Generalized Linear Regression Models

Griesman, Joshua 14 June 2012 (has links)
As genomic sequencing technologies continue to advance, researchers are furthering their understanding of the relationships between genetic variants and expressed traits (Hirschhorn and Daly, 2005). However, missing data can significantly limit the power of a genetic study. Here, the use of a regularized generalized linear model, denoted GLMNET is proposed to impute missing genotypes. The method aimed to address certain limitations of earlier regression approaches in regards to genotype imputation, particularly multicollinearity among predictors. The performance of GLMNET-based method is compared to the performance of the phase-based method fastPHASE. Two simulation settings were evaluated: a sparse-missing model, and a small-panel expan- sion model. The sparse-missing model simulated a scenario where SNPs were missing in a random fashion across the genome. In the small-panel expansion model, a set of test individuals that were only genotyped at a small subset of the SNPs of the large panel. Each imputation method was tested in the context of two data-sets: Canadian Holstein cattle data and human HapMap CEU data. Although the proposed method was able to perform with high accuracy (>90% in all simulations), fastPHASE per- formed with higher accuracy (>94%). However, the new method, which was coded in R, was able to impute genotypes with better time efficiency than fastPHASE and this could be further improved by optimizing in a compiled language.
272

Unraveling the Role of Cellular Factors in Viral Capsid Formation

Smith, Gregory Robert 01 March 2015 (has links)
Understanding the mechanisms of virus capsid assembly has been an important research objective over the past few decades. Determining critical points along the pathways by which virus capsids form could prove extremely beneficial in producing more stable DNA vectors or pinpointing targets for antiviral therapy. The inability of current experimental technology to address this objective has resulted in a need for alternative approaches. Theoretical and computational studies offer an unprecedented opportunity for detailed examination of capsid assembly. The Schwartz Lab has previously developed a discrete event stochastic simulator to model virus assembly based upon local rules detailing the geometry and interaction kinetics of individual capsid subunits. Applying numerical optimization methods to learn kinetic rate parameters that fit simulation output to in vitro static light scattering data has been a successful avenue to understand the details of virus assembly systems; however, information describing in vitro assembly processes does not necessarily translate to real virus assembly pathways in vivo. There are a number of important distinctions between experimental and realistic assembly environments that must be addressed to produce an accurate model. This thesis will describe work expanding upon previous parameter estimation algorithms for more complex data over three model icosahedral virus systems: human papillomavirus (HPV), hepatitis B virus (HBV) and cowpea chlorotic mottle virus (CCMV). Then it will consider two important modifications to assembly environment to more accurately reflect in vivo conditions: macromolecular crowding and the presence of nucleic acid about which viruses may assemble. The results of this work led to a number of surprising revelations about the variability in potential assembly rates and mechanisms discovered and insight into how assembly mechanisms are affected by changes in concentration, fluctuations in kinetic rates and adjustments to the assembly environment.
273

A NOVEL COMPUTATIONAL FRAMEWORK FOR TRANSCRIPTOME ANALYSIS WITH RNA-SEQ DATA

Hu, Yin 01 January 2013 (has links)
The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected alternative splicing variants, which circumvents the need of full transcript reconstruction and quantification. Beyond the scope of classical group-wise analysis, a clustering scheme is further described for mining prominent consistency among samples in transcription, breaking the restriction of presumed grouping. The performance of the framework has been demonstrated by a series of simulation studies and real datasets, including the Cancer Genome Atlas (TCGA) breast cancer analysis. The successful applications have suggested the unprecedented opportunity in using differential transcription analysis to reveal variations in the mRNA transcriptome in response to cellular differentiation or effects of diseases.
274

NOVEL COMPUTATIONAL METHODS FOR TRANSCRIPT RECONSTRUCTION AND QUANTIFICATION USING RNA-SEQ DATA

Huang, Yan 01 January 2015 (has links)
The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We have proposed two methods directly addressing this challenge. First, we developed a novel method MultiSplice to accurately estimate the abundance of the well-annotated transcripts. Driven by the desire of detecting novel isoforms, a max-flow-min-cost algorithm named Astroid is designed for simultaneously discovering the presence and quantities of all possible transcripts in the transcriptome. We further extend an \emph{ab initio} pipeline of transcriptome analysis to large-scale dataset which may contain hundreds of samples. The effectiveness of proposed methods has been supported by a series of simulation studies, and their application on real datasets suggesting a promising opportunity in reconstructing mRNA transcriptome which is critical for revealing variations among cells (e.g. disease vs. normal).
275

SMALL RNA EXPRESSION DURING PROGRAMMED REARRAGEMENT OF A VERTEBRATE GENOME

Herdy, Joseph R, III 01 January 2014 (has links)
The sea lamprey (Petromyzon marinus) undergoes programmed genome rearrangements (PGRs) during embryogenesis that results in the deletion of ~0.5 Gb of germline DNA from the somatic lineage. The underlying mechanism of these rearrangements remains largely unknown. miRNAs (microRNAs) and piRNAs (PIWI interacting RNAs) are two classes of small noncoding RNAs that play important roles in early vertebrate development, including differentiation of cell lineages, modulation of signaling pathways, and clearing of maternal transcripts. Here, I utilized next generation sequencing to determine the temporal expression of miRNAs, piRNAs, and other small noncoding RNAs during the first five days of lamprey embryogenesis, a time series that spans the 24-32 cell stage to the formation of the neural crest. I obtained expression patterns for thousands of miRNA and piRNA species. These studies identified several thousand small RNAs that are expressed immediately before, during, and immediately after PGR. Significant sequence variation was observed at the 3’ end of miRNAs, representing template-independent covalent modifications. Patterns observed in lamprey are consistent with expectations that the addition of adenosine and uracil residues plays a role in regulation of miRNA stability during the maternal-zygotic transition. We also identified a conserved motif present in sequences without any known annotation that is expressed exclusively during PGR. This motif is similar to binding motifs of known DNA binding and nuclear export factors, and our data could represent a novel class of small noncoding RNAs operating in lamprey.
276

High performance reconfigurable architectures for biological sequence alignment

Isa, Mohammad Nazrin January 2013 (has links)
Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort.
277

Algorithmic Contributions to Computational Molecular Biology

Vialette, Stéphane 01 June 2010 (has links) (PDF)
Nous présentons ici nos résultats de recherche depuis 2001 sur l'algorithmique des graphes linéaires, la recherche de motif (avec ou sans topologie) dans les graphes, et la génomique comparative.
278

Designing Synthetic Gene Circuits for Homeostatic Regulation and Sensory Adaptation

Ang, Jordan 02 August 2013 (has links)
Living cells are exquisite systems. They are strongly regulated to perform in highly specific ways, but are at the same time wonderfully robust. This combination arises from the sophistication of their construction and operation: their internal variables are carefully controlled by complex networks of dynamic biochemical interactions, crafted and refined by billions of years of evolution. Using mod- ern DNA engineering technology, scientists have begun to circumvent the long process of evolution by employing a rational design-based approach to construct novel gene networks inside living cells. Currently, these synthetic networks are relatively simple when compared to their natural counter- parts, but future prospects are promising, and synthetic biologists would one day like to be able to control cells using genetic circuits much in the way that electronic devices are controlled using electrical circuits. The importance of precise dynamical behaviour in living organisms suggests that this endeavour would benefit greatly from the insights of control theory. However, the nature of bio- chemical networks can make the implementation of even basic control structures challenging. This thesis focusses specifically on the concept of integral control in this context. Integral control is a fun- damental strategy in control theory that is central to regulation, sensory adaptation, and long-term robustness. Consequently, its implementation in a synthetic gene network is an attractive prospect. Here, the general challenges and important design considerations associated with engineering an in-cell synthetic integral controller are laid out. Specific implementations using transcriptional regu- lation are studied analytically and then in silico using models constructed with commonly available parts from the bacterium Escherichia coli. Finally, using a controller based on post-translational signalling, an on-paper design is proposed for an integral-controlled biosynthesis network intended to allow a population of engineered Saccharomyces cerevisiae cells to actively regulate the extracellular concentration of a small molecule.
279

Designing Synthetic Gene Circuits for Homeostatic Regulation and Sensory Adaptation

Ang, Jordan 02 August 2013 (has links)
Living cells are exquisite systems. They are strongly regulated to perform in highly specific ways, but are at the same time wonderfully robust. This combination arises from the sophistication of their construction and operation: their internal variables are carefully controlled by complex networks of dynamic biochemical interactions, crafted and refined by billions of years of evolution. Using mod- ern DNA engineering technology, scientists have begun to circumvent the long process of evolution by employing a rational design-based approach to construct novel gene networks inside living cells. Currently, these synthetic networks are relatively simple when compared to their natural counter- parts, but future prospects are promising, and synthetic biologists would one day like to be able to control cells using genetic circuits much in the way that electronic devices are controlled using electrical circuits. The importance of precise dynamical behaviour in living organisms suggests that this endeavour would benefit greatly from the insights of control theory. However, the nature of bio- chemical networks can make the implementation of even basic control structures challenging. This thesis focusses specifically on the concept of integral control in this context. Integral control is a fun- damental strategy in control theory that is central to regulation, sensory adaptation, and long-term robustness. Consequently, its implementation in a synthetic gene network is an attractive prospect. Here, the general challenges and important design considerations associated with engineering an in-cell synthetic integral controller are laid out. Specific implementations using transcriptional regu- lation are studied analytically and then in silico using models constructed with commonly available parts from the bacterium Escherichia coli. Finally, using a controller based on post-translational signalling, an on-paper design is proposed for an integral-controlled biosynthesis network intended to allow a population of engineered Saccharomyces cerevisiae cells to actively regulate the extracellular concentration of a small molecule.
280

Structural studies of ribonucleoprotein complexes using molecular modeling

Devkota, Batsal 06 December 2007 (has links)
The current work reports on structural studies of ribonucleoprotein complexes, Escherichia coli and Thermomyces lanuginosus ribosomes, and Pariacoto virus (PaV) using molecular modeling. Molecular modeling is the integration and representation of the structural data of molecules as models. Integrating high-resolution crystal structures available for the E. coli ribosome and the cryo-EM density maps for the PRE- and POST- accommodation states of the translational cycle, I generated two all-atom models for the ribosome in two functional states of the cycle. A program for flexible fitting of the crystal structures into low-resolution maps, YUP.scx, was used to generate the models. Based on these models, we hypothesize that the kinking of the tRNA plays a major role in cognate tRNA selection during accommodation. Secondly, we proposed all-atom models for the eukaryotic ribosomal RNA. This is part of a collaboration between Joachim Frank, Andrej Sali, and our lab to generate an all-atom model for the eukaryotic ribosome based on a cryo-EM density map of T. lanuginosus available at 8.9Å resolution. Homology modeling and ab initio RNA modeling were used to generate the rRNA components. Finally, we propose a first-order model for a T=3, icosahedral, RNA virus called Pariacoto virus. We used the structure available from x-ray crystallography as the starting model and modeled all the unresolved RNA and protein residues. Only 35% of the total RNA genome and 88% of the protein were resolved in the crystal structure. The generated models for the virus helped us determine the location of the missing N-terminal protein tails. The models were used to propose a new viral assembly pathway for small RNA viruses. We propose that the basic N-terminal tails make contact with the RNA genome and neutralize the negative charges in RNA and subsequently collapse the RNA/protein complex into a mature virus. This process is reminiscent of DNA condensation by positively charged ions.

Page generated in 0.1019 seconds