• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 13
  • 8
  • 5
  • 5
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A systemic perspective of the bioinformatics work domain

Ibrahim, Roliana January 2008 (has links)
The thesis entails an investigation of bioinfonnatics work perfonned by bench scientists through a variety of infonnation-based experimental activities and constant interaction with domain-rich infonnation space, also known as the Bioinfonnatics Infonnation Space (BIS). Ten research students from the Faculty of Pure Science at the University of Sheffield were initially interviewed for this purpose. In-depth interviews were then conducted with four students from Molecular Microbiology Research Group from the same faculty. Those interviews resulted in the production of two bioinfonnatics work models, the first of which is the Abstraction Hierarchy (AB). This model represents the bioinfonnatics work domain within a situation-independent state. It is could be regarded as a functional inventory map that provides infonnation on the basic functional features of the work domain. Further enquiry into the work domain was conducted in a situation-dependent state through the application of Beer's Viable System Model (VSM). The second model, the Process Recursion Model (PRM), conceptualises bioinfonnatics work situations by means of multiple and recursive structure. Viability diagnosis at each multi-level was perfonned by considering the design and planning of a variety of experimental procedures for the Functional Analysis of Gene Sequence Process (FAoGS), as well as the implementation of those procedures within the context of infonnation behaviour activities. The thesis suggests that the allocation of resources to support bioinfonnatics work was made based on the needs of individual work situations. The bench scientists were incapable of predicting future problematic work situations. Diagnosis to the bioinfonnatics work for F AoGS also exposed the lack of functioning cohesive and adaptive mechanisms. However, the PRM serves as a value-added tool and provides a novel way in representing the complexity of bioinfonnatics work as a multiple recursion model. This provides high-level representation of infonnation flow from one work situation to another. The model could be decomposed further to assist system analysts in integrating infonnation-based activities, thus revealing further solutions to providing effective infonnation delivery during bioinfonnatics work. This would ensure that bench scientists employ the right infonnation for the right tasks at the right time in order to achieve the ultimate purpose in experimentation, which is to determine the putative function of target genes
2

Digital signal processing techniques for gene prediction

Papaspiridis, Alexandros January 2012 (has links)
The purpose of this research is to apply existing Digital Signal Processing techniques to DNA sequences, with the objective of developing improved methods for gene prediction. Sections of DNA sequences are analyzed in the frequency domain and frequency components that distinguish intron regions are identified (21t/lOA). Novel detectors are created using digital filters and auto correlation, capable of identifying the location of intron regions in a sequence. The resulting signal from these detectors is used as a dynamic threshold in existing gene detectors, resulting in an improved accuracy of 12% and 25% respectively. Finally, DNA sequences are analyzed in terms of their amino acid composition, and new gene prediction algorithms are introduced.
3

Analysing directed network data

Sarajlic, Anida January 2015 (has links)
The topology of undirected biological networks, such as protein-protein interaction networks, or genetic interaction networks, has been extensively explored in search of new biological knowledge. Graphlets, small connected non-isomorphic induced sub-graphs of an undirected network, have been particularly useful in computational network biology. Having in mind that a significant portion of biological networks, such as metabolic networks or transcriptional regulatory networks, are directed by nature, we define all up to four node directed graphlets and orbits and implement the directed graphlet and graphlet orbits counting algorithm. We generalise all existing graphlet based measures to the directed case, defining: relative directed graphlet frequency distance, directed graphlet degree distribution similarity, directed graphlet degree vector similarity, and directed graphlet correlation distance. We apply new topological measures to metabolic networks and show that the topology of directed biological networks is correlated with biological function. Finally, we look for topology-function relationships in metabolic networks that are conserved across different species.
4

An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome

Hindle, Matthew Morritt January 2012 (has links)
Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means that identifying the function of a given gene used on a microarray, is often a non-trivial task. Information pertinent to gene annotations is spread across technically and semantically heterogeneous biological databases. Combining and exploiting these data in a consistent way has the potential to improve our ability to assign functions to new or uncharacterised genes. Methods: The Ondex data integration framework was further developed to integrate databases pertinent to plant gene annotation, and provide data inference tools. The CoPSA annotation pipeline was created to provide automated annotation of novel plant genes using this knowledgebase. CoPSA was used to derive annotations for Affymetrix GeneChips available for plant species. A conjoint approach was used to align GeneChip sequences to orthologous proteins, and identify protein domain regions. These proteins and domains were used together with multiple evidences to predict functional annotations for sequences on the GeneChip. Quality was assessed with reference to other annotation pipelines. These improved gene annotations were used in the analysis of a time-series transcriptomics study of the differential responses of durum wheat varieties to water stress. Results and Conclusions: The integration of plant databases using the Ondex showed that it was possible to increase the overall quantity and quality of information available, and thereby improve the resulting annotation. Direct data aggregation benefits were observed, as well as new information derived from inference across databases. The CoPSA pipeline was shown to improve coverage of the wheat microarray compared to the NetAffx and BLAST2GO pipelines. Leverage of these annotations during the analysis of data from a transcriptomics study of the durum wheat water stress responses, yielded new biological insights into water stress and highlighted potential candidate genes that could be used by breeders to improve drought response.
5

Integration strategies and data analysis methods for plant systems biology

Lysenko, Artem January 2012 (has links)
Understanding how function relates to multiple layers of inactions between biological entities is one of the key goals of bioinformatics research, in particular in such areas as systems biology. However, the realisation of this objective is hampered by the sheer volume and multi-level heterogeneity of potentially relevant information. This work addressed this issue by developing a set of integration pipelines and analysis methods as part of an Ondex data integration framework. The integration process incorporated both relevant data from a set of publically available databases and information derived from predicted approaches, which were also implemented as part of this work. These methods were used to assemble integrated datasets that were of relevance to the study of the model plant species Arabidopsis thaliana and applicable for the network-driven analysis. A particular attention was paid to the evaluation and comparison of the different sources of these data. Approaches were implemented for the identification and characterisation of functional modules in integrated networks and used to study and compare networks constructed from different types of data. The benefits of data integration were also demonstrated in three different bioinformatics research scenarios. The analysis of the constructed datasets has also resulted in a better understanding of the functional role of genes identified in a study of a nitrogen uptake mutant and allowed to select candidate genes for further exploration.
6

A novel method for integrative biological studies

Al Watban, Abdullatif Sulaiman January 2016 (has links)
DNA microarray technology has been extensively utilized in the biomedical field, becoming a standard in identifying gene expression signatures for disease diagnosis/prognosis and pharmaceutical practices. Although cancer research has benefited from this technology, challenges such as large-scale data size, few replicates and complex heterogeneous data types remain; thus the biomarkers identified by various studies have a small proportion of overlap because of molecular heterogeneity. However, it is desirable in cancer research to consider robust and consistent biomarkers for drug development as well as diagnosis/prognosis. Although cancer is a highly heterogeneous disease, some mechanism common to developing cancers is believed to exist; integrating datasets from multiple experiments increases the accuracy of predictions because increasing the sample size improves and enhances biomarkers detection. Therefore, integrative study is required for compiling multiple cancer data sets when searching for the common mechanism leading to cancers. Some critical challenges of integration analysis remain despite many successful methods introduced. Few is able to work on data sets with different dimensionalities. More seriously, when the replicate number is small, most existing algorithms cannot deliver robust predictions through an integrative study. In fact, as modern high-throughput technology matures to provide increasingly precise data, and with well-designed experiments, variance across replicates is believed to be small for us to consider a mean pattern model. This model assumes that all the genes (or metabolites, proteins or DNA copies) are random samples of a hidden (mean pattern) model. The study implements this model using a hierarchical modelling structure. As the primary component of the system, a multi-scale Gaussian (MSG) model, designed to identify robust differentially-expressed genes to be integrated, was developed for predicting differentially expressed genes from microarray expression data of small replicate numbers. To assure the validity of the mean pattern hypothesis, a bimodality detection method that was a revision of the Bimodality index was proposed.
7

Defining complex rule-based models in space and over time

Wilson-Kanamori, John Roger January 2015 (has links)
Computational biology seeks to understand complex spatio-temporal phenomena across multiple levels of structural and functional organisation. However, questions raised in this context are difficult to answer without modelling methodologies that are intuitive and approachable for non-expert users. Stochastic rule-based modelling languages such as Kappa have been the focus of recent attention in developing complex biological models that are nevertheless concise, comprehensible, and easily extensible. We look at further developing Kappa, in terms of how we might define complex models in both the spatial and the temporal axes. In defining complex models in space, we address the assumption that the reaction mixture of a Kappa model is homogeneous and well-mixed. We propose evolutions of the current iteration of Spatial Kappa to streamline the process of defining spatial structures for different modelling purposes. We also verify the existing implementation against established results in diffusion and narrow escape, thus laying the foundations for querying a wider range of spatial systems with greater confidence in the accuracy of the results. In defining complex models over time, we draw attention to how non-modelling specialists might define, verify, and analyse rules throughout a rigorous model development process. We propose structured visual methodologies for developing and maintaining knowledge base data structures, incorporating the information needed to construct a Kappa rule-based model. We further extend these methodologies to deal with biological systems defined by the activity of synthetic genetic parts, with the hope of providing tractable operations that allow multiple users to contribute to their development over time according to their area of expertise. Throughout the thesis we pursue the aim of bridging the divide between information sources such as literature and bioinformatics databases and the abstracting decisions inherent in a model. We consider methodologies for automating the construction of spatial models, providing traceable links from source to model element, and updating a model via an iterative and collaborative development process. By providing frameworks for modellers from multiple domains of expertise to work with the language, we reduce the entry barrier and open the field to further questions and new research.
8

Decisive noise : noisy intercellular signalling analysed and enforced through synthetic biology

Jackson, Victoria Jane January 2013 (has links)
Individual cells in a genetically identical population, exposed to the same environment, can show great variation in their protein expression levels. This is due to noise, which is inherent in many biological processes, due in part to the low molecule numbers and probabilistic interactions which lead to stochasticity. Much of the work in the field of noise and its propagation in gene expression networks, whether it is experimental, modelling or theoretical, has been conducted on networks/systems that occur within a single cell. However, cells do not exist solely in isolation and understanding how cells are able to coordinate their behaviour despite this noise is an interesting area of expansion for the field. In this study, a synthetic intercellular communication system was designed that allows the investigation of how noise is propagated in intercellular communication. The communication system consists of separate sender and receiver cells incorporating components of the Lux quorum sensing system of Vibrio fischeri. The sender cell was designed so that the production of the signalling molecule, 3-oxohexanoyl homoserine lactone, is able to be controlled by addition of isopropyl-β-D-thio-galactoside (IPTG) and monitored via a reporter gene. The receiver cell was designed with a dual reporter system to enable the response of the cell to the signalling molecule to be monitored and the intrinsic and extrinsic noise contributions to the total noise to be calculated. Sender and the receiver cells were engineered in Escherichia coli. The functionality of the receiver cells was tested in the presence of known concentrations of the signalling molecule. The population response and the noise characteristics of the receiver cells in the homogeneous environment were determined from single cell measurements. The functionality of the sender cells was tested in the presence of a range of IPTG concentrations and the induction of expression from the LacI-repressible promoter was monitored. Mathematical models of the system were developed. Stochastic simulations of the models were used to investigate any unexplained behaviour seen in the characterisation of the cells. The full functionality of the intercellular communication system was then tested by growing the receiver in the collected media of the induced sender cells. The response of the receiver cells to the signalling molecule in the media was again characterised using single cell measurements of the reporter expression levels. The analysis of mixed populations of the sender and receiver cells was hampered by the technical limitations of the instruments used for the single cell measurements. Difficulties were encountered in simultaneous and specific measurement of the three reporter genes. Two methods for overcoming this issue were proposed using microscopy, and one of these methods was shown to have potential in overcoming the issue.
9

Metabolic pathway analysis via integer linear programming

Planes, Francisco J. January 2008 (has links)
The understanding of cellular metabolism has been an intriguing challenge in classical cellular biology for decades. Essentially, cellular metabolism can be viewed as a complex system of enzyme-catalysed biochemical reactions that produces the energy and material necessary for the maintenance of life. In modern biochemistry, it is well-known that these reactions group into metabolic pathways so as to accomplish a particular function in the cell. The identification of these metabolic pathways is a key step to fully understanding the metabolic capabilities of a given organism. Typically, metabolic pathways have been elucidated via experimentation on different organisms. However, experimental findings are generally limited and fail to provide a complete description of all pathways. For this reason it is important to have mathematical models that allow us to identify and analyze metabolic pathways in a computational fashion. This is precisely the main theme of this thesis. We firstly describe, review and discuss existent mathematical/computational approaches to metabolic pathways, namely stoichiometric and path finding approaches. Then, we present our initial mathematical model named the Beasley-Planes (BP) model, which significantly improves on previous stoichiometric approaches. We also illustrate a successful application of the BP model to optimally disrupt metabolic pathways. The main drawback of the BP model is that it needs as input extra pathway knowledge. This is especially inappropriate if we wish to detect unknown metabolic pathways. As opposed to the BP model and stoichoimetric approaches, this issue is not found in path finding approaches. For this reason a novel path finding approach is built and examined in detail. This analysis serves us as inspiration to build the Improved Beasley-Planes (IBP) model. The IBP model incorporates elements of both stoichometric and path finding approaches. Though somewhat less accurate than the BP model, the IBP model solves the issue of extra pathway knowledge. Our research clearly demonstrates that there is a significant chance of developing a mathematical optimisation model that underlies many/all metabolic pathways.
10

High performance reconfigurable architectures for biological sequence alignment

Isa, Mohammad Nazrin January 2013 (has links)
Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort.

Page generated in 0.1149 seconds