This study explores the handling and analyzing of big data in the field of bioinformatics. The focus has been on improving the analysis of public domain data for Affymetrix GeneChips which are a widely used technology for measuring gene expression. Methods to determine the bias in gene expression due to G-stacks associated with runs of guanine in probes have been explored via the use of a grid and various types of cloud computing. An attempt has been made to find the best way of storing and analyzing big data used in bioinformatics. A grid and various types of cloud computing have been employed. The experience gained in using a grid and different clouds has been reported. In the case of Windows Azure, a public cloud has been employed in a new way to demonstrate the use of the R statistical language for research in bioinformatics. This work has studied the G-stack bias in a broad range of GeneChip data from public repositories. A wide scale survey has been carried out to determine the extent of the Gstack bias in four different chips across three different species. The study commenced with the human GeneChip HG U133A. A second human GeneChip HG U133 Plus2 was then examined, followed by a plant chip, Arabidopsis thaliana, and then a bacterium chip, Pseudomonas aeruginosa. Comparisons have also been made between the use of widely recognised algorithms RMA and PLIER for the normalization stage of extracting gene expression from GeneChip data.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:681817 |
Date | January 2016 |
Creators | Owen, Anne M. |
Publisher | University of Essex |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://repository.essex.ac.uk/16125/ |
Page generated in 0.0125 seconds