Spelling suggestions: "subject:"microarray data"" "subject:"icroarray data""
1 |
Comparative evaluation of microarray-based gene expression databasesDo, Hong-Hai, Kirsten, Toralf, Rahm, Erhard 11 December 2018 (has links)
Microarrays make it possible to monitor the expression of thousands of genes in parallel thus generating huge amounts of data. So far, several databases have been developed for managing and analyzing this kind of data but the current state of the art in this field is still early stage. In this paper, we comprehensively analyze the requirements for microarray data management. We consider the various kinds of data involved as well as data preparation, integration and analysis needs. The identified requirements are then used to comparatively evaluate eight existing microarray databases described in the literature. In addition to providing an overview of the current state of the art we identify problems that should be addressed in the future to obtain better solutions for managing and analyzing microarray data.
|
2 |
Detecting Locus-Locus Interactions Using Microarray DataGao, YanFei 04 1900 (has links)
In this report we explore how to find the locus-locus interaction using microarray data. Our analysis makes use of a dataset from an experiment with Affymetrix GeneChip MGU74Av2 for mice. In Chapter 1 we give the genetics background, an introduction to microarray methodology and the preprocessing of microarray data, and a review of SAM (Significance Analysis of Microarrays) method for finding differentially expressed genes in microarray data. In Chapter 2 we describe our dataset and our objective of finding the genes with locus-locus interaction but with no main effect. We also show how to find the interaction in this chapter. In Chapter 3 we show the simulation study of detecting the locus-locus interaction without main effects and propose a two-stage method of doing that. In Chapter 4 we apply the two-stage method to the microarray data and focus on the second stage analysis. In Chapter 5 we examine an alternative method using bootstrap resampling in place of permutations. Chapter 6 contains our conclusion and some suggestions for future research. / Thesis / Master of Science (MS)
|
3 |
A GA-Fuzzy-Based Voting Mechanism for Microarray Data ClassificationChen, Ming-cheng 30 September 2008 (has links)
The microarray technology plays an important role of clinical oncology field. The patient can be diagnosed a symptom about cancer through microarray data. Currently, to solve classification of microarray data is still a wild open issue. Existing methods may have a good performance, but need to spend much time to analyze microarray data, such as SVM. In this thesis, we propose a novel GA-Fuzzy-based voting mechanism to find genes which affect the symptom to better diagnose patient. The proposed algorithm can blur the boundary between classes to handle the ambiguous regions. In order to simulate the gene selection mechanism, we proposed upper bound £\-Cut and lower bound £\-Cut in voting mechanism.
Two groups of data collected from the literature are used to test the performance of the proposed algorithm. In the first group of dataset, experimental results show that the accuracies of five datasets using the proposed algorithm are better than those methods proposed by Pochet et al. But, there are the four datasets which the accuracies using the proposed algorithm are a little bit worse than the methods proposed by Pochet et al. For the second group of dataset, the accuracies of seven datasets using the proposed algorithm are better than KerNN proposed by Xiong and Chen. But, there are four datasets which the accuracies using the proposed algorithm are worse than KerNN proposed by Xiong and Chen. Nevertheless, experimental results show that the proposed algorithm performs the best for multi-class data.
|
4 |
A new normalized EM algorithm for clustering gene expression dataNguyen, Phuong Minh, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2008 (has links)
Microarray data clustering represents a basic exploratory tool to find groups of genes exhibiting similar expression patterns or to detect relevant classes of molecular subtypes. Among a wide range of clustering approaches proposed and applied in the gene expression community to analyze microarray data, mixture model-based clustering has received much attention to its sound statistical framework and its flexibility in data modeling. However, clustering algorithms following the model-based framework suffer from two serious drawbacks. The first drawback is that the performance of these algorithms critically depends on the starting values for their iterative clustering procedures. Additionally, they are not capable of working directly with very high dimensional data sets in the sample clustering problem where the dimension of the data is up to hundreds or thousands. The thesis focuses on the two challenges and includes the following contributions: First, the thesis introduces the statistical model of our proposed normalized Expectation Maximization (EM) algorithm followed by its clustering performance analysis on a number of real microarray data sets. The normalized EM is stable even with random initializations for its EM iterative procedure. The stability of the normalized EM is demonstrated through its performance comparison with other related clustering algorithms. Furthermore, the normalized EM is the first mixture model-based clustering approach to be capable of working directly with very high dimensional microarray data sets in the sample clustering problem, where the number of genes is much larger than the number of samples. This advantage of the normalized EM is illustrated through the comparison with the unnormalized EM (The conventional EM algorithm for Gaussian mixture model-based clustering). Besides, for experimental microarray data sets with the availability of class labels of data points, an interesting property of the convergence speed of the normalized EM with respect to the radius of the hypersphere in its corresponding statistical model is uncovered. Second, to support the performance comparison of different clusterings a new internal index is derived using fundamental concepts from information theory. This index allows the comparison of clustering approaches in which the closeness between data points is evaluated by their cosine similarity. The method for deriving this internal index can be utilized to design other new indexes for comparing clustering approaches which employ a common similarity measure.
|
5 |
Sibios as a Framework for Biomarker Discovery Using Microarray DataChoudhury, Bhavna 26 July 2006 (has links)
Submitted to the Faculty of the School of Informatics in parial fulfillment of the requirements for the degree of Master of Schience in Bioinformatics Indiana University August 2006 / Decoding the human genome resulted in generating large amount of data that need to be analyzed and given a biological meaning. The field of Life Schiences is highly information driven. The genomic data are mainly the gene expression data that are obtained from measurement of mRNA levels in an organism. Efficiently processing large amount of gene expression data has been possible with the help of high throughput technology. Research studies working on microarray data has led to the possibility of finding disease biomarkers. Carrying out biomarker discovery experiments has been greatly facilitated with the emergence of various analytical and visualization tools as well as annotation databases. These tools and databases are often termed as 'bioinformatics services'.
The main purpose of this research was to develop SIBIOS (Bystem for Integration of Bioinformatics Services) as a platform to carry out microarray experiments for the purpose of biomarker discovery. Such experiments require the understanding of the current procedures adopted by researchers to extract biologically significant genes.
In the course of this study, sample protocols were built for the purpose of biomarker discovery. A case study on the BCR-ABL subtype of ALL was selected to validate the results. Different approaches for biomarker discovery were explored and both statistical and mining techniques were considered. Biological annotation of the results was also carried out. The final task was to incorporate the new proposed sample protocols into SIBIOS by providing the workflow capabilities and therefore enhancing the system's characteristics to be able to support biomarker discovery workflows.
|
6 |
A Comparison of Unsupervised Methods for DNA Microarray Leukemia DataHarness, Denise 05 April 2018 (has links) (PDF)
Advancements in DNA microarray data sequencing have created the need for sophisticated machine learning algorithms and feature selection methods. Probabilistic graphical models, in particular, have been used to identify whether microarrays or genes cluster together in groups of individuals having a similar diagnosis. These clusters of genes are informative, but can be misleading when every gene is used in the calculation. First feature reduction techniques are explored, however the size and nature of the data prevents traditional techniques from working efficiently. Our method is to use the partial correlations between the features to create a precision matrix and predict which associations between genes are most important to predicting Leukemia diagnosis. This technique reduces the number of genes to a fraction of the original. In this approach, partial correlations are then extended into a spectral clustering approach. In particular, a variety of different Laplacian matrices are generated from the network of connections between features, and each implies a graphical network model of gene interconnectivity. Various edge and vertex weighted Laplacians are considered and compared against each other in a probabilistic graphical modeling approach. The resulting multivariate Gaussian distributed clusters are subsequently analyzed to determine which genes are activated in a patient with Leukemia. Finally, the results of this are compared against other feature engineering approaches to assess its accuracy on the Leukemia data set. The initial results show the partial correlation approach of feature selection predicts the diagnosis of a Leukemia patient with almost the same accuracy as using a machine learning algorithm on the full set of genes. More calculations of the precision matrix are needed to ensure the set of most important genes is correct. Additionally more machine learning algorithms will be implemented using the full and reduced data sets to further validate the current prediction accuracy of the partial correlation method.
|
7 |
Comparative Microarray Data MiningMao, Shihong 27 December 2007 (has links)
No description available.
|
8 |
A Novel Ensemble Machine Learning for Robust Microarray Data Classification.Peng, Yonghong January 2006 (has links)
No / Microarray data analysis and classification has demonstrated convincingly that it provides an effective methodology for the effective diagnosis of diseases and cancers. Although much research has been performed on applying machine learning techniques for microarray data classification during the past years, it has been shown that conventional machine learning techniques have intrinsic drawbacks in achieving accurate and robust classifications. This paper presents a novel ensemble machine learning approach for the development of robust microarray data classification. Different from the conventional ensemble learning techniques, the approach presented begins with generating a pool of candidate base classifiers based on the gene sub-sampling and then the selection of a sub-set of appropriate base classifiers to construct the classification committee based on classifier clustering. Experimental results have demonstrated that the classifiers constructed by the proposed method outperforms not only the classifiers generated by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods (bagging and boosting).
|
9 |
Multiple testing using the posterior probability of half-space: application to gene expression data.Labbe, Aurelie January 2005 (has links)
We consider the problem of testing the equality of two sample means, when the number of tests performed is large. Applying this problem to the context of gene expression data, our goal is to detect a set of genes differentially expressed under two treatments or two biological conditions. A null hypothesis of no difference in the gene expression under the two conditions is constructed. Since such a hypothesis is tested for each gene, it follows that thousands of tests are performed simultaneously, and multiple testing issues then arise. The aim of our research is to make a connection between Bayesian analysis and frequentist theory in the context of multiple comparisons by deriving some properties shared by both p-values and posterior probabilities. The ultimate goal of this work is to use the posterior probability of the one-sided alternative hypothesis (or equivalently, posterior probability of the half-space) in the same spirit as a p-value. We show for instance that such a Bayesian probability can be used as an input in some standard multiple testing procedures controlling for the False Discovery rate.
|
10 |
INFORMATION THEORETIC APPROACHES TOWARDS REGULATORY NETWORK INFERENCEChaitankar, Vijender 12 December 2012 (has links)
In spite of many efforts in the past, inference or reverse engineering of regulatory networks from microarray data remains an unsolved problem in the area of systems biology. Such regulatory networks play a critical role in cellular function and organization and are of interest in the study of a variety of disease areas and ecotoxicology to name a few. This dissertation proposes information theoretic methods/algorithms for inferring regulatory networks from microarray data. Most of the algorithms proposed in this dissertation can be implemented both on time series and multifactorial microarray data sets. The work proposed here infers regulatory networks considering the following six factors: (i) computational efficiency to infer genome-scale networks, (ii) incorporation of prior biological knowledge, (iii) choosing the optimal network that minimizes the joint network entropy, (iv) impact of higher order structures (specifically 3-node structures) on network inference (v) effects of the time sensitivity of regulatory interactions and (vi) exploiting the benefits of existing/proposed metrics and algorithms for reverse engineering using the concept of consensus of consensus networks. Specifically, this dissertation presents an approach towards incorporating knock-out data sets. The proposed method for incorporating knock-out data sets is flexible so that it can be easily adapted in existing/new approaches. While most of the information theoretic approaches infer networks based on pair-wise interactions this dissertation discusses inference methods that consider scoring edges from complex structures. A new inference method for building consensus networks based on networks inferred by multiple popular information theoretic approaches is also proposed here. For time-series datasets, new information theoretic metrics were proposed considering the time-lags of regulatory interactions estimated from microarray datasets. Finally, based on the scores predicted for each possible edge in the network, a probabilistic minimum description length based approach was proposed to identify the optimal network (minimizing the joint network entropy). Comparison analysis on in-silico and/or real time data sets have shown that the proposed algorithms achieve better inference accuracy and/or higher computational efficiency as compared with other state-of-the-art schemes such as ARACNE, CLR and Relevance Networks. Most of the methods proposed in this dissertation are generalized and can be easily incorporated into new methods/algorithms for network inference.
|
Page generated in 0.0639 seconds