Spelling suggestions: "subject:"1expression data"" "subject:"1expression mata""
11 |
HARP: a practical projected clustering algorithm for mining gene expression dataYip, Yuk-Lap, Kevin., 葉旭立. January 2003 (has links)
published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy
|
12 |
Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence CriterionZarkoob, Hadi 21 May 2010 (has links)
DNA microarrays are capable of measuring expression levels of thousands of genes, even the whole genome, in a single experiment. Based on this, they have been widely used to extend the studies of cancerous tissues to a genomic level. One of the main goals in DNA microarray experiments is to identify a set of relevant genes such that the desired outputs of the experiment mostly depend on this set, to the exclusion of the rest of the
genes. This is motivated by the fact that the biological process in cell typically involves only a subset of genes, and not the whole genome. The task of selecting a subset of relevant genes is called feature (gene) selection. Herein, we propose a feature selection algorithm for gene expression data. It is based on the Hilbert-Schmidt independence criterion, and partly motivated
by Rank-One Downdate (R1D) and the Singular Value
Decomposition (SVD). The algorithm is computationally very fast and
scalable to large data sets, and can be applied to response variables of arbitrary type (categorical and continuous). Experimental
results of the proposed technique are presented
on some synthetic and well-known microarray data sets. Later, we discuss the capability of HSIC in providing a general framework which encapsulates many widely used techniques for dimensionality reduction, clustering and metric learning. We will use this framework to explain two metric learning algorithms, namely the Fisher discriminant analysis (FDA) and closed form metric learning (CFML). As a result of this framework, we are able to propose a new metric learning method. The proposed technique uses the concepts from normalized cut spectral clustering and is associated with an underlying convex optimization problem.
|
13 |
The development and application of informatics-based systems for the analysis of the human transcriptome.Kelso, Janet January 2003 (has links)
<p>Despite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash / the location and timing of transcript expression &ndash / provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed.<br />
<br />
In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.</p>
|
14 |
A Neuro-Fuzzy Approach for Functional Genomics Data Interpretation and AnalysisNeagu, Daniel, Palade, V. January 2003 (has links)
No
|
15 |
Algorithms for the analysis of gene expression dataVenet, David 07 December 2004 (has links)
High-throughput gene expression data have been generated on a large scale by biologists.<p>The thesis describe a set of tools for the analysis of such data. It is specially gearded towards microarray data. / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
|
16 |
Meta-aprendizagem aplicada à classificação de dados de expressão gênica / Meta-learning applied to gene expression data classificationSouza, Bruno Feres de 26 October 2010 (has links)
Dentre as aplicações mais comuns envolvendo microarrays, pode-se destacar a classificação de amostras de tecido, essencial para a identificação correta da ocorrência de câncer. Essa classificação é realizada com a ajuda de algoritmos de Aprendizagem de Máquina. A escolha do algoritmo mais adequado para um dado problema não é trivial. Nesta tese de doutorado, estudou-se a utilização de meta-aprendizagem como uma solução viável. Os resultados experimentais atestaram o sucesso da aplicação utilizando um arcabouço padrão para caracterização dos dados e para a construção da recomendação. A partir de então, buscou-se realizar melhorias nesses dois aspectos. Inicialmente, foi proposto um novo conjunto de meta-atributos baseado em índices de validação de agrupamentos. Em seguida, estendeu-se o método de construção de rankings kNN para ponderar a influência dos vizinhos mais próximos. No contexto de meta-regressão, introduziu-se o uso de SVMs para estimar o desempenho de algoritmos de classificação. Árvores de decisão também foram empregadas para a construção da recomendação de algoritmos. Ante seu desempenho inferior, empregou-se um esquema de comitês de árvores, que melhorou sobremaneira a qualidade dos resultados / Among the most common applications involving microarray, one can highlight the classification of tissue samples, which is essential for the correct identification of the occurrence of cancer and its type. This classification takes place with the aid of machine learning algorithms. Choosing the best algorithm for a given problem is not trivial. In this thesis, we studied the use of meta-learning as a viable solution. The experimental results confirmed the success of the application using a standard framework for characterizing data and constructing the recommendation. Thereafter, some improvements were made in these two aspects. Initially, a new set of meta-attributes was proposed, which are based on cluster validation indices. Then the kNN method for ranking construction was extended to weight the influence of nearest neighbors. In the context of meta-regression, the use of SVMs was introduced to estimate the performance of ranking algorithms. Decision trees were also employed for recommending algorithms. Due to their low performance, a ensemble of trees was employed, which greatly improved the quality of results
|
17 |
Statistical Analysis of High-Dimensional Gene Expression DataJustin Zhu Unknown Date (has links)
The use of diagnostic rules based on microarray gene expression data has received wide attention in bioinformatics research. In order to form diagnostic rules, statistical techniques are needed to form classifiers with estimates for their associated error rates, and to correct for any selection biases in the estimates. There are also the associated problems of identifying the genes most useful in making these predictions. Traditional statistical techniques require the number of samples to be much larger than the number of features. Gene expression datasets usually have a small number of samples, but a large number of features. In this thesis, some new techniques are developed, and traditional techniques are used innovatively after appropriate modification to analyse gene expression data. Classification: We first consider classifying tissue samples based on the gene expression data. We employ an external cross-validation with recursive feature elimination to provide classification error rates for tissue samples with different numbers of genes. The techniques are implemented as an R package BCC (Bias-Corrected Classification), and are applied to a number of real-world datasets. The results demonstrate that the error rates vary with different numbers of genes. For each dataset, there is usually an optimal number of genes that returns the lowest cross-validation error rate. Detecting Differentially Expressed Genes: We then consider the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. The focus is on the use of mixture models to handle the multiplicity issue. The mixture model approach provides a framework for the estimation of the prior probability that a gene is not differentially expressed. It estimates various error rates, including the FDR (False Discovery Rate) and the FNR (False Negative Rate). We also develop a method for selecting biomarker genes for classification, based on their repeatability among the highly differentially expressed genes in cross-validation trials. The latter method incorporates both gene selection and classification. Selection Bias: When forming a prediction rule on the basis of a small number of classified tissue samples, some form of feature (gene) selection is usually adopted. This is a necessary step if the number of features is high. As the subset of genes used in the final form of the rule has not been randomly selected but rather chosen according to some criteria designed to reflect the predictive power of the rule, there will be a selection bias inherent in estimates of the error rates of the rule if care is not taken. Various situations are presented where selection biases arise in the formation of a prediction rule and where there is a consequent need for the correction of the biases. Three types of selection biases are analysed: selection bias from not using external cross-validation, selection bias of not working with the full set of genes, and the selection bias from optimizing the classification error rate over a number of subsets obtained according to a selection method. Here we mostly employ the support vector machine with recursive feature elimination. This thesis includes a description of cross-validation schemes that are able to correct for these selection biases. Furthermore, we examine the bias incurred when using the predicted rather than the true outcomes to define the class labels in forming and evaluating the performance of the discriminant rule. Case Study: We present a case study using the breast cancer datasets. In the study, we compare the 70 highly differentially expressed genes proposed by van 't Veer and colleagues, against the set of the genes selected using our repeatability method. The results demonstrate that there is more than one set of biomarker genes. We also examine the selection biases that may exist when analysing this dataset. The selection biases are demonstrated to be substantial.
|
18 |
Hierarchical Multi-Bottleneck Classification Method And Its Application to DNA Microarray Expression DataXiong, Xuejian, Wong, Weng Fai, Hsu, Wen Jing 01 1900 (has links)
The recent development of DNA microarray technology is creating a wealth of gene expression data. Typically these datasets have high dimensionality and a lot of varieties. Analysis of DNA microarray expression data is a fast growing research area that interfaces various disciplines such as biology, biochemistry, computer science and statistics. It is concluded that clustering and classification techniques can be successfully employed to group genes based on the similarity of their expression patterns. In this paper, a hierarchical multi-bottleneck classification method is proposed, and it is applied to classify a publicly available gene microarray expression data of budding yeast Saccharomyces cerevisiae. / Singapore-MIT Alliance (SMA)
|
19 |
ChlamyCyc : an integrative systems biology database and web-portal for Chlamydomonas reinhardtiiMay, Patrick, Christian, Jan-Ole, Kempa, Stefan, Walther, Dirk January 2009 (has links)
Background:
The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic
model organism for the study of photosynthesis and plant growth. In the era of modern highthroughput technologies there is an imperative need to integrate large-scale data sets from highthroughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism.
Results:
In the framework of the German Systems Biology initiative GoFORSYS, a pathway
database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification.
Conclusion:
ChlamyCyc provides a curated and integrated systems biology repository that will
enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.
|
20 |
Deriving Genetic Networks from Gene Expression Data and Prior KnowledgeLindlöf, Angelica January 2001 (has links)
In this work three different approaches for deriving genetic association networks were tested. The three approaches were Pearson correlation, an algorithm based on the Boolean network approach and prior knowledge. Pearson correlation and the algorithm based on the Boolean network approach derived associations from gene expression data. In the third approach, prior knowledge from a known genetic network of a related organism was used to derive associations for the target organism, by using homolog matching and mapping the known genetic network to the related organism. The results indicate that the Pearson correlation approach gave the best results, but the prior knowledge approach seems to be the one most worth pursuing
|
Page generated in 0.0972 seconds