Global ETD Search

21	Identification of gene expression changes in human cancer using bioinformatic approaches Griffith, Obi Lee 05 1900 (has links) The human genome contains tens of thousands of gene loci which code for an even greater number of protein and RNA products. The highly complex temporal and spatial expression of these genes makes possible all the biological processes of life. Altered gene expression by mutation or deregulation is fundamental for the development of many human diseases. The ultimate aim of this thesis was to identify gene expression changes relevant to cancer. The advent of genome-wide expression profiling techniques, such as microarrays, has provided powerful new tools to identify such changes and researchers are now faced with an explosion of gene expression data. Processing, comparing and integrating these data present major challenges. I approached these challenges by developing and assessing novel methods for cross-platform analysis of expression data, scalable subspace clustering, and curation of experimental gene regulation data from the published literature. I found that combining results from different expression platforms increases reliability of coexpression predictions. However, I also observed that global correlation between platforms was generally low, and few gene pairs reached reasonable thresholds for high-confidence coexpression. Therefore, I developed a novel subspace clustering algorithm, able to identify coexpressed genes in experimental subsets of very large gene expression datasets. Biological assessment against several metrics indicates that this algorithm performs well. I also developed a novel meta-analysis method to identify consistently reported genes from differential expression studies when raw data are unavailable. This method was applied to thyroid cancer, producing a ranked list of significantly over-represented genes. Tissue microarray analysis of some of these candidates and others identified a number of promising biomarkers for diagnostic and prognostic classification of thyroid cancer. Finally, I present ORegAnno (www.oreganno.org), a resource for the community-driven curation of experimentally verified regulatory sequences. This resource has proven a great success with ~30,000 sequences entered from over 900 publications by ~50 contributing users. These data, methods and resources contribute to our overall understanding of gene regulation, gene expression, and the changes that occur in cancer. Such an understanding should help identify new cancer mechanisms, potential treatment targets, and have significant diagnostic and prognostic implications. Bioinformatics Gene expression Gene regulation SAGE Tissue microarray Thyroid cancer Subspace clustering Biclustering Ontology Biomarker
22	Associações entre borboletas frugívoras em áreas de floresta com diferentes históricos de perturbação antrópica / Associations between fruit-feeding butterflies in forest areas with different historics of anthropic disturbances Guidelli, Rodrigo Vieira [UNESP] 28 February 2016 (has links) Submitted by Rodrigo Vieira Guidelli null (rguidelli4@gmail.com) on 2016-03-26T16:03:57Z No. of bitstreams: 1 Rodrigo Vieira Guidelli - Dissertação de Mestrado.pdf: 2574087 bytes, checksum: c4570e90e2f553c165e4cbcb47a0f339 (MD5) / Approved for entry into archive by Ana Paula Grisoto (grisotoana@reitoria.unesp.br) on 2016-03-28T16:40:24Z (GMT) No. of bitstreams: 1 guidelli_rv_me_rcla.pdf: 2574087 bytes, checksum: c4570e90e2f553c165e4cbcb47a0f339 (MD5) / Made available in DSpace on 2016-03-28T16:40:24Z (GMT). No. of bitstreams: 1 guidelli_rv_me_rcla.pdf: 2574087 bytes, checksum: c4570e90e2f553c165e4cbcb47a0f339 (MD5) Previous issue date: 2016-02-28 / Pró-Reitoria de Extensão Universitária (PROEX UNESP) / Pró-Reitoria de Pós-Graduação (PROPG UNESP) / Em 2009 Uehara Prado et al., coletaram uma grande quantidade de dados para avaliar o papel das borboletas da família Nymphalidae como bioindicadoras, porém esses dados não foram utilizados em sua totalidade. O presente estudo está direcionado à experimentação e modelagem de interações ecológicas, a partir dos dados obtidos por Uehara-Prado et al. (2009), juntamente com aqueles não previamente utilizados que, no intuito de extrair o máximo de informação de relevância biológica e ecológica. Para tanto, foram utilizados três diferentes tipos de abordagens: (1) Biclusterização (Cheng & Church, 2000; Madeira & Oliveira, 2004); (2) Árvores de decisão (Quinlan, 1986; Bell, 1999; De’ath & Fabricius, 2000; Olden et al., 2008) e (3) Redes Bayesianas (Korb & Nicholson, 2003; McCann et al., 2006; Chen & Pollino, 2012; Pearl, 2014). Os resultados se mostraram bastante promissores, e as três ferramentas atingiram as expectativas; em biclusterização, conseguimos identificar todos os padrões de correlação dentro dos cenários apresentados, árvores de decisão se mostraram extremamente eficazes na classificação das variáveis apresentadas e as Redes Bayesianas conseguiram identificar quais variáveis influenciavam ou eram influenciadas pelas outras. Com este trabalho esperamos incentivar outros pesquisadores à revisitarem antigas bases de dados com ferramentas computacionais mais modernas, pois seu potencial é extraordinário. / Elucidating the complex interactions networks in ecological systems is not an easy task (Proulx et al., 2005) and, in order to extract information in an efficient way, powerful computational tools and the right approach, to the types of scenario to be studied, are required. In 2009 UeharaPrado et al., collected a great amount of data to assess the role of the Nymphalidae family of butterflies as bio-indicators, but these data were not used in its entirety. This study is aimed at experimentation and modeling of the ecological interactions from the data obtained by UeharaPrado et al. (2009), along with those not previously used, in order to extract the maximum information of biological and ecological significance. Therefore, three different approaches were used: (1) Biclusterization (Cheng & Church, 2000; Wood& Olive, 2004); (2) Decision Trees (Quinlan, 1986; Bell, 1999; De'ath & Fabricius., 2000; Olden et al, 2008) and (3) Bayesian Networks (Korb & Nicholson, 2003; McCann et al., 2006; Chen & Pollino, 2012; Pearl, 2014). The results were very promising, and the three tools reached our expectations; with Biclusterization we managed to identify all the correlation patterns inside the scenarios presented, Decision Trees proved to be extremely effective in the classification of the variables and the Bayesian Networks were able to identify what variables influenced or were influenced by the others. With this work, we hope to encourage other researchers to revisit old databases with more modern computational tools, because its potential is extraordinary. Borboletas frugívoras Biclusterização Árvores de Decisão Redes Bayesianas Fruit-feeding butterflies Biclustering Decision trees Bayesian networks
23	Identification of gene expression changes in human cancer using bioinformatic approaches Griffith, Obi Lee 05 1900 (has links) The human genome contains tens of thousands of gene loci which code for an even greater number of protein and RNA products. The highly complex temporal and spatial expression of these genes makes possible all the biological processes of life. Altered gene expression by mutation or deregulation is fundamental for the development of many human diseases. The ultimate aim of this thesis was to identify gene expression changes relevant to cancer. The advent of genome-wide expression profiling techniques, such as microarrays, has provided powerful new tools to identify such changes and researchers are now faced with an explosion of gene expression data. Processing, comparing and integrating these data present major challenges. I approached these challenges by developing and assessing novel methods for cross-platform analysis of expression data, scalable subspace clustering, and curation of experimental gene regulation data from the published literature. I found that combining results from different expression platforms increases reliability of coexpression predictions. However, I also observed that global correlation between platforms was generally low, and few gene pairs reached reasonable thresholds for high-confidence coexpression. Therefore, I developed a novel subspace clustering algorithm, able to identify coexpressed genes in experimental subsets of very large gene expression datasets. Biological assessment against several metrics indicates that this algorithm performs well. I also developed a novel meta-analysis method to identify consistently reported genes from differential expression studies when raw data are unavailable. This method was applied to thyroid cancer, producing a ranked list of significantly over-represented genes. Tissue microarray analysis of some of these candidates and others identified a number of promising biomarkers for diagnostic and prognostic classification of thyroid cancer. Finally, I present ORegAnno (www.oreganno.org), a resource for the community-driven curation of experimentally verified regulatory sequences. This resource has proven a great success with ~30,000 sequences entered from over 900 publications by ~50 contributing users. These data, methods and resources contribute to our overall understanding of gene regulation, gene expression, and the changes that occur in cancer. Such an understanding should help identify new cancer mechanisms, potential treatment targets, and have significant diagnostic and prognostic implications. / Medicine, Faculty of / Medical Genetics, Department of / Graduate Bioinformatics Gene expression Gene regulation SAGE Tissue microarray Thyroid cancer Subspace clustering Biclustering Ontology Biomarker
24	Avaliação sistemática de técnicas de bi-agrupamento de dados / A systematic comparative evaluation of biclustering techniques Victor Alexandre Padilha 23 September 2016 (has links) Análise de agrupamento é um problema fundamental de aprendizado de máquina não supervisionado em que se objetiva determinar categorias que descrevam um conjunto de objetos de acordo com suas similaridades ou inter-relacionamentos. Na formulação tradicional do problema, busca-se por partições ou hierarquias de partições contendo grupos cujos objetos são de alguma forma similares entre si e dissimilares aos objetos dos demais grupos, segundo alguma medida direta ou indireta de (dis)similaridade que leva em conta o conjunto completo de atributos que descrevem os objetos na base de dados sob análise. Entretanto, apesar de décadas de aplicações bem sucedidas, existem situações em que a natureza dos agrupamentos contidos nos dados não pode ser representada segundo este tipo de formulação. Em particular, existem situações em que grupos de objetos se caracterizam como tais apenas segundo um subconjunto dos atributos que os descrevem, sendo que tal subconjunto pode ser distinto para cada grupo. Ao contrário de algoritmos de agrupamento tradicionais, algoritmos de bi-agrupamento são capazes de agrupar simultaneamente linhas e colunas de uma matriz de dados. Tais algoritmos produzem bi-grupos formados por subconjuntos de objetos e subconjuntos de atributos de alguma forma fortemente co-relacionados. Esses algoritmos passaram a atrair a atenção da comunidade científica quando se evidenciou a relevância da tarefa de bi-agrupamento em problemas de análise de dados de expressão gênica em bioinformática. Embora em menor grau, as abordagens de bi-agrupamento também têm ganho atenção em outros domínios de aplicação, tais como mineração de textos (text mining) e filtragem colaborativa em sistemas de recomendação. O problema é que uma variedade de algoritmos de bi-agrupamento têm sido propostos na literatura baseados em diferentes princípios e suposições sobre os dados, podendo chegar a resultados completamente distintos em uma mesma aplicação. Nesse cenário, torna-se importante a realização de estudos comparativos que possam contrastar o comportamento e desempenho dos diversos algoritmos. Neste trabalho é apresentado um estudo comparativo envolvendo 17 algoritmos de bi-agrupamento (representativos das principais categorias de algoritmos existentes) em coleções de bases de dados tanto de natureza real como simulada, com particular ênfase em problemas de análise de dados de expressão gênica. Diversos aspectos metodológicos e procedimentos para a avaliação experimental foram considerados, a fim de superar as limitações de estudos comparativos anteriores da literatura. Além da comparação em si, todo o arcabouço comparativo pode ser reutilizado para a comparação de outros algoritmos no futuro. / Data clustering is a fundamental problem in the unsupervised machine learning field, whose objective is to find categories that describe a dataset according to similarities between its objects. In its traditional formulation, we search for partitions or hierarchies of partitions containing clusters such that the objects contained in the same cluster are similar to each other and dissimilar to objects from other clusters according to a similarity or dissimilarity measure that uses all the data attributes in its calculation. So, it is supposed that all clusters are characterized in the same feature space. However, there are several applications where the clusters are characterized only in a subset of the attributes, which could be different from one cluster to another. Different than traditional data clustering algorithms, biclustering algorithms are able to cluster the rows and columns of a data matrix simultaneously, producing biclusters formed with strongly related subsets of objects and subsets of attributes. These algorithms started to draw the scientific communitys attention only after some studies that show their importance for gene expression data analysis. To a lesser degree, biclustering techniques have also been used in other application domains, such as text mining and collaborative filtering in recommendation systems. The problem is that several biclustering algorithms have been proposed in the past recent years with different principles and assumptions, which could result in different outcomes in the same dataset. So, it becomes important to perform comparative studies that could illustrate the behavior and performance of some algorithms. In this thesis, it is presented a comparative study with 17 biclustering algorithms (which are representative of the main categories of algorithms in the literature) which were tested on synthetic and real data collections, with particular emphasis on gene expression data analysis. Several methodologies and experimental evaluation procedures were taken into account during the research, in order to overcome the limitations of previous comparative studies from the literature. Beyond the presented comparison, the comparative methodology developed could be reused to compare other algorithms in the future. Agrupamento de dados Bi-agrupamento de dados Expressão gênica Biclustering Clustering Gene expression
25	Intelligent Data Mining on Large-scale Heterogeneous Datasets and its Application in Computational Biology Wu, Chao 10 October 2014 (has links) No description available. Computer Science Machine Learning Clustering Data Integration Biclustering Bioinformatics Network-based Methodology
26	Data Mining Algorithms for Discovering Patterns in Text Collections Patchala, Jagadeesh 27 May 2016 (has links) No description available. Computer Science Authorship Analysis Biclustering 3-clusters Drug repurposing Text mining Data mining
27	Graph Coloring and Clustering Algorithms for Science and Engineering Applications Bozdag, Doruk January 2008 (has links) No description available. Bioinformatics Computer Science Electrical Engineering parallel graph coloring distributed memor y biclustering co-clustering subspace clustering
28	SNAP Biclustering Chan, William Hannibal 22 January 2010 (has links) This thesis presents a new ant-optimized biclustering technique known as SNAP biclustering, which runs faster and produces results of superior quality to previous techniques. Biclustering techniques have been designed to compensate for the weaknesses of classical clustering algorithms by allowing cluster overlap, and allowing vectors to be grouped for a subset of their defined features. These techniques have performed well in many problem domains, particularly DNA microarray analysis and collaborative filtering. A motivation for this work has been the biclustering technique known as bicACO, which was the first to use ant colony optimization. As bicACO is time intensive, much emphasis was placed on decreasing SNAP's runtime. The superior speed and biclustering results of SNAP are due to its improved initialization and solution construction procedures. In experimental studies involving the Yeast Cell Cycle DNA microarray dataset and the MovieLens collaborative filtering dataset, SNAP has run at least 22 times faster than bicACO while generating superior results. Thus, SNAP is an effective choice of technique for microarray analysis and collaborative filtering applications. / Master of Science Single Nucleotide Polymorphisms Collaborative Filtering Microarray Analysis Ant Colony Optimization Biclustering
29	Agrupamento de dados baseado em predições de modelos de regressão: desenvolvimentos e aplicações em sistemas de recomendação / Data clustering based on prediction regression models: developments and applications in recommender systems Pereira, André Luiz Vizine 12 May 2016 (has links) Sistemas de Recomendação (SR) vêm se apresentando como poderosas ferramentas para portais web tais como sítios de comércio eletrônico. Para fazer suas recomendações, os SR se utilizam de fontes de dados variadas, as quais capturam as características dos usuários, dos itens e suas transações, bem como de modelos de predição. Dada a grande quantidade de dados envolvidos, é improvável que todas as recomendações possam ser bem representadas por um único modelo global de predição. Um outro importante aspecto a ser observado é o problema conhecido por cold-start, que apesar dos avanços na área de SR, é ainda uma questão relevante que merece uma maior atenção. O problema está relacionado com a falta de informação prévia sobre novos usuários ou novos itens do sistema. Esta tese apresenta uma abordagem híbrida de recomendação capaz de lidar com situações extremas de cold-start. A abordagem foi desenvolvida com base no algoritmo SCOAL (Simultaneous Co-Clustering and Learning). Na sua versão original, baseada em múltiplos modelos lineares de predição, o algoritmo SCOAL mostrou-se eficiente e versátil, podendo ser utilizado numa ampla gama de problemas de classificação e/ou regressão. Para melhorar o algoritmo SCOAL no sentido de deixá-lo mais versátil por meio do uso de modelos não lineares, esta tese apresenta uma variante do algoritmo SCOAL que utiliza modelos de predição baseados em Máquinas de Aprendizado Extremo. Além da capacidade de predição, um outro fator que deve ser levado em consideração no desenvolvimento de SR é a escalabilidade do sistema. Neste sentido, foi desenvolvida uma versão paralela do algoritmo SCOAL baseada em OpenMP, que minimiza o tempo envolvido no cálculo dos modelos de predição. Experimentos computacionais controlados, por meio de bases de dados amplamente usadas na prática, comprovam que todos os desenvolvimentos propostos tornam o SCOAL ainda mais atraente para aplicações práticas variadas. / Recommender Systems (RS) are powerful and popular tools for e-commerce. To build its recommendations, RS make use of multiple data sources, capture the characteristics of items, users and their transactions, and take advantage of prediction models. Given the large amount of data involved in the predictions made by RS, is unlikely that all predictions can be well represented by a single global model. Another important aspect to note is the problem known as cold-start that, despite that recent advances in the RS area, it is still a relevant issue that deserves further attention. The problem arises due to the lack of prior information about new users and new items. This thesis presents a hybrid recommendation approach that addresses the (pure) cold start problem, where no collaborative information (ratings) is available for new users. The approach is based on an existing algorithm, named SCOAL (Simultaneous Co-Clustering and Learning). In its original version, based on multiple linear prediction models, the SCOAL algorithm has shown to be efficient and versatile. In addition, it can be used in a wide range of problems of classification and / or regression. The SCOAL algorithm showed impressive results with the use of linear prediction models, but there is still room for improvements with nonlinear models. From this perspective, this thesis presents a variant of the SCOAL based on Extreme Learning Machines. Besides improving the accuracy, another important issue related to the development of RS is system scalability. In this sense, a parallel version of the SCOAL, based on OpenMP, was developed, aimed at minimizing the computational cost involved as prediction models are learned. Experiments using real-world datasets has shown that all proposed developments make SCOAL algorithm even more attractive for a variety of practical applications. Biclustering Biclusterização Cold-start Cold-start Extreme learning machines Máquinas de aprendizado extremo Modelos de predição Prediction models Recommender systems Sistemas de recomendação
30	Agrupamento de dados baseado em predições de modelos de regressão: desenvolvimentos e aplicações em sistemas de recomendação / Data clustering based on prediction regression models: developments and applications in recommender systems André Luiz Vizine Pereira 12 May 2016 (has links) Sistemas de Recomendação (SR) vêm se apresentando como poderosas ferramentas para portais web tais como sítios de comércio eletrônico. Para fazer suas recomendações, os SR se utilizam de fontes de dados variadas, as quais capturam as características dos usuários, dos itens e suas transações, bem como de modelos de predição. Dada a grande quantidade de dados envolvidos, é improvável que todas as recomendações possam ser bem representadas por um único modelo global de predição. Um outro importante aspecto a ser observado é o problema conhecido por cold-start, que apesar dos avanços na área de SR, é ainda uma questão relevante que merece uma maior atenção. O problema está relacionado com a falta de informação prévia sobre novos usuários ou novos itens do sistema. Esta tese apresenta uma abordagem híbrida de recomendação capaz de lidar com situações extremas de cold-start. A abordagem foi desenvolvida com base no algoritmo SCOAL (Simultaneous Co-Clustering and Learning). Na sua versão original, baseada em múltiplos modelos lineares de predição, o algoritmo SCOAL mostrou-se eficiente e versátil, podendo ser utilizado numa ampla gama de problemas de classificação e/ou regressão. Para melhorar o algoritmo SCOAL no sentido de deixá-lo mais versátil por meio do uso de modelos não lineares, esta tese apresenta uma variante do algoritmo SCOAL que utiliza modelos de predição baseados em Máquinas de Aprendizado Extremo. Além da capacidade de predição, um outro fator que deve ser levado em consideração no desenvolvimento de SR é a escalabilidade do sistema. Neste sentido, foi desenvolvida uma versão paralela do algoritmo SCOAL baseada em OpenMP, que minimiza o tempo envolvido no cálculo dos modelos de predição. Experimentos computacionais controlados, por meio de bases de dados amplamente usadas na prática, comprovam que todos os desenvolvimentos propostos tornam o SCOAL ainda mais atraente para aplicações práticas variadas. / Recommender Systems (RS) are powerful and popular tools for e-commerce. To build its recommendations, RS make use of multiple data sources, capture the characteristics of items, users and their transactions, and take advantage of prediction models. Given the large amount of data involved in the predictions made by RS, is unlikely that all predictions can be well represented by a single global model. Another important aspect to note is the problem known as cold-start that, despite that recent advances in the RS area, it is still a relevant issue that deserves further attention. The problem arises due to the lack of prior information about new users and new items. This thesis presents a hybrid recommendation approach that addresses the (pure) cold start problem, where no collaborative information (ratings) is available for new users. The approach is based on an existing algorithm, named SCOAL (Simultaneous Co-Clustering and Learning). In its original version, based on multiple linear prediction models, the SCOAL algorithm has shown to be efficient and versatile. In addition, it can be used in a wide range of problems of classification and / or regression. The SCOAL algorithm showed impressive results with the use of linear prediction models, but there is still room for improvements with nonlinear models. From this perspective, this thesis presents a variant of the SCOAL based on Extreme Learning Machines. Besides improving the accuracy, another important issue related to the development of RS is system scalability. In this sense, a parallel version of the SCOAL, based on OpenMP, was developed, aimed at minimizing the computational cost involved as prediction models are learned. Experiments using real-world datasets has shown that all proposed developments make SCOAL algorithm even more attractive for a variety of practical applications. Biclusterização Cold-start Máquinas de aprendizado extremo Modelos de predição Sistemas de recomendação Biclustering Cold-start Extreme learning machines Prediction models Recommender systems

Search results