Global ETD Search

1	Two Novel Methods for Clustering Short Time-Course Gene Expression Profiles 2014 January 1900 (has links) As genes with similar expression pattern are very likely having the same biological function, cluster analysis becomes an important tool to understand and predict gene functions from gene expression profi les. In many situations, each gene expression profi le only contains a few data points. Directly applying traditional clustering algorithms to such short gene expression profi les does not yield satisfactory results. Developing clustering algorithms for short gene expression profi les is necessary. In this thesis, two novel methods are developed for clustering short gene expression pro files. The fi rst method, called the network-based clustering method, deals with the defect of short gene expression profi les by generating a gene co-expression network using conditional mutual information (CMI), which measures the non-linear relationship between two genes, as well as considering indirect gene relationships in the presence of other genes. The network-based clustering method consists of two steps. A gene co-expression network is firstly constructed from short gene expression profi les using a path consistency algorithm (PCA) based on the CMI between genes. Then, a gene functional module is identi ed in terms of cluster cohesiveness. The network-based clustering method is evaluated on 10 large scale Arabidopsis thaliana short time-course gene expression profi le datasets in terms of gene ontology (GO) enrichment analysis, and compared with an existing method called Clustering with Over-lapping Neighbourhood Expansion (ClusterONE). Gene functional modules identi ed by the network-based clustering method for 10 datasets returns target GO p-values as low as 10-24, whereas the original ClusterONE yields insigni cant results. In order to more speci cally cluster gene expression profi les, a second clustering method, namely the protein-protein interaction (PPI) integrated clustering method, is developed. It is designed for clustering short gene expression profi les by integrating gene expression profi le patterns and curated PPI data. The method consists of the three following steps: (1) generate a number of prede ned profi le patterns according to the number of data points in the profi les and assign each gene to the prede fined profi le to which its expression profi le is the most similar; (2) integrate curated PPI data to refi ne the initial clustering result from (1); (3) combine the similar clusters from (2) to gradually reduce cluster numbers by a hierarchical clustering method. The PPI-integrated clustering method is evaluated on 10 large scale A. thaliana datasets using GO enrichment analysis, and by comparison with an existing method called Short Time-series Expression Miner (STEM). Target gene functional clusters identi ed by the PPI-integrated clustering method for 10 datasets returns GO p-values as low as 10-62, whereas STEM returns GO p-values as low as 10-38. In addition to the method development, obtained clusters by two proposed methods are further analyzed to identify cross-talk genes under fi ve stress conditions in root and shoot tissues. A list of potential abiotic stress tolerant genes are found. Cluster analysis Gene expression profiles Protein-protein interaction Conditional mutual information short time-course GO enrichment analysis.
2	Investigation of Information-Theoretic Bounds on Generalization Error Qorbani, Reza, Pettersson, Kevin January 2022 (has links) Generalization error describes how well a supervised machine learning algorithm predicts the labels of input data that it has not been trained with. This project aims to explore two different methods for bounding generalization error, f-CMI and ISMI, which explicitly use mutual information. Our experiments are based on the experiments in the papers in which the methods were proposed. The experiments implement and validate the accuracy of the mathematically derived bounds. Each methodology also has a different method for calculating mutual information. The ISMI bound experiment used a multivariate normal distribution dataset, whereas a dataset consisting of cats and dogs was used for the experiment using f-CMI. Our results show that both methods are capable of bounding the generalization error of a binary classification algorithm and provide bounds that closely follow the true generalization error. The results of the experiments agree with the original experiments, indicating that the proposed methods also work for similar applications with different datasets. / Generaliseringsfel beskriver hur väl en övervakad maskininlärnings algoritm förutspår etiketter av indata som den inte har blivit tränad med. Syftet med projektet är att utforska två olika metoder för att begränsa generaliseringsfelet, f-CMI och ISMI som explicit använder ömsesidig information. Vårt experiment är baserat på experimenten i artiklarna som tog fram metoderna. Experimenten implementerade och validerade noggrannheten av de matematiskt härleda gränserna. Varje metod har olika sätt att beräkna den ömsesidiga informationen. ISMI gräns experimentet använde en flerdimensionell normalfördelning som data set, medan en datauppsättning med katter och hundar användes för f-CMI gränsen. Våra resultat visar att båda metoder kan begränsa generaliseringsfelet av en binär klassificerings algoritm och förse gränser som nära följer det sanna generaliseringsfelet. Resultatet av experimenten instämmer med de ursprungliga författarnas experiment vilket indikerar att de föreslagna metoderna också fungerar for liknande tillämpningar med andra data set. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm Generalization error ISMI Generalization bound Elektroteknik och elektronik
3	TermodinÃ¢mica e informaÃ§Ã£o em redes quÃ¢nticas lineares / Thermodynamics and information in linear quantum lattices Malouf, William Tiago Batista 24 May 2019 (has links) Quando um sistema quântico é acoplado à diversos banhos térmicos de diferentes temperaturas, eventualmente um estado estacionário fora do equilíbrio (NESS), caracterizado por correntes internas de calor é atingido. Por um lado, essas correntes são responsáveis por causar decoerência e produzir entropia no sistema. Entretanto, sua existência também induz correlações entre diferentes partes do sistema. Neste trabalho, nós exploramos este duplo aspecto dos NESSs. Usando técnicas do espaço de fase nós calculamos a produção de entropia de Wigner em redes lineares harmônicas. Trabalhando no célebre limite de fraco acoplamento interno e dissipativo, nós obtivemos expressões simples e frechadas para a contribuição de cada corrente de quasi-probabilidade na entropia. Nossa análise também mostra que, a dinâmica interna (reversével) é exclusivamente responsável em manter a produção de entropia (irreversível) estacionária. Considerando um ponto de vista informacional, nós trabalhamos no problema de como quantificar a informação compartilhada entre partes desconexas de uma cadeia quântica em um estado estacionário fora do equilíbrio. Nós mostramos então que esta é mais precisamente caracterizada utilizando a informação mútua condicional (CMI), um quantificador mais geral de correlações tripartites do que a usual informação mútua. Como aplicação, nós utilizamos o paradigmático problema da transferência de energia em uma cadeia de osciladores sujeita a banhos internos auto-consistentes, que podem ser usados para mudar de um transporte balístico para difusivo. Nós encontramos que a produção de entropia escala com diferentes leis de potência nos regimes balístico e difusivo, permitindo então quantificar o \'\'custo entrópico da difusividade\'\'. Nós também computamos a CMI para cadeias de diversos tamanhos e assim encontramos leis de escala relacionando a informação compartilhada com a difusividade. Finalmente nós discutimos como esta nova perspectiva na caracterização de sistemas fora do equilíbrio pode ser aplicada para entender o problema de equilibração local em estados fora do equilíbrio. / When a quantum system is coupled to several heat baths at different temperatures, it eventually reaches a non-equilibrium steady state (NESS) featuring stationary internal heat currents. From one side, these currents are responsible to cause decorehence and produce entropy in the system. However, their existence also induce correlations between different parts of the system. In this work, we explore this two-folded aspect of NESSs. Using phase-space techniques we calculate the Wigner entropy production on general linear networks of harmonic nodes. Working in the ubiquitous limit of weak internal coupling and weak dissipation, we obtain simple closed-form expressions for the entropic contribution of each individual quasi-probability current. Our analysis also shows that, it is exclusively the (reversible) internal dynamics which maintain the stationary (irreversible) entropy production. From the informational point of view, we address how to quantify the amount of information that disconnected parts of a quantum chain share in a non-equilibrium steady-state. As we show, this is more precisely captured by the conditional mutual information (CMI), a more general quantifier of tripartite correlations than the usual mutual information. As an application, we apply our framework to the paradigmatic problem of energy transfer through a chain of oscillators subject to self-consistent internal baths that can be used to tune the transport from ballistic to diffusive. We find that the entropy production scales with different power law behaviors in the ballistic and diffusive regimes, hence allowing us to quantify what is the \'\'entropic cost of diffusivity\'\'. We also compute the CMI for arbitrary sizes and thus find the scaling rules connecting information sharing and diffusivity. Finally, we discuss how this new perspective in the characterization of non-equilibrium systems may be applied to understand the issue of local equilibration in non-equilibrium states.

1

Page generated in 0.1451 seconds