1 |
Network inference from sparse single-cell transcriptomics data: Exploring, exploiting, and evaluating the single-cell toolboxSteinheuer, Lisa Maria 04 April 2022 (has links)
Large-scale transcriptomics data studies revolutionised the fields of systems biology and medicine, allowing to generate deeper mechanistic insights into biological pathways and molecular functions. However, conventional bulk RNA-sequencing results in the analysis of an averaged signal of many input cells, which are homogenised during the experimental procedure.
Hence, those insights represent only a coarse-grained picture, potentially missing information from rare or unidentified cell types. Allowing for an unprecedented level of resolution, single-cell transcriptomics may help to identify and characterise new cell types, unravel developmental trajectories, and facilitate inference of cell type-specific networks. Besides all these tempting promises, there is one main limitation that currently hampers many downstream tasks: single-cell RNA-sequencing data is characterised by a high degree of sparsity.
Due to this limitation, no reliable network inference tools allowed to disentangle the hidden information in the single-cell data.
Single-cell correlation networks likely hold previously masked information and could allow inferring new insights into cell type-specific networks. To harness the potential of single-cell transcriptomics data, this dissertation sought to evaluate the influence of data dropout on network inference and how this might be alleviated. However, two premisses must be met to fulfil the promise of cell type-specific networks: (I) cell type annotation and (II) reliable network inference. Since any experimentally generated scRNA-seq data is associated with an unknown degree of dropout, a benchmarking framework was set up using a synthetic gold data set, which was subsequently affected with different defined degrees of dropout. Aiming to desparsify the dropout-afflicted data, the influence of various imputations tools on the network
structure was further evaluated. The results highlighted that for moderate dropout levels, a deep count autoencoder (DCA) was able to outperform the other tools and the unimputed data. To fulfil the premiss of cell type annotation, the impact of data imputation on cell-cell correlations was investigated using a human retina organoid data set. The results highlighted that no imputation tool intervened with cell cluster annotation.
Based on the encouraging results of the benchmarking analysis, a window of opportunity was identified, which allowed for meaningful network inference from imputed single-cell RNA-seq data. Therefore, the inference of cell type-specific networks subsequent to DCA-imputation was evaluated in a human retina organoid data set. To understand the differences and commonalities of cell type-specific networks, those were analysed for cones and rods, two closely related photoreceptor cell types of the retina. Comparing the importance of marker genes for rods and cones between their respective cell type-specific networks exhibited that these genes were of high importance, i.e. had hub-gene-like properties in one module of the corresponding network but were of less importance in the opposing network. Furthermore, it was analysed how many hub genes in general preserved their status across cell type-specific networks and whether they associate with similar or diverging sub-networks. While a set of preserved hub genes was identified, a few were linked to completely different network structures. One candidate was EIF4EBP1, a eukaryotic translation initiation factor binding protein, which is associated with a retinal pathology called age-related macular degeneration (AMD). These results suggest that given very defined prerequisites, data imputation via DCA can indeed facilitate cell type-specific network inference, delivering promising biological insights.
Referring back to AMD, a major cause for the loss of central vision in patients older than 65, neither the defined mechanisms of pathogenesis nor treatment options are at hand. However, light can be shed on this disease through the employment of organoid model systems since they resemble the in vivo organ composition while reducing its complexity and ethical concerns. Therefore, a recently developed human retina organoid system (HRO) was investigated using the single-cell toolbox to evaluate whether it provides a useful base to study the defined effects on the onset and progression of AMD in the future. In particular, different workflows for a robust and in-depth annotation of cell types were used, including literature-based and transfer learning approaches. These allowed to state that the organoid system may reproduce hallmarks of a more central retina, which is an important determinant of AMD pathogenesis. Also, using trajectory analysis, it could be detected that the organoids in part reproduce major developmental hallmarks of the retina, but that different HRO samples exhibited developmental differences that point at different degrees of maturation. Altogether, this analysis allowed to deeply characterise a human retinal organoid system, which revealed in vivo-like outcomes and features as pinpointing discrepancies. These results could be used to refine culture conditions during the organoid differentiation to optimise its utility as a disease model.
In summary, this dissertation describes a workflow that, in contrast to the current state of the art in the literature enables the inference of cell type-specific gene regulatory networks.
The thesis illustrated that such networks indeed differ even between closely related cells.
Thus, single-cell transcriptomics can yield unprecedented insights into so far not understood cell regulatory principles, particularly rare cell types that are so far hardly reflected in bulk-derived RNA-seq data.
|
2 |
BioNetStat: uma ferramenta para análise diferencial de redes biológicas / BioNetStat: a tool for biological networks differential analysisCarvalho, Vinícius Jardim 08 February 2018 (has links)
A diversidade de interações que ocorre dentro de sistemas biológicos, considerando desde as organelas de uma célula até toda a biosfera, pode ser modelada por meio da teoria de redes. A dinâmica das interações entre os elementos é uma propriedade intrínseca desses sistemas. Diversas ferramentas foram propostas para comparar redes, que representam os muitos estados assumidos por um sistema. Porém, nenhuma delas é capaz de comparar características estruturais de mais de duas redes simultaneamente. Devido à grande quantidade de estados que um sistema pode assumir, construímos uma ferramenta estatística para comparar duas ou mais redes e indicar variáveis chave no processo estudado. A principal proposta deste trabalho foi comparar redes de correlação usando medidas baseadas nos espectros dos grafos (conjunto de autovalores das matrizes de adjacência), como a distribuição espectral. Essa medida está associada a diversas características estruturais das redes como o número de caminhos, diâmetro e cliques. Além da distribuição espectral, também comparamos as redes por entropia espectral, distribuição dos graus e pelas centralidades dos nós. Usamos dois diferentes conjuntos de dados biológicos (expressão gênica de células tumorais e metabolismo vegetal) para realizar os testes de desempenho da ferramenta e para os estudos de caso. O método proposto está implementado em um pacote do programa R, chamado BioNetStat, com interface gráfica para o usuário leigo em programação. Constatamos que os testes são eficientes em diferenciar mais de duas redes. Além disso, o aumento do número de redes comparadas e a queda dos números de unidades amostrais, diminui o poder estatístico do teste. Mostramos ainda que ocorre uma economia de tempo significativa ao realizarmos uma única análise para comparar muitas redes ao invés de compará-las par-a-par. Além disto, o método apontou grupos de variáveis com papel central nos sistemas biológicos estudados que não foram encontrados nas análises onde apenas a expressão ou concentração dos elementos foi estudada. Foi possível assim diferenciar células de tipos cancerígenos ou órgãos de organismos vegetais através das centralidades das redes. As variáveis levantadas possibilitam ao usuário gerar hipóteses sobre seus papeis nos processos em estudo. O BioNetStat pode assim ajudar a detectar possíveis novas descobertas associadas a mecanismos de funcionamento de sistemas. / The diversity of interactions, which are among elements of the biological systems, can be studied based on the networks theory. Moreover, the dynamic of these interactions is an inherent trait of those systems. In this sense, several tools have been proposed to compare networks, in that each network represents a state assumed by the system. However, the biological systems generally can assume much more than two biological states and none of the tools are able to compare structural characteristics among more than two networks simultaneously. To solve this issue, we developed a statistical tool to compare two or more networks and highlight key variables of a system. Here we describe the new method, called BioNetStat, that is able to compare correlation networks using traits that are based on graph spectra (the group of eigenvalues of the adjacency matrix), such as the spectral distribution. This measure is associated with several structural characteristics of networks such as the number of walks, diameter, and cliques. In addition to the spectral distribution, BioNetStat can also compare networks to the node centralities. We used two different biological datasets, tumoral cells genes expressions and plant metabolism, to evaluate the performance of BioNetStat and as case studies. The tool is implemented in an R package, and it also has a user-friendly interface. We showed that BioNetStat is efficient in distinguishing more than two networks. In comparison with a similar tool (GSCA), the increase in the number of compared networks reduces less the statistical power of the BioNetStat than the GSCA. Furthermore, BioNetStat is able to find signaling pathways in a bigger proportion than the GSCA, complementing tools proposed in the literature. In the case studies, the method pointed out variables, and sets of variables, with a central role in biological systems, which were not highlighted when only gene expression pattern or metabolomics were studied. For instance, BioNetStat allowed us to differentiate among cancer types and plant organs. The BioNetStat results bring new findings on what differentiate the states, giving us a systemic view of our study subject and affording the proposition of new hypotheses about the studied processes.
|
3 |
Microbial Functional Diversity and the Associated Biogeochemical Interactions Across Miami-Dade County, Florida SoilsKushwaha, Priyanka 02 November 2016 (has links)
Decomposition of soil organic matter by microbial processes results in carbon sequestration within soils and/or carbon loss via atmospheric emission of carbon dioxide and methane. Natural as well as anthropogenic factors have been documented to impact soil microbial diversity and the associated biogeochemical functions. The soil microbial communities co-inhabiting Miami-Dade County soils, Florida are under threat because of the ongoing restoration efforts in the adjoining Florida Everglades Parks, predicted climatic changes such as sea-level rise and high rainfall, as well as urbanization. Therefore, an improved understanding of the current microbial functional communities is essential to better assess the impact of soil communities when anthropogenic or climatic disturbances occur. The objectives of the current study were to characterize the biodiversity and distribution of: a) cellulose degrading microbial community, and b) methanogenic guilds responsible for producing the gas methane, across four different Miami-Dade County, Florida soil types using the high throughput technique of GeoChip 5.0 functional microarray. In addition, the influence of vegetation cover, organic content, soil moisture content, pH, and soil texture in shaping the soil functional microbial community was also investigated. The function of cellulose degradation was distributed across wide range of taxonomic lineages with the majority belonging to the bacterial groups of Actinobacteria, Firmicutes, Alphaproteobacteria, and Gammaproteobacteria, whereas Ascomycota and Basidiomycota were the only detected fungal phyla. The cellulolytic bacterial community correlated more with the vegetation cover while fungal groups showed influence of moisture and organic content as well as percent silt. Six out of the seven methanogenic orders, with the greatest numbers found in the Methanomicrobiales, Methanosarcinales, and Methanomassiliicoccales, were identified across all four soil types of Miami-Dade. The abundance of the mcrA gene sequences was significantly greater with respect to soil moisture content. Additionally, the recently classified order Methanomassiliicoccales was identified across all four soils, including soils with lower moisture content not thought to provide ideal redox conditions to support methanogens. The greater number of correlation network interactions amongst the methanogenic guilds in the Florida Everglades wetlands versus the urbanized Miami-Dade County soils depicted the impact of the historical drainage of the Florida Everglades on the methanogenic community. Overall, the current study characterized the biodiversity of cellulolytic and methanogenic organisms across dry and saturated soils of Miami-Dade County and demonstrated that microbial guilds were functionally redundant and were influenced to some extent by the soil abiotic factors. Also, results from network analyses provide a platform to assess the future impacts of disturbances on the microbial community.
|
4 |
DNA methylation correlation networks in overweight and normal-weight adolescents reveal differential coordinationBringeland, Nathalie January 2013 (has links)
Multiple health issues are associated with obesity and numerous factors are causative of the disease. The role of genetic factors is well established, as is the knowledge that dietary and sedentary behavior promotes weight gain. Although there is strong suspicion towards the role of epigenetics as a driving force toward disease, this field remains l in the context of obesity. DNA methylation correlation networks were profiled from blood samples of 69 adolescents of two distinct weight-classes; obese (n=35) and normal-weight (n=34). The network analysis revealed major differences in the organization of the networks where the network of the obese had less modularity compared to normal-weight. This is manifested by more and smaller clusters in the obese, pertaining to genes of related functions and pathways, than the network of the normal-weight. Consequently, this suggests that biological pathways have a lower order of coordination between each other in means of DNA methylation in obese than normal-weight. Analysis of highly connected genes, hubs, in the two networks suggests that the difference in coordination between biological pathways may be derived by changes of the methylation pattern of these hubs; highly connected genes in one network had an intriguingly low connectivity in the other. In conclusion, the results suggest differential regulation of transcription through changes in the coordination of DNA methylation in overweight and normal weighted individuals. The findings of this study are a major step towards understanding the role of DNA methylation in obesity and provide potential biomarkers for diagnosing and predicting obesity.
|
Page generated in 0.1109 seconds