Global ETD Search

1	Conditional Multifactorial Contingency (CMC) Model and Its Applications Cheng, Zuolin 17 January 2023 (has links) In biology and bioinformatics, a variety of data share a common property that challenges numerous cutting-edge research studies: heterogeneities at the individual level with respect to more than one factor. Examples of such heterogeneities include but are not limited to: 1) unequal susceptibility of different patients, and 2) large diversity in gene length, GC content, etc., along with the resulting gene characteristics. For many biological data analysis studies, the critical first step is usually to infer null probability distribution of observed data with the heterogeneities in multiple (confounding) factors taken into account, so that we can further investigate the impact of other factor(s) of interest. Obviously, the heterogeneities heavily influence the potential conclusions that we may draw from statistical analyses of the data. However, modeling such heterogeneities has been challenging, not only due to the inapplicable explicit modeling of all factors with heterogeneous effects on the data, but also because of the non-independence of many factors from one another. Existing methods, either partially/fully neglected the heterogeneity issue at all, or took care of each factor's heterogeneity in isolation. Evidences have shown the insufficiency of such strategies and the errors they may produce in downstream analyses. The emergence of large-scale data sets provides the opportunity to directly and comprehensively learn the heterogeneity from the data without explicitly modeling the mechanisms behind or exerting strong assumptions. The data, as often stored or organized as multidimensional contingency tensors, lead to a natural perspective of modeling heterogeneity with each impact factor of interest being one dimension. The heterogeneity in each factor's impact on the variable of interest can be captured by the marginal property of the data tensor with respect to the corresponding dimension. For instance, in a single-cell sequencing dataset, which can be organized as a matrix with each row representing a gene and each column representing a cell, the heterogeneity caused by both the gene and cell factors can be modeled. In this dissertation, we develop a novel model, Conditional Multifactorial Contingency (CMC), that models the intertwined heterogeneities in all dimensions of the data tensor and infers the probability distribution of each entry of the data tensor jointly conditioned on these heterogeneities. In the proposed CMC model, the problem is formulated as a maximum entropy problem of the contingency tensor's probability distribution subject to the marginal constraints, under the assumption that the individuals within each dimension are independent. The marginal constraints are applied to the expected value instead of observed trial outcomes, which plays a key role in avoiding the innumerable combinations of trial outcomes and leading to an elegant expression form of the entry's probability distribution. The model is first developed for 3D binary data matrix, then extended to multidimensional data tensors and integer data tensors. Furthermore, missing values are taken into account and CMC is extended to be compatible with data with missing values. Being empowered by CMC, we conducted four case studies for real-world bioinformatics research problems: (1) driving transcription factor (TF) identification; (2) scRNA-seq data normalization; (3) cancer-associated gene identification; (4) cell similarity quantification. For each of these case studies, we proposed a whole analysis framework and specific adaptation design for CMC. For the driving-TF identification, compared with traditional methods, we considered the variations in the gene's binding affinity in addition to the typically considered variations in TF's binding affinity. The driving TFs were identified by comparing the observed binding state and the estimated binding probability conditioned on TF/gene binding affinities. For the scRNA-seq data normalization, besides gene factor and cell factor, we figured out one more factor impacting the read counts, cDNA length, and applied CMC to comprehensively analyze the three factors. For cancer-associated gene identification, the CMC model is applied to systematically model the patient, gene, and mutation type factors in the mutation count data. As for the last application, to the best of our knowledge, our solution is the first proposed cell-to-cell-type similarity quantification method, thanks to the availability of CMC to systematically model and remove the impact of cell and gene factors. We studied the theoretical properties of the proposed model and validated the effectiveness and efficiency of our method through experiments. The uniqueness of the probability solution and the convergence of the algorithm was proved. In the endeavor to identify true driving TFs, CMC significantly boosted the best record of success rate, which was proved using data with ground truth. Besides, in an exploratory study without ground truth, in addition to the previously known TFs, Olig1 (ranks 2nd), Olig2 (ranks 3rd), and Sox10 (ranks 4th), we successfully identified Ppp1r14b (ranks 1st) and Zfp36l1 (ranks 6th) that function in oligodendrocyte lineage development, which was validated via biological knock-out experiments and, has led to genuine biological discoveries. In the scRNA-seq data normalization, experimental results show that, by taking the cell, gene, and cDNA-length factors into account, the normalized data achieves lower variances for housekeeping genes than the peer methods. Besides, the data normalized by the CMC model leads to better accuracy of downstream DEG detection than that normalized by peer normalization methods. In cancer-associated gene identification, the CMC model is able to eliminate most of the likely artefactual findings resulted by considering the hidden factors separately. In the cell similarity quantification, CMC based model enables the identification of cell types by establishing between-species cell similarity quantification, regardless of contamination in scRNA-seq data. / Doctor of Philosophy / Biological data are complicated and typically influenced by numerous factors, including characteristics of biological subjects, physical or chemical properties of molecules, artifacts created by experimental operations, and so on. The information of real interest in a biology/bioinformatics study can be buried in all sorts of irrelevant factors and their impacts on the data. Consider a simple example where a study is conducted to figure out if an association exists between a specific gene and a cancer. Although this gene shows obviously different frequencies of mutation in two groups of people, patients and the normal, we cannot safely confirm the association from this observation. Such differential mutation levels can also be a result of the diversity among all these people in how easily this gene is mutated in a person (related to many characteristics of this person besides "cancer/not"). We call this diversity "heterogeneity", and it actually can be seen everywhere, in people, in genes, in cells, and in cell types, etc. One needs to take good care of such heterogeneities so as to draw firm statistical hence scientific conclusions. However, handling the heterogeneities is far from trivial. On the one hand, it is generally impossible to fully understand the mechanisms behind those diversities, let alone to explicitly and rigorously formulate them. One the other hand, it is not rare that multiple factors intertwine with one another, in which case all these factors must be considered systematically in order to model the data precisely. Existing methods, either partially/fully neglected the heterogeneity issue at all, or took care of each factor's heterogeneity in isolation. Evidences have shown the insufficiency of such strategies and the errors they may produce in downstream analyses. As the exact mechanisms behind heterogeneities are usually not available, we aim to learn and infer the heterogeneities' effects on data from data itself. A large group of biological data can be stored or organized as multidimensional contingency tensors, with each impact factor of interest being one dimension. The heterogeneity in each factor's impact on the variable of interest can be captured by the marginal property of the data tensor with respect to the corresponding dimension, for example, the row sum and the column sum in a 2D tensor. In this dissertation, under the assumption that the individuals of each dimension are independent, we proposed a novel model, Conditional Multifactorial Contingency (CMC), that models the intertwined heterogeneities in all dimensions of the data tensor and infers the probability distribution of each entry of the data tensor jointly conditioned on these heterogeneities. The eventual and most comprehensive version of CMC can work on multidimensional binary or integer data tensors, even in cases where some values in the tensor are missing. CMC was initiated from elegant and simple statistical principles, derived through rigorous theoretical proofs, but ended up as a powerful tool being widely applicable to real-world biology/bioinformatics studies. Being empowered by CMC, we conducted four case studies for real-world bioinformatics research problems: (1) driving transcription factor (TF) identification; (2) scRNA-seq data normalization; (3) cancer-associated gene identification; (4) cell similarity quantification. For each of these case studies, we proposed a whole analysis framework and specific adaptation design for CMC. In each of them, our method based on CMC outperformed existing methods and provided inspiring clues for biological discoveries, which have been validated by biological experiments. tensor heterogeneity multiple factors
2	Expatriate adjustment revisited : an exploration of the factors explaining expatriate adjustment in MNCs and UN organizations in Egypt Khedr, Wessam January 2011 (has links) This thesis aims to understand the relative influence of institutional, cultural and organizational factors on the adjustment of the United Nations’ (UN) and multinational companies’ expatriates in Egypt. The research makes a contribution to the field of expatriate research through its application of the institutional lens in examining the factors impacting on adjustment; and through testing a traditional adjustment model in an under-researched host context. As a result of the research this thesis proposes a new framework for understanding the factors impacting on adjustment which adopts a contingency perspective and incorporates a stronger focus on institutional determinants and the organisational infrastructure supporting the management of expatriates. The study relies, for its theoretical basis, on certain cultural and organizational factors borrowed from the expatriate literature, in addition to introducing other factors (mainly institutional factors) which have not been previously examined in the literature as predictors of adjustment. The research questions the utility of these organizational, cultural and institutional factors, especially those from traditional models, when applied to relatively new national and organizational contexts, the Egyptian national context and the United Nations organizational context. Both contexts are under-researched areas in the expatriate adjustment literature and in the international human resources management literature in general. The Arab cultural context introduces many differences to the Anglo-Saxon and European context, more traditionally the subject of research studies and thus it provides an opportunity for testing the wider application of expatriate models. Equally the UN is a highly multicultural organisational context with a socio-political mission which is highly distinct from the ‘for profit’ based multinational. Thus both these contextual factors offer fertile ground for the further development of a framework for understanding expatriate adjustment during contemporary times. In addition, the novelty of the context brings to the fore the opportunity for examining the utility of institutional theory as an alternative or complement to cultural theory as a way of understanding the factors influencing expatriate adjustment. In terms of the method, the research relies mainly on quantitative data obtained by surveying expatriates in multinational and United Nations organizations working in Egypt. In addition a qualitative technique (interviews) was used to aid questionnaire development and data contextualization. The results highlight the role of institutional measures in explaining expatriate adjustment. The evidence suggests that the institutional variables provide additional explanatory power beyond that provided by traditional factors studies. However, the research also demonstrates that the institutional measures do not replace the cultural measures and therefore there is not a substitution factor at work. Rather, we would argue that the institutional lens provides additional understanding and is tapping into other factors not already captured through measures of culture. The research puts forward a contingency model incorporating additional organisational and institutional variables which are often overlooked or underemphasised in some of the traditional organisational focused models. 342.08
3	A Study on Contingency Learning in Introductory Physics Concepts Scaife, Thomas Mark 16 December 2010 (has links) No description available. Physics Science Education Physics Education Research Contingency Model Control-of-variable Causal Learning

1

Page generated in 0.0716 seconds