Global ETD Search

1	Conditional Multifactorial Contingency (CMC) Model and Its Applications Cheng, Zuolin 17 January 2023 (has links) In biology and bioinformatics, a variety of data share a common property that challenges numerous cutting-edge research studies: heterogeneities at the individual level with respect to more than one factor. Examples of such heterogeneities include but are not limited to: 1) unequal susceptibility of different patients, and 2) large diversity in gene length, GC content, etc., along with the resulting gene characteristics. For many biological data analysis studies, the critical first step is usually to infer null probability distribution of observed data with the heterogeneities in multiple (confounding) factors taken into account, so that we can further investigate the impact of other factor(s) of interest. Obviously, the heterogeneities heavily influence the potential conclusions that we may draw from statistical analyses of the data. However, modeling such heterogeneities has been challenging, not only due to the inapplicable explicit modeling of all factors with heterogeneous effects on the data, but also because of the non-independence of many factors from one another. Existing methods, either partially/fully neglected the heterogeneity issue at all, or took care of each factor's heterogeneity in isolation. Evidences have shown the insufficiency of such strategies and the errors they may produce in downstream analyses. The emergence of large-scale data sets provides the opportunity to directly and comprehensively learn the heterogeneity from the data without explicitly modeling the mechanisms behind or exerting strong assumptions. The data, as often stored or organized as multidimensional contingency tensors, lead to a natural perspective of modeling heterogeneity with each impact factor of interest being one dimension. The heterogeneity in each factor's impact on the variable of interest can be captured by the marginal property of the data tensor with respect to the corresponding dimension. For instance, in a single-cell sequencing dataset, which can be organized as a matrix with each row representing a gene and each column representing a cell, the heterogeneity caused by both the gene and cell factors can be modeled. In this dissertation, we develop a novel model, Conditional Multifactorial Contingency (CMC), that models the intertwined heterogeneities in all dimensions of the data tensor and infers the probability distribution of each entry of the data tensor jointly conditioned on these heterogeneities. In the proposed CMC model, the problem is formulated as a maximum entropy problem of the contingency tensor's probability distribution subject to the marginal constraints, under the assumption that the individuals within each dimension are independent. The marginal constraints are applied to the expected value instead of observed trial outcomes, which plays a key role in avoiding the innumerable combinations of trial outcomes and leading to an elegant expression form of the entry's probability distribution. The model is first developed for 3D binary data matrix, then extended to multidimensional data tensors and integer data tensors. Furthermore, missing values are taken into account and CMC is extended to be compatible with data with missing values. Being empowered by CMC, we conducted four case studies for real-world bioinformatics research problems: (1) driving transcription factor (TF) identification; (2) scRNA-seq data normalization; (3) cancer-associated gene identification; (4) cell similarity quantification. For each of these case studies, we proposed a whole analysis framework and specific adaptation design for CMC. For the driving-TF identification, compared with traditional methods, we considered the variations in the gene's binding affinity in addition to the typically considered variations in TF's binding affinity. The driving TFs were identified by comparing the observed binding state and the estimated binding probability conditioned on TF/gene binding affinities. For the scRNA-seq data normalization, besides gene factor and cell factor, we figured out one more factor impacting the read counts, cDNA length, and applied CMC to comprehensively analyze the three factors. For cancer-associated gene identification, the CMC model is applied to systematically model the patient, gene, and mutation type factors in the mutation count data. As for the last application, to the best of our knowledge, our solution is the first proposed cell-to-cell-type similarity quantification method, thanks to the availability of CMC to systematically model and remove the impact of cell and gene factors. We studied the theoretical properties of the proposed model and validated the effectiveness and efficiency of our method through experiments. The uniqueness of the probability solution and the convergence of the algorithm was proved. In the endeavor to identify true driving TFs, CMC significantly boosted the best record of success rate, which was proved using data with ground truth. Besides, in an exploratory study without ground truth, in addition to the previously known TFs, Olig1 (ranks 2nd), Olig2 (ranks 3rd), and Sox10 (ranks 4th), we successfully identified Ppp1r14b (ranks 1st) and Zfp36l1 (ranks 6th) that function in oligodendrocyte lineage development, which was validated via biological knock-out experiments and, has led to genuine biological discoveries. In the scRNA-seq data normalization, experimental results show that, by taking the cell, gene, and cDNA-length factors into account, the normalized data achieves lower variances for housekeeping genes than the peer methods. Besides, the data normalized by the CMC model leads to better accuracy of downstream DEG detection than that normalized by peer normalization methods. In cancer-associated gene identification, the CMC model is able to eliminate most of the likely artefactual findings resulted by considering the hidden factors separately. In the cell similarity quantification, CMC based model enables the identification of cell types by establishing between-species cell similarity quantification, regardless of contamination in scRNA-seq data. / Doctor of Philosophy / Biological data are complicated and typically influenced by numerous factors, including characteristics of biological subjects, physical or chemical properties of molecules, artifacts created by experimental operations, and so on. The information of real interest in a biology/bioinformatics study can be buried in all sorts of irrelevant factors and their impacts on the data. Consider a simple example where a study is conducted to figure out if an association exists between a specific gene and a cancer. Although this gene shows obviously different frequencies of mutation in two groups of people, patients and the normal, we cannot safely confirm the association from this observation. Such differential mutation levels can also be a result of the diversity among all these people in how easily this gene is mutated in a person (related to many characteristics of this person besides "cancer/not"). We call this diversity "heterogeneity", and it actually can be seen everywhere, in people, in genes, in cells, and in cell types, etc. One needs to take good care of such heterogeneities so as to draw firm statistical hence scientific conclusions. However, handling the heterogeneities is far from trivial. On the one hand, it is generally impossible to fully understand the mechanisms behind those diversities, let alone to explicitly and rigorously formulate them. One the other hand, it is not rare that multiple factors intertwine with one another, in which case all these factors must be considered systematically in order to model the data precisely. Existing methods, either partially/fully neglected the heterogeneity issue at all, or took care of each factor's heterogeneity in isolation. Evidences have shown the insufficiency of such strategies and the errors they may produce in downstream analyses. As the exact mechanisms behind heterogeneities are usually not available, we aim to learn and infer the heterogeneities' effects on data from data itself. A large group of biological data can be stored or organized as multidimensional contingency tensors, with each impact factor of interest being one dimension. The heterogeneity in each factor's impact on the variable of interest can be captured by the marginal property of the data tensor with respect to the corresponding dimension, for example, the row sum and the column sum in a 2D tensor. In this dissertation, under the assumption that the individuals of each dimension are independent, we proposed a novel model, Conditional Multifactorial Contingency (CMC), that models the intertwined heterogeneities in all dimensions of the data tensor and infers the probability distribution of each entry of the data tensor jointly conditioned on these heterogeneities. The eventual and most comprehensive version of CMC can work on multidimensional binary or integer data tensors, even in cases where some values in the tensor are missing. CMC was initiated from elegant and simple statistical principles, derived through rigorous theoretical proofs, but ended up as a powerful tool being widely applicable to real-world biology/bioinformatics studies. Being empowered by CMC, we conducted four case studies for real-world bioinformatics research problems: (1) driving transcription factor (TF) identification; (2) scRNA-seq data normalization; (3) cancer-associated gene identification; (4) cell similarity quantification. For each of these case studies, we proposed a whole analysis framework and specific adaptation design for CMC. In each of them, our method based on CMC outperformed existing methods and provided inspiring clues for biological discoveries, which have been validated by biological experiments. tensor heterogeneity multiple factors
2	[en] USE OF MULTI-FATORIAL MODEL OF BARRA TO FORECAST STOCK RETURNS / [pt] UTILIZAÇÃO DO MODELO MULTI-FATORIAL DA CONSULTORIA BARRA NA PREVISÃO DE RETORNO DE AÇÕES FREDERICO FERREIRA SARMENTO 25 July 2002 (has links) [pt] Esta pesquisa tem como objetivo principal estimar e analisar previsões dos retornos das ações utilizandoo modelo multi-fatorial desenvolvido pela empresa de consultoria BARRA.Para tanto, foram empregadas três metodologias no cálculo das projeções dos retornos dos fatores contra mudanças inesperadas em variáveis macroeonômicas.Tais projeções foram, então, traduzidas em previsões dos retornos das ações. A análise dos resultados obtidos indica que as previsões geradas contém informações úteis na identificação dos movimentos relativos nos preços das ações. / [en] The main objective of this work is to estimate stocks return forecasts using the BARRA multiple factor model developed for the brazilian market. Three methodologies were applied to estimate the projection of the factors return. The first on is based on a moving average approach and the other two are based on regressions of the factors return against unexpected changes in some macroeconomic variables. These projections were then translated into forecasts for stocks return. Theresults show that the obtained forecasts have useful information to identify relative movement on stock prices. [pt] MERCADO DE ACOES [pt] PREVISAO DE RETORNO DE ACOES [pt] MODELOS MULTI-FATORIAIS [en] ACTIONS MARKET [en] STOCKS RETURN FORECAST [en] MULTIPLE FACTORS MODEL
3	Situation analysis of perceptions on comprehensiveness of rape prevention interventions by implementing agencies in Addis Ababa Difabachew Setegn Hailegeorgis 02 1900 (has links) Abstract in English, Xhosa and Afrikaans / The victimization of women and children represents one of the public health problems deserving urgent attention in Ethiopia, making the prevention of rape in all its forms a matter of vital importance. The purpose of the study was mainly to describe the extent of rape prevention interventions in Addis Ababa and examine efforts to assist rape survivors based on the perceptions of professionals working for organizations operating in this context. The study had a further purpose of identifying difficulties faced by government institutions and making suitable recommendations for the improvement of rape prevention interventions and programs in the future. A qualitative descriptive research approach was adopted mainly involving in-depth interviews for primary data collection. The study involved 14 research participants purposively selected from five government institutions. The study findings indicated Gandhi Memorial Hospital to be the only institution in Ethiopia implementing an integrated rape prevention intervention. Efforts were directed largely at secondary prevention, with little attention being paid to primary prevention. Recommendations included tackling the multiple factors influencing rape at different levels of the social-ecological model simultaneously through the implementation, strengthening, and intensification of well-designed, comprehensive rape prevention interventions and programs. / Ukuxhatshazwa kwabafazi nabantwana e-Ethiopia kufana nenye yeengxaki zempilo kwaye kudinga ukuthathelwa ingqalelo ngokungxamisekileyo. Oku kwenza ukuba ukuthintela ukudlwengulwa ngazo zonke iindlela kube ngumbandela obaluleke kakhulu. Injongo yesi sifundo ibikukucacisa iindlela zokuthintela ukudlwengulwa eAddis Ababa, nokuvavanya imizamo yokunceda abo bakhe badlwengulwa, ngokokubona kwabo basebenzela amaqumrhu aququzelela lo msebenzi. Enye injongo yesi sifundo ibikukuchonga ubunzima obufunyanwa ngamaziko aseburhulumenteni ajongene neli candelo ukuze kunikwe iingcebiso zokuphucula amacebo neenkqubo zokuthintela ukudlwengulwa. Kuqhutywe uhlobo lophando lomgangatho nolucacisayo, apho kuqokelelwe iinkcukacha zolwazi ngokwenza udliwano ndlebe olunzulu. Kusetyenzwe nabathathi nxaxheba abali-14 abakhethwe ngobuchule kumaziko aseburhulumenteni ama-5. Okufunyaniswe sesi sifundo kubonakalise ukuba isibhedlele esiyiGandhi Memorial siso sodwa esinenkqubo elungelelaniswe kakuhle yokuthintela ukudlwengulwa. Imigudu yokhukhusela ijoliswe ekuncedeni kwiziqhamo zodlwengulo nasekufundiseni ngodlwengulo (secondary prevention) hayi kudlwengulo ngqo (primary prevention). Amacebiso esifundo aquka ukulwa neemeko eziphembelela udlwengulo olwenzeka kumazinga ahlukeneyo oluntu, ngaxeshanye nokuqinisa ukusetyenziswa kweenkqubo eziqulunqwe kakuhle zokuthintela udlwengulo. / Die viktimisering van vroue en kinders is een van talle kwessies in die openbare gesondheid van Ethiopië wat dringend aandag vereis, aangesien die voorkoming van verkragting in enige vorm van die allergrootste belang is. Die doel van hierdie studie was om die omvang te bepaal van intervensies om verkragting in Addis Abeba te voorkom, en om die hulp wat aan verkragtingslagoffers verleen word, te ondersoek op grond van die belewenisse van beroepslui wat in hierdie verband vir organisasies werk. Hierdie studie het dit verder ten doel gehad om die probleme aan te toon waarmee staatsinstellings in hierdie opsig te kampe het, en om beter intervensies en programme vir die voorkoming van verkragting aan te beveel. ŉ Kwalitatiewe en deskriptiewe navorsingsbenadering is gevolg. Dit het omvattende onderhoude behels waartydens primêre data versamel is. Altesame 14 deelnemers by vyf staatsinstellings is vir hierdie doel gekies. Volgens die bevindings is die Gandhi Gedenkhospitaal die enigste instelling in Ethiopië wat ŉ geïntegreerde program vir die voorkoming van verkragting ingestel het. Sekondêre voorkoming geniet voorrang, terwyl primêre voorkoming min aandag geniet. Daar word aanbeveel dat tegelykertyd werk gemaak word van die veelvuldige faktore wat verkragting op verskillende vlakke van die sosiaal-ekologiese model beïnvloed. Dit moet gedoen word deur deeglik ontwerpte, omvattende intervensies en programme om verkragting te voorkom in werking te stel, uit te bou en te verskerp. / Sociology / M.A. (Sociology) Rape Prevention interventions Comprehensiveness of rape prevention Social-ecological model Multiple factors influencing rape Primary prevention Udlwengulo Amacebo okuthintela Iimeko ezininzi eziphembelela udlwengulo Uthintelo ngqo Verkragting Voorkomingsintervensies Omvattendheid van verkragtingsvoorkoming Sosiaal-ekologiese model Primêre voorkoming 364.409633

1

Page generated in 0.0799 seconds