361 |
Modélisation et apprentissage de dépendances á l’aide de copules dans les modéles probabilistes latents / Modeling and learning dependencies with copulas in latent topic modelsAmoualian, Hesam 12 December 2017 (has links)
Ce travail de thése a pour objectif de s’intéresser à une classe de modèles hiérarchiques bayesiens, appelés topic models, servant à modéliser de grands corpus de documents et ceci en particulier dans le cas où ces documents arrivent séquentiellement. Pour cela, nous introduisons au Chapitre 3, trois nouveaux modèles prenant en compte les dépendances entre les thèmes relatifs à chaque document pour deux documents successifs. Le premier modèle s’avère être une généralisation directe du modèle LDA (Latent Dirichlet Allocation). On utilise une loi de Dirichlet pour prendre en compte l’influence sur un document des paramètres relatifs aux thèmes sous jacents du document précédent. Le deuxième modèle utilise les copules, outil générique servant à modéliser les dépendances entre variables aléatoires. La famille de copules utilisée est la famille des copules Archimédiens et plus précisément la famille des copules de Franck qui vérifient de bonnes propriétés (symétrie, associativité) et qui sont donc adaptés à la modélisation de variables échangeables. Enfin le dernier modèle est une extension non paramétrique du deuxième. On intègre cette fois ci lescopules dans la construction stick-breaking des Processus de Dirichlet Hiérarchique (HDP). Nos expériences numériques, réalisées sur cinq collections standard, mettent en évidence les performances de notre approche, par rapport aux approches existantes dans la littérature comme les dynamic topic models, le temporal LDA et les Evolving Hierarchical Processes, et ceci à la fois sur le plan de la perplexité et en terme de performances lorsqu’on cherche à détecter des thèmes similaires dans des flux de documents. Notre approche, comparée aux autres, se révèle être capable de modéliser un plus grand nombre de situations allant d’une dépendance forte entre les documents à une totale indépendance. Par ailleurs, l’hypothèse d’échangeabilité sous jacente à tous les topics models du type du LDA amène souvent à estimer des thèmes différents pour des mots relevant pourtant du même segment de phrase ce qui n’est pas cohérent. Dans le Chapitre 4, nous introduisons le copulaLDA (copLDA), qui généralise le LDA en intégrant la structure du texte dans le modèle of the text et de relaxer l’hypothèse d’indépendance conditionnelle. Pour cela, nous supposons que les groupes de mots dans un texte sont reliés thématiquement entre eux. Nous modélisons cette dépendance avec les copules. Nous montrons de manièreempirique l’efficacité du modèle copLDA pour effectuer à la fois des tâches de natureintrinsèque et extrinsèque sur différents corpus accessibles publiquement. Pour compléter le modèle précédent (copLDA), le chapitre 5 présente un modèle de type LDA qui génére des segments dont les thèmes sont cohérents à l’intérieur de chaque document en faisant de manière simultanée la segmentation des documents et l’affectation des thèmes à chaque mot. La cohérence entre les différents thèmes internes à chaque groupe de mots est assurée grâce aux copules qui relient les thèmes entre eux. De plus ce modèle s’appuie tout à la fois sur des distributions spécifiques pour les thèmes reliés à chaque document et à chaque groupe de mots, ceci permettant de capturer les différents degrés de granularité. Nous montrons que le modèle proposé généralise naturellement plusieurs modèles de type LDA qui ont été introduits pour des tâches similaires. Par ailleurs nos expériences, effectuées sur six bases de données différentes mettent en évidence les performances de notre modèle mesurée de différentes manières : à l’aide de la perplexité, de la Pointwise Mutual Information Normalisée, qui capture la cohérence entre les thèmes et la mesure Micro F1 measure utilisée en classification de texte. / This thesis focuses on scaling latent topic models for big data collections, especiallywhen document streams. Although the main goal of probabilistic modeling is to find word topics, an equally interesting objective is to examine topic evolutions and transitions. To accomplish this task, we propose in Chapter 3, three new models for modeling topic and word-topic dependencies between consecutive documents in document streams. The first model is a direct extension of Latent Dirichlet Allocation model (LDA) and makes use of a Dirichlet distribution to balance the influence of the LDA prior parameters with respect to topic and word-topic distributions of the previous document. The second extension makes use of copulas, which constitute a generic tool to model dependencies between random variables. We rely here on Archimedean copulas, and more precisely on Franck copula, as they are symmetric and associative and are thus appropriate for exchangeable random variables. Lastly, the third model is a non-parametric extension of the second one through the integration of copulas in the stick-breaking construction of Hierarchical Dirichlet Processes (HDP). Our experiments, conducted on five standard collections that have been used in several studies on topic modeling, show that our proposals outperform previous ones, as dynamic topic models, temporal LDA and the Evolving Hierarchical Processes,both in terms of perplexity and for tracking similar topics in document streams. Compared to previous proposals, our models have extra flexibility and can adapt to situations where there are no dependencies between the documents.On the other hand, the "Exchangeability" assumption in topic models like LDA oftenresults in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent. In Chapter 4, we propose copulaLDA (copLDA), that extends LDA by integrating part of the text structure to the model and relaxes the conditional independence assumption between the word-specific latent topics given the per-document topic distributions. To this end, we assume that the words of text spans like noun-phrases are topically bound and we model this dependence with copulas. We demonstrate empirically the effectiveness of copLDA on both intrinsic and extrinsic evaluation tasks on several publicly available corpora. To complete the previous model (copLDA), Chapter 5 presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine-grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.
|
362 |
Log analysis aided by latent semantic mappingBuys, Stephanus 14 April 2013 (has links)
In an age of zero-day exploits and increased on-line attacks on computing infrastructure, operational security practitioners are becoming increasingly aware of the value of the information captured in log events. Analysis of these events is critical during incident response, forensic investigations related to network breaches, hacking attacks and data leaks. Such analysis has led to the discipline of Security Event Analysis, also known as Log Analysis. There are several challenges when dealing with events, foremost being the increased volumes at which events are often generated and stored. Furthermore, events are often captured as unstructured data, with very little consistency in the formats or contents of the events. In this environment, security analysts and implementers of Log Management (LM) or Security Information and Event Management (SIEM) systems face the daunting task of identifying, classifying and disambiguating massive volumes of events in order for security analysis and automation to proceed. Latent Semantic Mapping (LSM) is a proven paradigm shown to be an effective method of, among other things, enabling word clustering, document clustering, topic clustering and semantic inference. This research is an investigation into the practical application of LSM in the discipline of Security Event Analysis, showing the value of using LSM to assist practitioners in identifying types of events, classifying events as belonging to certain sources or technologies and disambiguating different events from each other. The culmination of this research presents adaptations to traditional natural language processing techniques that resulted in improved efficacy of LSM when dealing with Security Event Analysis. This research provides strong evidence supporting the wider adoption and use of LSM, as well as further investigation into Security Event Analysis assisted by LSM and other natural language or computer-learning processing techniques. / LaTeX with hyperref package / Adobe Acrobat 9.54 Paper Capture Plug-in
|
363 |
Proposta de sistemática para prevenção de acidentes a partir da avaliação de erros ativos e condições latentesOliveira, Paulo Apelles Camboim de January 2011 (has links)
O objetivo geral desta tese foi conceber uma sistemática para elaborar um plano de prevenção, a partir do delineamento das falhas humanas, com a finalidade de minimizar os acidentes numa organização. Essa sistemática está baseada no pressuposto de que as organizações podem aprender com os acidentes, e que estes não são decorrentes de comportamentos inapropriados dos trabalhadores, mas consequência de um contexto organizacional desfavorável, e nos conceitos provenientes dos erros ativos e das condições latentes, propostos como fatores causais de um acidente. Para se alcançar tal objetivo, foi realizada revisão bibliográfica acerca dos assuntos pertinentes e, a partir deste estudo, foi concebida a proposta inicial da sistemática para, em seguida, submetê-la a um estudo de caso. A revisão de literatura abordou as teorias sobre como os acidentes acontecem, qual a participação do erro humano nestes eventos, quais os tipos de erros, como eles se manifestam e quais as técnicas de prevenção. Além disso, a revisão de literatura permitiu avaliar o Sistema de Análise e Classificação de Fatores Humanos (HFACS), técnica desenvolvida para identificar e classificar os erros humanos, de forma ordenada, percebendo-se que este sistema possui limitações e que as técnicas de prevenção enfatizam ações centradas na segurança operacional, não abrangendo outros níveis na organização. A proposta inicial da sistemática foi concebida em dois módulos: o de Investigação, que visa entender como a organização conduz o processo de análise dos acidentes e determinar os principais erros ativos e as condições latentes, por meio de múltiplas fontes de evidência, baseando-se nas categorias e subcategorias do sistema HFACS e com a utilização de entrevistas com grupos focados e de observação não-participante; e o módulo de Prevenção, o qual procura, juntamente com a equipe gerencial da empresa, determinar ações de prevenção estratégicas para a organização. Com os resultados empíricos obtidos, foi possível avaliar o emprego da sistemática numa concessionária de energia elétrica, detectando-se pontos de melhorias e estabelecendo a versão final da mesma, além de se definir parâmetros de como aplicá-la. Constatou-se, também, que a sistemática possibilita, por meio do cenário dos erros ativos e das condições latentes, visualizar setores que necessitam intervenções na área de segurança, auxiliando, dessa forma, este setor na organização, além de permitir avaliar o desempenho da Gestão do Sistema de Segurança e Saúde do Trabalho (GSST) da empresa. / The object of this thesis was to conceive a framework to develop a prevention plan, based on the outlining of human errors, in order to minimize accidents in organizations. This work is based on the assumption that organizations can learn from accidents, and that these are not due to workers inappropriate behavior, but because of an unfavorable organizational context; and on concepts originated from active errors and latent conditions proposed as casual factors in an accident. To reach such object, we reviewed literature on relevant subjects and from that study the original proposal of the system was conceived and subjected to a case study. The literature review approached the theories on how accidents happen, the role of human errors in such events types of errors concerned, how they manifest themselves in accidents and which are the prevention techniques. In addition, the literature review allowed, an evaluation of the Human Factor Analysis Classification System – HFACS, a framework developed to identify and classify human error, in an orderly manner, but with limitations; and prevention techniques are centered on operational safety, not involving other levels of the organizations. The initial framework proposal was designed in two modules: the Research Module, aiming to understand how the organization conducts the process of analysis of accidents, and to determine the main active errors and latent conditions using multiple sources of evidence based on the categories and subcategories of the HFACS, on interviews applied to focused groups and on non-participant observation; and the Prevention Module, which aims to determine prevention strategies for the organization, together with their management team. With the results attained in the case study, it was possible to evaluate performance the framework in an electric utility company, detect improvement points, establish its final version and set the parameters on how to apply it. It was also noted that, by means of the active errors and the latent condition settings, this framework is able to help the sectors of a company as it displays where assistance in the security field is needed; besides allowing the organization to evaluate the management performance of the Safety and Health at Work System.
|
364 |
A dynamic network model to measure exposure diversification in the Austrian interbank marketHledik, Juraj, Rastelli, Riccardo 08 August 2018 (has links) (PDF)
We propose a statistical model for weighted temporal networks capable
of measuring the level of heterogeneity in a financial system. Our model focuses
on the level of diversification of financial institutions; that is, whether
they are more inclined to distribute their assets equally among partners, or
if they rather concentrate their commitment towards a limited number of
institutions. Crucially, a Markov property is introduced to capture time dependencies
and to make our measures comparable across time. We apply the
model on an original dataset of Austrian interbank exposures. The temporal span encompasses the onset and development of the financial crisis in 2008 as
well as the beginnings of European sovereign debt crisis in 2011. Our analysis
highlights an overall increasing trend for network homogeneity, whereby core
banks have a tendency to distribute their market exposures more equally
across their partners.
|
365 |
Beyond One-Size Fits All: Using Heterogeneous Models to Estimate School Performance in MathematicsMelton, Joshua 01 May 2017 (has links)
This dissertation explored the academic growth in mathematics of a longitudinal cohort of 21,567 Oregon students during middle school on a state accountability test. The student test scores were used to calculate estimates of school performance based on four different accountability models (percent proficient [PP], change in PP, multilevel growth, and growth mixture). On average, 72% of Oregon eighth graders were proficient in mathematics in 2012, 71% in the average school, and 6% more students in this cohort demonstrated mathematics proficiency compared to 2011. The two-level unconditional multilevel growth model estimated the average intercept (Grade 6) to be 228.4 (SE = 0.07) scale score points with an average middle school growth rate of 5.40 scale points per year (SE = 0.02) on the state mathematics test. Student demographic characteristics were a statistically significant improvement on the unconditional model. A major shortcoming of this research, however, was the inability to find successful model convergence for any three-level growth model or any growth mixture model.
A latent class growth analysis was used to uncover groups of students who shared common growth trajectories. A five-latent class solution best represented the data with the lowest BIC and a significant LMR p. Two of the latent classes were students who had high achievement in Grade 6 and demonstrated high growth across middle school and a second group with low sixth grade achievement that had below average growth in middle school. Student-level demographic predictors had statistically significant relations with growth characteristics and latent class membership.
In comparing school performance based on the four different models, it was found that, although statistically correlated, the models of school performance ranked schools differently. A school’s percentage of proficient students in Grade 8 correlated moderately (r = [.60, .70]) with growth over the middle school years as estimated by the growth and LCGA models. About 70% to 80% of schools ranked more than 10 percentiles differently for every pairwise comparison of models. These results, like previous research call into question whether currently used models of school performance produce consistent and valid descriptions of school performance using state test scores.
|
366 |
A Study of Latent Heat of Vaporization in Aqueous NanofluidsJanuary 2015 (has links)
abstract: Nanoparticle suspensions, popularly termed “nanofluids,” have been extensively investigated for their thermal and radiative properties. Such work has generated great controversy, although it is arguably accepted today that the presence of nanoparticles rarely leads to useful enhancements in either thermal conductivity or convective heat transfer. On the other hand, there are still examples of unanticipated enhancements to some properties, such as the reported specific heat of molten salt-based nanofluids and the critical heat flux. Another largely overlooked example is the apparent effect of nanoparticles on the effective latent heat of vaporization (hfg) of aqueous nanofluids. A previous study focused on molecular dynamics (MD) modeling supplemented with limited experimental data to suggest that hfg increases with increasing nanoparticle concentration.
Here, this research extends that exploratory work in an effort to determine if hfg of aqueous nanofluids can be manipulated, i.e., increased or decreased, by the addition of graphite or silver nanoparticles. Our results to date indicate that hfg can be substantially impacted, by up to ± 30% depending on the type of nanoparticle. Moreover, this dissertation reports further experiments with changing surface area based on volume fraction (0.005% to 2%) and various nanoparticle sizes to investigate the mechanisms for hfg modification in aqueous graphite and silver nanofluids. This research also investigates thermophysical properties, i.e., density and surface tension in aqueous nanofluids to support the experimental results of hfg based on the Clausius - Clapeyron equation. This theoretical investigation agrees well with the experimental results. Furthermore, this research investigates the hfg change of aqueous nanofluids with nanoscale studies in terms of melting of silver nanoparticles and hydrophobic interactions of graphite nanofluid. As a result, the entropy change due to those mechanisms could be a main cause of the changes of hfg in silver and graphite nanofluids.
Finally, applying the latent heat results of graphite and silver nanofluids to an actual solar thermal system to identify enhanced performance with a Rankine cycle is suggested to show that the tunable latent heat of vaporization in nanofluilds could be beneficial for real-world solar thermal applications with improved efficiency. / Dissertation/Thesis / Doctoral Dissertation Mechanical Engineering 2015
|
367 |
The Impact of Partial Measurement Invariance on Between-group Comparisons of Latent Means for a Second-Order FactorJanuary 2016 (has links)
abstract: A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both levels for a second-order CFA model. The numbers and directions of differences in noninvariant loadings or intercepts were also manipulated, along with total sample size and effect size of the second-order factor mean difference. Data were analyzed using correct and incorrect specifications of noninvariant loadings and intercepts. Results summarized across the 5,000 replications in each condition included Type I error rates and powers for the chi-square difference test and the Wald test of the second-order factor mean difference, estimation bias and efficiency for this latent mean difference, and means of the standardized root mean square residual (SRMR) and the root mean square error of approximation (RMSEA).
When the model was correctly specified, no obvious estimation bias was observed; when the model was misspecified by constraining noninvariant loadings or intercepts to be equal, the latent mean difference was overestimated if the direction of the difference in loadings or intercepts of was consistent with the direction of the latent mean difference, and vice versa. Increasing the number of noninvariant loadings or intercepts resulted in larger estimation bias if these noninvariant loadings or intercepts were constrained to be equal. Power to detect the latent mean difference was influenced by estimation bias and the estimated variance of the difference in the second-order factor mean, in addition to sample size and effect size. Constraining more parameters to be equal between groups—even when unequal in the population—led to a decrease in the variance of the estimated latent mean difference, which increased power somewhat. Finally, RMSEA was very sensitive for detecting misspecification due to improper equality constraints in all conditions in the current scenario, including the nonzero latent mean difference, but SRMR did not increase as expected when noninvariant parameters were constrained. / Dissertation/Thesis / Masters Thesis Educational Psychology 2016
|
368 |
Time Metric in Latent Difference Score ModelsJanuary 2016 (has links)
abstract: Time metric is an important consideration for all longitudinal models because it can influence the interpretation of estimates, parameter estimate accuracy, and model convergence in longitudinal models with latent variables. Currently, the literature on latent difference score (LDS) models does not discuss the importance of time metric. Furthermore, there is little research using simulations to investigate LDS models. This study examined the influence of time metric on model estimation, interpretation, parameter estimate accuracy, and convergence in LDS models using empirical simulations. Results indicated that for a time structure with a true time metric where participants had different starting points and unequally spaced intervals, LDS models fit with a restructured and less informative time metric resulted in biased parameter estimates. However, models examined using the true time metric were less likely to converge than models using the restructured time metric, likely due to missing data. Where participants had different starting points but equally spaced intervals, LDS models fit with a restructured time metric resulted in biased estimates of intercept means, but all other parameter estimates were unbiased, and models examined using the true time metric had less convergence than the restructured time metric as well due to missing data. The findings of this study support prior research on time metric in longitudinal models, and further research should examine these findings under alternative conditions. The importance of these findings for substantive researchers is discussed. / Dissertation/Thesis / Doctoral Dissertation Psychology 2016
|
369 |
Risk and Protective Factors on Mexican-Origin Youths’ Academic Achievement, Educational Expectations and Postsecondary EnrollmentJanuary 2017 (has links)
abstract: Both theoretical and empirical research has recognized the importance of contextual factors for Mexican-origin youths' educational outcomes. The roles of parents, teachers, and peers have been predictive of Mexican-origin youths' academic achievement, educational expectations, and decision to enroll in postsecondary education. However, few studies have examined the interdependence among sociocultural context characteristics in predicting Mexican-origin youths' educational outcomes. In this dissertation, two studies address this limitation by using a person-centered analytical approach. The first study identified profiles of Mexican-origin youth using culturally relevant family characteristics. The second study identified profiles of Mexican-origin youth using culturally relevant school characteristics. The links between profiles and youths' academic achievement, educational expectations, and postsecondary enrollment were examined in both studies. Overall, this dissertation contributes to the growing body of literature that aims to understand risk and protective processes related to Mexican-origin youths' academic achievement, educational expectations, and postsecondary enrollment. / Dissertation/Thesis / Doctoral Dissertation Family and Human Development 2017
|
370 |
Mechanisms Linking Daily Pain and Depressive Symptoms: The Application of Diary Assessment and Bio-Psycho-Social ProfilingJanuary 2018 (has links)
abstract: Despite the strong link between pain and depressive symptoms, the mechanisms by which they are connected in the everyday lives of individuals with chronic pain are not well understood. In addition, previous investigations have tended to ignore biopsychosocial individual difference factors, assuming that all individuals respond to pain-related experiences and affect in the same manner. The present study tried to address these gaps in the existing literature. Two hundred twenty individuals with Fibromyalgia completed daily diaries during the morning, afternoon, and evening for 21 days. Findings were generally consistent with the hypotheses. Multilevel structural equation modeling revealed that morning pain and positive and negative affect are uniquely associated with morning negative pain appraisal, which in turn, is positively related to pain’s activity interference in the afternoon. Pain’s activity interference was the strongest predictor of evening depressive symptoms. Latent profile analysis using biopsychosocial measures identified three theoretically and clinically important subgroups (i.e., Low Functioning, Normative, and High Functioning groups). Although the daily pain-depressive symptoms link was not significantly moderated by these subgroups, individuals in the High Functioning group reported the lowest levels of average morning pain, negative affect, negative pain appraisal, afternoon pain’s activity interference, and evening depressive symptoms, and the highest levels of average morning positive affect across 21 days relative to the other two groups. The Normative group fared better on all measures than did the Low Functioning group. The findings of the present study suggest the importance of promoting morning positive affect and decreasing negative affect in disconnecting the within-day pain-depressive symptoms link, as well as the potential value of tailoring chronic pain interventions to those individuals who are in the greatest need. / Dissertation/Thesis / Doctoral Dissertation Psychology 2018
|
Page generated in 0.085 seconds