391 |
A Quality Criteria Based Evaluation of Topic ModelsSathi, Veer Reddy, Ramanujapura, Jai Simha January 2016 (has links)
Context. Software testing is the process, where a particular software product, or a system is executed, in order to find out the bugs, or issues which may otherwise degrade its performance. Software testing is usually done based on pre-defined test cases. A test case can be defined as a set of terms, or conditions that are used by the software testers to determine, if a particular system that is under test operates as it is supposed to or not. However, in numerous situations, test cases can be so many that executing each and every test case is practically impossible, as there may be many constraints. This causes the testers to prioritize the functions that are to be tested. This is where the ability of topic models can be exploited. Topic models are unsupervised machine learning algorithms that can explore large corpora of data, and classify them by identifying the hidden thematic structure in those corpora. Using topic models for test case prioritization can save a lot of time and resources. Objectives. In our study, we provide an overview of the amount of research that has been done in relation to topic models. We want to uncover various quality criteria, evaluation methods, and metrics that can be used to evaluate the topic models. Furthermore, we would also like to compare the performance of two topic models that are optimized for different quality criteria, on a particular interpretability task, and thereby determine the topic model that produces the best results for that task. Methods. A systematic mapping study was performed to gain an overview of the previous research that has been done on the evaluation of topic models. The mapping study focused on identifying quality criteria, evaluation methods, and metrics that have been used to evaluate topic models. The results of mapping study were then used to identify the most used quality criteria. The evaluation methods related to those criteria were then used to generate two optimized topic models. An experiment was conducted, where the topics generated from those two topic models were provided to a group of 20 subjects. The task was designed, so as to evaluate the interpretability of the generated topics. The performance of the two topic models was then compared by using the Precision, Recall, and F-measure. Results. Based on the results obtained from the mapping study, Latent Dirichlet Allocation (LDA) was found to be the most widely used topic model. Two LDA topic models were created, optimizing one for the quality criterion Generalizability (TG), and one for Interpretability (TI); using the Perplexity, and Point-wise Mutual Information (PMI) measures respectively. For the selected metrics, TI showed better performance, in Precision and F-measure, than TG. However, the performance of both TI and TG was comparable in case of Recall. The total run time of TI was also found to be significantly high than TG. The run time of TI was 46 hours, and 35 minutes, whereas for TG it was 3 hours, and 30 minutes.Conclusions. Looking at the F-measure, it can be concluded that the interpretability topic model (TI) performs better than the generalizability topic model (TG). However, while TI performed better in precision, Conclusions. Looking at the F-measure, it can be concluded that the interpretability topic model (TI) performs better than the generalizability topic model (TG). However, while TI performed better in precision, recall was comparable. Furthermore, the computational cost to create TI is significantly higher than for TG. Hence, we conclude that, the selection of the topic model optimization should be based on the aim of the task the model is used for. If the task requires high interpretability of the model, and precision is important, such as for the prioritization of test cases based on content, then TI would be the right choice, provided time is not a limiting factor. However, if the task aims at generating topics that provide a basic understanding of the concepts (i.e., interpretability is not a high priority), then TG is the most suitable choice; thus making it more suitable for time critical tasks.
|
392 |
Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classificationHalmann, Marju January 2017 (has links)
Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy.
|
393 |
Multivariate Ordinal Regression Models: An Analysis of Corporate Credit RatingsHirk, Rainer, Hornik, Kurt, Vana, Laura 01 1900 (has links) (PDF)
Correlated ordinal data typically arise from multiple measurements on a collection of subjects. Motivated by an application in credit risk, where multiple credit rating agencies assess the creditworthiness of a firm on an ordinal scale, we consider multivariate ordinal models with a latent variable specification and correlated error terms. Two different link functions are employed, by assuming a multivariate normal and a multivariate logistic distribution for the latent variables underlying the ordinal outcomes. Composite likelihood methods, more specifically the pairwise and tripletwise likelihood approach, are applied for estimating the model parameters. We investigate how sensitive the pairwise likelihood estimates are to the number of subjects and to the presence of observations missing completely at random, and find that these estimates are robust for both link functions and reasonable sample size. The empirical application consists of an analysis of corporate credit ratings from the big three credit rating agencies (Standard & Poor's, Moody's and Fitch). Firm-level and stock price data for publicly traded US companies as well as an incomplete panel of issuer credit ratings are collected and analyzed to illustrate the proposed framework. / Series: Research Report Series / Department of Statistics and Mathematics
|
394 |
Oncolytic Viruses as a Potential Approach to Eliminate Cells That Constitute the Latent HIV ReservoirRanganath, Nischal 03 April 2018 (has links)
HIV infection represents a major health and socioeconomic challenge worldwide. Despite significant advances in therapy, a cure for HIV continues to be elusive. The design of novel curative strategies will require targeting and elimination of cells that constitute the latent HIV-1 reservoir. However, such an approach is impeded by the inability to distinguish latently HIV-infected cells from uninfected cells.
The type-I interferon (IFN-I) response is an integral antiviral defense mechanism, but is impaired at multiple levels during productive HIV infection. Interestingly, similar global impairments in IFN-I signaling have been observed in various human cancers. This led to the development of IFN-sensitive oncolytic viruses, including the recombinant Vesicular Stomatitis Virus (VSV 51) and Maraba virus (MG1), as virotherapy designed to treat various cancers.
Based on this, it was hypothesized that IFN-I signaling is impaired in latently HIV-infected cells (as observed in productively infected cells) and that VSV 51 and MG1 may be able to exploit such intracellular defects to target and eliminate latently HIV-infected cells, while sparing healthy cells. First, using cell line models of HIV-1 latency, intracellular defects in IFN-I responses, including impaired IFN / production and expression of IFNAR1, MHC-I, ISG15, and PKR, were demonstrated to represent an important feature of latently HIV-infected cells. Consistent with this, the latently HIV-infected cell lines were observed to have a greater sensitivity to VSV 51 and MG1 infection, and MG1-mediated killing, than the HIV-uninfected parental cells.
Next, the ability of oncolytic viruses to kill latently HIV-infected human primary cells was demonstrated using an in vitro resting CD4+ T cell model of latency. Interestingly, while both VSV 51 and MG1 infection resulted in a significant reduction in inducible p24 expression, a dose-dependent decrease in integrated HIV-1 DNA was only observed following MG1 infection. In keeping with this, MG1 infection of memory CD4+ T cells from HIV-1 infected individuals on HAART also resulted in a significant decrease in inducible HIV-1 gag RNA expression.
By targeting an intracellular pathway that is impaired in latently HIV-infected cells, the findings presented in this dissertation highlight a novel, proof-of-concept approach to eliminate the latent HIV-1 reservoir. Given that VSV 51 and MG1 are currently being studied in cancer clinical trials, there is significant potential to translate this work to in vivo studies.
|
395 |
Diseño, desarrollo y evaluación de un algoritmo para detectar sub-comunidades traslapadas usando análisis de redes sociales y minería de datosMuñoz Cancino, Ricardo Luis January 2013 (has links)
Magíster en Gestión de Operaciones / Ingeniero Civil Industrial / Los sitios de redes sociales virtuales han tenido un enorme crecimiento en la última década. Su principal objetivo es facilitar la creación de vínculos entre personas que, por ejemplo, comparten intereses, actividades, conocimientos, o conexiones en la vida real. La interacción entre los usuarios genera una comunidad en la red social.
Existen varios tipos de comunidades, se distinguen las comunidades de interés y práctica. Una comunidad de interés es un grupo de personas interesadas en compartir y discutir un tema de interés particular. En cambio, en una comunidad de práctica las personas comparten una preocupación o pasión por algo que ellos hacen y aprenden cómo hacerlo mejor. Si las interacciones se realizan por internet, se les llama comunidades virtuales (VCoP/VCoI por sus siglas en inglés). Es común que los miembros compartan solo con algunos usuarios formando así subcomunidades, pudiendo pertenecer a más de una. Identificar estas subestructuras es necesario, pues allí se generan las interacciones para la creación y desarrollo del conocimiento de la comunidad. Se han diseñado muchos algoritmos para detectar subcomunidades. Sin embargo, la mayoría de ellos detecta subcomunidades disjuntas y además, no consideran el contenido generado por los miembros de la comunidad. El objetivo principal de este trabajo es diseñar, desarrollar y evaluar un algoritmo para detectar subcomunidades traslapadas mediante el uso de análisis de redes sociales (SNA) y Text Mining.
Para ello se utiliza la metodología SNA-KDD propuesta por Ríos et al. [79] que combina Knowledge Discovery in Databases (KDD) y SNA. Ésta fue aplicada sobre dos comunidades virtuales, Plexilandia (VCoP) y The Dark Web Portal (VCoI). En la etapa de KDD se efectuó el preprocesamiento de los posts de los usuarios, para luego aplicar Latent Dirichlet Allocation (LDA), que permite describir cada post en términos de tópicos. En la etapa SNA se construyeron redes filtradas con la información obtenida en la etapa anterior. A continuación se utilizaron dos algoritmos desarrollados en esta tesis, SLTA y TPA, para encontrar subcomunidades traslapadas.
Los resultados muestran que SLTA logra un desempeño, en promedio, un 5% superior que el mejor algoritmo existente cuando es aplicado sobre una VCoP. Además, se encontró que la calidad de la estructura de sub-comunidades detectadas aumenta, en promedio, un 64% cuando el filtro semántico es aumentado. Con respecto a TPA, este algoritmo logra, en promedio, una medida de modularidad de 0.33 mientras que el mejor algoritmo existente 0.043 cuando es aplicado sobre una VCoI. Además la aplicación conjunta de nuestros algoritmos parece mostrar una forma de determinar el tipo de comunidad que se está analizando. Sin embargo, esto debe ser comprobado analizando más comunidades virtuales.
|
396 |
Profiles of Trauma Exposure and Biopsychosocial Health among Sex Trafficking Survivors: Exploring Differences in Help-Seeking Attitudes and IntentionsRuhlman, Lauren January 1900 (has links)
Doctor of Philosophy / School of Family Studies and Human Services / Briana S. Goff / Human sex trafficking is a complex and unique phenomenon involving the commercial sexual exploitation (CSE) of persons by means of force, fraud, or coercion. The purpose of this study was to investigate unique patterns of trauma exposure and biopsychosocial health among a sample of CSE survivors. Results from a latent profile analysis with 135 adults trafficked in the United States yielded three distinct survivor sub-groups: mildly distressed, moderately distressed, and severely distressed. The mildly distressed class (18.5%) was characterized by the lowest reports of trauma exposure and an absence of clinically significant psycho-social stress symptoms. The moderately distressed class (48.89%) endorsed comparatively medial levels of trauma exposure, as well as clinically significant disturbance in six domains of psycho-social health. The severely distressed class (32.59%) reported the highest degree of trauma exposure and exhibited clinically significant symptoms of pervasive psycho-social stress across all domains assessed. To better understand variation in CSE survivors’ engagement with formal support services, this study also examined differences in help-seeking attitudes and intentions between latent classes. Results indicated that compared to those in the mildly and moderately distressed classes, severely distressed survivors endorsed significantly more unfavorable attitudes toward seeking professional help, along with no intention to seek help from any source when facing a personal or emotional crisis. Findings from this study provide a snapshot of significant heterogeneity in trauma exposure and biopsychosocial health among CSE survivors, as well as associated differences in help-seeking attitudes and intentions. The identification of distinct survivor sub-groups in these and future analyses mark an important intermediate step toward developing empirically-testable support services that are specifically designed to meet the unique needs of CSE survivors.
|
397 |
Zdrojové faktory indexů ekonomické svobody / Factors of economical freedom indicesOndruš, Martin January 2015 (has links)
This work discusses the detection of latent variables, which create indices of economic freedom. Firstly, we present the most well-known indices of economic freedom (IEF, EFW). Secondly, this work discusses multivariate statistical method - factor analysis, which we use to detect latent variables. We show different methods of estimates in factor analysis and we focus on principal factor method. Furthermore, we compare already defined methods by analysing the structure of EFW index. According to estimated models, we interpret detected latent variables. We use statistical software SPSS and R for factor analysis of EFW index.
|
398 |
Food safety, perceptions and preferences : empirical studies on risks, responsibility, trust, and consumer choicesErdem, Seda January 2011 (has links)
This thesis addresses various food safety issues and investigates them from an economic perspective within four different, but related, studies. The studies are intended to provide policy-makers and other decision-makers in the industry with valuable information that will help them to implement better mitigation strategies and policies. The studies also present some applications of advancements in choice modelling, and thus contribute to the literature. To address these issues, various surveys were conducted in the UK.The first study investigates different stakeholder groups’ perceptions of responsibility among the stages of the meat chain for ensuring the meat they eat does not cause them to become ill, and how this differed with food types. The means by which this is achieved is novel, as we elicit stakeholders’ relative degrees of responsibility using the Best-Worst Scaling (BWS) technique. BWS is particularly useful because it avoids the necessity of ranking a large set of items, which people have been found to struggle with. The results from this analysis reveal a consistent pattern among respondents of downplaying the extent of their own responsibility. The second study explores people’s perceptions of various food and non-food risks within a framework characterised by the level of control that respondents believe they have over the risks, and the level of worry that the risks prompt. The means by which this is done differs from past risk perception analyses in that it questions people directly regarding their relative assessments of the levels of control and worry over the risks presented. The substantive analysis of the risk perceptions has three main foci concerning the relative assessment of (i) novel vs. more familiar risks, (ii) food vs. non-food risks, (iii) differences in the risk perceptions across farmers and consumers, with a particular orientation on E. coli. The third study investigates consumers’ willingness to pay (WTP) for reductions in the level foodborne health risk achieved by (1) nanotechnology and (2) less controversial manners in the food system. The difference between consumers’ valuations provides an implicit value for nanotechnology. This comparison is achieved via a split sample Discrete Choice Experiment study. Valuations of the risk reductions are derived from conditional, heteroskedastic conditional, mixed, and heteroscedastic mixed logit models. General results show the existence of heterogeneity in British consumers’ preferences and variances, and that the value of nanotechnology differs for different types of consumers. The fourth study investigates consumers’ perceptions of trust in institutions to provide information about nanotechnology and its use in food production and packaging. It is shown how the use of BWS and Latent Class modelling of survey data can provide in-depth information on consumer categories useful for the design of effective public policy, which in turn would allow the development of best practice in risk communication for novel technologies. Results show heterogeneity in British consumers’ preferences. Three distinct consumer segments are identified: Class-1, who trust “government institutions and scientists” most; Class-2, who trust “non-profit organisations and environmental groups” most; and Class-3, who trust “food producers and handlers, and media” most.
|
399 |
Exploring the latent structure of IT employees’ intention to resign in South AfricaLe Roux, Mark January 2013 (has links)
One of the major challenges facing South African IT organisations today is the
dramatic shortage of IT professionals. Both literature and business sentiment have
indicated that employee turnover within the IT sector is on a continually rising trend.
The ramifications of these high turnover rates translate into exorbitant direct and
indirect costs to organisations. The purpose of this research was to identify the factors
pertaining to the underlying structure of the turnover intention of these employees. A
deeper understanding of these drivers may possibly enable management to reduce the
turnover intention of employees within their organisations.
A quantitative, multi-disciplinary research approach, focussing on the antecedents of
turnover intention and the three systemic levels of organisational behaviour (micro,
meso and macro) was used to operationalise the main research construct of this study.
Data was collected by means of an anonymous self-administered web-based survey.
A sample of 188 completed questionnaires was collected using a snowball sampling
technique from the population of employees in the IT industry in South Africa. A
statistical data reduction method, exploratory factor analysis, was conducted on the
dataset to determine the underlying nature of the construct, IT employees’ perceived
intention to resign from employment.
After an appropriate number of factor analytic rounds, a robust 4-factor model of the
data set was established. The results indicated that the factor, Personal Enrichment
from Management Support, possibly plays the most significant role in understanding,
monitoring, and managing IT employees’ perceived intention to resign from
employment. The study provided support that monetary factors had the most
significant influence in an employee’s decision to join an organisation; however, nonmonetary
benefits, such as job satisfaction and skills development, were found to be
more effective in retaining employees. The practical implications uncovered from this
study will enable management to gain further insight into understanding the underlying
factors and drivers of turnover intention and thereby minimise its impact on the
organisation. / Dissertation (MBA)--University of Pretoria, 2013. / lmgibs2014 / Gordon Institute of Business Science (GIBS) / MBA / Unrestricted
|
400 |
The Effects of Age and Gender on Pedestrian Traffic Injuries: A Random Parameters and Latent Class AnalysisRaharjo, Tatok Raharjo 21 June 2016 (has links)
Pedestrians are vulnerable road users because they do not have any protection while they walk. They are unlike cyclists and motorcyclists who often have at least helmet protection and sometimes additional body protection (in the case of motorcyclists with body-armored jackets and pants). In the US, pedestrian fatalities are increasing and becoming an ever larger proportion of overall roadway fatalities (NHTSA, 2016), thus underscoring the need to study factors that influence pedestrian-injury severity and potentially develop appropriate countermeasures. One of the critical elements in the study of pedestrian-injury severities is to understand how injuries vary across age and gender ‒ two elements that have been shown to be critical injury determinants in past research. In the current research effort, 4829 police-reported pedestrian crashes from Chicago in 2011 and 2012 are used to estimate multinomial logit, mixed logit, and latent class logit models to study the effects of age and gender on resulting injury severities in pedestrian crashes. The results from these model estimations show that the injury severity level for older males, younger males, older females, and younger females are statistically different. Moreover, the overall findings also show that older males and older females are more likely to have higher injury-severity levels in many instances (if a crash occurs on city streets, state maintained urban roads, the primary cause of the crash is failing to yield right-of way, pedestrian entering/ leaving/ crossing is not at intersection, road surface condition is dry, and road functional class is a local road or street). The findings suggest that well-designed and well-placed crosswalks, small islands in two-way streets, narrow streets, clear road signs, provisions for resting places, and wide, flat sidewalks all have the potential to result in lower pedestrian-injury severities across age/gender combinations.
|
Page generated in 0.0744 seconds