101 |
Comparing Three Effect Sizes for Latent Class AnalysisGranado, Elvalicia A. 12 1900 (has links)
Traditional latent class analysis (LCA) considers entropy R2 as the only measure of effect size. However, entropy may not always be reliable, a low boundary is not agreed upon, and good separation is limited to values of greater than .80. As applications of LCA grow in popularity, it is imperative to use additional sources to quantify LCA classification accuracy. Greater classification accuracy helps to ensure that the profile of the latent classes reflect the profile of the true underlying subgroups. This Monte Carlo study compared the quantification of classification accuracy and confidence intervals of three effect sizes, entropy R2, I-index, and Cohen’s d. Study conditions included total sample size, number of dichotomous indicators, latent class membership probabilities (γ), conditional item-response probabilities (ρ), variance ratio, sample size ratio, and distribution types for a 2-class model. Overall, entropy R2 and I-index showed the best accuracy and standard error, along with the smallest confidence interval widths. Results showed that I-index only performed well for a few cases.
|
102 |
Tag recommendation using Latent Dirichlet Allocation.Choubey, Rahul January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Doina Caragea / The vast amount of data present on the internet calls for ways to label and organize this data according to specific categories, in order to facilitate search and browsing activities.
This can be easily accomplished by making use of folksonomies and user provided tags.
However, it can be difficult for users to provide meaningful tags. Tag recommendation
systems can guide the users towards informative tags for online resources such as websites, pictures, etc. The aim of this thesis is to build a system for recommending tags to URLs available through a bookmark sharing service, called BibSonomy. We assume that the URLs for which we recommend tags do not have any prior tags assigned to them.
Two approaches are proposed to address the tagging problem, both of them based on
Latent Dirichlet Allocation (LDA) Blei et al. [2003]. LDA is a generative and probabilistic
topic model which aims to infer the hidden topical structure in a collection of documents.
According to LDA, documents can be seen as mixtures of topics, while topics can be seen as mixtures of words (in our case, tags). The first approach that we propose, called topic words based approach, recommends the top words in the top topics representing a resource as tags for that particular resource. The second approach, called topic distance based approach, uses the tags of the most similar training resources (identified using the KL-divergence Kullback and Liebler [1951]) to recommend tags for a test untagged resource.
The dataset used in this work was made available through the ECML/PKDD Discovery
Challenge 2009. We construct the documents that are provided as input to LDA in two
ways, thus producing two different datasets. In the first dataset, we use only the description and the tags (when available) corresponding to a URL. In the second dataset, we crawl the URL content and use it to construct the document. Experimental results show that the LDA approach is not very effective at recommending tags for new untagged resources. However, using the resource content gives better results than using the description only. Furthermore,
the topic distance based approach is better than the topic words based approach, when only the descriptions are used to construct documents, while the topic words based approach works better when the contents are used to construct documents.
|
103 |
Mathematical modelling of malaria transmission and pathogenesisOkrinya, Aniayam January 2015 (has links)
In this thesis we will consider two mathematical models on malaria transmission and patho- genesis. The transmission model is a human-mosquito interaction model that describes the development of malaria in a human population. It accounts for the various phases of the disease in humans and mosquitoes, together with treatment of both sick and partially im- mune humans. The partially immune humans (termed asymptomatic) have recovered from the worst of the symptoms, but can still transmit the disease. We will present a mathematical model consisting of a system of ordinary differential equations that describes the evolution of humans and mosquitoes in a range of malarial states. A new feature, in what turns out to be a key class, is the consideration of reinfected asymptomatic humans. The analysis will include establishment of the basic reproduction number, R0, and asymptotic analysis to draw out the major timescale of events in the process of malaria becoming non-endemic to endemic in a region following introduction of a few infected mosquitoes. We will study the model to ascertain possible time scale in which intervention programmes may yield better results. We will also show through our analysis of the model some evidence of disease control and possible eradication. The model on malaria pathogenesis describes the evolution of the disease in the human host. We model the effect of immune response on the interaction between malaria parasites and erythrocytes with a system of delay differential equations in which there is time lag between the advent of malaria merozoites in the blood and the training of adaptive immune cells. We will study the model to ascertain whether or not a single successful bite of an infected mosquito would result in death in the absence of innate and adaptive immune response. Stability analysis will be carried out on the parasite free state in both the immune and non immune cases. We will also do numerical simulations on the model to track the development of adaptive immunity and use asymptotic methods, assuming a small delay to study the evolution of the disease in a naive individual following the injection of small amount of merozoites into the blood stream. The effect of different levels of innate immune response to the pathogenesis of the disease will be considered in the simulations to elicit a possible immune level that can serve as a guide to producing a vaccine with high efficacy level.
|
104 |
Social Tag-based Community Recommendation Using Latent Semantic AnalysisAkther, Aysha 07 September 2012 (has links)
Collaboration and sharing of information are the basis of modern social web system. Users in the social web systems are establishing and joining online communities, in order to collectively share their content with a group of people having common topic of interest. Group or community activities have increased exponentially in modern social Web systems. With the explosive growth of social communities, users of social Web systems have experienced considerable difficulty with discovering communities relevant to their interests. In this study, we address the problem of recommending communities to individual users. Recommender techniques that are based solely on community affiliation, may fail to find a wide range of proper communities for users when their available data are insufficient. We regard this problem as tag-based personalized searches. Based on social tags used by members of communities, we first represent communities in a low-dimensional space, the so-called latent semantic space, by using Latent Semantic Analysis. Then, for recommending communities to a given user, we capture how each community is relevant to both user’s personal tag usage and other community members’ tagging patterns in the latent space. We specially focus on the challenging problem of recommending communities to users who have joined very few communities or having no prior community membership. Our evaluation on two heterogeneous datasets shows that our approach can significantly improve the recommendation quality.
|
105 |
Towards generic relation extractionHachey, Benjamin January 2009 (has links)
A vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g., PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database that can be more effectively used for querying and automated reasoning. However, adapting conventional relation extraction systems to new domains or tasks requires significant effort from annotators and developers. Furthermore, previous adaptation approaches based on bootstrapping start from example instances of the target relations, thus requiring that the correct relation type schema be known in advance. Generic relation extraction (GRE) addresses the adaptation problem by applying generic techniques that achieve comparable accuracy when transferred, without modification of model parameters, across domains and tasks. Previous work on GRE has relied extensively on various lexical and shallow syntactic indicators. I present new state-of-the-art models for GRE that incorporate governordependency information. I also introduce a dimensionality reduction step into the GRE relation characterisation sub-task, which serves to capture latent semantic information and leads to significant improvements over an unreduced model. Comparison of dimensionality reduction techniques suggests that latent Dirichlet allocation (LDA) – a probabilistic generative approach – successfully incorporates a larger and more interdependent feature set than a model based on singular value decomposition (SVD) and performs as well as or better than SVD on all experimental settings. Finally, I will introduce multi-document summarisation as an extrinsic test bed for GRE and present results which demonstrate that the relative performance of GRE models is consistent across tasks and that the GRE-based representation leads to significant improvements over a standard baseline from the literature. Taken together, the experimental results 1) show that GRE can be improved using dependency parsing and dimensionality reduction, 2) demonstrate the utility of GRE for the content selection step of extractive summarisation and 3) validate the GRE claim of modification-free adaptation for the first time with respect to both domain and task. This thesis also introduces data sets derived from publicly available corpora for the purpose of rigorous intrinsic evaluation in the news and biomedical domains.
|
106 |
Policing priorities in London : do borough characteristics make a difference?Norris, Paul Andrew January 2009 (has links)
Much current discourse around policing in the UK stresses the need for a partnership between the police and public and, in particular, the need for the police to be responsive to the concerns of local communities. It is argued that appearing responsive to local needs, and showing a willingness to consult the public in the process of decision making, is likely to increase support for the police. Despite this, detailed analysis of the public’s preferences for policing remains relatively sparse. This thesis uses data from the 2003-04 Metropolitan Police’s Public Attitude Survey (PAS) to consider whether survey data can provide a useful indication of a respondent’s preferences, and how these preferences may vary depending on the characteristics of respondents and the boroughs in which they live. This thesis argues that rather than simply considering some overall measure of the level of policing individuals would like to see, or investigating attitudes towards different functions of the police individually, a more interesting and complete view of preferences for policing can be developed by looking at the mix of policing that individuals best believe will meet their needs. Additionally, it will be shown that differences in respondents’ preferences can be related to both the characteristics of individuals and the nature of the boroughs in which they live. It will be suggested that some of these relationships provide evidence that respondents favour a mix of policing they believe will protect them from perceived threats and reflect their perception of the police’s role within society. In addition, this thesis provides an example of how the techniques of Factor Analysis and Latent Class Analysis can provide greater insight into the data collected in large scale surveys. It is suggested that responses provided to different questions are often related and may represent a more general underlying attitude held by the respondent. It is also argued that using techniques which can handle multilevel data will provide greater explanatory depth by suggesting how a respondent’s attitude may be influenced by the context in which they live. The analysis presented offers new insights into the public’s priorities for policing and demonstrates the worth of the statistical methods employed. However it is, to some extent, limited by the form of the questions within the PAS dataset and by the lack of information about the thought process underlying a respondent’s answers. These concerns will be discussed, along with suggestions for future research.
|
107 |
Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic ModelingShen, Yao 08 1900 (has links)
Conventional pattern recognition systems have two components: feature analysis and pattern classification. For any object in an image, features could be considered as the major characteristic of the object either for object recognition or object tracking purpose. Features extracted from a training image, can be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable scene analysis, it is important that the features extracted from the training image are detectable even under changes in image scale, noise and illumination. Scale invariant feature has wide applications such as image classification, object recognition and object tracking in the image processing area. In this thesis, color feature and SIFT (scale invariant feature transform) are considered to be scale invariant feature. The classification, recognition and tracking result were evaluated with novel evaluation criterion and compared with some existing methods. I also studied different types of scale invariant feature for the purpose of solving scene analysis problems. I propose probabilistic models as the foundation of analysis scene scenario of images. In order to differential the content of image, I develop novel algorithms for the adaptive combination for multiple features extracted from images. I demonstrate the performance of the developed algorithm on several scene analysis tasks, including object tracking, video stabilization, medical video segmentation and scene classification.
|
108 |
An Investigation of Factors Affecting Test Equating in Latent Trait TheorySuanthong, Surintorn 08 1900 (has links)
The study investigated five factors which can affect the equating of scores from two tests onto a common score scale. The five factors studied were: (a) distribution type (i.e., normal versus uniform); (b) standard deviation of itemdifficulties (i.e., .68, .95, .99); (c) test length or number of test items (i.e., 50,100, 200); (d) number of common items (i.e., 10,20,30); and (e) sample size (i.e., 100, 300, 500). The significant two-way interaction effects were for common item length and test length, standard deviation of item difficulties and distribution type, and standard deviation of item difficulties and sample size.
|
109 |
ExploringWeakly Labeled Data Across the Noise-Bias SpectrumFisher, Robert W. H. 01 April 2016 (has links)
As the availability of unstructured data on the web continues to increase, it is becoming increasingly necessary to develop machine learning methods that rely less on human annotated training data. In this thesis, we present methods for learning from weakly labeled data. We present a unifying framework to understand weakly labeled data in terms of bias and noise and identify methods that are well suited to learning from certain types of weak labels. To compensate for the tremendous sizes of weakly labeled datasets, we leverage computationally efficient and statistically consistent spectral methods. Using these methods, we present results from four diverse, real-world applications coupled with a unifying simulation environment. This allows us to make general observations that would not be apparent when examining any one application on its own. These contributions allow us to significantly improve prediction when labeled data is available, and they also make learning tractable when the cost of acquiring annotated data is prohibitively high.
|
110 |
Deconstructing Anesthesia Handoffs During Simulated Intraoperative Anesthesia CareLowe, Jason S. 01 January 2015 (has links)
Anesthesia patient handoffs are a vulnerable time for patient care and handoffs occur frequently during anesthesia care. Communication failures contribute to patient harm during anesthesia patient handoffs. The Joint Commission has recognized the potential for communication failure during patient handoffs and has recommended processes to improve handoff safety. Handoffs are made more difficult by latent conditions such as time constraints, pressure and distractions, which often result in incomplete or inaccurate handoff reports. This nonexperimental, correlation study identified the latent conditions that occur during the handoff process and their relationship to the quality of the handoff. This research shows an inverse relationship between latent conditions and anesthesia patient handoff scores. The number of latent conditions and the types of latent conditions affected handoff scores. Handoffs that were not interactive or handoffs with unsafe timing predictably resulted in poor handoff communication. Clinicians must acknowledge that handoffs are a high-risk event that can result in patient harm. Clear and effective communication is key to safe, quality care and this includes being aware of and minimizing the impact of latent conditions during the anesthesia patient handoff.
|
Page generated in 0.0727 seconds