Global ETD Search

71	Multiset Model Selection and Averaging, and Interactive Storytelling Maiti, Dipayan 23 August 2012 (has links) The Multiset Sampler [Leman et al., 2009] has previously been deployed and developed for efficient sampling from complex stochastic processes. We extend the sampler and the surrounding theory to model selection problems. In such problems efficient exploration of the model space becomes a challenge since independent and ad-hoc proposals might not be able to jointly propose multiple parameter sets which correctly explain a new pro- posed model. In order to overcome this we propose a multiset on the model space to en- able efficient exploration of multiple model modes with almost no tuning. The Multiset Model Selection (MSMS) framework is based on independent priors for the parameters and model indicators on variables. We show that posterior model probabilities can be easily obtained from multiset averaged posterior model probabilities in MSMS. We also obtain typical Bayesian model averaged estimates for the parameters from MSMS. We apply our algorithm to linear regression where it allows easy moves between parame- ter modes of different models, and in probit regression where it allows jumps between widely varying model specific covariance structures in the latent space of a hierarchical model. The Storytelling algorithm [Kumar et al., 2006] constructs stories by discovering and con- necting latent connections between documents in a network. Such automated algorithms often do not agree with user's mental map of the data. Hence systems that incorporate feedback through visual interaction from the user are of immediate importance. We pro- pose a visual analytic framework in which such interactions are naturally incorporated in to the existing Storytelling algorithm through a redefinition of the latent topic space used in the similarity measure of the network. The document network can be explored us- ing the newly learned normalized topic weights for each document. Hence our algorithm augments the limitations of human sensemaking capabilities in large document networks by providing a collaborative framework between the underlying model and the user. Our formulation of the problem is a supervised topic modeling problem where the supervi- sion is based on relationships imposed by the user as a set of inequalities derived from tolerances on edge costs from inverse shortest path problem. We show a probabilistic modeling of the relationships based on auxiliary variables and propose a Gibbs sampling based strategy. We provide detailed results from a simulated data and the Atlantic Storm data set. / Ph. D. supervised topic modeling visual analytics bayesian model averaging Bayesian mode selection
72	The Impact of Varied Knowledge on Innovation and the Fate of Organizations Asgari, Elham 02 August 2019 (has links) In my dissertation, I examine varied types of knowledge and how they contribute to innovation generation and selection at both the firm and the industry level using the emerging industry context of small satellites. My research is divided into three papers. In Paper One, I take a supply-demand perspective and examine how suppliers of technology—with their unique knowledge of science and technology—and users of technology—with their unique knowledge of demand—contribute to innovation generation and selection over the industry lifecycle. Results show that the contributions of suppliers and users vary based on unique aspects of innovation, such as novelty, breadth, and coherence – and also over the industry life cycle. In Paper Two, I study how firms overcome science-business tension in their pursuit of novel innovation. I examine unique aspects of knowledge: scientists' business knowledge and CEOs' scientific knowledge. I show that CEOs' scientific knowledge is an important driver of firms' novel pursuits and that this impact is higher when scientists do not have business knowledge. In the third paper, I further examine how scientists with high technological and scientific knowledge—i.e., star scientists—impact firm innovation generation and selection. With a focus on explorative and exploitative innovation, I develop theory on the boundary conditions of stars' impact on firm level outcomes. I propose that individual level contingencies—i.e., stage of employment—and organizational level contingencies—explorative or exploitative innovation—both facilitate and hinder stars' impact on firms' innovative pursuits. / Doctor of Philosophy / In my dissertation, I study innovation at both the firm level and the industry level using the emerging industry context of small satellites. My dissertation divides into three papers. In Paper One, I study unique aspects of innovation at the industry level taking a supply-demand perspective. Since novelty, breadth, and convergence of innovation are all important drivers of the emergence and evolution of industries, I examine how supply side or demand side actors contribute to unique aspects of innovation over the industry life cycle. Results suggest that both suppliers and users of technology make important contributions to innovation, however, their respective contributions vary to novelty, breadth, and convergence of innovation. This impact varies over the industry life cycle. In Paper Two, I study how firms pursue novel innovation as main creator of economic value for firms. Firms need both scientific and technological knowledge in their pursuit of novel innovation. However, firms often struggle to overcome science-business tensions. Focusing on CEOs and scientists as two main drivers of innovation, I study how CEOs’ scientific knowledge and scientists’ business knowledge help firms overcome business-science tension. Results suggest that the likelihood of firms’ novel pursuit is higher when CEOs have scientific knowledge and scientists do not have business knowledge. In Paper Three, I further examine how high-performing scientists—i.e., star scientists—impact explorative and exploitative innovation. I propose that the stage of employment of individuals and goal context of firms are important contingencies that impact how stars impact firm level innovation. Innovation Demand-Side Perspective Industry Evolution Scientific Knowledge Business Knowledge Small Satellite Industry Topic Modeling
73	News Analytics for Global Infectious Disease Surveillance Ghosh, Saurav 29 November 2017 (has links) Traditional disease surveillance can be augmented with a wide variety of open sources, such as online news media, twitter, blogs, and web search records. Rapidly increasing volumes of these open sources are proving to be extremely valuable resources in helping analyze, detect, and forecast outbreaks of infectious diseases, especially new diseases or diseases spreading to new regions. However, these sources are in general unstructured (noisy) and construction of surveillance tools ranging from real-time disease outbreak monitoring to construction of epidemiological line lists involves considerable human supervision. Intelligent modeling of such sources using text mining methods such as, topic models, deep learning and dependency parsing can lead to automated generation of the mentioned surveillance tools. Moreover, real-time global availability of these open sources from web-based bio-surveillance systems, such as HealthMap and WHO Disease Outbreak News (DONs) can aid in development of generic tools which will be applicable to a wide range of diseases (rare, endemic and emerging) across different regions of the world. In this dissertation, we explore various methods of using internet news reports to develop generic surveillance tools which can supplement traditional surveillance systems and aid in early detection of outbreaks. We primarily investigate three major problems related to infectious disease surveillance as follows. (i) Can trends in online news reporting monitor and possibly estimate infectious disease outbreaks? We introduce approaches that use temporal topic models over HealthMap corpus for detecting rare and endemic disease topics as well as capturing temporal trends (seasonality, abrupt peaks) for each disease topic. The discovery of temporal topic trends is followed by time-series regression techniques to estimate future disease incidence. (ii) In the second problem, we seek to automate the creation of epidemiological line lists for emerging diseases from WHO DONs in a near real-time setting. For this purpose, we formulate Guided Epidemiological Line List (GELL), an approach that combines neural word embeddings with information extracted from dependency parse-trees at the sentence level to extract line list features. (iii) Finally, for the third problem, we aim to characterize diseases automatically from HealthMap corpus using a disease-specific word embedding model which were subsequently evaluated against human curated ones for accuracies. / Ph. D. Infectious Disease Surveillance HealthMap WHO DONs Temporal Topic Modeling Guided Epidemiological Line List Word Embeddings
74	Identifying Job Categories and Required Competencies for Instructional Technologist: A Text Mining and Content Analysis Chen, Le 06 July 2020 (has links) This study applied both human-based and computer-based techniques to conduct a job analysis in the field of instructional technology. The primary research focus of the job analysis was to examine the efficacy of text mining by comparing text mining results with content analysis results. This agenda was fulfilled by using job announcement data as an example to determine essential job categories and required competencies. In phase one, a job title analysis was conducted. Different categorizing strategies were explored, and primary job categories were reported. In phase two, the human-based content analysis was conducted, which identified 20 competencies in the knowledge domain, 22 in the ability domain, 23 in the skill domain, and 13 other competencies. In phase three, text mining (topic modeling) was applied to the entire data set, resulting in 50 themes. From these 50 themes, the researcher selected 20 themes that were most relevant to instructional technology competencies. The findings of the two research techniques differ in terms of granularity, comprehensibility, and objectivity. Based on evidence revealed in the current study, the author recommends that future studies explore ways to combine the two techniques to complement one another. / Doctor of Philosophy / According to Kimmons and Veletsianos (2018), text mining has not been widely applied in the field of instructional technology. This study provides an example of using text mining techniques to discover a set of required job competencies. It can be helpful to researchers unfamiliar with text mining methodology, allowing them to understand its potentials and limitations better. The primary research focus was to examine the efficacy of text mining by comparing text mining results with content analysis results. Both content analysis and text mining procedures were applied to the same data set to extract job competencies. Similarities and differences between the results were compared, and the pros and cons of each methodology were discussed. text mining content analysis job analysis competency T-LAB topic modeling
75	Product Design For Repairability: Identifying Failure Modes With Topic Modeling And Designing For Electronic Waste Reuse Franz, Claire J 01 June 2024 (has links) (PDF) Design for repairability is imperative to making products that last long enough to justify the resources they consume and the pollution they generate. While design for repairability has been gaining steady momentum, especially with recent advances in Right to Repair legislation, there is still work to be done. There are gaps in both the tools available for repair-conscious designers and the products coming onto store shelves. This thesis work aims to help set sails in the right direction on both fronts. This research explores the use of topic modeling (a natural language processing technique) to extract repairability design insights from online customer feedback. This could help repair-conscious designers identify areas for redesign to improve product repairability and/or prioritize components to provide as available replacement parts. Additionally, designers could apply this methodology early in their design process by examining the failure modes of similar existing products. Non-Negative Matrix Factorization (NMF) and BERTopic approaches are used to analyze 5,000 Amazon reviews for standalone computer keyboards to assess device failure modes. The proposed method identifies several failure modes for keyboards, including keys sticking, legs breaking, keyboards disconnecting, keyboard bases wobbling, and errors while typing. An accelerated product design process for a keyboard is presented to showcase an application of the topic modeling results, as well as to demonstrate the potential for product design that uses a “piggybacking” design strategy to reuse electronic components. This work indicates that topic modeling is a promising approach for obtaining repairability-related design leads and demonstrates the efficacy of product design to reduce e-waste. Design For Repairability Product Design E-waste Electronic Waste Reuse Topic Modeling Sustainability Other Mechanical Engineering
76	Discovering Hidden Networks Using Topic Modeling Cooper, Wyatt 01 January 2017 (has links) This paper explores topic modeling via unsupervised non-negative matrix factorization. This technique is used on a variety of sources in order to extract salient topics. From these topics, hidden entity networks are discovered and visualized in a graph representation. In addition, other visualization techniques such as examining the time series of a topic and examining the top words of a topic are used for evaluation and analysis. There is a large software component to this project, and so this paper will also focus on the design decisions that were made in order to make the program developed as versatile and extensible as possible. Topic Modeling Computer Science Natural Language Processing Non-negative Matrix Factorization Artificial Intelligence and Robotics Other Computer Sciences Software Engineering
77	Mathematical Modeling of Public Opinion using Traditional and Social Media Cody, Emily 01 January 2016 (has links) With the growth of the internet, data from text sources has become increasingly available to researchers in the form of online newspapers, journals, and blogs. This data presents a unique opportunity to analyze human opinions and behaviors without soliciting the public explicitly. In this research, I utilize newspaper articles and the social media service Twitter to infer self-reported public opinions and awareness of climate change. Climate change is one of the most important and heavily debated issues of our time, and analyzing large-scale text surrounding this issue reveals insights surrounding self-reported public opinion. First, I inquire about public discourse on both climate change and energy system vulnerability following two large hurricanes. I apply topic modeling techniques to a corpus of articles about each hurricane in order to determine how these topics were reported on in the post event news media. Next, I perform sentiment analysis on a large collection of data from Twitter using a previously developed tool called the "hedonometer". I use this sentiment scoring technique to investigate how the Twitter community reports feeling about climate change. Finally, I generalize the sentiment analysis technique to many other topics of global importance, and compare to more traditional public opinion polling methods. I determine that since traditional public opinion polls have limited reach and high associated costs, text data from Twitter may be the future of public opinion polling. environmental communications human behavior opinion polling sentiment analysis social media topic modeling Applied Mathematics Climate Social and Behavioral Sciences
78	Nonparametric Bayesian Dictionary Learning and Count and Mixture Modeling Zhou, Mingyuan January 2013 (has links) <p>Analyzing the ever-increasing data of unprecedented scale, dimensionality, diversity, and complexity poses considerable challenges to conventional approaches of statistical modeling. Bayesian nonparametrics constitute a promising research direction, in that such techniques can fit the data with a model that can grow with complexity to match the data. In this dissertation we consider nonparametric Bayesian modeling with completely random measures, a family of pure-jump stochastic processes with nonnegative increments. In particular, we study dictionary learning for sparse image representation using the beta process and the dependent hierarchical beta process, and we present the negative binomial process, a novel nonparametric Bayesian prior that unites the seemingly disjoint problems of count and mixture modeling. We show a wide variety of successful applications of our nonparametric Bayesian latent variable models to real problems in science and engineering, including count modeling, text analysis, image processing, compressive sensing, and computer vision.</p> / Dissertation Electrical engineering Statistics Computer science Bayesian Nonparametrics Count Modeling Dictionary Learning Mixture Modeling Negative Binomial Process Topic Modeling
79	Novel document representations based on labels and sequential information Kim, Seungyeon 21 September 2015 (has links) A wide variety of text analysis applications are based on statistical machine learning techniques. The success of those applications is critically affected by how we represent a document. Learning an efficient document representation has two major challenges: sparsity and sequentiality. The sparsity often causes high estimation error, and text's sequential nature, interdependency between words, causes even more complication. This thesis presents novel document representations to overcome the two challenges. First, I employ label characteristics to estimate a compact document representation. Because label attributes implicitly describe the geometry of dense subspace that has substantial impact, I can effectively resolve the sparsity issue while only focusing the compact subspace. Second, while modeling a document as a joint or conditional distribution between words and their sequential information, I can efficiently reflect sequential nature of text in my document representations. Lastly, the thesis is concluded with a document representation that employs both labels and sequential information in a unified formulation. The following four criteria are utilized to evaluate the goodness of representations: how close a representation is to its original data, how strongly a representation can be distinguished from each other, how easy to interpret a representation by a human, and how much computational effort is needed for a representation. While pursuing those good representation criteria, I was able to obtain document representations that are closer to the original data, stronger in discrimination, and easier to be understood than traditional document representations. Efficient computation algorithms make the proposed approaches largely scalable. This thesis examines emotion prediction, temporal emotion analysis, modeling documents with edit histories, locally coherent topic modeling, and text categorization tasks for possible applications. Representation learning Topic modeling Supervised learning Sequential document modeling Sentiment analysis Mood analysis Matrix factorization Machine learning Artificial intelligence
80	Finding early signals of emerging trends in text through topic modeling and anomaly detection Redyuk, Sergey January 2018 (has links) Trend prediction has become an extremely popular practice in many industrial sectors and academia. It is beneficial for strategic planning and decision making, and facilitates exploring new research directions that are not yet matured. To anticipate future trends in academic environment, a researcher needs to analyze an extensive amount of literature and scientific publications, and gain expertise in the particular research domain. This approach is time-consuming and extremely complicated due to abundance of data and its diversity. Modern machine learning tools, on the other hand, are capable of processing tremendous volumes of data, reaching the real-time human-level performance for various applications. Achieving high performance in unsupervised prediction of emerging trends in text can indicate promising directions for future research and potentially lead to breakthrough discoveries in any field of science. This thesis addresses the problem of emerging trend prediction in text in two main steps: it utilizes HDP topic model to represent latent topic space of a given temporal collection of documents, DBSCAN clustering algorithm to detect groups with high-density regions in the document space potentially leading to emerging trends, and applies KLdivergence in order to capture deviating text which might indicate birth of a new not-yet-seen phenomenon. In order to empirically evaluate the effectiveness of the proposed framework and estimate its predictive capability, both synthetically generated corpora and real-world text collections from arXiv.org, an open-access electronic archive of scientific publications (category: Computer Science), and NIPS publications are used. For synthetic data, a text generator is designed which provides ground truth to evaluate the performance of anomaly detection algorithms. This work contributes to the body of knowledge in the area of emerging trend prediction in several ways. First of all, the method of incorporating topic modeling and anomaly detection algorithms for emerging trend prediction is a novel approach and highlights new perspectives in the subject area. Secondly, the three-level word-document-topic topology of anomalies is formalized in order to detect anomalies in temporal text collections which might lead to emerging trends. Finally, a framework for unsupervised detection of early signals of emerging trends in text is designed. The framework captures new vocabulary, documents with deviating word/topic distribution, and drifts in latent topic space as three main indicators of a novel phenomenon to occur, in accordance with the three-level topology of anomalies. The framework is not limited by particular sources of data and can be applied to any temporal text collections in combination with any online methods for soft clustering. Machine learning text mining topic modeling emerging trend prediction novelty detection group anomaly detection Computer Sciences Datavetenskap (datalogi)

Search results