Global ETD Search

21	Fifty Years of Information Management Research: A Conceptual Structure Analysis using Structural Topic Modeling Sharma, A., Rana, Nripendra P., Nunkoo, R. 10 January 2021 (has links) Yes / Information management is the management of organizational processes, technologies, and people which collectively create, acquire, integrate, organize, process, store, disseminate, access, and dispose of the information. Information management is a vast, multi-disciplinary domain that syndicates various subdomains and perfectly intermingles with other domains. This study aims to provide a comprehensive overview of the information management domain from 1970 to 2019. Drawing upon the methodology from statistical text analysis research, this study summarizes the evolution of knowledge in this domain by examining the publication trends as per authors, institutions, countries, etc. Further, this study proposes a probabilistic generative model based on structural topic modeling to understand and extract the latent themes from the research articles related to information management. Furthermore, this study graphically visualizes the variations in the topic prevalences over the period of 1970 to 2019. The results highlight that the most common themes are data management, knowledge management, environmental management, project management, service management, and mobile and web management. The findings also identify themes such as knowledge management, environmental management, project management, and social communication as academic hotspots for future research. Information management Structural topic models Topic modelling Generative models Text analytics
22	Modeling Mortality Rates In The WikiLeaks Afghanistan War Logs Rusch, Thomas, Hofmarcher, Paul, Hatzinger, Reinhold, Hornik, Kurt 09 1900 (has links) (PDF) The WikiLeaks Afghanistan war logs contain more than 76 000 reports about fatalities and their circumstances in the US led Afghanistan war, covering the period from January 2004 to December 2009. In this paper we use those reports to build statistical models to help us understand the mortality rates associated with specific circumstances. We choose an approach that combines Latent Dirichlet Allocation (LDA) with negative binomial based recursive partitioning. LDA is used to process the natural language information contained in each report summary. We estimate latent topics and assign each report to one of them. These topics - in addition to other variables in the data set - subsequently serve as explanatory variables for modeling the number of fatalities of the civilian population, ISAF Forces, Anti-Coalition Forces and the Afghan National Police or military as well as the combined number of fatalities. Modeling is carried out with manifest mixtures of negative binomial distributions estimated with model-based recursive partitioning. For each group of fatalities, we identify segments with different mortality rates that correspond to a small number of topics and other explanatory variables as well as their interactions. Furthermore, we carve out the similarities between segments and connect them to stories that have been covered in the media. This provides an unprecedented description of the war in Afghanistan covered by the war logs. Additionally, our approach can serve as an example as to how modern statistical methods may lead to extra insight if applied to problems of data journalism. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
23	Probabilistic Models for the Analysis of Gene Expression Profiles Quon, Gerald 16 August 2013 (has links) Gene expression profiles are some of the most abundant sources of data about the cellular state of a collection of cells in an organism. Comparison of the expression profiles of multiple samples allows biologists to find associations between observations at the molecular level and the phenotype of the samples. A key challenge is to distinguish variation in expression due to biological factors of interest from variation due to confounding factors that can arise for unrelated technical or biological reasons. This thesis presents models that can explicitly adjust the comparison of expression profiles to account for specific types of confounding factors. One such confounding factor arises when comparing tissue-specific expression profiles across multiple organisms to identify differences in expression that are indicative of changes in gene function. When the organisms are separated by long evolutionary distances, tissue functions may be re-distributed and introduce expression changes unrelated to changes in gene function. We developed Brownian Factor Phylogenetic Analysis, a model that can account for such re-distribution of function, and demonstrate that removing this confounding factor improves tasks such as predicting gene function. Another confounding factor arises because current protocols for expression profiling require RNA extracts from multiple cells. Often biological samples are heterogeneous mixtures of multiple cell types, so the measured expression profile is an average of the RNA levels of the constituent cells. When the biological sample contains both cells of interest and nuisance cells, the confounding expression from the nuisance cells can mask the expression of the cells of interest. We developed ISOLATE and ISOpure, two models for addressing the heterogeneity of tumor samples. We demonstrated that modeling tumor heterogeneity leads to an improvement in two tasks: identifying the site of origin of metastatic tumors, and predicting the risk of death of lung cancer patients. probabilistic models gene expression profiling cancer prognosis prediction expression evolution tumor heterogeneity expression deconvolution tumor purification topic models 0715
24	Probabilistic Models for the Analysis of Gene Expression Profiles Quon, Gerald 16 August 2013 (has links) Gene expression profiles are some of the most abundant sources of data about the cellular state of a collection of cells in an organism. Comparison of the expression profiles of multiple samples allows biologists to find associations between observations at the molecular level and the phenotype of the samples. A key challenge is to distinguish variation in expression due to biological factors of interest from variation due to confounding factors that can arise for unrelated technical or biological reasons. This thesis presents models that can explicitly adjust the comparison of expression profiles to account for specific types of confounding factors. One such confounding factor arises when comparing tissue-specific expression profiles across multiple organisms to identify differences in expression that are indicative of changes in gene function. When the organisms are separated by long evolutionary distances, tissue functions may be re-distributed and introduce expression changes unrelated to changes in gene function. We developed Brownian Factor Phylogenetic Analysis, a model that can account for such re-distribution of function, and demonstrate that removing this confounding factor improves tasks such as predicting gene function. Another confounding factor arises because current protocols for expression profiling require RNA extracts from multiple cells. Often biological samples are heterogeneous mixtures of multiple cell types, so the measured expression profile is an average of the RNA levels of the constituent cells. When the biological sample contains both cells of interest and nuisance cells, the confounding expression from the nuisance cells can mask the expression of the cells of interest. We developed ISOLATE and ISOpure, two models for addressing the heterogeneity of tumor samples. We demonstrated that modeling tumor heterogeneity leads to an improvement in two tasks: identifying the site of origin of metastatic tumors, and predicting the risk of death of lung cancer patients. probabilistic models gene expression profiling cancer prognosis prediction expression evolution tumor heterogeneity expression deconvolution tumor purification topic models 0715
25	Using Topic Models to Study Journalist-Audience Convergence and Divergence: The Case of Human Trafficking Coverage on British Online Newspapers Papadouka, Maria Eirini 08 1900 (has links) Despite the accessibility of online news and availability of sophisticated methods for analyzing news content, no previous study has focused on the simultaneous examination of news coverage on human trafficking and audiences' interpretations of this coverage. In my research, I have examined both journalists' and commenters' topic choices in coverage and discussion of human trafficking from the online platforms of three British newspapers covering the period 2009–2015. I used latent semantic analysis (LSA) to identify emergent topics in my corpus of newspaper articles and readers' comments, and I then quantitatively investigated topic preferences to identify convergence and divergence on the topics discussed by journalists and their readers. I addressed my research questions in two distinctive studies. The first case study implemented topic modelling techniques and further quantitative analyses on article and comment paragraphs from The Guardian. The second extensive study included article and comment paragraphs from the online platforms of three British newspapers: The Guardian, The Times and the Daily Mail. The findings indicate that the theories of "agenda setting" and of "active audience" are not mutually exclusive, and the scope of explanation of each depends partly on the specific topic or subtopic that is analyzed. Taking into account further theoretical concepts related to agenda setting, four more additional research questions were addressed. Topic convergence and divergence was further identified when taking into account the newspapers' political orientation and the articles' and comments' year of publication. human trafficking text analysis topic models agenda setting active audiences news media latent semantic analysis Sociology, Criminology and Penology Sociology, General
26	Time Dynamic Topic Models Jähnichen, Patrick 30 March 2016 (has links) (PDF) Information extraction from large corpora can be a useful tool for many applications in industry and academia. For instance, political communication science has just recently begun to use the opportunities that come with the availability of massive amounts of information available through the Internet and the computational tools that natural language processing can provide. We give a linguistically motivated interpretation of topic modeling, a state-of-the-art algorithm for extracting latent semantic sets of words from large text corpora, and extend this interpretation to cover issues and issue-cycles as theoretical constructs coming from political communication science. We build on a dynamic topic model, a model whose semantic sets of words are allowed to evolve over time governed by a Brownian motion stochastic process and apply a new form of analysis to its result. Generally this analysis is based on the notion of volatility as in the rate of change of stocks or derivatives known from econometrics. We claim that the rate of change of sets of semantically related words can be interpreted as issue-cycles, the word sets as describing the underlying issue. Generalizing over the existing work, we introduce dynamic topic models that are driven by general (Brownian motion is a special case of our model) Gaussian processes, a family of stochastic processes defined by the function that determines their covariance structure. We use the above assumption and apply a certain class of covariance functions to allow for an appropriate rate of change in word sets while preserving the semantic relatedness among words. Applying our findings to a large newspaper data set, the New York Times Annotated corpus (all articles between 1987 and 2007), we are able to identify sub-topics in time, \\\\textit{time-localized topics} and find patterns in their behavior over time. However, we have to drop the assumption of semantic relatedness over all available time for any one topic. Time-localized topics are consistent in themselves but do not necessarily share semantic meaning between each other. They can, however, be interpreted to capture the notion of issues and their behavior that of issue-cycles. Topic Modelle maschinelles Lernen Bayes Modelle Automatische Sprachverarbeitung Topic Models Machine Learning Bayesian Models Time Series Analysis Natural Language Processing ddc:500
27	A framework for exploiting electronic documentation in support of innovation processes Uys, J. W. 03 1900 (has links) Thesis (PhD (Industrial Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The crucial role of innovation in creating sustainable competitive advantage is widely recognised in industry today. Likewise, the importance of having the required information accessible to the right employees at the right time is well-appreciated. More specifically, the dependency of effective, efficient innovation processes on the availability of information has been pointed out in literature. A great challenge is countering the effects of the information overload phenomenon in organisations in order for employees to find the information appropriate to their needs without having to wade through excessively large quantities of information to do so. The initial stages of the innovation process, which are characterised by free association, semi-formal activities, conceptualisation, and experimentation, have already been identified as a key focus area for improving the effectiveness of the entire innovation process. The dependency on information during these early stages of the innovation process is especially high. Any organisation requires a strategy for innovation, a number of well-defined, implemented processes and measures to be able to innovate in an effective and efficient manner and to drive its innovation endeavours. In addition, the organisation requires certain enablers to support its innovation efforts which include certain core competencies, technologies and knowledge. Most importantly for this research, enablers are required to more effectively manage and utilise innovation-related information. Information residing inside and outside the boundaries of the organisation is required to feed the innovation process. The specific sources of such information are numerous. Such information may further be structured or unstructured in nature. However, an ever-increasing ratio of available innovation-related information is of the unstructured type. Examples include the textual content of reports, books, e-mail messages and web pages. This research explores the innovation landscape and typical sources of innovation-related information. In addition, it explores the landscape of text analytical approaches and techniques in search of ways to more effectively and efficiently deal with unstructured, textual information. A framework that can be used to provide a unified, dynamic view of an organisation‟s innovation-related information, both structured and unstructured, is presented. Once implemented, this framework will constitute an innovation-focused knowledge base that will organise and make accessible such innovation-related information to the stakeholders of the innovation process. Two novel, complementary text analytical techniques, Latent Dirichlet Allocation and the Concept-Topic Model, were identified for application with the framework. The potential value of these techniques as part of the information systems that would embody the framework is illustrated. The resulting knowledge base would cause a quantum leap in the accessibility of information and may significantly improve the way innovation is done and managed in the target organisation. / AFRIKAANSE OPSOMMING: Die belangrikheid van innovasie vir die daarstel van „n volhoubare mededingende voordeel word tans wyd erken in baie sektore van die bedryf. Ook die belangrikheid van die toeganklikmaking van relevante inligting aan werknemers op die geskikte tyd, word vandag terdeë besef. Die afhanklikheid van effektiewe, doeltreffende innovasieprosesse op die beskikbaarheid van inligting word deurlopend beklemtoon in die navorsingsliteratuur. „n Groot uitdaging tans is om die oorsake en impak van die inligtingsoorvloedverskynsel in ondernemings te bestry ten einde werknemers in staat te stel om inligting te vind wat voldoen aan hul behoeftes sonder om in die proses deur oormatige groot hoeveelhede inligting te sif. Die aanvanklike stappe van die innovasieproses, gekenmerk deur vrye assosiasie, semi-formele aktiwiteite, konseptualisering en eksperimentasie, is reeds geïdentifiseer as sleutelareas vir die verbetering van die effektiwiteit van die innovasieproses in sy geheel. Die afhanklikheid van hierdie deel van die innovasieproses op inligting is besonder hoog. Om op „n doeltreffende en optimale wyse te innoveer, benodig elke onderneming „n strategie vir innovasie sowel as „n aantal goed gedefinieerde, ontplooide prosesse en metingskriteria om die innovasieaktiwiteite van die onderneming te dryf. Bykomend benodig ondernemings sekere innovasie-ondersteuningsmeganismes wat bepaalde sleutelaanlegde, -tegnologiëe en kennis insluit. Kern tot hierdie navorsing, benodig organisasies ook ondersteuningsmeganismes om hul in staat te stel om meer doeltreffend innovasie-verwante inligting te bestuur en te gebruik. Inligting, gehuisves beide binne en buite die grense van die onderneming, word benodig om die innovasieproses te voer. Die bronne van sulke inligting is veeltallig en hierdie inligting mag gestruktureerd of ongestruktureerd van aard wees. „n Toenemende persentasie van innovasieverwante inligting is egter van die ongestruktureerde tipe, byvoorbeeld die inligting vervat in die tekstuele inhoud van verslae, boeke, e-posboodskappe en webbladsye. In hierdie navorsing word die innovasielandskap asook tipiese bronne van innovasie-verwante inligting verken. Verder word die landskap van teksanalitiese benaderings en -tegnieke ondersoek ten einde maniere te vind om meer doeltreffend en optimaal met ongestruktureerde, tekstuele inligting om te gaan. „n Raamwerk wat aangewend kan word om „n verenigde, dinamiese voorstelling van „n onderneming se innovasieverwante inligting, beide gestruktureerd en ongestruktureerd, te skep word voorgestel. Na afloop van implementasie sal hierdie raamwerk die innovasieverwante inligting van die onderneming organiseer en meer toeganklik maak vir die deelnemers van die innovasieproses. Daar word verslag gelewer oor die aanwending van twee nuwerwetse, komplementêre teksanalitiese tegnieke tot aanvulling van die raamwerk. Voorts word die potensiele waarde van hierdie tegnieke as deel van die inligtingstelsels wat die raamwerk realiseer, verder uitgewys en geillustreer. Unstructured information Text analytics Electronic documents Information retrieval Latent Dirichlet Allocation (LDA) Statistical topic models Dissertations -- Industrial engineering Theses -- Industrial engineering Information resources management Innovation by management
28	Continuous-time infinite dynamic topic models Elshamy, Wesam Samy January 1900 (has links) Doctor of Philosophy / Department of Computing and Information Sciences / William Henry Hsu / Topic models are probabilistic models for discovering topical themes in collections of documents. In real world applications, these models provide us with the means of organizing what would otherwise be unstructured collections. They can help us cluster a huge collection into different topics or find a subset of the collection that resembles the topical theme found in an article at hand. The first wave of topic models developed were able to discover the prevailing topics in a big collection of documents spanning a period of time. It was later realized that these time-invariant models were not capable of modeling 1) the time varying number of topics they discover and 2) the time changing structure of these topics. Few models were developed to address this two deficiencies. The online-hierarchical Dirichlet process models the documents with a time varying number of topics. It varies the structure of the topics over time as well. However, it relies on document order, not timestamps to evolve the model over time. The continuous-time dynamic topic model evolves topic structure in continuous-time. However, it uses a fixed number of topics over time. In this dissertation, I present a model, the continuous-time infinite dynamic topic model, that combines the advantages of these two models 1) the online-hierarchical Dirichlet process, and 2) the continuous-time dynamic topic model. More specifically, the model I present is a probabilistic topic model that does the following: 1) it changes the number of topics over continuous time, and 2) it changes the topic structure over continuous-time. I compared the model I developed with the two other models with different setting values. The results obtained were favorable to my model and showed the need for having a model that has a continuous-time varying number of topics and topic structure. Machine learning Topic models Statistical models Artificial intelligence Graphical models Bayesian statistics Applied Mathematics (0364) Artificial Intelligence (0800) Computer Science (0984)
29	Labeling Clinical Reports with Active Learning and Topic Modeling / Uppmärkning av kliniska rapporter med active learning och topic modeller Lindblad, Simon January 2018 (has links) Supervised machine learning models require a labeled data set of high quality in order to perform well. Available text data often exists in abundance, but it is usually not labeled. Labeling text data is a time consuming process, especially in the case where multiple labels can be assigned to a single text document. The purpose of this thesis was to make the labeling process of clinical reports as effective and effortless as possible by evaluating different multi-label active learning strategies. The goal of the strategies was to reduce the number of labeled documents a model needs, and increase the quality of those documents. With the strategies, an accuracy of 89% was achieved with 2500 reports, compared to 85% with random sampling. In addition to this, 85% accuracy could be reached after labeling 975 reports, compared to 1700 reports with random sampling. active learning topic modeling topic models Binary Version Space Minimization Clinical Reports Computer Sciences Datavetenskap (datalogi)
30	The institutional pluralism of the state Holperin, Michelle Moretzsohn 05 June 2017 (has links) Submitted by Michelle Holperin (mimoretz@gmail.com) on 2017-07-05T11:36:45Z No. of bitstreams: 1 The Institutional Pluralism of the State.pdf: 4295867 bytes, checksum: 8d35c5e25d3078a289613f86df68e81b (MD5) / Approved for entry into archive by Maria Almeida (maria.socorro@fgv.br) on 2017-07-07T18:37:35Z (GMT) No. of bitstreams: 1 The Institutional Pluralism of the State.pdf: 4295867 bytes, checksum: 8d35c5e25d3078a289613f86df68e81b (MD5) / Made available in DSpace on 2017-07-07T18:38:02Z (GMT). No. of bitstreams: 1 The Institutional Pluralism of the State.pdf: 4295867 bytes, checksum: 8d35c5e25d3078a289613f86df68e81b (MD5) Previous issue date: 2017-06-05 / What are the logics that public organizations enact in their daily activities? This doctoral dissertation investigated the institutional logics of the State. The institutional logic concept adopted was the one of Friedland and colleagues: institutional logics are 'stable constellations of practice', the necessary coupling of substances and material practices that constitutes the institutions’ organizing principles (Friedland et.al., 2014). The State is understood as one of the central institutions of society, composed by two dimensions. One is the bureaucratic dimension, permeated by different ideas about how things should be done in the State. The other is the capitalist dimension, permeated by different ideas about what should be done,i.e., what should be the role of the State. I have chosen a specific type of public organization to explore the logic of the State: the Brazilian independent regulatory agencies (IRAs). IRAs have diffused widely in the past years, and the literature suggests that they represent the 'appropriate model of governance' of the capitalist economy (Levi-Faur, 2005). They changed both how things were done - emphasizing the state's rule-making instruments - and what should be done - focusing on competition promotion and correcting market failures (Majone, 1994). In Brazil, IRAs were part of a broader process of State Reform, and represented an important innovation in terms of organizational design, based on autonomy, and role to be performed, based on competition promotion. However, the process of IRAs’ diffusion was largely impacted by the local context and despite being idealized as purely regulatory, their policies and activities indicate that they do much more than promoting competition. In fact, state policies in general, and regulatory policies in particular, 'are rooted in changing conceptions of what the state is, what it can and should do' (Friedland & Alford, 1991). To assess the institutional logics of the State, this research investigated over 9,000 press releases published by three formal independent regulatory agencies in Brazil between 2002 and 2016. Those press releases cover all the news they released since their creation. Press releases are frequently used by Brazilian IRAs, and they serve as a good proxy of the policies and activities conducted by these agencies. I applied a correlated topic model (CTM) to extract the main themes discussed by the agencies in the past years. Originating from the study areas of natural language processing and machine learning, topic models are probabilistic models that uncover the semantic structure of a collection of documents, or corpus (Blei, 2012; Blei, Ng & Jordan, 2003). Differently from other content analysis techniques, topic models are purely inductive and conform to the ‘relationality’ of meaning assumption of the institutional logics literature (DiMaggio, Nag & Blei, 2013). The results indicated that the logics enacted by independent agencies do not refer only to procedural correctness (Meyer & Hammerschmid, 2006) or democracy (Ocasio, Mauskapf & Steele, 2015). In fact, much of what they do is grounded on broader substantive values, reflecting developmental-, pro-competition- and social-oriented interpretations of the role of the State. Yet, the bureaucratic logic is very pervasive within IRAs: it permeates substantive logics, but also it stands up as a logic of its own. Regulatory agencies enact it more often when they are not able to perform their substantive mission. IRAs re-frame at their discretion the practices of administrative police (standards setting and inspections) and public participation (procedural fairness) during periods of crisis, in order to justify their actions. By doing so, they were able to legitimate their existence, gain a new sense of mission and avoid blame for their actions. Pluralism Institutional logics Bureaucracies Hybrid policy regimes Topic models Administração de empresas Pluralismo Estado Lógica Burocracia Sociologia organizacional

Search results