Spelling suggestions: "subject:"topic modeling"" "subject:"oopic modeling""
91 |
Semantic Overflow of Powerful Feelings: Digital Humanities Approaches and the 1805 and 1850 Versions of Wordsworth's PreludeHansen, Dylan 25 April 2023 (has links) (PDF)
Scholars have repeatedly contrasted the 1805 and 1850 versions of William Wordsworth’s The Prelude since the discovery and publication of the former by Ernest De Sélincourt in 1926. Points of contention have included the 1850 poem’s grammatical revisions and shifts toward greater political and religious orthodoxy. While these discussions have waned in recent decades, digital humanities tools allow us to revisit oft-debated texts through new lenses. Wanting to examine scholarly claims about The Prelude from a digital humanities perspective, I collaborated with Dr. Billy Hall to enter both versions of the poem into a data analysis and visualization tool, which displayed the results in topic-modeling outputs and most-frequent-words lists. The 1805 and 1850 topic modeling outputs were essentially identical to one another, suggesting either that scholars have overstated differences between the versions or that the themes of the poem may have evolved in ways not easily captured by my digital humanities methods. On the other hand, the most-frequent-words lists revealed some notable discrepancies between the two Preludes. One set of lists included articles, conjunctions, pronouns, and linking verbs (otherwise known as “stop words”), demonstrating, for instance, that the word “was” appeared with significantly less frequency in the 1850 Prelude. I found that other linking verbs also decreased in the 1850 Prelude, and this discovery prompted me to conduct a stylistic analysis of said verbs. Knowing that a raw statistical count of linking verbs in both texts would reveal only an incomplete portrait of Wordsworth’s shifting verb usage, I divided the verb revisions into two primary categories: replacements of linking verbs with dynamic verbs and descriptors, and removals of lines containing linking verbs. While scholars have previously highlighted the replacement of linking verbs with dynamic verbs and descriptors in the 1850 Prelude, these revisions only account for 30% of the 1850 linking verb revisions. In fact, the majority of linking verb revisions consist of removed 1805 lines. Many of these lines are declarative statements—the removal of which suggests that Wordsworth preferred, in some cases, a less prescriptive approach in the 1850 Prelude.
|
92 |
Bayesian Test Analytics for Document CollectionsWalker, Daniel David 15 November 2012 (has links) (PDF)
Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved modeling of document collections that may contain textual noise or that may include real-valued metadata associated with the documents. This class of documents includes many historical document collections. Indeed, our specific motivation for this work is to help improve the modeling of historical documents, which are often noisy and/or have historical context represented by metadata. Many historical documents are digitized by means of Optical Character Recognition(OCR) from document images of old and degraded original documents. Historical documents also often include associated metadata, such as timestamps,which can be incorporated in an analysis of their topical content. Many techniques, such as topic models, have been developed to automatically discover patterns of meaning in large collections of text. While these methods are useful, they can break down in the presence of OCR errors. We show the extent to which this performance breakdown occurs. The specific types of analyses covered in this dissertation are document clustering, feature selection, unsupervised and supervised topic modeling for documents with and without OCR errors and a new supervised topic model that uses Bayesian nonparametrics to improve the modeling of document metadata. We present results in each of these areas, with an emphasis on studying the effects of noise on the performance of the algorithms and on modeling the metadata associated with the documents. In this research we effectively: improve the state of the art in both document clustering and topic modeling; introduce a useful synthetic dataset for historical document researchers; and present analyses that empirically show how existing algorithms break down in the presence of OCR errors.
|
93 |
Topic classification of Monetary Policy Minutes from the Swedish Central Bank / Ämnesklassificering av Riksbankens penningpolitiska mötesprotokollCedervall, Andreas, Jansson, Daniel January 2018 (has links)
Over the last couple of years, Machine Learning has seen a very high increase in usage. Many previously manual tasks are becoming automated and it stands to reason that this development will continue in an incredible pace. This paper builds on the work in Topic Classification and attempts to provide a baseline on how to analyse the Swedish Central Bank Minutes and gather information using both Latent Dirichlet Allocation and a simple Neural Networks. Topic Classification is done on Monetary Policy Minutes from 2004 to 2018 to find how the distributions of topics change over time. The results are compared to empirical evidence that would confirm trends. Finally a business perspective of the work is analysed to reveal what the benefits of implementing this type of technique could be. The results of these methods are compared and they differ. Specifically the Neural Network shows larger changes in topic distributions than the Latent Dirichlet Allocation. The neural network also proved to yield more trends that correlated with other observations such as the start of bond purchasing by the Swedish Central Bank. Thus, our results indicate that a Neural Network would perform better than the Latent Dirichlet Allocation when analyzing Swedish Monetary Policy Minutes. / Under de senaste åren har artificiell intelligens och maskininlärning fått mycket uppmärksamhet och växt otroligt. Tidigare manuella arbeten blir nu automatiserade och mycket tyder på att utvecklingen kommer att fortsätta i en hög takt. Detta arbete bygger vidare på arbeten inom topic modeling (ämnesklassifikation) och applicera detta i ett tidigare outforskat område, riksbanksprotokoll. Latent Dirichlet Allocation och Neural Network används för att undersöka huruvida fördelningen av diskussionspunkter (topics) förändras över tid. Slutligen presenteras en teoretisk diskussion av det potentiella affärsvärdet i att implementera en liknande metod. Resultaten för de olika modellerna uppvisar stora skillnader över tid. Medan Latent Dirichlet Allocation inte finner några större trender i diskussionspunkter visar Neural Network på större förändringar över tid. De senare stämmer dessutom väl överens med andra observationer såsom påbörjandet av obligationsköp. Därav indikerar resultaten att Neural Network är en mer lämplig metod för analys av riksbankens mötesprotokoll.
|
94 |
COP TOPICS: TOPIC MODELING-ASSISTED DISCOVERIES OF POLICE-RELATED THEMES IN AFRICAN-AMERICAN JOURNALISTIC TEXTSLemire Garlic, Nicole January 2017 (has links)
The analysis of mainstream newspaper content has long been mined by communication scholars and researchers for insights into public opinion and perceptions. In recent years, scholars have been examining African-American authored periodicals to obtain similar insights. Hearkening back to the 1950s and 1960s civil rights movement in the United States, the highly-publicized killings of African-American men by police officers during the past several years have highlighted longstanding strained police-community relations. As part of its role as both a reflection of, and an advocate for, the African-American community, African-American journalistic texts contain a wealth of data about African-American public opinion about, and perceptions of, police. In years past, media content analysts would manually sift through newspapers to divine interesting police-related themes and variables worthy of study. But, with the exponential growth of digitized texts, communication scholars are experimenting with computerized text analysis tools like topic modeling software to aid them in their content analyses. This thesis considers to what degree topic modeling software can be used at the exploratory stage of designing a content analysis study to aid in uncovering themes and variables worthy of further investigation. Appendix A contains results of the manual exploratory content analysis. The list of topics generated by the topic modeling software may be found in Appendix B. / Media Studies & Production / Accompanied by one .pdf file: NLG Thesis Appendices Final.pdf
|
95 |
Essays on Utilizing Data Analytics and Dynamic Modeling to Inform Complex Science and Innovation PoliciesBaghaei Lakeh, Arash 27 April 2018 (has links)
In many ways, science represents a complex system which involves technical, social, and economic aspects. An analysis of such a system requires employing and combining different methodological perspectives and incorporation of different sources of data. In this dissertation, we use a variety of methods to analyze large sets of data in order to examine the effects of various domestic and institutional factors on scientific activities. First, we evaluate how the contributions of behavioral and social sciences to studies of health have evolved over time. We use data analytics to conduct a textual analysis of more than 200,000 publications on the topic of HIV/AIDS. We find that the focus of the scientific community within the context of the same problem varies as the societal context of the problem changes. Specifically, we uncover that the focus on the behavioral and social aspects of HIV/AIDS has increased over time and varies in different countries. Further, we show that this variation is related to the mortality level that the disease causes in each country. Second, we investigate how different sources of funding affect the science enterprise differently. We use data analytics to analyze more than 60,000 papers published on the subject of specific diseases globally and highlight the role of philanthropic money in these domains. We find that philanthropies tend to have a more practical approach in health studies as compared with public funders. We further show that they are also concerned with the economic, policy related, social, and behavioral aspects of the diseases. We uncover that philanthropies tend to mix and combine approaches and contents supported both by public and private sources of funding for science. We further show that in doing so, philanthropies tend to be closer to the position held by the public sector in the context of health studies. Finally, we find that studies funded by philanthropies tend to receive higher citations, and hence have higher impact, in comparison to those funded by the public sector. Third, we study the effect of different schemes of funding distribution on the career of scientists. In this study, we develop a system dynamics model for analyzing a scientist's career under different funding and competition contexts. We investigate the characteristics of optimal strategies and also the equilibrium points for the cases of scientists competing for financial resources. We show that a policy to fund the best can lead scientists to spend more time on writing proposals, in order to secure funding, rather than writing papers. We find that when everyone receives funding (or have the same chance of receiving funding) the overall optimal payoff of the scientists reaches its highest level and at this optimum, scientists spend all their time on writing papers rather than writing proposals. Our analysis suggests that more egalitarian distributions of funding results in higher overall research output by scientists. We also find that luck plays an important role in the success of scientists. We show that following the optimal strategies do not guarantee success. Due to the stochastic nature of funding decisions, some will eventually fail. The failure is not due to scientists' faulty decisions, but rather simply due to their lack of luck. / Ph. D. / Science helps us understand the world and enables us to improve how we interact with our environment. But science itself has also been the subject of inquiry by philosophers, sociologists, economists, historians, and scientists. The goal in the investigations of science has been to better understand how scientific advances occur, how to foster innovation, and how to improve the institutions that push science forward. This dissertation contributes to this area of research by asking and responding to several questions about the science enterprise. First, we study how communities of scientists in different parts of the world look at the seemingly same problem differently. We use a computational method to read through a large set of publications on the topic of HIV/AIDS (which includes more than 200,000 papers) and uncover the topics of these papers. We find that in the context of HIV/AIDS, contributions of behavioral and social scientists have increased over time. Moreover, we show that the share of these contributions in any counties’ total research output differs significantly. We further find that there is a significant relationship between one country’s rate of death, due to HIV/AIDS, and the share of behavioral and social studies in the overall research profile of that country on the topic of HIV/AIDS. Second, we investigate how different sources of research funding affect scientific activities differently. Specifically, we focus on the role of philanthropic money in science and its effect on the content and impact of research studies. In our analysis, we rely on computational techniques that distinguishes between different themes of research in the studies of a few diseases and also different statistical methods. We find that philanthropies tend to have a more practical approach to health studies as compared with public sources of funding. Meanwhile, we find that they are also concerned with the economic, policy related, social, and behavioral aspects of the diseases. Moreover, we show that philanthropies tend to mix and combine approaches and contents supported both by public and private sources of funding for science. We find that, in doing so, philanthropies tend to be closer to the position held by the public sector in the context of health studies. Finally, we show that studies funded by philanthropies tend to receive higher citations. This finding suggests that these studies have a higher impact in comparison to those funded by the public sector. Third, we study how different mechanisms for distributing research funding among scientists can affect their career and success. Many scientists should spend time on both writing papers and research grant proposals. In this work, we aim at understanding how a scientists should allocate her time between these two activities to maximize her career long number of papers. We develop a small mathematical model to capture the mechanisms related to the research career of a scientist in an academic setting. Then, for different schemes of funding distribution, we find the scientist’s time allocation that maximizes the number of papers she publishes over her career. We find that when funding is being allocated to the best scientists and best grant proposals, scientists’ best strategy is to spend more time on writing research grant proposals rather than papers. This decreases the total number of papers published by the scientists over their career. We also find that luck is important in determining the career success of scientists. Due to errors in evaluation of proposal qualities, a scientist may fail in her career regardless of whether she has followed the best strategy that she could.
|
96 |
HBTQI-personer, en utsatt grupp i en diskursiv kamp : En analys av riksdagsanföranden mellan 2010–2023 med hjälp av temamodellering och diskursteori / LGBTQI people, a vulnerable group in a discursive battle : An analysis of Swedish Riksdag speeches between 2010-2023 using topic modeling and discourse analysisThelin, Alice January 2024 (has links)
Sweden is generally regarded as progressive in politics related to LGBTQI people, and the work for LGBTQI-rights is often described as a success story. Nevertheless, success and resistance have coincided throughout history with different discourses characterizing the political conversation. The study’s aim was to identify and deconstruct the subject positioning of LGBTQI people in Swedish politics. Furthermore, I analysed how LGBTQI people are constructed as a threat or as threatened in relation to the majority society. Using the AI-based topic modeling tool BERTopic, speeches from parliamentary debates from the period 2010–2023 were sampled for a qualitative discourse analysis. The theoretical framework consists of discourse theory, intersectionality, and concepts from queer- and postcolonial theory. The results show that the positioning of LGBTQI people is made in a hegemonic vulnerability discourse. The construction of LGBTQI people as vulnerable relates to an unwanted social development in which LGBTQI people are positioned as an already vulnerable group risking further vulnerability. Two competing discourses emerge, one that constructs threats to LGBTQI people as imported problems, and one that constructs LGBTQI people as threatened by right-wing nationalism. When LGBTQI people are constructed as a threat, it is primarily a threat to the prevailing gender order.
|
97 |
Same same, but different? On the Relation of Information Science and the Digital Humanities: A Scientometric Comparison of Academic Journals Using LDA and Hierarchical ClusteringBurghardt, Manuel, Luhmann, Jan 26 June 2024 (has links)
In this paper we investigate the relationship of Information Science (IS) and the
Digital Humanities (DH) by means of a scientometric comparison of academic
journals from the respective disciplines. In order to identify scholarly practices
for both disciplines, we apply a recent variant of LDA topic modeling that makes
use of additional hierarchical clustering. The results reveal the existence of characteristic topic areas for both IS (information retrieval, information seeking behavior, scientometrics) and DH (computational linguistics, distant reading and
digital editions) that can be used to distinguish them as disciplines in their own
right. However, there is also a larger shared area of practices related to information management and also a few shared topic clusters that indicate a common
ground for – mostly methodological – exchange between the two disciplines.
|
98 |
Comparison of Causal Models for Bibliometric and Scientometric Analysis Applications / Jämförelse av orsakssambandsmodeller för bibliometriska och scientometriska analysapplikationerGholamniaetakhsami, Hirbod January 2024 (has links)
Keyword analysis in scientific articles is a method used to identify and evaluate the importance and relevance of specific words or phrases (keywords) within scientific literature. The primary goal of keyword analysis is to uncover the core themes, research trends, and conceptual frameworks within a given field or across multiple disciplines. It helps researchers understand scientific discourse's focus and ideas' evolution over time. This thesis performs keyword analysis on a repository of scientific publications through a combination of methods. It starts with extracting the available keywords, and it deals with the missing keywords data through data augmentation. Then, it utilizes a variety of statistical methods to gain insight into the publications. The study employs an implementation of LDA topic modeling to accurately categorize keywords into thematic groups, a Vector autoregression to explore keyword relationships, and temporal dynamics of keywords. Next, the research further examines the interdisciplinary connectivity of keywords, clarifying the collective nature of modern science. In conclusion, the thesis presents a comprehensive framework for keyword analysis in scientific literature, through a blend of data augmentation, natural language processing, temporal dynamics, and interdisciplinary examination, the study provides a robust tool for understanding the development and structure of scientific literature. The findings of this research have important implications for scholars, it allows navigating the vast amount of scientific literature more effectively and to discern the most influential ideas and trends shaping target fields. The methodologies implemented here offer an opportunity for any studies to methodologically search, extract, and identify keywords to find relevant papers and interpret the complex landscape of scientific communication. / Nyckelordsanalys i vetenskapliga artiklar är en metod som används för att identifiera och utvärdera vikten och relevansen av specifika ord eller fraser (nyckelord) inom vetenskaplig litteratur. Det primära målet med nyckelordsanalys är att avslöja kärnteman, forskningstrender och konceptuella ramverk inom ett givet fält eller över flera lämnar. Det hjälper forskare att förstå den vetenskapliga diskursens fokus och idéernas utveckling över tid. Denna avhandling utför nyckelordsanalys på ett arkiv av vetenskapliga publikationer genom en kombination av metoder. Den börjar med att extrahera de tillgängliga nyckelorden och hanterar de saknade nyckelordsdata genom dataaugtation. Därefter använder den en mängd statistiska metoder för att få insikt i publikationerna. Studien använder en implementering av LDA-ämnesmodellering för att noggrant kategorisera nyckelord i tematiska grupper, en vektorautoregression för att utforska nyckelordsrelationer och tidsmässig dynamik av nyckelord. Nästa steg i forskningen är att ytterligare undersöka den tvärvetenskapliga kopplingen mellan nyckelord, vilket klargör den kollektiva naturen av modern vetenskap. Sammanfattningsvis presenterar avhandlingen ett omfattande ramverk för nyckelordsanalys i vetenskaplig litteratur. Genom en blandning av dataaugmentation, naturlig språkbehandling, tidsmässig dynamik och tvärvetenskaplig undersökning, erbjuder studien ett robust verktyg för att förstå utvecklingen och strukturen av vetenskaplig litteratur. Forskningens resultat har viktiga implikationer för forskare; det möjliggör effektivare navigering i den omfattande mängden vetenskaplig litteratur och att urskilja de mest inflytdrikelserika idéerna och trenderna som formar målfälten. De metoder som införas här erbjuder en möjlighet för vilken studie som helst att metodiskt söka, extrahera och identifiera nyckelord för att hitta relevanta artiklar och tolka det komplexa landskapet av vetenskaplig kommunikation.
|
99 |
The Salience of Issues in Parliamentary Debates : Its Development and Relation to the Support of the Sweden DemocratsAlexander, Ödlund Lindholm January 2020 (has links)
The aim of this study was to analyze the salience of issue dimensions in the Swedish parliament debates by the established parties during the rise of the Sweden Democrats Party (SD). Structural topic modeling was used to construct a measurement of the salience of issues, examining the full body of speeches in the Swedish parliament between September 2006 and December 2019. Trend analysis revealed a realignment from a focus on socio-economic to socio-cultural issues in Swedish politics. Cross-correlation analyses had conflicting results, indicating a weak positive relationship between the salience of issues and the support of SD – but low predictive ability; it also showed that changes in the support of SD did lead (precede) changes in the salience of issues in the parliament. The ramifications of socio-cultural issues being the most salient are that so-called radical right-wing populist parties (RRPs), or neo-nationalist parties, has a greater opportunity to gain support. It can make voters more inclined to base their voting decision on socio-cultural issues, which favors parties who fight for and are trustworthy in those issues – giving them more valence in the eyes of the voters.
|
100 |
Topic Analysis of Tweets on the European Refugee Crisis Using Non-negative Matrix FactorizationShen, Chong 01 January 2016 (has links)
The ongoing European Refugee Crisis has been one of the most popular trending topics on Twitter for the past 8 months. This paper applies topic modeling on bulks of tweets to discover the hidden patterns within these social media discussions. In particular, we perform topic analysis through solving Non-negative Matrix Factorization (NMF) as an Inexact Alternating Least Squares problem. We accelerate the computation using techniques including tweet sampling and augmented NMF, compare NMF results with different ranks and visualize the outputs through topic representation and frequency plots. We observe that supportive sentiments maintained a strong presence while negative sentiments such as safety concerns have emerged over time.
|
Page generated in 0.3866 seconds