Global ETD Search

1	Descriptive Labeling of Document Clusters / Deskriptiv märkning av dokumentkluster Österberg, Adam January 2022 (has links) Labeling is the process of giving a set of data a descriptive name. This thesis dealt with documents with no additional information and aimed at clustering them using topic modeling and labeling them using Wikipedia as a second source. Labeling documents is a new field with many potential solutions. This thesis examined one method in a practical setting. Unstructured data was preprocessed and clustered using a topic model. Frequent words from each cluster were used to generate a search query sent to Wikipedia, where titles and categories from the most relevant pages were stored as candidate labels. Each candidate label was evaluated based on the frequency of common cluster words among the candidate labels. The frequency was weighted proportional to the relevance of the original Wikipedia article. The relevance was based on the order of appearance in the search results. The five labels with the highest scores were chosen to describe the cluster. The clustered documents consisted of exam questions that students use to practice before a course exam. Each question in the cluster was scored by someone experienced in the relevant topic by evaluating if one of the five labels correctly described the content. The method proved unreliable, with only one course receiving labels considered descriptive for most of its questions. A significant problem was the closely related data with all documents belonging to one overarching category instead of a dataset containing independent topics. However, for one dataset, 80 % of the documents received a descriptive label, indicating that labeling using secondary sources has potential, but needs to be investigated further. / Märkning handlar om att ge okända data en beskrivning. I denna uppsats behandlas data i form av dokument som utan ytterligare information klustras med temamodellering samt märks med hjälp av Wikipedia som en sekundär källa. Märkning av dokument är ett nytt forskningsområde med flera tänkbara vägar framåt. I denna uppsats undersöks en möjlig metod i en praktisk miljö. Dokumenten förbehandlas och grupperas i kluster med hjälp av en temamodell. Vanliga ord från varje kluster används sedan för att generera en sökfråga som skickas till Wikipedia där titlar och kategorier från de mest relevanta sidorna lagras som kandidater. Varje kandidat utvärderas sedan baserat på frekvensen av kandidatordet bland titlarna i klustret och relevansen av den ursprungliga Wikipedia-artikeln. Relevansen av artiklarna baserades på i vilken ordning de dök upp i sökresultatet. De fem märkningarna med högst poäng valdes ut för att beskriva klustret. De klustrade dokumenten bestod av tentamensfrågor som studenter använder sig av för att träna inför ett prov. Varje fråga i klustret utvärderades av någon med erfarenhet av det i frågan behandlade ämnet. Utvärderingen baserades på om någon av de fem märkningarna ansågs beskriva innehållet. Metoden visade sig vara opålitlig med endast en kurs som erhöll märkningar som ansågs beskrivande för majoriteten av dess frågor. Ett stort problem var att data var nära relaterad med alla dokument tillhörande en övergripande kategori i stället för oberoende ämnen. För en datamängd fick dock 80 % av dokumenten en beskrivande etikett. Detta visar att märkning med hjälp av sekundära källor har potential, men behöver undersökas ytterligare. Natural Language Processing Wikipedia Topic Modeling Labeling Språkteknologi Wikipedia Temamodellering Märkning Computer and Information Sciences Data- och informationsvetenskap
2	HBTQI-personer, en utsatt grupp i en diskursiv kamp : En analys av riksdagsanföranden mellan 2010–2023 med hjälp av temamodellering och diskursteori / LGBTQI people, a vulnerable group in a discursive battle : An analysis of Swedish Riksdag speeches between 2010-2023 using topic modeling and discourse analysis Thelin, Alice January 2024 (has links) Sweden is generally regarded as progressive in politics related to LGBTQI people, and the work for LGBTQI-rights is often described as a success story. Nevertheless, success and resistance have coincided throughout history with different discourses characterizing the political conversation. The study’s aim was to identify and deconstruct the subject positioning of LGBTQI people in Swedish politics. Furthermore, I analysed how LGBTQI people are constructed as a threat or as threatened in relation to the majority society. Using the AI-based topic modeling tool BERTopic, speeches from parliamentary debates from the period 2010–2023 were sampled for a qualitative discourse analysis. The theoretical framework consists of discourse theory, intersectionality, and concepts from queer- and postcolonial theory. The results show that the positioning of LGBTQI people is made in a hegemonic vulnerability discourse. The construction of LGBTQI people as vulnerable relates to an unwanted social development in which LGBTQI people are positioned as an already vulnerable group risking further vulnerability. Two competing discourses emerge, one that constructs threats to LGBTQI people as imported problems, and one that constructs LGBTQI people as threatened by right-wing nationalism. When LGBTQI people are constructed as a threat, it is primarily a threat to the prevailing gender order. LGBTQI discourse theory subject positions intersectionality topic modeling HBTQI diskursteori subjektspositioner intersektionalitet temamodellering Social Work Socialt arbete

Search results

Descriptive Labeling of Document Clusters / Deskriptiv märkning av dokumentkluster