Global ETD Search

121	Mapping out the impact of surveillance technology: research, professionals, and public opinion : A mixed methods approach Karlsson, Kalle January 2022 (has links) Combating crime is a complex task with cultural, political, and legal dimensions. In technologically advanced societies, surveillance technology can be used to aid law enforcement. A few examples of such tools are drones, cameras, and wiretaps to mention a few. As such tools become more commonplace, the need to address associated issues increase which relate to cultural, political, and legal dimensions and different stakeholders. Hence, the purpose of this thesis is to discern the impact of informatics research on surveillance technology and map out similarities and discrepancies between views of social media users, researchers, and professionals within law enforcement. The thesis impose a heuristic perspective and stem from both positivist and interpretivist tradition. The Panopticon metaphor and Panopticism are used as a theoretical lens, mainly to discuss and contextualize the findings. Data was from Twitter and Scopus by using scripts and by conducting an interview with law enforcement staff in Sweden. A total of 88 989 tweets and 4 874 research papers were retrieved. These were analyzed using topic modeling which assigned a dominant topic to each tweet and research paper. The interview was thematized using both the literature review and the topic modeling findings for guiding framework. The findings showed that there were seven topics found within the Scopus dataset and four topics within the Twitter dataset. It was found that privacy was one of the least mentioned aspects in all three datasets and that law enforcement personnel see it as closely related with efficiency. Military applications and usage were found in both research papers and tweets and law enforcement staff use a variety of ICT in their daily work. Based on the findings, it seems as though surveillance technology today can suitably be characterized as being bi-directional, both in the form of sousveillance and surveillance which relates to the Deleuzian perspectives on Panopticon. It was concluded that concrete implementations of surveillance technology attracted the most attention compared to more abstract themes such as ethics and privacy. But in all both datasets, specific ICT was addressed from a critical perspective. Similarly, law enforcement personnel viewed privacy and integrity from the organization’s perspective and highlighted rules and regulation. For future work, sentiment analysis is suggested to supplement topic modeling as well as imposing a longitudinal approach or adding additional social media sources. Topic modeling Panopticon Surveillance Technology Twitter Altmetrics Surveillance Theory CCTV AI Big Data Information Systems, Social aspects
122	Elucidating AI Policy Discourse : Uncovering Themes Through Latent Dirichlet Allocation Zetterblom, Patrik January 2023 (has links) This thesis embarks on a journey to investigate the discourse contained within the policy documents examined by utilizing the topic modeling technique labeled Latent Dirichlet Allocation. The aforementioned investigation will be conducted through the theoretical lens of Systems Theory and Discourse Analysis Theory. The thesis aims to identify the core constituents, form a consensus and enrich the scientific communities’ understanding regarding how these core constituents alongside the discourse contained within the policy documents shape the overall landscape of AI governance in continental Europe. Furthermore, prior to an in depth investigation of the methods and theoretical frameworks mentioned above commences, an introduction is presented to give additional insight to the background of AI & the problem formulation. The results of this study reveal 8 inferred themes. These inferred themes are then thoroughly discussed in alignment with the principles and concepts set forth by the theoretical frameworks. The thesis then provides a conclusive penultimate subchapter that encapsulates the key points and directly addresses the research question before highlighting possible future research opportunities. Artificial Intelligence Machine Learning Topic Modeling Regulation National Strategy Latent Dirichlet Allocation Systems Theory Discourse Analysis Theory Information Systems, Social aspects
123	Efficient Sentiment Analysis and Topic Modeling in NLP using Knowledge Distillation and Transfer Learning / Effektiv sentimentanalys och ämnesmodellering inom NLP med användning av kunskapsdestillation och överföringsinlärning Malki, George January 2023 (has links) This abstract presents a study in which knowledge distillation techniques were applied to a Large Language Model (LLM) to create smaller, more efficient models without sacrificing performance. Three configurations of the RoBERTa model were selected as ”student” models to gain knowledge from a pre-trained ”teacher” model. Multiple steps were used to improve the knowledge distillation process, such as copying some weights from the teacher to the student model and defining a custom loss function. The selected task for the knowledge distillation process was sentiment analysis on Amazon Reviews for Sentiment Analysis dataset. The resulting student models showed promising performance on the sentiment analysis task capturing sentiment-related information from text. The smallest of the student models managed to obtain 98% of the performance of the teacher model while being 45% lighter and taking less than a third of the time to analyze an entire the entire IMDB Dataset of 50K Movie Reviews dataset. However, the student models struggled to produce meaningful results on the topic modeling task. These results were consistent with the topic modeling results from the teacher model. In conclusion, the study showcases the efficacy of knowledge distillation techniques in enhancing the performance of LLMs on specific downstream tasks. While the model excelled in sentiment analysis, further improvements are needed to achieve desirable outcomes in topic modeling. These findings highlight the complexity of language understanding tasks and emphasize the importance of ongoing research and development to further advance the capabilities of NLP models. / Denna sammanfattning presenterar en studie där kunskapsdestilleringstekniker tillämpades på en stor språkmodell (Large Language Model, LLM) för att skapa mindre och mer effektiva modeller utan att kompremissa på prestandan. Tre konfigurationer av RoBERTa-modellen valdes som ”student”-modeller för att inhämta kunskap från en förtränad ”teacher”-modell. Studien mäter även modellernas prestanda på två ”DOWNSTREAM” uppgifter, sentimentanalys och ämnesmodellering. Flera steg användes för att förbättra kunskapsdestilleringsprocessen, såsom att kopiera vissa vikter från lärarmodellen till studentmodellen och definiera en anpassad förlustfunktion. Uppgiften som valdes för kunskapsdestilleringen var sentimentanalys på datamängden Amazon Reviews for Sentiment Analysis. De resulterande studentmodellerna visade lovande prestanda på sentimentanalysuppgiften genom att fånga upp information relaterad till sentiment från texten. Den minsta av studentmodellerna lyckades erhålla 98% av prestandan hos lärarmodellen samtidigt som den var 45% lättare och tog mindre än en tredjedel av tiden att analysera hela IMDB Dataset of 50K Movie Reviews datasettet.Dock hade studentmodellerna svårt att producera meningsfulla resultat på ämnesmodelleringsuppgiften. Dessa resultat överensstämde med ämnesmodelleringsresultaten från lärarmodellen. Dock hade studentmodellerna svårt att producera meningsfulla resultat på ämnesmodelleringsuppgiften. Dessa resultat överensstämde med ämnesmodelleringsresultaten från lärarmodellen. Large Language Model RoBERTa Knowledge distillation Transfer learning Sentiment analysis Topic modeling Stor språkmodell RoBERTa Kunskapsdestillation överföringsinlärning Sentimentanalys Ämnesmodellering Computer and Information Sciences Data- och informationsvetenskap
124	Investigating Performance of Different Models at Short Text Topic Modelling / En jämförelse av textrepresentationsmodellers prestanda tillämpade för ämnesinnehåll i korta texter Akinepally, Pratima Rao January 2020 (has links) The key objective of this project was to quantitatively and qualitatively assess the performance of a sentence embedding model, Universal Sentence Encoder (USE), and a word embedding model, word2vec, at the task of topic modelling. The first step in the process was data collection. The data used for the project was podcast descriptions available at Spotify, and the topics associated with them. Following this, the data was used to generate description vectors and topic vectors using the embedding models, which were then used to assign topics to descriptions. The results from this study led to the conclusion that embedding models are well suited to this task, and that overall the USE outperforms the word2vec models. / Det huvudsakliga syftet med det i denna uppsats rapporterade projektet är att kvantitativt och kvalitativt utvärdera och jämföra hur väl Universal Sentence Encoder USE, ett semantiskt vektorrum för meningar, och word2vec, ett semantiskt vektorrum för ord, fungerar för att modellera ämnesinnehåll i text. Projektet har som träningsdata använt skriftliga sammanfattningar och ämnesetiketter för podd-episoder som gjorts tillgängliga av Spotify. De skriftliga sammanfattningarna har använts för att generera både vektorer för de enskilda podd-episoderna och för de ämnen de behandlar. De båda ansatsernas vektorer har sedan utvärderats genom att de använts för att tilldela ämnen till beskrivningar ur en testmängd. Resultaten har sedan jämförts och leder både till den allmänna slutsatsen att semantiska vektorrum är väl lämpade för den här sortens uppgifter, och att USE totalt sett överträffar word2vec-modellerna. Topic Modeling NLP word2vec Universal Sentence Encoder Podcast Textanalys semantiska vektorrum NLP word2vec Universal Sentence Encoder Poddar Computer and Information Sciences Data- och informationsvetenskap
125	Supporting Source Code Comprehension During Software Evolution and Maintenance Alhindawi, Nouh Talal 30 July 2013 (has links) No description available. Computer Science Software Comprehension Software Evolution Software Maintenance Information Retrieval Latent Semantic Indexing Stereotypes Traceability Commits Topic Modeling Corpus Feature Location Concept Location Source Code Quering
126	Vers une représentation du contexte thématique en Recherche d'Information / Generative models of topical context for Information Retrieval Deveaud, Romain 29 November 2013 (has links) Quand des humains cherchent des informations au sein de bases de connaissancesou de collections de documents, ils utilisent un système de recherche d’information(SRI) faisant office d’interface. Les utilisateurs doivent alors transmettre au SRI unereprésentation de leur besoin d’information afin que celui-ci puisse chercher des documentscontenant des informations pertinentes. De nos jours, la représentation du besoind’information est constituée d’un petit ensemble de mots-clés plus souvent connu sousla dénomination de « requête ». Or, quelques mots peuvent ne pas être suffisants pourreprésenter précisément et efficacement l’état cognitif complet d’un humain par rapportà son besoin d’information initial. Sans une certaine forme de contexte thématiquecomplémentaire, le SRI peut ne pas renvoyer certains documents pertinents exprimantdes concepts n’étant pas explicitement évoqués dans la requête.Dans cette thèse, nous explorons et proposons différentes méthodes statistiques, automatiqueset non supervisées pour la représentation du contexte thématique de larequête. Plus spécifiquement, nous cherchons à identifier les différents concepts implicitesd’une requête formulée par un utilisateur sans qu’aucune action de sa part nesoit nécessaire. Nous expérimentons pour cela l’utilisation et la combinaison de différentessources d’information générales représentant les grands types d’informationauxquels nous sommes confrontés quotidiennement sur internet. Nous tirons égalementparti d’algorithmes de modélisation thématique probabiliste (tels que l’allocationde Dirichlet latente) dans le cadre d’un retour de pertinence simulé. Nous proposonspar ailleurs une méthode permettant d’estimer conjointement le nombre de conceptsimplicites d’une requête ainsi que l’ensemble de documents pseudo-pertinent le plusapproprié afin de modéliser ces concepts. Nous évaluons nos approches en utilisantquatre collections de test TREC de grande taille. En annexes, nous proposons égalementune approche de contextualisation de messages courts exploitant des méthodesde recherche d’information et de résumé automatique / When searching for information within knowledge bases or document collections,humans use an information retrieval system (IRS). So that it can retrieve documentscontaining relevant information, users have to provide the IRS with a representationof their information need. Nowadays, this representation of the information need iscomposed of a small set of keywords often referred to as the « query ». A few wordsmay however not be sufficient to accurately and effectively represent the complete cognitivestate of a human with respect to her initial information need. A query may notcontain sufficient information if the user is searching for some topic in which she is notconfident at all. Hence, without some kind of context, the IRS could simply miss somenuances or details that the user did not – or could not – provide in query.In this thesis, we explore and propose various statistic, automatic and unsupervisedmethods for representing the topical context of the query. More specifically, we aim toidentify the latent concepts of a query without involving the user in the process norrequiring explicit feedback. We experiment using and combining several general informationsources representing the main types of information we deal with on a dailybasis while browsing theWeb.We also leverage probabilistic topic models (such as LatentDirichlet Allocation) in a pseudo-relevance feedback setting. Besides, we proposea method allowing to jointly estimate the number of latent concepts of a query andthe set of pseudo-relevant feedback documents which is the most suitable to modelthese concepts. We evaluate our approaches using four main large TREC test collections.In the appendix of this thesis, we also propose an approach for contextualizingshort messages which leverages both information retrieval and automatic summarizationtechniques Recherche d’information Contextualisation Concepts implicites Modélisation thématique probabiliste Retour de pertinence simulé Modèles de pertinence TREC Information retrieval Contextualization Latent concepts Probabilistic topic modeling Information sources Pseudo-relevance feedback Relevance models TREC 025.042
127	Miljöpartiet and the never-ending nuclear energy debate : A computational rhetorical analysis of Swedish climate policy Dickerson, Claire January 2022 (has links) The domain of rhetoric has changed dramatically since its inception as the art of persuasion. It has adapted to encompass many forms of digital media, including, for example, data visualization and coding as a form of literature, but the approach has frequently been that of an outsider looking in. The use of comprehensive computational tools as a part of rhetorical analysis has largely been lacking. In this report, we attempt to address this lack by means of three case studies in natural language processing tasks, all of which can be used as part of a computational approach to rhetoric. At this same moment in time, it is becoming all the more important to transition to renewable energy in order to keep global warming under 1.5 degrees Celsius and ensure that countries meet the conditions of the Paris Agreement. Thus, we make use of speech data on climate policy from the Swedish parliament to ground these three analyses in semantic textual similarity, topic modeling, and political party attribution. We find that speeches are, to a certain extent, consistent within parties, given that a slight majority of most semantically similar speeches come from the same party. We also find that some of the most common topics discussed in these speeches are nuclear energy and the Swedish Green party, purported environmental risks due to renewable energy sources, and the job market. Finally, we find that though pairs of speeches are semantically similar, party rhetoric on the whole is generally not unique enough for speeches to be distinguishable by party. These results then open the door for a broader exploration of computational rhetoric for Swedish political science in the future. Computational linguistics language technology natural language processing NLP computational rhetoric topic modeling semantic textual similarity STS political party attribution SBERT KB-BERT machine learning climate policy Swedish climate policy
128	Reclaiming the “C” in ICT4D: A Critical Examination of the Discursive (Un)Freedoms in Digital State Policy and News Media of Bangladesh and Norway Ala-Uddin, Mohammad 11 May 2022 (has links) No description available. Communication Mass Media Mass Communications ICT4D Digitalization CDA Critical Discourse Analysis LDA Topic Modeling Mixed Method C-CDA Communication as Critical Freedom Digital Media Democratization Capabilities
129	Neural Methods Towards Concept Discovery from Text via Knowledge Transfer Das, Manirupa January 2019 (has links) No description available. Computer Engineering Computer Science Information Science Library Science Linguistics
130	[en] EXTRACTING RELIABLE INFORMATION FROM LARGE COLLECTIONS OF LEGAL DECISIONS / [pt] EXTRAINDO INFORMAÇÕES CONFIÁVEIS DE GRANDES COLEÇÕES DE DECISÕES JUDICIAIS FERNANDO ALBERTO CORREIA DOS SANTOS JUNIOR 09 June 2022 (has links) [pt] Como uma consequência natural da digitalização do sistema judiciário brasileiro, um grande e crescente número de documentos jurídicos tornou-se disponível na internet, especialmente decisões judiciais. Como ilustração, em 2020, o Judiciário brasileiro produziu 25 milhões de decisões. Neste mesmo ano, o Supremo Tribunal Federal (STF), a mais alta corte do judiciário brasileiro, produziu 99.5 mil decisões. Alinhados a esses valores, observamos uma demanda crescente por estudos voltados para a extração e exploração do conhecimento jurídico de grandes acervos de documentos legais. Porém, ao contrário do conteúdo de textos comuns (como por exemplo, livro, notícias e postagem de blog), o texto jurídico constitui um caso particular de uso de uma linguagem altamente convencionalizada. Infelizmente, pouca atenção é dada à extração de informações em domínios especializados, como textos legais. Do ponto de vista temporal, o Judiciário é uma instituição em constante evolução, que se molda para atender às demandas da sociedade. Com isso, o nosso objetivo é propor um processo confiável de extração de informações jurídicas de grandes acervos de documentos jurídicos, tomando como base o STF e as decisões monocráticas publicadas por este tribunal nos anos entre 2000 e 2018. Para tanto, pretendemos explorar a combinação de diferentes técnicas de Processamento de Linguagem Natural (PLN) e Extração de Informação (EI) no contexto jurídico. Da PLN, pretendemos explorar as estratégias automatizadas de reconhecimento de entidades nomeadas no domínio legal. Do ponto da EI, pretendemos explorar a modelagem dinâmica de tópicos utilizando a decomposição tensorial como ferramenta para investigar mudanças no raciocinio juridico presente nas decisões ao lonfo do tempo, a partir da evolução do textos e da presença de entidades nomeadas legais. Para avaliar a confiabilidade, exploramos a interpretabilidade do método empregado, e recursos visuais para facilitar a interpretação por parte de um especialista de domínio. Como resultado final, a proposta de um processo confiável e de baixo custo para subsidiar novos estudos no domínio jurídico e, também, propostas de novas estratégias de extração de informações em grandes acervos de documentos. / [en] As a natural consequence of the Brazilian Judicial System’s digitization, a large and increasing number of legal documents have become available on the Internet, especially judicial decisions. As an illustration, in 2020, 25 million decisions were produced by the Brazilian Judiciary. Meanwhile, the Brazilian Supreme Court (STF), the highest judicial body in Brazil, alone has produced 99.5 thousand decisions. In line with those numbers, we face a growing demand for studies focused on extracting and exploring the legal knowledge hidden in those large collections of legal documents. However, unlike typical textual content (e.g., book, news, and blog post), the legal text constitutes a particular case of highly conventionalized language. Little attention is paid to information extraction in specialized domains such as legal texts. From a temporal perspective, the Judiciary itself is a constantly evolving institution, which molds itself to cope with the demands of society. Therefore, our goal is to propose a reliable process for legal information extraction from large collections of legal documents, based on the STF scenario and the monocratic decisions published by it between 2000 and 2018. To do so, we intend to explore the combination of different Natural Language Processing (NLP) and Information Extraction (IE) techniques on legal domain. From NLP, we explore automated named entity recognition strategies in the legal domain. From IE, we explore dynamic topic modeling with tensor decomposition as a tool to investigate the legal reasoning changes embedded in those decisions over time through textual evolution and the presence of the legal named entities. For reliability, we explore the interpretability of the methods employed. Also, we add visual resources to facilitate interpretation by a domain specialist. As a final result, we expect to propose a reliable and cost-effective process to support further studies in the legal domain and, also, to propose new strategies for information extraction on a large collection of documents. [pt] DIREITO [pt] DECOMPOSICAO TENSORIAL [pt] MODELAGEM DINAMICA DE TOPICOS [pt] RECONHECIMENTO DE ENTIDADE NOMEADA [pt] DOMINIO JURIDICO [pt] EXTRACAO DE INFORMACAO [en] LAW [en] TENSOR DECOMPOSITION [en] DYNAMIC TOPIC MODELING [en] NAMED ENTITY RECOGNITION [en] LEGAL DOMAIN [en] EXTRATION OF INFORMATION

Search results