Global ETD Search

11	Unveiling the Swedish philosophical landscape : A topic model study of the articles of a Swedish philosophical journal from 1980-2020 Lindqvist, Björn January 2023 (has links) Bibliometric research is an important tool for examining the scientific output of various fields of study. By conducting such research, it is possible to see how the influences of different people, ideologies and discoveries have affected the scientific discourse. One way of doing this is through topic modelling, which consists of organizing the words that are used within a set of text data into different topics. To the knowledge of the author, no topic modelling study of Swedish philosophy had previously been conducted. For this reason, this study aimed to partially fill the gap by exploring the publications of one specific Swedish philosophical journal. Using Python, a topic model with 14 topics was created from the journal Filosofisk tidskrift. The change of these topics between the years 1980 and 2020 was examined. Specific attention was given to possible differences between analytic and Continental philosophy. To validate the results, an interview was also held with Fredrik Stjernberg, professor in theoretical philosophy. The results displayed a varied popularity and change for each topic. Too little Continental philosophy was discovered for a proper comparison, leading to the conclusion that Continental philosophy is not very influential in Swedish philosophical discourse. Future research should be conducted on peer-reviewed articles and be backed up by greater professional philosophical aid. bibliometrics topic modelling LDA philosophy Swedish Continental analytic
12	The Cognitive Revolution – Fact or Fiction? : Using topic modelling to look for signs of a paradigm shift in a Swedish journal Fagerlind, Johannes January 2023 (has links) Traditionally, when social scientists wanted to analyze large amounts of documents, they have resorted to using manual coding techniques. This process can be made easier by using machine learning approaches. One such approach, called topic modelling, can find which words commonly occur together and in doing so provide the researcher with semantically coherent topics. This thesis utilizes topic modelling to investigate Nordic Psychology, a psychology journal published in the Nordic languages. Articles published between 1949 and 2005 are examined to map out how discourse has changed during the second half of the 20:th century. Psychology textbooks and researchers active in the late sixties frequently refer to something called the cognitive revolution taking place. Accounts of this revolution paint a picture of something resembling a paradigm shift. This thesis therefore sets out to look for signs of the cognitive revolution being a paradigm shift. The topic model used in this thesis does however not find the traces of a paradigm shift within the dataset, suggesting that if a paradigm shift did take place, it was not reflected in the Nordic Psychology journal. topic modelling LDA psychology paradigm shift cognitive revolution Swedish Computer and Information Sciences Data- och informationsvetenskap Psychology Psykologi
13	Spatial Regularization for Analysis of Text and Epidemiological Data MAITI, ANIRUDDHA, 0000-0002-1142-6344 January 2022 (has links) Use of spatial data has become an important aspect of data analysis. Use of location information can provide useful insight into the dataset. Advancement of sensor technologies and improved data connectivity have made it possible to the generation of large amounts of passively generated user location data. Apart from passively generated data from users, explicit effort has been made by commercial vendors to curate large amounts of location related data such as residential histories from a variety of sources such as credit records, litigation data, driving license records etc. Such spatial data, when linked with other datasets can provide useful insights. In this dissertation, we show that spatial information of data enables us to derive useful insights in domains of text analysis and epidemiology. We investigated primarily two types of data having spatial information - text data with location information and disease related data having residential address information. We show that in the case of text data, spatial information helps us find spatially informative topics. In the case of epidemiological data, we show residential information can be used to identify high risk spatial regions. There are instances where a primary analysis is not sufficient to establish a statistically robust conclusion. For instance, in domains such as epidemiology, where a finding is not considered to be relevant unless some statistical significance is established. We proposed techniques for significant tests which can be applied to text analysis, topic modelling, and disease mapping tasks in order to establish significance of the findings. / Computer and Information Science Computer science Hypothesis testing Microblog data Residential history data Spatial epidemiology Spatial text analysis Topic modelling
14	Simplifying Q&A Systems with Topic Modelling Kozee, Troy January 2017 (has links) No description available. Computer Science Information Retrieval question-answer natural language processing topic modelling lucene
15	Fifty Years of Information Management Research: A Conceptual Structure Analysis using Structural Topic Modeling Sharma, A., Rana, Nripendra P., Nunkoo, R. 10 January 2021 (has links) Yes / Information management is the management of organizational processes, technologies, and people which collectively create, acquire, integrate, organize, process, store, disseminate, access, and dispose of the information. Information management is a vast, multi-disciplinary domain that syndicates various subdomains and perfectly intermingles with other domains. This study aims to provide a comprehensive overview of the information management domain from 1970 to 2019. Drawing upon the methodology from statistical text analysis research, this study summarizes the evolution of knowledge in this domain by examining the publication trends as per authors, institutions, countries, etc. Further, this study proposes a probabilistic generative model based on structural topic modeling to understand and extract the latent themes from the research articles related to information management. Furthermore, this study graphically visualizes the variations in the topic prevalences over the period of 1970 to 2019. The results highlight that the most common themes are data management, knowledge management, environmental management, project management, service management, and mobile and web management. The findings also identify themes such as knowledge management, environmental management, project management, and social communication as academic hotspots for future research. Information management Structural topic models Topic modelling Generative models Text analytics
16	Latent Dirichlet Allocation for the Detection of Multi-Stage Attacks Lefoane, Moemedi, Ghafir, Ibrahim, Kabir, Sohag, Awan, Irfan U. 19 December 2023 (has links) No / The rapid shift and increase in remote access to organisation resources have led to a significant increase in the number of attack vectors and attack surfaces, which in turn has motivated the development of newer and more sophisticated cyber-attacks. Such attacks include Multi-Stage Attacks (MSAs). In MSAs, the attack is executed through several stages. Classifying malicious traffic into stages to get more information about the attack life-cycle becomes a challenge. This paper proposes a malicious traffic clustering approach based on Latent Dirichlet Allocation (LDA). LDA is a topic modelling approach used in natural language processing to address similar problems. The proposed approach is unsupervised learning and therefore will be beneficial in scenarios where traffic data is not labeled and analysis needs to be performed. The proposed approach uncovers intrinsic contexts that relate to different categories of attack stages in MSAs. These are vital insights needed across different areas of cybersecurity teams like Incident Response (IR) within the Security Operations Center (SOC), the insights uncovered could have a positive impact in ensuring that attacks are detected at early stages in MSAs. Besides, for IR, these insights help to understand the attack behavioural patterns and lead to reduced time in recovery following an incident. The proposed approach is evaluated on a publicly available MSAs dataset. The performance results are promising as evidenced by over 99% accuracy in identified malicious traffic clusters. Multi-stage attack Network security Intrusion detection system Latent dirichlet allocation Topic modelling
17	Three Essays on Phishing Attacks, Individual Susceptibility, and Detection Accuracy Bera, Debalina 08 1900 (has links) Phishing is a social engineering attack to deceive and persuade people to divulge private information like usernames and passwords, account details (including bank account details), and social security numbers. Phishers typically utilize e-mail, chat, text messages, or social media. Despite the presence of automatic anti-phishing filters, phishing messages reach online users' inboxes. Understanding the influence of phishing techniques and individual differences on susceptibility and detection accuracy is an important step toward creating comprehensive behavioral and organizational anti-phishing awareness programs. This dissertation seeks to achieve a dual purpose in a series of three essays. Essay 1 seeks to explore the nature of phishing threats that including identifying attack intentions, and psychological and design techniques of phishing attacks. Essay 2 seeks to understand the relative influence of attack techniques and individual phishing experiential traits on people's phishing susceptibility. Essay 3 seeks to understand an individual's cognitive and affective differences that differentiate between an individual's phishing detection accuracy. Phishing susceptibility detection accuracy topic modelling content analysis mock phishing experiment phishing techniques mindfulness affectivity
18	Characterisation of a developer’s experience fields using topic modelling Déhaye, Vincent January 2020 (has links) Finding the most relevant candidate for a position represents an ubiquitous challenge for organisations. It can also be arduous for a candidate to explain on a concise resume what they have experience with. Due to the fact that the candidate usually has to select which experience to expose and filter out some of them, they might not be detected by the person carrying out the search, whereas they were indeed having the desired experience. In the field of software engineering, developing one's experience usually leaves traces behind: the code one produced. This project explores approaches to tackle the screening challenges with an automated way of extracting experience directly from code by defining common lexical patterns in code for different experience fields, using topic modeling. Two different techniques were compared. On one hand, Latent Dirichlet Allocation (LDA) is a generative statistical model which has proven to yield good results in topic modeling. On the other hand Non-Negative Matrix Factorization (NMF) is simply a singular value decomposition of a matrix representing the code corpus as word counts per piece of code.The code gathered consisted of 30 random repositories from all the collaborators of the open-source Ruby-on-Rails project on GitHub, which was then applied common natural language processing transformation steps. The results of both techniques were compared using respectively perplexity for LDA, reconstruction error for NMF and topic coherence for both. The two first represent how well the data could be represented by the topics produced while the later estimates the hanging and fitting together of the elements of a topic, and can depict human understandability and interpretability. Given that we did not have any similar work to benchmark with, the performance of the values obtained is hard to assess scientifically. However, the method seems promising as we would have been rather confident in assigning labels to 10 of the topics generated. The results imply that one could probably use natural language processing methods directly on code production in order to extend the detected fields of experience of a developer, with a finer granularity than traditional resumes and with fields definition evolving dynamically with the technology. Computer Systems Datorsystem
19	Improving the speed and quality of an Adverse Event cluster analysis with Stepwise Expectation Maximization and Community Detection Erlanson, Nils January 2020 (has links) Adverse drug reactions are unwanted effects alongside the intended benefit of a drug and might be responsible for 3-7\% of hospitalizations. Finding such reactions is partly done by analysing individual case safety reports (ICSR) of adverse events. The reports consist of categorical terms that describe the event.Data-driven identification of suspected adverse drug reactions using this data typically considers single adverse event terms, one at a time. This single term approach narrows the identification of reports and information in the reports is ignored during the search. If one instead assumes that each report is connected to a topic, then by creating a cluster of the reports that are connected to the topic more reports would be identified. More context would also be provided by virtue of the topics. This thesis takes place at Uppsala Monitoring Centre which has implemented a probabilistic model of how an ICSR, and its topic, is assumed to be generated. The parameters of the model are estimated with expectation maximization (EM), which also assigns the reports to clusters. The clusters are improved with Consensus Clustering that identify groups of reports that tend to be grouped together by several runs of EM. Additionally, in order to not cluster outlying reports all clusters below a certain size are excluded. The objective of the thesis is to improve the algorithm in terms of computational efficiency and quality, as measured by stability and clinical coherence. The convergence of EM is improved using stepwise EM, which resulted in a speed up of at least 1.4, and a decrease of the computational complexity. With all the speed improvements the speed up factor of the entire algorithm can reach 2 but is constrained by the size of the data. In order to improve the clusters' quality, the community detection algorithm Leiden is used. It is able to improve the stability with the added benefit of increasing the number of clustered reports. The clinical coherence score performs worse with Leiden. There are good reasons to further investigate the benefits of Leiden as there were suggestions that community detection identified clusters with greater resolution that still appeared clinically coherent in a posthoc analysis. Adverse Drug Reactions Pharmacovigilance Cluster Analysis Stepwise Expectation Maximization Community Detection Topic Modelling Engineering and Technology Teknik och teknologier
20	Anemone: a Visual Semantic Graph Ficapal Vila, Joan January 2019 (has links) Semantic graphs have been used for optimizing various natural language processing tasks as well as augmenting search and information retrieval tasks. In most cases these semantic graphs have been constructed through supervised machine learning methodologies that depend on manually curated ontologies such as Wikipedia or similar. In this thesis, which consists of two parts, we explore in the first part the possibility to automatically populate a semantic graph from an ad hoc data set of 50 000 newspaper articles in a completely unsupervised manner. The utility of the visual representation of the resulting graph is tested on 14 human subjects performing basic information retrieval tasks on a subset of the articles. Our study shows that, for entity finding and document similarity our feature engineering is viable and the visual map produced by our artifact is visually useful. In the second part, we explore the possibility to identify entity relationships in an unsupervised fashion by employing abstractive deep learning methods for sentence reformulation. The reformulated sentence structures are qualitatively assessed with respect to grammatical correctness and meaningfulness as perceived by 14 test subjects. We negatively evaluate the outcomes of this second part as they have not been good enough to acquire any definitive conclusion but have instead opened new doors to explore. / Semantiska grafer har använts för att optimera olika processer för naturlig språkbehandling samt för att förbättra sökoch informationsinhämtningsuppgifter. I de flesta fall har sådana semantiska grafer konstruerats genom övervakade maskininlärningsmetoder som förutsätter manuellt kurerade ontologier såsom Wikipedia eller liknande. I denna uppsats, som består av två delar, undersöker vi i första delen möjligheten att automatiskt generera en semantisk graf från ett ad hoc dataset bestående av 50 000 tidningsartiklar på ett helt oövervakat sätt. Användbarheten hos den visuella representationen av den resulterande grafen testas på 14 försökspersoner som utför grundläggande informationshämtningsuppgifter på en delmängd av artiklarna. Vår studie visar att vår funktionalitet är lönsam för att hitta och dokumentera likhet med varandra, och den visuella kartan som produceras av vår artefakt är visuellt användbar. I den andra delen utforskar vi möjligheten att identifiera entitetsrelationer på ett oövervakat sätt genom att använda abstraktiva djupa inlärningsmetoder för meningsomformulering. De omformulerade meningarna utvärderas kvalitativt med avseende på grammatisk korrekthet och meningsfullhet såsom detta uppfattas av 14 testpersoner. Vi utvärderar negativt resultaten av denna andra del, eftersom de inte har varit tillräckligt bra för att få någon definitiv slutsats, men har istället öppnat nya dörrar för att utforska. Neo4j Topic Modelling Semantic Graph Latent Dirichlet Allocation (LDA) NER Sentence Reformulation. Computer and Information Sciences Data- och informationsvetenskap

Search results