61 |
Topic propagation over time in internet security conferences : Topic modeling as a tool to investigate trends for future research / Ämnesspridning över tid inom säkerhetskonferenser med hjälp av topic modelingJohansson, Richard, Engström Heino, Otto January 2021 (has links)
When conducting research, it is valuable to find high-ranked papers closely related to the specific research area, without spending too much time reading insignificant papers. To make this process more effective an automated process to extract topics from documents would be useful, and this is possible using topic modeling. Topic modeling can also be used to provide topic trends, where a topic is first mentioned, and who the original author was. In this paper, over 5000 articles are scraped from four different top-ranked internet security conferences, using a web scraper built in Python. From the articles, fourteen topics are extracted, using the topic modeling library Gensim and LDA Mallet, and the topics are visualized in graphs to find trends about which topics are emerging and fading away over twenty years. The result found in this research is that topic modeling is a powerful tool to extract topics, and when put into a time perspective, it is possible to identify topic trends, which can be explained when put into a bigger context.
|
62 |
Ab-initio Untersuchungen von Phosphorfehlstellen in der Silizium(111)-(2x1) OberflächePötter, Mirco 26 May 2017 (has links)
In der Arbeit werden Fehlstellen in der Silizium (2x1)-(111)-Oberfläche mit den ab-initio-Methoden DFT-LDA und GdW betrachtet. Zunächst werden dabei die bei dieser Oberfläche auftretenden Pandeyketten innerhalb der DFT-LDA näher betrachtet und der Einfluss zwischen und innerhalb der Ketten untersucht. Um den Einfluss der Phosphorfehlstelle zu betrachten wird ein System erstellt, das groß genug ist, um einzelne Defekte in der Oberfläche untersuchen zu können.
Die Ergebnisse der DFT werden durch Berechnungen der Vielteilchenstörungstheorie GdW erweitert, wodurch gleichzeitig gezeigt werden konnte, dass sich die GdW auf Systeme mit 600 Elektronen anwenden lässt. Weiterhin wird durch Betrachtung der ortsaufgelösten DOS die Wechselwirkung der Phosphordotierung mit dem Siliziumkristall untersucht.
|
63 |
IONA: Intelligent Online News AnalysisDoumit, Sarjoun S. January 2018 (has links)
No description available.
|
64 |
Unveiling the Swedish philosophical landscape : A topic model study of the articles of a Swedish philosophical journal from 1980-2020Lindqvist, Björn January 2023 (has links)
Bibliometric research is an important tool for examining the scientific output of various fields of study. By conducting such research, it is possible to see how the influences of different people, ideologies and discoveries have affected the scientific discourse. One way of doing this is through topic modelling, which consists of organizing the words that are used within a set of text data into different topics. To the knowledge of the author, no topic modelling study of Swedish philosophy had previously been conducted. For this reason, this study aimed to partially fill the gap by exploring the publications of one specific Swedish philosophical journal. Using Python, a topic model with 14 topics was created from the journal Filosofisk tidskrift. The change of these topics between the years 1980 and 2020 was examined. Specific attention was given to possible differences between analytic and Continental philosophy. To validate the results, an interview was also held with Fredrik Stjernberg, professor in theoretical philosophy. The results displayed a varied popularity and change for each topic. Too little Continental philosophy was discovered for a proper comparison, leading to the conclusion that Continental philosophy is not very influential in Swedish philosophical discourse. Future research should be conducted on peer-reviewed articles and be backed up by greater professional philosophical aid.
|
65 |
The Cognitive Revolution – Fact or Fiction? : Using topic modelling to look for signs of a paradigm shift in a Swedish journalFagerlind, Johannes January 2023 (has links)
Traditionally, when social scientists wanted to analyze large amounts of documents, they have resorted to using manual coding techniques. This process can be made easier by using machine learning approaches. One such approach, called topic modelling, can find which words commonly occur together and in doing so provide the researcher with semantically coherent topics. This thesis utilizes topic modelling to investigate Nordic Psychology, a psychology journal published in the Nordic languages. Articles published between 1949 and 2005 are examined to map out how discourse has changed during the second half of the 20:th century. Psychology textbooks and researchers active in the late sixties frequently refer to something called the cognitive revolution taking place. Accounts of this revolution paint a picture of something resembling a paradigm shift. This thesis therefore sets out to look for signs of the cognitive revolution being a paradigm shift. The topic model used in this thesis does however not find the traces of a paradigm shift within the dataset, suggesting that if a paradigm shift did take place, it was not reflected in the Nordic Psychology journal.
|
66 |
Facilitating Corpus Annotation by Improving Annotation AggregationFelt, Paul L 01 December 2015 (has links) (PDF)
Annotated text corpora facilitate the linguistic investigation of language as well as the automation of natural language processing (NLP) tasks. NLP tasks include problems such as spam email detection, grammatical analysis, and identifying mentions of people, places, and events in text. However, constructing high quality annotated corpora can be expensive. Cost can be reduced by employing low-cost internet workers in a practice known as crowdsourcing, but the resulting annotations are often inaccurate, decreasing the usefulness of a corpus. This inaccuracy is typically mitigated by collecting multiple redundant judgments and aggregating them (e.g., via majority vote) to produce high quality consensus answers. We improve the quality of consensus labels inferred from imperfect annotations in a number of ways. We show that transfer learning can be used to derive benefit from out-dated annotations which would typically be discarded. We show that, contrary to popular preference, annotation aggregation models that take a generative data modeling approach tend to outperform those that take a condition approach. We leverage this insight to develop csLDA, a novel annotation aggregation model that improves on the state of the art for a variety of annotation tasks. When data does not permit generative data modeling, we identify a conditional data modeling approach based on vector-space text representations that achieves state-of-the-art results on several unusual semantic annotation tasks. Finally, we identify a family of models capable of aggregating annotation data containing heterogenous annotation types such as label frequencies and labeled features. We present a multiannotator active learning algorithm for this model family that jointly selects an annotator, data items, and annotation type.
|
67 |
Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster AnalysisRiley, Owen G. 23 April 2014 (has links) (PDF)
Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these two problems are polysemy and synonymy. Termediator-I, a web application built on top of a collection of glossaries, uses definition count as a measure of term confusion. This tool was an attempt to identify confusing cross-disciplinary terms. As more glossaries were added to the collection, this measure became ineffective. This thesis provides a measure of term polysemy. Term polysemy is effectively measured by semantically clustering the text concepts, or definitions, of each term and counting the number of resulting clusters. Hierarchical clustering uses a measure of proximity between the text concepts. Three such measures are evaluated: cosine similarity, latent semantic indexing, and latent Dirichlet allocation. Two linkage types, for determining cluster proximity during the hierarchical clustering process, are also evaluated: complete linkage and average linkage. Crowdsourcing through a web application was unsuccessfully attempted to obtain a viable clustering threshold by public consensus. An alternate metric of polysemy, convergence value, is identified and tested as a viable clustering threshold. Six resulting lists of terms ranked by cluster count based on convergence values are generated, one for each similarity measure and linkage type combination. Each combination produces a competitive list, and no clear combination can be determined as superior. Semantic clustering successfully identifies polysemous terms, but each similarity measure and linkage type combination provides slightly different results.
|
68 |
Boronic Acids as Optical Chemosensors for Saccharides and Phosphate Related AnalytesPenavic, Andrej 29 August 2022 (has links)
No description available.
|
69 |
Exploring the Potential of Twitter Data and Natural Language Processing Techniques to Understand the Usage of Parks in Stockholm / Utforska potentialen för användning av Natural Language Processing på Twitter data för att förstå användningen av parker i StockholmNorsten, Theodor January 2020 (has links)
Traditional methods used to investigate the usage of parks consists of questionnaire which is both a very time- and- resource consuming method. Today more than four billion people daily use some form of social media platform. This has led to the creation of huge amount of data being generated every day through various social media platforms and has created a potential new source for retrieving large amounts of data. This report will investigate a modern approach, using Natural Language Processing on Twitter data to understand how parks in Stockholm being used. Natural Language Processing (NLP) is an area within artificial intelligence and is referred to the process to read, analyze, and understand large amount of text data and is considered to be the future for understanding unstructured text. Twitter data were obtained through Twitters open API. Data from three parks in Stockholm were collected between the periods 2015-2019. Three analysis were then performed, temporal, sentiment, and topic modeling analysis. The results from the above analysis show that it is possible to understand what attitudes and activities are associated with visiting parks using NLP on social media data. It is clear that sentiment analysis is a difficult task for computers to solve and it is still in an early stage of development. The results from the sentiment analysis indicate some uncertainties. To achieve more reliable results, the analysis would consist of much more data, more thorough cleaning methods and be based on English tweets. One significant conclusion given the results is that people’s attitudes and activities linked to each park are clearly correlated with the different attributes each park consists of. Another clear pattern is that the usage of parks significantly peaks during holiday celebrations and positive sentiments are the most strongly linked emotion with park visits. Findings suggest future studies to focus on combining the approach in this report with geospatial data based on a social media platform were users share their geolocation to a greater extent. / Traditionella metoder använda för att förstå hur människor använder parker består av frågeformulär, en mycket tids -och- resurskrävande metod. Idag använder mer en fyra miljarder människor någon form av social medieplattform dagligen. Det har inneburit att enorma datamängder genereras dagligen via olika sociala media plattformar och har skapat potential för en ny källa att erhålla stora mängder data. Denna undersöker ett modernt tillvägagångssätt, genom användandet av Natural Language Processing av Twitter data för att förstå hur parker i Stockholm används. Natural Language Processing (NLP) är ett område inom artificiell intelligens och syftar till processen att läsa, analysera och förstå stora mängder textdata och anses vara framtiden för att förstå ostrukturerad text. Data från Twitter inhämtades via Twitters öppna API. Data från tre parker i Stockholm erhölls mellan perioden 2015–2019. Tre analyser genomfördes därefter, temporal, sentiment och topic modeling. Resultaten från ovanstående analyser visar att det är möjligt att förstå vilka attityder och aktiviteter som är associerade med att besöka parker genom användandet av NLP baserat på data från sociala medier. Det är tydligt att sentiment analys är ett svårt problem för datorer att lösa och är fortfarande i ett tidigt skede i utvecklingen. Resultaten från sentiment analysen indikerar några osäkerheter. För att uppnå mer tillförlitliga resultat skulle analysen bestått av mycket mer data, mer exakta metoder för data rensning samt baserats på tweets skrivna på engelska. En tydlig slutsats från resultaten är att människors attityder och aktiviteter kopplade till varje park är tydligt korrelerat med de olika attributen respektive park består av. Ytterligare ett tydligt mönster är att användandet av parker är som högst under högtider och att positiva känslor är starkast kopplat till park-besök. Resultaten föreslår att framtida studier fokuserar på att kombinera metoden i denna rapport med geospatial data baserat på en social medieplattform där användare delar sin platsinfo i större utsträckning.
|
70 |
Experimental investigation of the near wall flow structure of a low Reynolds number 3-D turbulent boundary layerFleming, Jonathan Lee 08 August 2007 (has links)
Laser Doppler velocimetry (LDV) measurements and hydrogen-bubble flow-visualization techniques were used to examine the near-wall flow structure of 2-D and 3-D turbulent boundary layers (TBLs) over a range of low Reynolds numbers. The goals of this research were (1) an increased understanding of the flow physics in the near wall region of turbulent boundary layers, (2) to observe and quantify differences between 2-D and 3-D TBL flow structures, and (3) to document Reynolds number effects for 3-D TBLs. An ultimate application of this work would be to improve turbulence modeling for 3-D flows.
The LDV data have provided results detailing the turbulence structure of the 2-D and 3-D TBLs, as well as low uncertainty skin friction estimates. These results include mean Reynolds stress distributions, flow skewing results, and U and V spectra. Effects of Reynolds number for the 3-D flow were examined when possible. Comparison to results with the same 3-D flow geometry but at a significantly higher Reynolds number provided unique insight into the structure of 3-D TBLs. While the 3-D mean and fluctuating velocities were found to be highly dependent on Reynolds number, a previously defined shear stress parameter was discovered to be invariant with Reynolds number.
The hydrogen-bubble technique was used as a flow-visualization tool to examine the near-wall flow structure of 2-D and 3-D TBLs. Both the quantitative and qualitative results displayed larger turbulent fluctuations with more highly concentrated vorticity regions for the 2-D flow. The 2-D low-speed streaky structures experienced greater interaction with the outer region high-momentum fluid than observed for the 3-D flow. The near-wall 3-D flow structures were generally more quiescent. Numerical parameters quantified the observed differences, and characterized the low-speed streak and high-speed sweep events. All observations indicated a more stable near-wall flow structure with less turbulent interactions occurring between the inner and log regions for a 3-D TBL. / Ph. D.
|
Page generated in 0.0785 seconds