321 |
OLLDA: Dynamic and Scalable Topic Modelling for Twitter : AN ONLINE SUPERVISED LATENT DIRICHLET ALLOCATION ALGORITHMJaradat, Shatha January 2015 (has links)
Providing high quality of topics inference in today's large and dynamic corpora, such as Twitter, is a challenging task. This is especially challenging taking into account that the content in this environment contains short texts and many abbreviations. This project proposes an improvement of a popular online topics modelling algorithm for Latent Dirichlet Allocation (LDA), by incorporating supervision to make it suitable for Twitter context. This improvement is motivated by the need for a single algorithm that achieves both objectives: analyzing huge amounts of documents, including new documents arriving in a stream, and, at the same time, achieving high quality of topics’ detection in special case environments, such as Twitter. The proposed algorithm is a combination of an online algorithm for LDA and a supervised variant of LDA - labeled LDA. The performance and quality of the proposed algorithm is compared with these two algorithms. The results demonstrate that the proposed algorithm has shown better performance and quality when compared to the supervised variant of LDA, and it achieved better results in terms of quality in comparison to the online algorithm. These improvements make our algorithm an attractive option when applied to dynamic environments, like Twitter. An environment for analyzing and labelling data is designed to prepare the dataset before executing the experiments. Possible application areas for the proposed algorithm are tweets recommendation and trends detection. / Tillhandahålla högkvalitativa ämnen slutsats i dagens stora och dynamiska korpusar, såsom Twitter, är en utmanande uppgift. Detta är särskilt utmanande med tanke på att innehållet i den här miljön innehåller korta texter och många förkortningar. Projektet föreslår en förbättring med en populär online ämnen modellering algoritm för Latent Dirichlet Tilldelning (LDA), genom att införliva tillsyn för att göra den lämplig för Twitter sammanhang. Denna förbättring motiveras av behovet av en enda algoritm som uppnår båda målen: analysera stora mängder av dokument, inklusive nya dokument som anländer i en bäck, och samtidigt uppnå hög kvalitet på ämnen "upptäckt i speciella fall miljöer, till exempel som Twitter. Den föreslagna algoritmen är en kombination av en online-algoritm för LDA och en övervakad variant av LDA - Labeled LDA. Prestanda och kvalitet av den föreslagna algoritmen jämförs med dessa två algoritmer. Resultaten visar att den föreslagna algoritmen har visat bättre prestanda och kvalitet i jämförelse med den övervakade varianten av LDA, och det uppnådde bättre resultat i fråga om kvalitet i jämförelse med den online-algoritmen. Dessa förbättringar gör vår algoritm till ett attraktivt alternativ när de tillämpas på dynamiska miljöer, som Twitter. En miljö för att analysera och märkning uppgifter är utformad för att förbereda dataset innan du utför experimenten. Möjliga användningsområden för den föreslagna algoritmen är tweets rekommendation och trender upptäckt.
|
322 |
Topic propagation over time in internet security conferences : Topic modeling as a tool to investigate trends for future research / Ämnesspridning över tid inom säkerhetskonferenser med hjälp av topic modelingJohansson, Richard, Engström Heino, Otto January 2021 (has links)
When conducting research, it is valuable to find high-ranked papers closely related to the specific research area, without spending too much time reading insignificant papers. To make this process more effective an automated process to extract topics from documents would be useful, and this is possible using topic modeling. Topic modeling can also be used to provide topic trends, where a topic is first mentioned, and who the original author was. In this paper, over 5000 articles are scraped from four different top-ranked internet security conferences, using a web scraper built in Python. From the articles, fourteen topics are extracted, using the topic modeling library Gensim and LDA Mallet, and the topics are visualized in graphs to find trends about which topics are emerging and fading away over twenty years. The result found in this research is that topic modeling is a powerful tool to extract topics, and when put into a time perspective, it is possible to identify topic trends, which can be explained when put into a bigger context.
|
323 |
Automatic variance adjusted Bayesian inference with pseudo likelihood under unequal probability sampling: imputation and data syntheticAlmomani, Ayat January 2021 (has links)
No description available.
|
324 |
Semi-parametric Survival Analysis via Dirichlet Process Mixtures of the First Hitting Time ModelRace, Jonathan Andrew January 2019 (has links)
No description available.
|
325 |
Evaluation of applying Crum-based transformation in solving two point boundary value problemsJogiat, Aasif January 2016 (has links)
A dissertation submitted to the Faculty of Engineering and the Built Environment, University of the
Witwatersrand, in ful llment of the requirements for the degree of Master of Science in Engineering, Johannesburg, 2016 / The aim of this research project is evaluating the application of the Crum-based transformation
in solving engineering systems modelled as two-point boundary value problems. The boundary
value problems were subjected to the various combinations of Dirichlet, Non-Dirichlet and Affine
boundary conditions. The engineering systems that were modelled were in the elds of electrostatics,
heat conduction and longitudinal vibrations. Other methods such as the Z-transforms and iterative
methods have been discussed. An attractive property of the Crum-based transformation is that
it can be applied to cases where the eigenparameters (function of eigenvalues) generated in the
discrete case are negative and was therefore chosen to be explored further in this dissertation. An
alternative matrix method was proposed and used instead of the algebraic method in the Crum-
based transformation. The matrix method was tested against the algebraic method using three unit
intervals. The analysis revealed, that as the number of unit intervals increase, there is a general
increase in the accuracy of the approximated continuous-case eigenvalues generated for the discrete
case. The other observed general trend was that the accuracy of the approximated continuous-
case eigenvalues decrease as one ascends the continuous-case eigenvalue spectrum. Three cases:
(Affine, Dirichlet), (Affine, Non-Dirichlet) and (Affine, Affine) generated negative eigenparameters.
The approximated continuous-case eigenvalues, derived from the negative eigenparameters, were
shown not to represent true physical natural frequencies since the discrete eigenvalues, derived from
negative eigenparameters, do not satisfy the condition for purely oscillatory behaviour. The research
has also shown that the Crum-based transformation method was useful in approximating the shifted
eigenvalues of the continuous case, in cases where the generated eigenparameters were negative:
since, as the number of unit intervals increase, the post-transformed approximated eigenvalues
improved in accuracy. The accuracy was also found to be better in the post-transformed case than
in the pre-transformed case. Furthermore, the approximated non-shifted and shifted continuous-
case eigenvalues (except the approximated continuous-case eigenvalues generated from negative
eigenparameters) satis ed the condition for purely oscillatory behaviour. / MT2017
|
326 |
The pro-C anabelian geometry of number fields / 数体の副C遠アーベル幾何についてShimizu, Ryoji 23 March 2023 (has links)
京都大学 / 新制・課程博士 / 博士(理学) / 甲第24392号 / 理博第4891号 / 新制||理||1699(附属図書館) / 京都大学大学院理学研究科数学・数理解析専攻 / (主査)教授 玉川 安騎男, 教授 並河 良典, 教授 望月 新一 / 学位規則第4条第1項該当 / Doctor of Science / Kyoto University / DGAM
|
327 |
Weak functional inequalities in the setting of discrete graphsPopert, Aldo 04 March 2024 (has links)
Abstract: This thesis explores the application of isoperimetric functions to gain weak functional inequalities involving Dirichlet forms. The connection between such weak functional inequalities and bounds on the convergence speed of the corresponding Markov semi-group is established. Three examples of discrete graphs and the corresponding Dirichlet forms are discussed.
|
328 |
Topic Modeling for Customer Insights : A Comparative Analysis of LDA and BERTopic in Categorizing Customer CallsAxelborn, Henrik, Berggren, John January 2023 (has links)
Customer calls serve as a valuable source of feedback for financial service providers, potentially containing a wealth of unexplored insights into customer questions and concerns. However, these call data are typically unstructured and challenging to analyze effectively. This thesis project focuses on leveraging Topic Modeling techniques, a sub-field of Natural Language Processing, to extract meaningful customer insights from recorded customer calls to a European financial service provider. The objective of the study is to compare two widely used Topic Modeling algorithms, Latent Dirichlet Allocation (LDA) and BERTopic, in order to categorize and analyze the content of the calls. By leveraging the power of these algorithms, the thesis aims to provide the company with a comprehensive understanding of customer needs, preferences, and concerns, ultimately facilitating more effective decision-making processes. Through a literature review and dataset analysis, i.e., pre-processing to ensure data quality and consistency, the two algorithms, LDA and BERTopic, are applied to extract latent topics. The performance is then evaluated using quantitative and qualitative measures, i.e., perplexity and coherence scores as well as in- terpretability and usefulness of topic quality. The findings contribute to knowledge on Topic Modeling for customer insights and enable the company to improve customer engagement, satisfaction and tailor their customer strategies. The results show that LDA outperforms BERTopic in terms of topic quality and business value. Although BERTopic demonstrates a slightly better quantitative performance, LDA aligns much better with human interpretation, indicating a stronger ability to capture meaningful and coherent topics within company’s customer call data.
|
329 |
Dirichlet-to-Neumann maps and Nonlinear eigenvalue problemsJernström, Tindra, Öhman, Anna January 2023 (has links)
Differential equations arise frequently in modeling of physical systems, often resulting in linear eigenvalue problems. However, when dealing with large physical domains, solving such problems can be computationally expensive. This thesis examines an alternative approach to solving these problems, which involves utilizing absorbing boundary conditions and a Dirichlet-to-Neumann maps to transform the large sparse linear eigenvalue problem into a smaller nonlinear eigenvalue problem (NEP). The NEP is then solved using augmented Newton’s method. The specific equation investigated in this thesis is the two-dimensional Helmholtz equation, defined on the interval (x, y) ∈ [0, 10] × [0, 1], with the absorbing boundary condition introduced at x = 1. The results show a significant reduction in computational time when using this method compared to the original linear problem, making it a valuable tool for solving large linear eigenvalue problems. Another result is that the NEP does not affect the computational error compared to solving the linear problem, which further supports the NEP as an attractive alternative method.
|
330 |
Views or news? : Exploring the interplay of production and consumption of political news content on YouTubeDarin, Jasper January 2023 (has links)
YouTube is the second largest social media platform in the world, with a multitude of popularchannels which combine politicised commentary with news reporting. The platform providesdirect accessibility to data which makes it possible for the commentators to adjust theircontent to reach wider audiences, however done to an extreme could mean that the creatorspick topics which are the most financially beneficial or lead to fame. If this were the case itwould highlight populist newsmaking and the mechanisms behind it. To investigate theproduction-consumption interaction, data from the 10 most popular channels for 2021 wascollected. Using latent Dirichlet allocation and preferential attachment analysis, the effect ofcumulative advantage, and whether topic choice was driven by views were measured. Apositive feedback loop, where prevalent topics become more prevalent, was found in all buttwo channels, but picking topics which generated more views was only present for onechannel. The findings imply that the top political news commentators over a year have a set oftopics which they return to at a high degree, but choosing the topics which simply are themost popular for the time is not a general feature.
|
Page generated in 0.0647 seconds