• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 77
  • 7
  • 4
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 132
  • 132
  • 43
  • 40
  • 36
  • 33
  • 32
  • 28
  • 25
  • 24
  • 23
  • 23
  • 22
  • 21
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Réseaux de service web : construction, analyse et applications / Web service networks : analysis, construction and applications

Naim, Hafida 13 December 2017 (has links)
Cette thèse se place dans le cadre de services web en dépassant leur description pour considérer leur structuration en réseaux (réseaux d'interaction et réseaux de similitude). Nous proposons des méthodes basées sur les motifs, la modélisation probabiliste et l'analyse des concepts formels, pour améliorer la qualité des services découverts. Trois contributions sont alors proposées: découverte de services diversifiés, recommandation de services et cohérence des communautés de services détectées. Nous structurons d'abord les services sous forme de réseaux. Afin de diversifier les résultats de la découverte, nous proposons une méthode probabiliste qui se base à la fois sur la pertinence, la diversité et la densité des services. Dans le cas de requêtes complexes, nous exploitons le réseau d'interaction de services construit et la notion de diversité dans les graphes pour identifier les services web qui sont susceptibles d'être composables. Nous proposons également un système de recommandation hybride basé sur le contenu et le filtrage collaboratif. L'originalité de la méthode proposée vient de la combinaison des modèles thématiques et les motifs fréquents pour capturer la sémantique commune maximale d'un ensemble de services. Enfin, au lieu de ne traiter que des services individuels, nous considérons aussi un ensemble de services regroupés sous forme de communautés de services pour la recommandation. Nous proposons dans ce contexte, une méthode qui combine la sémantique et la topologie dans les réseaux afin d'évaluer la qualité et la cohérence sémantique des communautés détectées, et classer également les algorithmes de détection de communautés. / As a part of this thesis, we exceed the description of web services to consider their structure as networks (i.e. similarity and interaction web service networks). We propose methods based on patterns, topic models and formal concept analysis, to improve the quality of discovered services. Three contributions are then proposed: (1) diversified services discovery, (2) services recommendation and (3) consistency of detected communities. Firstly, we propose modeling the space of web services through networks. To discover the diversified services corresponding to a given query, we propose a probabilistic method to diversify the discovery results based on relevancy, diversity and service density. In case of complex requests, it is necessary to combine multiple web services to fulfill this kind of requests. In this regard, we use the interaction web service network and the diversity notion in graphs to identify all possible services compositions. We also propose a new hybrid recommendation system based on both content and collaborative filtering. Its originality comes from the combination of probabilistic topic models and pattern mining to capture the maximal common semantic of a set of services. Finally, instead of processing individual services, we consider a set of services grouped into service communities for the recommendation. We propose in this context, a new method combining both topology and semantics to evaluate the quality and the semantic consistency of detected communities, and also rank the detection communities algorithms.
102

What makes an (audio)book popular? / Vad gör en (ljud)bok populär?

Barakat, Arian January 2018 (has links)
Audiobook reading has traditionally been used for educational purposes but has in recent times grown into a popular alternative to the more traditional means of consuming literature. In order to differentiate themselves from other players in the market, but also provide their users enjoyable literature, several audiobook companies have lately directed their efforts on producing own content. Creating highly rated content is, however, no easy task and one reoccurring challenge is how to make a bestselling story. In an attempt to identify latent features shared by successful audiobooks and evaluate proposed methods for literary quantification, this thesis employs an array of frameworks from the field of Statistics, Machine Learning and Natural Language Processing on data and literature provided by Storytel - Sweden’s largest audiobook company. We analyze and identify important features from a collection of 3077 Swedish books concerning their promotional and literary success. By considering features from the aspects Metadata, Theme, Plot, Style and Readability, we found that popular books are typically published as a book series, cover 1-3 central topics, write about, e.g., daughter-mother relationships and human closeness but that they also hold, on average, a higher proportion of verbs and a lower degree of short words. Despite successfully identifying these, but also other factors, we recognized that none of our models predicted “bestseller” adequately and that future work may desire to study additional factors, employ other models or even use different metrics to define and measure popularity. From our evaluation of the literary quantification methods, namely topic modeling and narrative approximation, we found that these methods are, in general, suitable for Swedish texts but that they require further improvement and experimentation to be successfully deployed for Swedish literature. For topic modeling, we recognized that the sole use of nouns provided more interpretable topics and that the inclusion of character names tended to pollute the topics. We also identified and discussed the possible problem of word inflections when modeling topics for more morphologically complex languages, and that additional preprocessing treatments such as word lemmatization or post-training text normalization may improve the quality and interpretability of topics. For the narrative approximation, we discovered that the method currently suffers from three shortcomings: (1) unreliable sentence segmentation, (2) unsatisfactory dictionary-based sentiment analysis and (3) the possible loss of sentiment information induced by translations. Despite only examining a handful of literary work, we further found that books written initially in Swedish had narratives that were more cross-language consistent compared to books written in English and then translated to Swedish.
103

Labeling Clinical Reports with Active Learning and Topic Modeling / Uppmärkning av kliniska rapporter med active learning och topic modeller

Lindblad, Simon January 2018 (has links)
Supervised machine learning models require a labeled data set of high quality in order to perform well. Available text data often exists in abundance, but it is usually not labeled. Labeling text data is a time consuming process, especially in the case where multiple labels can be assigned to a single text document. The purpose of this thesis was to make the labeling process of clinical reports as effective and effortless as possible by evaluating different multi-label active learning strategies. The goal of the strategies was to reduce the number of labeled documents a model needs, and increase the quality of those documents. With the strategies, an accuracy of 89% was achieved with 2500 reports, compared to 85% with random sampling. In addition to this, 85% accuracy could be reached after labeling 975 reports, compared to 1700 reports with random sampling.
104

Explorer et apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds / Mining and learning from multilingual text collections using topic models and word embeddings

Balikas, Georgios 20 October 2017 (has links)
Le texte est l'une des sources d'informations les plus répandues et les plus persistantes. L'analyse de contenu du texte se réfère à des méthodes d'étude et de récupération d'informations à partir de documents. Aujourd'hui, avec une quantité de texte disponible en ligne toujours croissante l'analyse de contenu du texte revêt une grande importance parce qu' elle permet une variété d'applications. À cette fin, les méthodes d'apprentissage de la représentation sans supervision telles que les modèles thématiques et les word embeddings constituent des outils importants.L'objectif de cette dissertation est d'étudier et de relever des défis dans ce domaine.Dans la première partie de la thèse, nous nous concentrons sur les modèles thématiques et plus précisément sur la manière d'incorporer des informations antérieures sur la structure du texte à ces modèles.Les modèles de sujets sont basés sur le principe du sac-de-mots et, par conséquent, les mots sont échangeables. Bien que cette hypothèse profite les calculs des probabilités conditionnelles, cela entraîne une perte d'information.Pour éviter cette limitation, nous proposons deux mécanismes qui étendent les modèles de sujets en intégrant leur connaissance de la structure du texte. Nous supposons que les documents sont répartis dans des segments de texte cohérents. Le premier mécanisme attribue le même sujet aux mots d'un segment. La seconde, capitalise sur les propriétés de copulas, un outil principalement utilisé dans les domaines de l'économie et de la gestion des risques, qui sert à modéliser les distributions communes de densité de probabilité des variables aléatoires tout en n'accédant qu'à leurs marginaux.La deuxième partie de la thèse explore les modèles de sujets bilingues pour les collections comparables avec des alignements de documents explicites. En règle générale, une collection de documents pour ces modèles se présente sous la forme de paires de documents comparables. Les documents d'une paire sont écrits dans différentes langues et sont thématiquement similaires. À moins de traductions, les documents d'une paire sont semblables dans une certaine mesure seulement. Pendant ce temps, les modèles de sujets représentatifs supposent que les documents ont des distributions thématiques identiques, ce qui constitue une hypothèse forte et limitante. Pour le surmonter, nous proposons de nouveaux modèles thématiques bilingues qui intègrent la notion de similitude interlingue des documents qui constituent les paires dans leurs processus générateurs et d'inférence.La dernière partie de la thèse porte sur l'utilisation d'embeddings de mots et de réseaux de neurones pour trois applications d'exploration de texte. Tout d'abord, nous abordons la classification du document polylinguistique où nous soutenons que les traductions d'un document peuvent être utilisées pour enrichir sa représentation. À l'aide d'un codeur automatique pour obtenir ces représentations de documents robustes, nous démontrons des améliorations dans la tâche de classification de documents multi-classes. Deuxièmement, nous explorons la classification des tweets à plusieurs tâches en soutenant que, en formant conjointement des systèmes de classification utilisant des tâches corrélées, on peut améliorer la performance obtenue. À cette fin, nous montrons comment réaliser des performances de pointe sur une tâche de classification du sentiment en utilisant des réseaux neuronaux récurrents. La troisième application que nous explorons est la récupération d'informations entre langues. Compte tenu d'un document écrit dans une langue, la tâche consiste à récupérer les documents les plus similaires à partir d'un ensemble de documents écrits dans une autre langue. Dans cette ligne de recherche, nous montrons qu'en adaptant le problème du transport pour la tâche d'estimation des distances documentaires, on peut obtenir des améliorations importantes. / Text is one of the most pervasive and persistent sources of information. Content analysis of text in its broad sense refers to methods for studying and retrieving information from documents. Nowadays, with the ever increasing amounts of text becoming available online is several languages and different styles, content analysis of text is of tremendous importance as it enables a variety of applications. To this end, unsupervised representation learning methods such as topic models and word embeddings constitute prominent tools.The goal of this dissertation is to study and address challengingproblems in this area, focusing on both the design of novel text miningalgorithms and tools, as well as on studying how these tools can be applied to text collections written in a single or several languages.In the first part of the thesis we focus on topic models and more precisely on how to incorporate prior information of text structure to such models.Topic models are built on the premise of bag-of-words, and therefore words are exchangeable. While this assumption benefits the calculations of the conditional probabilities it results in loss of information.To overcome this limitation we propose two mechanisms that extend topic models by integrating knowledge of text structure to them. We assume that the documents are partitioned in thematically coherent text segments. The first mechanism assigns the same topic to the words of a segment. The second, capitalizes on the properties of copulas, a tool mainly used in the fields of economics and risk management that is used to model the joint probability density distributions of random variables while having access only to their marginals.The second part of the thesis explores bilingual topic models for comparable corpora with explicit document alignments. Typically, a document collection for such models is in the form of comparable document pairs. The documents of a pair are written in different languages and are thematically similar. Unless translations, the documents of a pair are similar to some extent only. Meanwhile, representative topic models assume that the documents have identical topic distributions, which is a strong and limiting assumption. To overcome it we propose novel bilingual topic models that incorporate the notion of cross-lingual similarity of the documents that constitute the pairs in their generative and inference processes. Calculating this cross-lingual document similarity is a task on itself, which we propose to address using cross-lingual word embeddings.The last part of the thesis concerns the use of word embeddings and neural networks for three text mining applications. First, we discuss polylingual document classification where we argue that translations of a document can be used to enrich its representation. Using an auto-encoder to obtain these robust document representations we demonstrate improvements in the task of multi-class document classification. Second, we explore multi-task sentiment classification of tweets arguing that by jointly training classification systems using correlated tasks can improve the obtained performance. To this end we show how can achieve state-of-the-art performance on a sentiment classification task using recurrent neural networks. The third application we explore is cross-lingual information retrieval. Given a document written in one language, the task consists in retrieving the most similar documents from a pool of documents written in another language. In this line of research, we show that by adapting the transportation problem for the task of estimating document distances one can achieve important improvements.
105

Exploring NMF and LDA Topic Models of Swedish News Articles

Svensson, Karin, Blad, Johan January 2020 (has links)
The ability to automatically analyze and segment news articles by their content is a growing research field. This thesis explores the unsupervised machine learning method topic modeling applied on Swedish news articles for generating topics to describe and segment articles. Specifically, the algorithms non-negative matrix factorization (NMF) and the latent Dirichlet allocation (LDA) are implemented and evaluated. Their usefulness in the news media industry is assessed by its ability to serve as a uniform categorization framework for news articles. This thesis fills a research gap by studying the application of topic modeling on Swedish news articles and contributes by showing that this can yield meaningful results. It is shown that Swedish text data requires extensive data preparation for successful topic models and that nouns exclusively and especially common nouns are the most suitable words to use. Furthermore, the results show that both NMF and LDA are valuable as content analysis tools and categorization frameworks, but they have different characteristics, hence optimal for different use cases. Lastly, the conclusion is that topic models have issues since they can generate unreliable topics that could be misleading for news consumers, but that they nonetheless can be powerful methods for analyzing and segmenting articles efficiently on a grand scale by organizations internally. The thesis project is a collaboration with one of Sweden’s largest media groups and its results have led to a topic modeling implementation for large-scale content analysis to gain insight into readers’ interests.
106

Sdílená ekonomika v kontextu postmateriálních hodnot: případ segmentu ubytování v Praze / Sharing Economy in the Context of Postmaterial Values: The Case of Accommodation Segment in Prague

Svobodová, Tereza January 2020 (has links)
This master's thesis is about the success of sharing economy in the accommodation segment in Prague. The thesis is based on theories conceptualizing sharing economy as a result of social and value change, not only as technological one. Using online review data, the user experience of shared accommodation via Airbnb and traditional via Booking are compared. Analysis is conducted with focus on users' satisfied needs and fulfilled values. For processing the data, text mining techniques (topic modelling and sentiment analysis) were employed. The major result is that in Prague the models of sharing economy accommodation meets the growing need in society to fulfil post-material values in the market much better than the models of traditional accommodation (hotels, hostels, boarding houses). In their experiences, Airbnb users reflect social and emotional values more often, even though most sharing economy accommodations in Prague do not involve any physical sharing with the host. The thesis thus brings a unique perspective on the Airbnb phenomenon in the Czech context and contributes to the discussion of why the market share of the sharing economy in the accommodation segment in Prague has been growing, while traditional models stagnated.
107

Neural probabilistic topic modeling of short and messy text / Neuronprobabilistisk ämnesmodellering av kort och stökig text

Harrysson, Mattias January 2016 (has links)
Exploring massive amount of user generated data with topics posits a new way to find useful information. The topics are assumed to be “hidden” and must be “uncovered” by statistical methods such as topic modeling. However, the user generated data is typically short and messy e.g. informal chat conversations, heavy use of slang words and “noise” which could be URL’s or other forms of pseudo-text. This type of data is difficult to process for most natural language processing methods, including topic modeling. This thesis attempts to find the approach that objectively give the better topics from short and messy text in a comparative study. The compared approaches are latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words, and a new approach based on previous work named Neural Probabilistic Topic Modeling (NPTM). It could only be concluded that NPTM have a tendency to achieve better topics on short and messy text than LDA and RO-LDA. GMM on the other hand could not produce any meaningful results at all. The results are less conclusive since NPTM suffers from long running times which prevented enough samples to be obtained for a statistical test. / Att utforska enorma mängder användargenererad data med ämnen postulerar ett nytt sätt att hitta användbar information. Ämnena antas vara “gömda” och måste “avtäckas” med statistiska metoder såsom ämnesmodellering. Dock är användargenererad data generellt sätt kort och stökig t.ex. informella chattkonversationer, mycket slangord och “brus” som kan vara URL:er eller andra former av pseudo-text. Denna typ av data är svår att bearbeta för de flesta algoritmer i naturligt språk, inklusive ämnesmodellering. Det här arbetet har försökt hitta den metod som objektivt ger dem bättre ämnena ur kort och stökig text i en jämförande studie. De metoder som jämfördes var latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words samt en egen metod med namnet Neural Probabilistic Topic Modeling (NPTM) baserat på tidigare arbeten. Den slutsats som kan dras är att NPTM har en tendens att ge bättre ämnen på kort och stökig text jämfört med LDA och RO-LDA. GMM lyckades inte ge några meningsfulla resultat alls. Resultaten är mindre bevisande eftersom NPTM har problem med långa körtider vilket innebär att tillräckligt många stickprov inte kunde erhållas för ett statistiskt test.
108

The impact of sentiment and misinformation cycling through the social media platform, Twitter, during the initial phase of the COVID-19 vaccine rollout

Burwell, Emily Grace 01 June 2022 (has links)
No description available.
109

Exploring Hybrid Topic Based Sentiment Analysis as Author Identification Method on Swedish Documents

Jakob, Bremer January 2021 (has links)
The Swedish national bank has had shifting policies when it comes to publicity and confidentiality concerning publishing of texts within the bank. For some time, texts written by commissioners within the bank were decided to be published anonymously. Later they revoked the confidentiality policy, publishing all documents publicly again. This led to emerged interests in possible shifting attitudes toward topics discussed by the commissioners when writing anonymously versus publicly. On a request, based on the interests, there are ongoing analyses being conducted with the help of language technology where topics are extracted from the anonymous and public documents respectively. The aim is to find topics related to individual commissioners with the purpose of, as accurately as possible, identifying which of the anonymous documents is written by who. To discover unique relations between the commissioners and the generated topics, this thesis proposes hybrid topic based sentiment analysis as an author identification method to be able to use sentiments of topics as identifying features of commissioners. The results showed promise in the proposed approach. Though, further research is substantial, conducting comparisons with other acknowledged author identification methods, to confirm some level of efficacy, especially on documents containing close similarities among topics.
110

Sentiment Analysis of MOOC learner reviews : What motivates learners to complete a course?

Knöös, Johanna, Rääf, Siri Amanda January 2021 (has links)
In the last decade, development of Information and Communication Technology (ICT) thatsupports online learning has increased the demand for e-learning and Massive Open OnlineCourses (MOOCs). Despite their increased popularity, MOOCs are struggling with highdropout rates and only a small percentage of learners complete the courses they enrolled in. Thepurpose of this thesis is to gain knowledge about MOOC learner behaviour. The aim of thestudy is to identify the motivations of learners and how these differ between learners whocompleted a course and those who dropped out. Research on MOOC learners has mostly beencarried out using a quantitative approach. While quantitative methodologies are effective inhandling the large amount of data produced by MOOCs, qualitative methods can give deeperinsights into online learners’ motivations. Therefore, this thesis employs an explanatorysequential mixed methods research, in which sentiment analysis and topic modeling of learnerreviews from the platform Coursera are further explained by qualitative interviews with MOOClearners. In the study 28,000 reviews scraped from five courses within the fields of data sciencewere analyzed and ten interviews were held with learners who either completed, dropped outfrom or both completed and dropped out from a MOOC. In the quantitative analysis nine coursefactors were found that learners wrote about: content, delivery, assessment, learning experience,tools, video material, teaching style, instructor skills and course provider. In addition, eighteenthemes were yielded from the interviews: self-discipline, just for fun, certificates, personaldevelopment, knowledge, career, time, equipment, practical exercise, interaction, instructor,reality, structure, external material, cost, community, degree of difficulty and other. In thediscussion the empirical findings are reflected upon using the theoretical framework of theresearch and the literature review. The result does not reveal any differences in motivationsbetween learners who completed a course and those who dropped out, however, it does identifyfactors that caused learners’ to drop out and the topics that most negative learner reviews wereabout. This research contributes to the body of knowledge in the field of research on MOOClearner retention and motivations. The topic is relevant for research in education informaticsand for continued improvements in delivery of MOOCs.

Page generated in 0.0687 seconds