1 |
Aspect discovery and sentiment classification for online reviewsBurns, Nicola January 2013 (has links)
Buying products and services online is becoming increasingly popular and as a result there are a vast number of online reviews. Automatic classification of this increasing large data has become a popular area of interest in recent research as the information contained in these reviews is valuable to potential customers and marketing intelligence. The work in this thesis is focused on discovering aspects and sentiment of online reviews using a topic modelling based approach. Sentiment analysis is to automatically discover opinions whereas topic modelling discovers latent topics. Topic modelling is combined with sentiment analysis techniques to create an effective approach to sentiment analysis. There are three problems which are addressed in this work. Firstly, the classes of real world product reviews tend to be highly imbalanced. When dealing with unbalanced data, data miners usually pre-process the unbalanced data so that they are class-balanced. This work therefore studies the comparison of balanced vs unbalanced datasets, and aims to answer the question: how to model unbalanced data sets, either artificially balance them or keep them unbalanced as they are? A series of experiments are performed to investigate the datasets in different scenarios. Experimental results provide evidence that within the product review domain there is no need to artificially balance a dataset as sentiment analysis on an unbalanced dataset performs better than a balanced dataset. Secondly, the LDA (Latent Dirichlet allocation) model is a popular choice for topic modelling, however the model comes with some shortcomings including identifying topics which could be considered too broad and the manual work to label all the topics produced. This work proposes a novel method, the Twofold-LDA model, to identify aspects and quantify sentiment, which incorporates domain knowledge, removes the one aspect per sentence assumption, and extracts such information that allows the sentiment analysis results to be presented in a user-friendly way. Finally, there has been no known work which focuses on ways to improve topic modelling to perform sentiment classification. As past studies show sentiment analysis techniques provide good performance for identifying sentiment, this work looks at how to incorporate sentiment analysis techniques into the topic modelling process. The Enhanced Twofold-LOA model is proposed which incorporates part-of-speech tagging into the topic / modelling process via altering the Gibbs sampling process. A case study is carried out to demonstrate the ability of the Enhanced Twofold-LDA model for solving practical problems, in particular through creating an end user application aimed at hotel customers.
|
2 |
Analyse et application de la diffusion d'information dans les microblogs / The analysis and applications of information diffusion in microblogsWang, Dong 22 October 2015 (has links)
Les services de microblogging (comme Twitter ou Sina Weibo) sont devenu ces dernières années des plateformes très importantes de partage d'information sur l'Internet. Les microblogs sont fréquemment utilisé pour l'analyse de l'opinion, le marketing viral, et les campagnes politiques. Comprendre les mécanismes sous-jacents de la diffusion d'information sur les microblogs et comment des contenus deviennent populaires est important.L‘analyse de la diffusion d'information dans les microblogs nécessite la collecte de donnée des microblogs, la modélisation de la diffusion d'information et l'application des modèles résultants. Traiter les données massives issues des microblogs est un défi en soi. Concevoir des algorithmes efficaces et sans biais afin d'échantillonner les microblogs est ainsi fondamental. Ceci doit prendre en compte la complexité du phénomène de « retweet » qui dépend de la valeur éphémère de l'information, de la topologie du réseau de microblogging et des caractéristiques particulières des éditeurs et retweeteurs.Deux modèles ont été traditionnellement appliqués à la diffusion d'information : les cascades indépendantes et modèle à seuil linéaire. Aucun de ces deux modèles n'est à même de décrire le processus du retweeting de façon correcte. Il devient donc nécessaire de de caractériser la diffusion d'information. De plus, une description complète de la relation entre la diffusion d'information dans les microblogs et de popularité des termes recherchés sur Internet serait utile.Ces travaux de thèse présentent une analyse complète de la diffusion d'information dans les microblogs. Les contributions ce cette thèse sont les suivantes :1) Il y'a deux technique d'échantillonnage sans biais pour les réseaux sociaux : la marche aléatoire de Métropolis-Hastings (MHRW), et la méthode d'échantillonnage sans biais de graphes dirigés (USDSG). Néanmoins ces deux méthodes peuvent aboutit à un taux important d'auto-échantillonnage quand elles sont appliquées à des microblogs. Pour résoudre ce problème, j'ai modélisé l'échantillonnage d'un OSN par un processus de Markov et j'en ai déduit les conditions nécessaires et suffisantes d'un échantillonnage sans biais. Ces conditions m'ont permis de proposer un algorithme d'échantillonnage sans biais et efficace que j'ai nommé : échantillonnage sans biais par liens vide (USDE). Cette nouvelle méthode d'échantillonage réduit fortement l'auto-échantillonnage du MHRW. L ‘évaluation empirique montre que la moyenne des dégrées des nœuds échantillonnés est proche de la vérité terrain alors que pour MHRW et USDSG elle est 2 à 4 fois supérieure.2) La seconde contribution de cette thèse vise les lacunes des modèles en cascades indépendantes et de seuils linéaires. J'ai développé un modèle fondé sur les processus de Galton-Watson avec mort (GWK) qui prennent en compte tous les facteurs importants du processus de retweet. Ce nouveau modèle est validé par une application sur des données issues de Twitter et de Weibo.3) La troisième contribution est relative au développement d'un modèle économique du marché des acteurs actifs dans le domaine du marketing sur les mots clés dans les sites de recherches. J'ai développé des méthodes de gestion de portfolios de mots clés et montrés que ces portfolios permettent d'améliorer fortement les rendements sans augmenter le niveau de risque. / Microblog service (such as Twitter and Sina Weibo) have become an important platform for Internet content sharing. As the information in Microblog are widely used in public opinion mining, viral marketing and political campaigns, understanding how information diffuses over Microblogs, and explaining the process through which some tweets become popular, are important.The analysis of the information diffusion in Microblogs involves the data collection from Microblog, the modeling on information spreading and using the resulting models. Dealing with the huge amount of data flowing through microblogs is by itself a challenge. Designing an efficient and unbiased sampling algorithm for Microblog is therefore essential. Besides, the retweeting process in Microblog is complex because of the ephemerality of information, the topology of Microblog network and the particular features (such as number of followers) of publisher and retweeters.Two traditional models have been used for information diffusion : Independent Cascades and Linear Threshold models. However no one of them can describe completely the retweeting process in Microblog accurately. The analysis and design of new models to characterize the information diffusion in Microblog is therefore necessary. Moreover, a comprehensive description of the correlation between the information diffusion in Microblog and the searching trends of keywords on search engines is lacking although some work has been found some preliminary relationships.This work presnets a complete analysis of information diffusion in Microblog from. The contributions and innovations of this thesis are as follows:1)There are two popular unbiased Online Social Network (OSN) sampling algorithms,Metropolis-Hastings Random Walk (MHRW) and Unbiased Sampling for Directed Social Graph (USDSG) method. However they are both likely to yield considerable self-sampling probabilities when applied to Microblogs where there is local. To solve this problem, I have modelled the process of OSN sampling as a Markov process and have deduced the sufficient and necessary conditions of unbiased sampling. Based on this unbiased conditions, I proposed an efficient and unbiased sampling algorithms, Unbiased Sampling method with Dummy Edges (USDE), which reduces strongly the self-sampling probabilities of MHRW. The experimental evaluation demonstrate thats the average node degree of samples of MHRW and USDSG is 2 - 4 times as high as the ground truth while USDE can provide the approximation of ground truth when the sampling repetitions are removed. Moreover the average sampling time per node in USDE is only a half of MHRW and USDSG one.2)A second contribution targets the shortages of Independent Cascades (IC) and Linear Threshold (LT) models in characterizing the retweeting process in Microblogs. I achieve this by introducing a Galton Watson with Killing (GWK) model which considers all the three important factors including the ephemerality of information, the topology of network and the features of publisher and retweeters accurately. We have validated the applicability of the of GWK model over two datasets from Sina Weibo and Twitter and showed that GWK model can fit 82% of information receivers and 90% of the maximum numbers of hops in the real retweeting process. Besides, the GWK model is useful for revealing the endogenous and exogenous factors which affect the popularity of tweets.3) Motivated by the correlation between popularity and trendiness of topicsin Microblog and search trends, I have developed an economic analysis of the market involving a third-party ad broker, which is a popular market in current SEM, and finds that the adwords augmenting strategy with the trending and popular topics in Twitter enables the broker to achieve, on average, four folds larger return on investment than with a non-augmented strategy, while still maintaining the same level of risk.
|
3 |
Nutzung von Social Media und onlinebasierten Anwendungen in der Wissenschaft: Erste Ergebnisse des Science 2.0-Survey 2013 des Leibniz-Forschungsverbunds „Science 2.0“Pscheida, Daniela, Albrecht, Steffen, Herbst, Sabrina, Minet, Claudia, Köhler, Thomas January 2014 (has links)
Der Science 2.0-Survey geht der Frage nach, welche Rolle die Nutzung von Social Media und onlinebasierten Anwendungen für Wissenschaftler/-innen an deutschen Hochschulen spielt. Seine Ergebnisse zeigen, dass diese Werkzeuge aus der wissenschaftlichen Arbeit nicht wegzudenken sind. Zu den meist genutzten Anwendungen zählen die Online-Enzyklopädie Wikipedia (von 95% der Befragten beruflich genutzt), Onlinearchive und -datenbanken (79%), Mailinglisten (76%) und Content Sharing bzw. Cloud-Dienste wie beispielsweise Dropbox oder Slideshare (68%), die jeweils von mehr als zwei Dritteln der Wissenschaftler/-innen für berufliche Zwecke genutzt werden.
Die Forschenden wählen dabei zielgerichtet diejenigen Anwendungen aus, die für ihre Arbeit besonders effizient sind. Der praktische Nutzen und die Erleichterung und Beschleunigung des Arbeitsalltages sind die meistgenannten Motive für den Einsatz der Online-Werkzeuge. Neben speziell für die Wissenschaft entwickelten Anwendungen wählen sie auch allgemein verbreitete Tools. Für den Zugriff auf die Anwendungen werden neben Notebooks (90%) und PCs (76%) von mehr als der Hälfte der Befragten auch Smartphones genutzt.
Während fast alle Anwendungen eine Rolle in der Forschungstätigkeit spielen, werden für die Wissenschaftskommunikation vor allem Web 2.0-Anwendungen, aber auch Kommunikationstools genutzt. Für die Wissenschaftsadministration werden im wissenschaftlichen Umfeld vor allem Mailinglisten und Content Sharing-Dienste eingesetzt. In der Lehre dominieren dagegen Wikipedia sowie Lernmanagementsysteme.
Aus der Alltagskommunikation bekannte und beliebte Web 2.0-Dienste wie Weblogs, Social Networks, Microblogs und Social Bookmarking-Dienste werden nur in geringem Maß zu beruflichen Zwecken von den Wissenschaftler/-innen eingesetzt. Allerdings bleiben auch speziell für Lehre und Forschung entwickelte Anwendungen wie Lernmanagementsysteme, Literaturverwaltungen und Virtuelle Forschungsumgebungen hinter ihren Möglichkeiten zurück. Hier zeigt die Studie Entwicklungsbedarf auf, um das World Wide Web, das einst für die Wissenschaft erfunden wurde, noch stärker in deren Dienst zu stellen.
Der Datenreport dokumentiert erste Ergebnisse der bundesweiten Onlinebefragung von insgesamt 778 Wissenschaftler/-innen an deutschen Hochschulen. Die Erhebung wurde von Anfang September bis Mitte Oktober 2013 als Gemeinschaftsprojekt im Rahmen des Leibniz-Forschungsverbundes Science 2.0 durchgeführt. Die Projektleitung lag beim Medienzentrum der TU Dresden.:Executive Summary
1. Einleitung
2. Methode und Untersuchungsdesign
3. Charakterisierung des Datensamples, Gewichtung
3.1 Ausschöpfung der Stichprobe
3.2 Gewichtungsfaktoren
3.3 Geschlecht
3.4 Altersgruppen
3.5 Wissenschaftlicher Status
3.6 Dauer der Beschäftigung im Hochschulbereich
3.7 Fächergruppen
3.8 Befragte nach Fächergruppen und Geschlecht, Alter und wissenschaftlichem Status
3.9 Tätigkeitsschwerpunkt
3.10 Konferenzteilnahme, Mitgliedschaften
4. Nutzung von Web 2.0-Anwendungen und Online-Werkzeugen
4.1 Allgemeine und berufliche Nutzung von Online-Werkzeugen
4.2 Häufigkeit der beruflichen Nutzung
4.3 Kontext der beruflichen Nutzung von Werkzeugen
4.4 Kontext der Nutzung innerhalb der Forschungstätigkeit
4.5 Gründe für berufliche Nutzung der Online-Werkzeuge
4.6 Gründe für berufliche Nichtnutzung der Online-Werkzeuge
4.7 Aktive und passive Nutzung von Online-Werkzeugen
4.8 Virtuelle Forschungsumgebungen
4.9 Endgerätenutzung
4.10 Informationsquellen
5. Einstellungen zur Nutzung von Web 2.0-Anwendungen und Online-Werkzeugen im akademischen Alltag
5.1 Einstellungen insgesamt
5.2 Einstellungen nach Geschlecht
5.3 Einstellungen nach Altersgruppe
5.4 Einstellungen nach wissenschaftlichem Status
5.5 Einstellungen nach Fächergruppen
6. Referenzen
Anschreiben
Fragebogen
|
4 |
Use of Social Media and Online-based Tools in Academia: Results of the Science 2.0-Survey 2014: Data Report 2014Pscheida, Daniela, Minet, Claudia, Herbst, Sabrina, Albrecht, Steffen, Köhler, Thomas January 2015 (has links)
The Science 2.0-Survey investigates the dissemination and use of online tools and social media applications among scientists of all disciplines at German universities (institutions of higher education) and research institutions (Leibniz, Helmholtz, Max Planck institutes). Results show that digital, online-based tools have found widespread use and acceptance in academia and must therefore be considered a central component of scientific working processes. Furthermore the data gathered also make it clear that certain usage patterns begin to emerge and stabilise as routines in everyday academic work.
The most popular tools are the online encyclopedia Wikipedia (95% of all respondents use it professionally), mailing lists (78%), online archives/databases (75%) and content sharing/cloud services such as Dropbox or Slideshare (70%). Meanwhile, social bookmarking services remain largely untapped and unknown among scientists (only 5% professional usage).
Online tools and social media applications are most commonly utilised in a research context. In addition to Wikipedia (67%), the top three tools used for research purposes are online archives/databases (63%), reference management software (49%) and content sharing/cloud services (43%). In teaching, learning management systems (32%) play a significant role, even though this mainly applies to universities. Video/photo communities (25%), online archives/databases (23%) and content sharing/cloud services (21%) are also used by scientists in the context of teaching. However, there seems to be some backlog in the fi eld of science communication. Scientists are rarely active in this area; 45 per cent of respondents say science communication is not part of their range of duties, while for another 40 per cent such activities comprise no more than 10 per cent of their daily workload. When active in the fi eld of science communication, scientists seem to favour classic online-based tools such as mailing lists (44%) or videoconferences/VoIP (35%), while typical Web 2.0 tools such as weblogs (10%) or microblogs (6%) are rarely used in this context. Social network sites (SNS) with a professional and/or academic orientation (30%), however, are relatively common for communication purposes in academia. The situation is similar for science administration practices where, although the use of online-based tools and social media applications is more common, no more than one-quarter of the scientists use a particular tool, while personal organizers/schedule managers (27%) dominate.
The main factors cited by scientists as preventing them from using online-based tools and social media applications professionally are a lack of added value for their own work (30%), insufficient technical assistance (21%) and insufficient time to become familiar with the handling of the tools (15%). In particular, many scientists do not use microblogs (53%), discussion forums (41%) and weblogs (40%) professionally because they cannot see any added value in using them.
With regard to the attitudes of scientists in relation to the use of online tools and social media applications, results show that they are aware of privacy issues and have relatively high concerns about the spread of and access to personal data on the Internet. However, scientists generally have few reservations about dealing with social media and show themselves to be open to new technological developments.
This report documents the results of a Germany-wide online survey of a total of 2,084 scientists at German universities (1,419) and research institutions (665). The survey explores the usage of 18 online tools and social media applications for daily work in research, teaching, science administration and science communication. In addition to the frequency and context of use, the survey also documents reasons for the non-use of tools, as well as general attitudes towards the Internet and social media. The survey was conducted between 23 June 2014 and 20 July 2014 and is a joint project of the Leibniz Research Alliance „Science 2.0“, led by the Technische Universität Dresden’s Media Center.:Executive summary
1. Introduction
2. Methodology and research design
3. Characterisation of the data sample
Gender
Age
Type of institution
Academic position
Duration of employment in academic context
Subject group
Fields of activity
4. Use of social media and online-based tools
4.1 General use of social media und online-based tools
General usage
Devices
4.2 Use of social media und online-based tools in academic work
Professional and private usage
Frequency of professional usage
Professional usage by gender
Professional usage by age
Professional usage by subject group
Professional usage by position
4.3 Use of online-based tools and social media applications in various areas of academic activity
4.3.1 Use of online-based tools and social media applications in research
4.3.2 Use of online-based tools and social media applications in teaching
4.3.3 Use of online-based tools and social media applications in science administration
4.3.4 Use of online-based tools and social media applications in science communication
4.4 Barriers to the use of social media applications and online-based tools in everyday academic life
Reasons for professional non-use of online tools
4.5 Active and passive use of social media applications in everyday academic life
5. Attitudes to the use of social media applications and online-based tools in
everyday academic life
Overall attitudes
Attitude measurement reliability analysis
Attitudes by gender
Attitudes by age
Attitudes by position
Attitudes by subject group
References
Cover letter English
Cover letter German
Questionnaire English
Questionnaire German
|
5 |
Use of Social Media and Online-based Tools in AcademiaPscheida, Daniela, Minet, Claudia, Herbst, Sabrina, Albrecht, Steffen, Köhler, Thomas 12 January 2016 (has links) (PDF)
The Science 2.0-Survey investigates the dissemination and use of online tools and social media applications among scientists of all disciplines at German universities (institutions of higher education) and research institutions (Leibniz, Helmholtz, Max Planck institutes). Results show that digital, online-based tools have found widespread use and acceptance in academia and must therefore be considered a central component of scientific working processes. Furthermore the data gathered also make it clear that certain usage patterns begin to emerge and stabilise as routines in everyday academic work.
The most popular tools are the online encyclopedia Wikipedia (95% of all respondents use it professionally), mailing lists (78%), online archives/databases (75%) and content sharing/cloud services such as Dropbox or Slideshare (70%). Meanwhile, social bookmarking services remain largely untapped and unknown among scientists (only 5% professional usage).
Online tools and social media applications are most commonly utilised in a research context. In addition to Wikipedia (67%), the top three tools used for research purposes are online archives/databases (63%), reference management software (49%) and content sharing/cloud services (43%). In teaching, learning management systems (32%) play a significant role, even though this mainly applies to universities. Video/photo communities (25%), online archives/databases (23%) and content sharing/cloud services (21%) are also used by scientists in the context of teaching. However, there seems to be some backlog in the fi eld of science communication. Scientists are rarely active in this area; 45 per cent of respondents say science communication is not part of their range of duties, while for another 40 per cent such activities comprise no more than 10 per cent of their daily workload. When active in the fi eld of science communication, scientists seem to favour classic online-based tools such as mailing lists (44%) or videoconferences/VoIP (35%), while typical Web 2.0 tools such as weblogs (10%) or microblogs (6%) are rarely used in this context. Social network sites (SNS) with a professional and/or academic orientation (30%), however, are relatively common for communication purposes in academia. The situation is similar for science administration practices where, although the use of online-based tools and social media applications is more common, no more than one-quarter of the scientists use a particular tool, while personal organizers/schedule managers (27%) dominate.
The main factors cited by scientists as preventing them from using online-based tools and social media applications professionally are a lack of added value for their own work (30%), insufficient technical assistance (21%) and insufficient time to become familiar with the handling of the tools (15%). In particular, many scientists do not use microblogs (53%), discussion forums (41%) and weblogs (40%) professionally because they cannot see any added value in using them.
With regard to the attitudes of scientists in relation to the use of online tools and social media applications, results show that they are aware of privacy issues and have relatively high concerns about the spread of and access to personal data on the Internet. However, scientists generally have few reservations about dealing with social media and show themselves to be open to new technological developments.
This report documents the results of a Germany-wide online survey of a total of 2,084 scientists at German universities (1,419) and research institutions (665). The survey explores the usage of 18 online tools and social media applications for daily work in research, teaching, science administration and science communication. In addition to the frequency and context of use, the survey also documents reasons for the non-use of tools, as well as general attitudes towards the Internet and social media. The survey was conducted between 23 June 2014 and 20 July 2014 and is a joint project of the Leibniz Research Alliance „Science 2.0“, led by the Technische Universität Dresden’s Media Center.
|
6 |
Nutzung von Social Media und onlinebasierten Anwendungen in der WissenschaftPscheida, Daniela, Albrecht, Steffen, Herbst, Sabrina, Minet, Claudia, Köhler, Thomas 20 February 2014 (has links) (PDF)
Der Science 2.0-Survey geht der Frage nach, welche Rolle die Nutzung von Social Media und onlinebasierten Anwendungen für Wissenschaftler/-innen an deutschen Hochschulen spielt. Seine Ergebnisse zeigen, dass diese Werkzeuge aus der wissenschaftlichen Arbeit nicht wegzudenken sind. Zu den meist genutzten Anwendungen zählen die Online-Enzyklopädie Wikipedia (von 95% der Befragten beruflich genutzt), Onlinearchive und -datenbanken (79%), Mailinglisten (76%) und Content Sharing bzw. Cloud-Dienste wie beispielsweise Dropbox oder Slideshare (68%), die jeweils von mehr als zwei Dritteln der Wissenschaftler/-innen für berufliche Zwecke genutzt werden.
Die Forschenden wählen dabei zielgerichtet diejenigen Anwendungen aus, die für ihre Arbeit besonders effizient sind. Der praktische Nutzen und die Erleichterung und Beschleunigung des Arbeitsalltages sind die meistgenannten Motive für den Einsatz der Online-Werkzeuge. Neben speziell für die Wissenschaft entwickelten Anwendungen wählen sie auch allgemein verbreitete Tools. Für den Zugriff auf die Anwendungen werden neben Notebooks (90%) und PCs (76%) von mehr als der Hälfte der Befragten auch Smartphones genutzt.
Während fast alle Anwendungen eine Rolle in der Forschungstätigkeit spielen, werden für die Wissenschaftskommunikation vor allem Web 2.0-Anwendungen, aber auch Kommunikationstools genutzt. Für die Wissenschaftsadministration werden im wissenschaftlichen Umfeld vor allem Mailinglisten und Content Sharing-Dienste eingesetzt. In der Lehre dominieren dagegen Wikipedia sowie Lernmanagementsysteme.
Aus der Alltagskommunikation bekannte und beliebte Web 2.0-Dienste wie Weblogs, Social Networks, Microblogs und Social Bookmarking-Dienste werden nur in geringem Maß zu beruflichen Zwecken von den Wissenschaftler/-innen eingesetzt. Allerdings bleiben auch speziell für Lehre und Forschung entwickelte Anwendungen wie Lernmanagementsysteme, Literaturverwaltungen und Virtuelle Forschungsumgebungen hinter ihren Möglichkeiten zurück. Hier zeigt die Studie Entwicklungsbedarf auf, um das World Wide Web, das einst für die Wissenschaft erfunden wurde, noch stärker in deren Dienst zu stellen.
Der Datenreport dokumentiert erste Ergebnisse der bundesweiten Onlinebefragung von insgesamt 778 Wissenschaftler/-innen an deutschen Hochschulen. Die Erhebung wurde von Anfang September bis Mitte Oktober 2013 als Gemeinschaftsprojekt im Rahmen des Leibniz-Forschungsverbundes Science 2.0 durchgeführt. Die Projektleitung lag beim Medienzentrum der TU Dresden.
|
Page generated in 0.0335 seconds